PBS_2024_12_21
An audio podcast where Bart Busschots is teaching the audience to program. Associated tutorial shownotes are available at https://pbs.bartificer.net.
Automatic Shownotes
Chapters
Introduction to Steve Matten
Steve's Background and Experience
Running Large Language Models Locally
Exploring Olama and Model Selection
User Interfaces for Language Models
Integrating Continue with Development Tools
Learning Programming with AI Assistance
Understanding Model Limitations and Context
The Future of Programming and AI
Closing Thoughts and Support Options
Long Summary
The interview features Alison Sheridan speaking with Steve Matten about running large language models (LLMs) locally on his Mac and Raspberry Pi. Steve, a New Jersey resident with a background in physics and computer science, shares his experiences transitioning from a management role in tech to exploring AI tools. He mentions that despite not actively coding professionally, he has contributed to programming projects and gained insight through experimenting with LLMs.
Steve's journey into AI began after encountering the excitement around GitHub Copilot. In a corporate environment where AI technologies were gaining traction, he was tasked with evaluating whether AI could replace traditional programming roles. This led him to try out GitHub Copilot for learning Python, during which he realized the tool was somewhat limited and often provided unreliable solutions, teaching him more about assessing code than actually coding itself.
Through further exploration, Steve discovered Olama, a tool that allows users to download and run various models locally. He explains how Olama enables users to interact with these models via the terminal, emphasizing the importance of RAM in hosting LLMs rather than storage size alone. He elaborates on the nuances of different model sizes, explaining that models with more parameters yield greater capabilities, and discusses the ease of downloading models through the Olama interface.
By utilizing Olama in conjunction with Enchanted, a Mac-specific GUI interface, and Continue, a useful plugin for Visual Studio Code, Steve enhanced his programming workflow. Enchanted allows for a more user-friendly interaction with LLMs, mirroring the user experience found in systems like ChatGPT while avoiding the command line hesitations many face. Furthermore, the Continue plugin integrates with Visual Studio Code, enabling different models tailored for specific coding tasks, significantly boosting his efficacy.
During their discussion, Steve highlights the challenges and potential pitfalls of relying on AI tools like Continue. He emphasizes that while they can produce helpful suggestions and code completion prompts, they often make common mistakes found in student submissions and can produce inaccurate results. His approach has been to leverage these tools for assistance while maintaining a solid understanding of the underlying technologies to catch mistakes, rely on context from the documentation, and apply tests to verify that his code maintains its intended functionality.
The conversation touches upon the future of programming, positing that the skills required will shift from rote syntax memorization to creating better specification requirements and test cases. Alison and Steve both express excitement and caution regarding AI's role, with Steve sharing a thought-provoking perspective on the potential for programmer skill dilution as automation becomes more pervasive in the field.
In closing, Steve shares how he discovered the Programming by Stealth podcast while looking for resources to better understand the terminal, leading him to engage with the larger community. He encourages listeners to connect with him via the podcast's Slack channel for further discussion. This engaging conversation highlights the practicalities and implications of using AI for programming, providing insights that are accessible to both novice and experienced developers.
Steve's journey into AI began after encountering the excitement around GitHub Copilot. In a corporate environment where AI technologies were gaining traction, he was tasked with evaluating whether AI could replace traditional programming roles. This led him to try out GitHub Copilot for learning Python, during which he realized the tool was somewhat limited and often provided unreliable solutions, teaching him more about assessing code than actually coding itself.
Through further exploration, Steve discovered Olama, a tool that allows users to download and run various models locally. He explains how Olama enables users to interact with these models via the terminal, emphasizing the importance of RAM in hosting LLMs rather than storage size alone. He elaborates on the nuances of different model sizes, explaining that models with more parameters yield greater capabilities, and discusses the ease of downloading models through the Olama interface.
By utilizing Olama in conjunction with Enchanted, a Mac-specific GUI interface, and Continue, a useful plugin for Visual Studio Code, Steve enhanced his programming workflow. Enchanted allows for a more user-friendly interaction with LLMs, mirroring the user experience found in systems like ChatGPT while avoiding the command line hesitations many face. Furthermore, the Continue plugin integrates with Visual Studio Code, enabling different models tailored for specific coding tasks, significantly boosting his efficacy.
During their discussion, Steve highlights the challenges and potential pitfalls of relying on AI tools like Continue. He emphasizes that while they can produce helpful suggestions and code completion prompts, they often make common mistakes found in student submissions and can produce inaccurate results. His approach has been to leverage these tools for assistance while maintaining a solid understanding of the underlying technologies to catch mistakes, rely on context from the documentation, and apply tests to verify that his code maintains its intended functionality.
The conversation touches upon the future of programming, positing that the skills required will shift from rote syntax memorization to creating better specification requirements and test cases. Alison and Steve both express excitement and caution regarding AI's role, with Steve sharing a thought-provoking perspective on the potential for programmer skill dilution as automation becomes more pervasive in the field.
In closing, Steve shares how he discovered the Programming by Stealth podcast while looking for resources to better understand the terminal, leading him to engage with the larger community. He encourages listeners to connect with him via the podcast's Slack channel for further discussion. This engaging conversation highlights the practicalities and implications of using AI for programming, providing insights that are accessible to both novice and experienced developers.
Brief Summary
In this interview, Alison Sheridan speaks with Steve Matten about his experiences running large language models (LLMs) on his Mac and Raspberry Pi. Steve reflects on his transition from tech management to exploring AI tools, beginning with GitHub Copilot. He discusses using Olama for local model management, emphasizing the importance of RAM and model parameters. Enhancing his workflow with Enchanted and the Continue plugin for Visual Studio Code, he shares insights on the challenges of relying on AI tools while maintaining coding fundamentals. They also explore the future of programming skills and invite listeners to engage with the podcast community for further discussion.
Tags
Alison Sheridan
Steve Matten
large language models
Mac
Raspberry Pi
tech management
AI tools
GitHub Copilot
local model management
Visual Studio Code
coding fundamentals
programming skills
Transcript
[0:00]Music
[0:07]Well, it's that time of the week again. It's time for Programming by Stealth,
[0:14]
Introduction to Steve Matten
[0:11]and this is Tidbit 10 for December 21st, 2024. I'm your host, Alison Sheridan, and I am not joined by Barb Bouchotts. As you'll hear in the upcoming segment, I'll be talking to a lovely new friend named Steve Matten about how he's running large language models, LLMs, locally on his back. Steve is absolutely delightful, and you'll be surprised to know this is the first time he's ever done a recording like this. This conversation was recorded for Programming by Stealth, but also for my more mainstream show, The No Silicast. As a result, you'll hear me breaking things down to simplify what he says a bit more than you might need. Okay, enough preamble. Let's get into it. I'm going to do something now that I've never done before, and that's introduce a guest about whom I know virtually nothing. I have evidence that Steve Matten is a listener to the Programming by Stealth podcast because he has contributed several fixes and valuable suggestions through GitHub to the project. He also participates in our Slack at podfee.com slash Slack. Beyond that, I literally don't know anything about his background. I don't know where he lives. I don't know how tall he is. I don't know anything,
[1:18]
Steve's Background and Experience
[1:16]but I do know that he has something super interesting to tell us about. So with that great introduction, welcome to the show, Steve.
[1:22]Oh, thank you very much. I can fill in some of those blanks. I live in Southampton, New Jersey, which is just inside the New Jersey Pinelands or the national Pinelands reserve, which means I live in the woods, not the North woods, the East woods. There we go. Steve from the East woods. We could do that. I have a bachelor's degree in physics and a master's degree in computer science. And as soon as I got the master's degree, the company that was paying for, moved me into management. So I've never written a bit of code in anger in all those years.
[2:01]You know, that's kind of the path that Bart went along. He was getting his PhD in physics when I first met him. And I always worried that getting into podcasting was why he stopped his PhD, but he swears it has nothing to do with it. But I'm pretty sure he's got a degree in computer science too. So you are definitely our people. That seems to be all we need to know. That's very interesting. You haven't written any code in anger because most of the contributions you've made have been about code.
[2:27]Yep. So I did... I wrote some utilities for myself back at that job. The company paid for me to go to grad school.
[2:38]Oh, nice.
[2:39]I went back in college, RPI, Troy, New York, 121. I graduated in 1982. 1982 was the height of the Reagan recession. One of the paths when you're tired of going to school was to get a job and have the company pay for you grad school for physics. Well, nobody got jobs that year.
[2:59]Okay.
[2:59]So eventually I got a job, kind of pseudo programming. And then I got hired. I got moved down to our corporate office, the development shop. And I was put in charge of testing. I was the one tester for 12 programmers because infamously, one of the VPs told all of our customers that they were supposed to test it for us. And why weren't they sending in all those bug reports? Why are they just complaining? That didn't go over big. Anyway, I started writing small utilities for myself and said, oh, this is interesting. A little in physics, my project was computer modeling of the surface and half Saramon effect, which was in Fortran many, many years ago.
[3:46]Ooh, Fortran 4 with Watt 5.
[3:48]There you go. So then I said, okay, well, they'll get them to pay for me to go to grad school. I did. And as soon as I got the degree, they said, nope, we want you to be a manager now and put me in charge of building out actual testing group.
[4:02]So from what I've seen, it appears to be in your DNA to mess around with code. So what we started talking, you actually did a post in Slack where you described, and I'm going to give the opening pitch for it, but then I want you to get a
[4:18]
Running Large Language Models Locally
[4:17]little bit into the details. We are definitely not going to get into the weeds because we could go here for hours and have too much fun. But you figured out a way to run a large language model, AI if we'll call it that, on your Mac without actually being connected to the internet. Like it runs locally. Is that correct?
[4:37]I didn't figure it out.
[4:38]Well.
[4:38]But I was able to implement it.
[4:40]You let go to pieces.
[4:42]So, yes. But even it gets better. I've got, oh, and that screen over there running on a Raspberry Pi.
[4:50]Oh, nice.
[4:51]Oh, I love it. So the tool Alright, so let's step back This is a book, you can't see the book If you're listening, but it's, Learn AI-assisted programming with GitHub Copilot. My company, the one I'm currently working for, the eSuites people have, just like you were just now, they've drank all of the Kool-Aid. Not just some of the Kool-Aid, all of the Kool-Aid about AI. And one of them actually came to my boss, who's the EVP of development, and said, I'm getting people calling me up and saying that they've got these programs that we could use in our business and they don't even have any programmers. They just asked chat GPT and it writes programs for them. Why do we need programmers? And he's there, well, trying to convince them that it really doesn't work that way. We started a project there and they said, Steve, you're not writing any code now. Go figure out if this thing can make you a programmer.
[5:53]Oh, that's actually an interesting way to do it because you are a programmer, but you're not a programmer.
[5:58]Right, right. So I could understand it enough, but I'm not writing any code.
[6:03]Okay.
[6:03]And so we decided to do GitHub Copilot as our test, which, no. Yeah, GitHub Copilot. That's the book again. And I was going to learn Python with that in VS Code. And they paid for the... Um co-pilot for the experiment.
[6:24]And i tried it just for people who don't know what he's talking about it basically when you're in a code editor github co-pilot is trained specifically to help you write code and that's it's written by microsoft so it's a large language model
[6:40]Yeah right and it's fun because you got in there and it worked kind of um even even that most of this book is spent telling you that it's not 100% accurate, that 50% of the time it's going to be wrong. And it teaches you not how to do code in Python, but how to figure out whether or not that code is right. And they even have some nice chapters in there where they say, many of the people that use, train, teach people and put up GitHub repositories in academia will have exercises and it'll say, here's the beginning, you know, and then there's the part that says your code goes here as a comment and there and the people will put in you know hey i'm trying to write this and they'll get back from github copilot your code goes here because that's what they're trained on, and we tried it and we decided yeah this really wasn't going to help us okay um and we're not using that but that got me interested just like anybody else who's probably listening to this in large language models and all that so i was one day wandering around the web and i saw this thing about olama and continue and.
[7:47]Let's spell that out real quick o-l-l-a-m-a and olama is the large language model that facebook has written right meta
[7:55]Um nope no llama llama with no o is the set of models that meta has written okay um olama is a tool that allows you to pick from many, many models to download and run locally on your machine.
[8:18]Okay. And I said Mac, but it's obviously more than Mac. It's any Unix or run on Windows?
[8:26]Yep. If you were to go to the Olama site.
[8:29]Actually, you did put a link in the show notes already. Yeah. Olama.com.
[8:35]Olama is a tool that allows you to run large language models locally. And it has a whole variety of tools, and it helps you pick different models and different sizes of the models. On my Raspberry Pi, I downloaded a model that's really tiny, 1B, 1 billion.
[8:57]Yeah, let me explain what I think that means. A small large language model and a large, large language model, the difference is in the number of parameters in the matrices that make this thing up. I don't know if that's quite the right wording, but a lot of people think, well, a large language model is built on the whole internet, and a small language model must be on fewer words. And that's not what it is at all. It's the size of the matrix of the model, the number of parameters. So when you said 1B, you mean, what, 1 billion parameters?
[9:31]1 billion parameters, yep. And there's even smaller models, but you can run those size models on your pie. On the Mac that I have, I have a studio with 64 gigs of RAM. The more memory you have, the more, because it loads the model into memory to do all the calculations.
[9:56]Okay. Okay. So you're not caring about disk size. You're caring about RAM. Oh, that's really interesting. Okay.
[10:06]And you're caring about video RAM. because it tries to do it on the gpu not the cpu oh.
[10:13]Really huh okay but i can't i when you buy a mac can you control how much video ram you get
[10:21]It's all of it all of it that's the nice thing about the m series max is all of the ram is both oh okay.
[10:31]Okay that's the
[10:33]So it's my machine has plenty of ram and plenty of cpu power it's the original studio m1 max um chip so if you were having m4 with you know plenty of ram and your machine you should be just fine actually.
[10:50]They don't have an m4 uh studio yet i don't think do they
[10:55]I don't m3.
[10:57]I think is the
[10:57]Slightest you're the expert i'm just you're.
[11:00]Just not a programmer okay all right so olama is this app that you download from olama.com and it's literally just downloads and then when you open it it's going to ask you to mess around in the terminal i think right that's what happened
[11:15]Yes it's a it's a terminal app and what you do is you go into the terminal and the command to start it is olama but like a number of different terminal apps it's that's the command and then there's a secondary command, just like git. When you say git and then the secondary command, it's alama and the secondary command. So alama run and the model name, will, if you don't have that model, pull it down, load it into your system,
[11:50]
Exploring Olama and Model Selection
[11:46]and then start running it so you can ask questions of Olama. Or not of Olama, of the model you selected.
[11:54]Okay, so let's get those pieces again. So Olama is a way to talk to your terminal, to get your terminal to start installing these A large language model. Because I think when I first ran it, you're saying it, that's when it said, okay, this is what you're going to have to go type into the terminal. When the first time I typed it, it put glop all over the screen, and then it said, okay, I'm ready. And it was just sitting there waiting. And then I could say, oh, Lama, run that model. And then I could start asking you questions. And that was a command line. Now, Steve, my husband, Steve, asked me an interesting question. He says, how big was it? And that gets back to this thing about thinking of large language models as being big. I said, I have no idea. I don't know where it is. And I use my favorite app for finding things. I can't figure out where they are because I don't even know what they're called. I use find any file and it found it buried down in some library file somewhere.
[12:52]If you go to the Olama website and you'll see up over here, there'll be a link that says models.
[13:01]Right.
[13:02]If you look in the models, you can click on a model and it will show you all of the variants of that model. So for example, you mentioned llama earlier. So there's a llama, a new llama, llama 3.2. Right. And that comes in two sizes, 1 billion and 3 billion.
[13:21]Okay.
[13:22]And you can look at it and you can click on that. And then it has more information on the page about what that model is, what it does, some of the parameters. Okay. You can look at, like, there's the Llama 3.1 that has a 870 and 405 billion parameters.
[13:42]I did end up going down a rabbit hole later asking it to just install different ones, but I didn't know what it was doing, really, and what it meant to go into the different ones. I ended up loading Llama 3.2 Vision, thinking, wow, that sounds better, but I think it had nothing to do with what I needed it to do. So, so we've got, okay, again, we've got this app, sorry, yeah, the app Alama, once that's running, you, it tells you to go to the terminal and type in Alama run, I'll get it yet, Alama run llama 3.2. So it does all the glop on the screen. From now on, you're pretty much not really talking to Alama anymore.
[14:25]Nope, you're talking to whatever model. So in your example right there, you're talking to Alama 3.2. And you're asking that model the questions so just like if you're using chat gpt and you ask chat gpt a question now you're asking llama 3.2.
[14:43]A question right and it's just was sitting there at the command line i'm not talking to llama anymore it's already done its job yep which is is disconcerting like while we were talking here i haven't played within a little while i typed in a llama and you know said okay launch it and it went uh-huh my mac just didn't do anything but over at the command line i think it should be working i think i'm doing something wrong but i can tell it's oh we got to start a llama first we got to serve it up so there's a little command line but at that point you you told us that that was fun but you took it up a level beyond that and this is where the third piece came into it right yeah
[15:20]So there's a couple many pieces Now, I got all kinds of pieces. So the next thing was because I was interested in using it as a programming tool to make suggestions similar to Copilot. What I had read was you could use this other tool called Continue. And continue.
[15:44]
User Interfaces for Language Models
[15:41]Can we wait to get to continue? Because I think you did Enchanted next.
[15:45]Well, Enchanted is a good. And if you go to the Olama GitHub page, again, if you're on the main Olama page, there'll be a little GitHub up over here. People can't see me pointing to the top of the screen here, right? Right over there. It's over there. You'll see GitHub. And if you go to GitHub and scroll down, you'll see integrations. And there'll be a whole list of tools that are front ends or other tools for using it.
[16:11]And that's where you found.
[16:13]Yep. The first one is called OpenWebUI or WebUI Open something. But the second one is called Enchanted. And it said Mac Native. And I said, okay, I use a Mac mostly. Let's try that one. And it's just a simple UI, GUI UI, where you can select the model you want. and then run your question.
[16:36]So at this point, I'm going to keep saying it again. Olama is what let us be able to download the models and now we have them available to us in the terminal and we can type little questions and it spits little answers back to us. But as soon as you run Enchanted, Enchanted is talking to that same terminal application that's been, or the model that's been installed. And now you say it's a little gooey, But it looks exactly like ChatGPT to me. It's pretty. It's Mac-like. It's got the right buttons. It's light gray, dark gray. It's got Enchanted's written in colors. I mean, this is a very, it's a very nerdy beginning step and immediately becomes not nerdy at all.
[17:18]Exactly. So if you are intimidated by the command line. And you shouldn't be because you've already listened to Taming the Terminal. That's been out for years now, right? Right. But if you are and you're more comfortable with a GUI, then this is a nice one there that, yep, you can use. Now, the other thing that I said was I took Keyboard Maestro and set it up so that as soon as I start Olama, it automatically launches Enchanted. So I never have to go into the terminal. No, I just, you know, go Alfred, Ollama, Ollama starts, Enchanted starts, and there you have it.
[17:55]Okay, okay. By the way, I am doing this at a much higher level each time because I'm talking, we're going to be talking to the Nocila castways, not necessarily programming by stealth folks. Some subset of the two. So as soon as I got to that stage, I was pretty excited.
[18:14]
Integrating Continue with Development Tools
[18:11]So now we've got Enchanted running. We've got a happy little GUI. Everything's pretty. We don't have to get our fingers too dirty at the terminal, but you really wanted to be able to use this for development. So where did you go next?
[18:23]So that's where that continue comes in. And that's a tool that will is integrated in both VS code and, um, jet brains tools. I don't use jet brains tools, so I'm not really familiar with what the names are.
[18:37]But for the non-programmer, a visual studio code is a code editor does a whole bunch of stuff, but it's kind of nice. because it's got, it's not kind of nice, it's really nice. It's got a plugin architecture so people can write these plugins to do things like install this continue app that will allow us to talk to the same model.
[18:59]Yep, exactly right. So it's a plugin. You go into the plugin marketplace, you type in continue, it'll bring it up. You click install, it installs it. It gives you, it says, move the continue over from the left sidebar to the right sidebar. So you have a little chat window there all the time. And it allows you to do multiple things. There's a little bit of configuration. You have to tell it which model you want for code completion, which model you want for chatting, which model you want for embedding. And they tell you which ones they recommend for different things. So if they tell you for a llama, they recommend these models. And those models will change over time as new models become available and become replaced. The old ones become better and things like that. But they'll walk you through those steps. You set it up.
[19:44]Okay. So this is where, hang on. This is where I didn't make it. I didn't make it past this step. I got as far as installing it into Visual Studio Code and, um, and it took me a while to get it to start answering questions for me, but I didn't ever notice it telling me how to talk to what model. Cause I thought we were using the stuff we installed over with Olama.
[20:08]Yes. Now, continue doesn't just use Olama. Continue will, if you have a ChatGPT API code, you can put that in. It'll use ChatGPT. It'll use Claude. It'll use any of those, Gemini, any of those models. I was more interested in doing everything locally. Yeah. See what I could do without having to pay anybody, you know, for sending things out to chat GPT or anybody like that. So I've used, chose as my provider, Olama.
[20:39]Okay.
[20:40]And then since I chose Olama, any of the models I had downloaded via Olama were then available in VS Code. And in the configuration for continue, you want to tell it, this is the model I want to use for code completion. And they suggest, I think it's called star coder. So you suggest that. And they said, this is the one you want to use for chat when you're asking it questions about code. And I forget the one I selected for that. But now you've got to go to Olama and download those models because continue won't see them unless you've downloaded them.
[21:15]Okay.
[21:15]So again, you go to Olama, but continue when you look at the documents on continue, it'll tell you, it'll suggest if you're using Olama, these are the models we suggest. They're small enough to run, but they give good results and they're optimized for code completion and code work. So you want to go to the continue site, read the documentation. It's very simple steps. I mean, you know, I did it in a half hour. I was up and running.
[21:47]
Learning Programming with AI Assistance
[21:42]And then as I sent you, I sent you a couple of things that were to me, extremely spooky. I'm learning Python with a textbook called, what's it? I don't have it over here. But Python Crash Course. It's from NoStarchPress. And it's the third edition. And apparently it's been quite widely used by people who have uploaded things to places where these models can be trained on. Because I'll type in, you know, the start of one of the exercises and it'll finish the exercise for me completely. And it'll even do, as I showed you, I was doing exercise like three and it then did exercise four and exercise five and just note them all out and said, here, you're done.
[22:27]That's actually terrible. It is. It is. Because you never even learn anything.
[22:32]It was delightful and spooky and terrible at the same time.
[22:36]Yeah.
[22:38]Now it wasn't always right. Right. That's the other thing, because it's being trained on students' answers, and the students don't always get it right. You know, it gets graded, and some of the things. And again, you ask it questions. There was, you know, I wanted to do a loop, and I couldn't get the loop right. It's like, okay, the loop is several chapters ahead, but I know loops because, you know, I'm not a programmer that doesn't write programs. So I knew, okay, I should be able to do a loop here, and I'm writing a loop in C, and the syntax is not quite the same, so I'm trying to guess. So I told you only how to do this in the chat window. And it gave me an answer that was wrong, but it was close enough that I could say, oh, I see what I'm doing wrong now and figure it out. And that's where I'm learning because otherwise I was just frustrated. Like I'm not getting anywhere now. Now it said, okay, here's a hint. And the hint wasn't complete. I still had to figure it out.
[23:31]Oh, that's good. How did you get it to stop answering for you?
[23:36]Um it doesn't so when you do the code completion you can either have and take everything or there's another keystroke i don't remember it off the top of my head right now that you can say just take it a word at a time right or a line at a time so it was showing me all this other stuff that i didn't really want right so i would just say no just give me this little bit okay right that's the help or i could go over to the chat window and say hey i'm trying to do this Help me out.
[24:02]Talk to me.
[24:03]I'm trying to do a for loop for, you know, this, how to, what's the syntax?
[24:08]Okay. Tell me that. That's interesting. So it's, it's, you're chatting with it and it's doing autocomplete though. And like you said, those were two different models. You had it point to.
[24:17]You, you continue the continued documentation suggests that you use different models for different things because they are optimized for different use cases.
[24:30]One of the things I've heard is that the chat GPTs of the world, Gemini, Claude, those kind of things are actually, those use a lot of resources to build and design because they're training on the internet. But specialized models are much more energy efficient, so you can feel less guilty about how that was built. Because if you've got a model, for example, that only knows Python, that's going to be actually, that won't have used much energy at all. But, okay, it's got Python and C and Swift, and it's got a bunch of languages, then it's going to be a little bit bigger. But that's still really small compared to the internet being part of the training data that went into building those matrices, right?
[25:14]Yep, exactly. When you read the documentation around these things, they talk about smaller models and targeted models are often much better for this than the generalized model. Now, if there are models that are trained on code, GitHub Copilot, for example, was just trained on GitHub.
[25:36]Right.
[25:36]Now, that's good because it's just code. And it's bad because a lot of people like yourself put code out on GitHub. Now, I'm not saying that.
[25:45]Common mistakes become common.
[25:47]Exactly, right? Or if you're trying to do JavaScript, it's using all this old JavaScript.
[25:55]Oh, yeah.
[25:56]It's not using, you know, it's using var instead of let.
[26:00]Okay.
[26:00]And practices like that.
[26:02]Yeah.
[26:02]And that's where more targeted ones that they say, okay. Now, there are other techniques you can do for that. There's context. In that chat window, and continue, in other tools as well, you can say, here are some documents. Let me load these documents in, and I want you to use these documents as context. Oh, nice. So you could use the chapter, right? Now, the problem is, is that you can only have so many tokens. tokens, a token is, you can think of a token as a word. It's really just parts of words, but for our discussion, you can think as a token of a word. So if it says I can have 2000 tokens, 2K tokens, well, your document can only have 2000 words.
[26:40]
Understanding Model Limitations and Context
[26:41]Now the models will tell you how many they can use. A llama, when it downloads them, sets everything to $2,000, $2,000. So if you want more, you've got to go in and tweak that. So some of them can have up to $128,000.
[26:58]What's the purpose of the token limit?
[27:01]The token limit is because that uses resources. If you're going to, say, ChatGPT and you pay for tokens, right you're not going to want to upload all this stuff to chat gpt and run up a big bill, or it's also using memory um the other thing is that but it's.
[27:22]Artificial in our example right because we're doing it all locally
[27:25]Now right but remember continue isn't just local oh.
[27:31]Okay right okay
[27:32]Now the other thing is that even though you have this large context window everybody can see my hands getting bigger and smaller on the screen here, right? Um, it doesn't always remember it all because it has to load it all into memory. Okay. It's already got the model in memory. It's got whatever else you're running in memory. And now it's got to load this into memory. And there's a lot of documentation out there that it says it members the beginning, it remembers the end. It doesn't necessarily remember the middle. So you want to target that stuff in context. The other way to do this is rag. I think it means retrieval augmented generation. And what you do there is you take a whole bunch of documents. I could take that Python book, turn it into a PDF OCR. There are steps. I don't know how to do them. I haven't done it yet, but you can do that in a format that the system will understand. Then you upload those in a rag and the rags, and then you say, just look at this stuff. So if you wanted a Python expert, you could take multiple Python textbooks, do whatever process it is to ragify them and then say, okay, you know.
[28:45]That's the only
[28:45]Place you're allowed to get content from. Just look at this.
[28:49]How interesting.
[28:50]And it's one of the ways that they're trying to make it so these things don't hallucinate, which is a big problem.
[28:58]One of the things I've liked about using large language models is that I can be really tailored in my question. And so, like, if I try to ask using Google, in Keyboard Maestro 5.4 running on macOS Sequoia, what is the syntax to do, blah, blah, blah. It's just going to barf on me. I mean, it's just going to give me like the Keyboard Maestro website or something. But in Chachipiti, I'm able to give it that very, very specific thing. Like, where is the menu that tells me how to do this thing that I'm trying to do? Like, I know the menu's there somewhere and I can't find it. And it's not always right, but it understood the question. It didn't understand anything. It regurgitated something that showed that it was at least on the right path. And then I can start to narrow it down. Whereas with Google, you're just getting the answers are always too broad for what I want is I want it in this operating system on this application, this version of this application. And so I would think with this, you could be able to say I'm running, you know, using JavaScript version, blah, blah, blah, or HTML5 or Bootstrap5. Give me the answer to my question. Can you do that?
[30:09]You can. And it's good that you asked it in the context of a programming language, because programming languages are very structured. There's a very well-defined syntax that they have to follow. Thus, the answers are going to be well known. You can tell whether it's right or not right away. Natural language is much less structured. So if you ask it a question about natural language topics, it's much more likely to confabulate because it's the words choices are much bigger.
[30:47]Right.
[30:47]Whereas in code, it's pretty much, hey, if you put a, you know, parentheses, then, you know, there's going to be another parentheses eventually. Right. And if you're doing a for loop, there's going to be certain things that have to be in that for loop. And there has to be, you know, depending on the language, a colon at the end and curly braces or whatever. Yes.
[31:07]So... It could be wrong but that's different than hallucinating
[31:13]Yes wrong.
[31:15]Is boy everybody makes this mistake
[31:18]Yep yep everybody makes this mistake here you go and but it's close enough that you can say oh yeah i could or run it through you know try to run it in vs code and it'll say oh hey you've got an error here there's one right finding logic errors or security errors right things like that are more problematic because it's not going to be able to to help you with those and if you're not a good coder and then this book over here you know learn ai programming keeps saying that don't use this for mission critical stuff like medical um controls because there's no guarantee that it's going to have the right safety protocols or the right security protocols in there you still need to know and that's again why it's good for programmers there was a thing i read recently that said programming is one of the use cases that'll work because the people that are using it usually have some domain knowledge like if we were to use it for the programmers at work it wouldn't replace them but they could ask it a question they'll get it back and they'll be able to say yeah i understand this oh it made this same mistake everybody makes you know with sql injection so So I'll be able to fix that.
[32:29]
The Future of Programming and AI
[32:30]But the old SQL injection, right? Yeah, that's interesting. My reaction to that is for now, that's true, right? But as people get lazier because it's doing more and more for us, we're going to be less and less skilled. And that's where it gets a little bit scary.
[32:50]Oh, it's going to be.
[32:51]Yeah, it's going to be a hot mess.
[32:54]It is, it is. I mean, I read another one. And this is getting a little bit off topic here, but about hallucinations. And it's real easy to tell, you know, if it tells you to use glue to keep cheese on your pizza or eat pebbles to get your mineral daily dose of minerals, yeah, you can pretty much tell that's wrong. But if you were to ask ChatGPT to give you a review of the movie Wicked, it would happily do that for you. It's never seen the movie Wicked.
[33:18]Right.
[33:19]It's just making it all up.
[33:21]That's actually a really good example because it can't have seen the movie,
[33:25]Right? No. Right. So everything it does is that. It just makes things up.
[33:30]Well, it's putting words in an order that is likely to be said because somebody else has already said it.
[33:38]Mm-hmm.
[33:41]Yeah, that's interesting. One other thing I did see that I did succeed in doing in VS Code running continue was you could ask it to, I forget what it was. It was like tighten this code up or make this more readable. So it took code I had written and it just kind of made it nicer. You know, it made it a little bit tighter. You can ask it to explain things.
[34:06]Why is this doing it the way it is? you can do it exactly like you said refactor it.
[34:11]Yeah refactored yeah and it even put in some nice comments and i was like man that's that's better than what i wrote yeah but again you've got to know that it's still doing the same thing that you thought it was doing before yeah it didn't maybe the real trick is going to be uh people writing tests back to where you started from right is that if you got good tests you'd be able to verify that it still did what you thought it was going to do so maybe right pointing to
[34:36]The book again everybody who can't see this on radio but anyway exactly that's what they keep saying that the skills are going to change from knowing the syntax and the semantics of the language to writing better requirements up front and test cases at the end to make sure.
[34:58]Or like helma keeps trying to convince me write the tests up front Well,
[35:04]You write the tests along with the code. Yeah.
[35:06]Yeah. This is really fun. I love that you grabbed all these pieces. I know Steve's making it sound like, oh, there was this, and then they told me to do this, and they told me to do this, and told me to do this. But he did a lot of reading between the lines to pull the pieces out, because I go to the Alama GitHub page, and I'm like, look at all this stuff. I don't know what anything. I would never have known to look at integrations, for example. I would never have clicked on that. But it is as easy, as he's saying, to do these things. But I really appreciate you showing us where these pieces are and the puzzle pieces we can put together. I've just had a great time talking about this, Steve. I think this is super fun. If people wanted to chat with you about this, would the right place be to go to our Slack, maybe to the PBS channel?
[35:58]Or they can contact me there. Yep.
[36:00]Uh, any other place you want to plug you big old Mastodon user or anything like that?
[36:06]I'm pretty much stay at home, but I'm not in Madagascar on 12 hour bus rides and all that.
[36:15]Okay. Yeah. We did talk about that. Let me ask you one more question. Do you remember how you found programming by stealth?
[36:21]Um, no. Well, yes. Cause I was, I listened to taming the terminal i wanted to learn how to use the terminal and when i was googling around for references that came up so i listened to
[36:41]
Closing Thoughts and Support Options
[36:35]that and then from there i found pod feet and the programming by stealth well.
[36:41]Very good i for one am very glad that you did and i appreciate you coming on the show this was this was super fun thanks again good
[36:49]Good i enjoyed it as well.
[36:51]If you learn as much from Bart each week as I do, I'd like you to go over to lets-talk.ie and press one of the buttons over there to help support him. He does 98% of the work here. I'm just the stooge that listens to him and asks the dumb questions. If you go over to lets-talk.ie, you can support him on Patreon, you can donate via PayPal, or you can use one of his referral links.
[37:15]Music