Sloppy Joe

Summary (AI Tinkerers Ottawa) Open Interpreter, hardware x LLM, and Accessibility - Killian Lucas (Youtube) youtu.be

7,539 words - YouTube video - View YouTube video

Speaker 0 Most of us here are, in Ottawa. So myself and Rami were actually in this open space called baby yards. And, we started our I think, around June of last year. And recently, we started doing, like, webinars with, people who are doing, like, cool stuff around the world. And me, personally, I've I've seen your project before, and I didn't realize that you were part of AI Tinkers, and you're based in Seattle.

Speaker 0 Is that correct?

Speaker 1 Yeah. I'm in Seattle.

Speaker 0 Awesome. And I was just, like, I'm gonna shoot my shot. I wanna send you a DM because you mentioned AI Tinkers Ottawa. So I'm like, This is my this is my shot.

Speaker 1 1st of all, so hello, everybody. My name is Kilian, and I am the lead developer on OpenInterpreter, which is an open source Project work. Kind of the thesis of it is how do you get language models to control computers? And the way that that is looking now It's like a code interpreter. So it's letting these things run code.

Speaker 1 Yeah. So, by the way, in just AI tinkerers lands. I am such an AI tinkerer. It is like my blood. I've been hosting the last couple events in Seattle, And we literally are just, like, constantly looking to Ottawa to be like, wow.

Speaker 1 What is it that you are doing? Like, I wanna copy everything you're doing Because there's something very, very special happening in Ottawa around AI Tinkers. Like an anomaly in the data, it's special. Because I'm I'm sure you know if you've been on Joe's, like, you know, web page in the background to, like, be able to see how all the stats are. Something's happened in Ottawa.

Speaker 1 Something's happened in Berlin too. We gotta figure out what's going on over there. They had they had, like, 700 people or something, like, after only, like, 2 meetups. And not too different in Ottawa. Seriously, seriously impressive.

Speaker 1 And it's just It's crazy too because, Open Interpreter is an AI tinkerer's project. Like, I mean, it it is it is the baby of AI tinkerers very much because, I've been building it with Tai Fierro, who I met through AI Thinkers. You know, the only reason I was able to do it was because The main AI thinkers guy out here who was hosting at Joe Heitzberg, offered me a job, which on my first day, I went And drove my 3 hour commute, to go to it. And on the first 20 minutes, I quit because I was the I lost open interpreter, It was already number 1 on GitHub, so it was not a bit of a burn bridge, just for that moment. But Yeah.

Speaker 1 Open interpreter and the way that that all started with me, First of all, go back to 2020 because, You know, I I was going to Western Washington University middle to be a middle school science teacher. And I actually just, Like, a few years earlier, I dropped out of high school, because I wanted to make dubstep, which did not It's not pan out. But then I went back. I got my GED and then went back to college to be a middle school science teacher. Is that that it was not smart enough to really do the stuff?

Speaker 1 I would probably, though. I I think I could talk about it, though, and I could teach it. And Then gbt 3 came out in 2020. And it was like, okay. So either Either I'm gonna be a part of this or I'm gonna watch it happen in the next few years.

Speaker 1 Because it seemed clear then that, like, this this thing is powerful, and there's no world where it doesn't happen or it happens very slowly. I think it's gonna happen very quickly. So I dropped out and broke my mother's heart twice. So I drove both cars. Like, She's like, you're back.

Speaker 1 You're doing it. You're on track. I'm like, I got some bad news. And then, like, I go to my first tech job that I get out of it, and I'm like, hey. I quit on day 1.

Speaker 1 This is a pattern. But in, 2020 when gbt 3 came out, I dropped out. I started learning to code on, like, Colab notebooks And and stuff, and then I moved to Replit. I still, up until very recently, did not run any code at all on my own laptop. I'm, like, an entirely cloud born, programmer.

Speaker 1 Open Trooper was built on Repla. And, You know, a lot of it came just from me trying to build startups in that area. So I wanted to do something. Once 2 p 3 came out, I was like, okay. It's it's I need to be a part of this, and I tried a bunch of shit out.

Speaker 1 All of it failed. And 1 of the Big ones, though, was trying to make something that was basically the GPT store where you would, like, be able to make a GPT, and you'd be able to connect it some knowledge and connect it to some tools. And it still didn't do well, but there was something I noticed, which was that 1 type of of these of GPT that had access to Only 1 tool was blowing every other 1 out of the water on every benchmark. It could do everything that every other GPT could do. It did it with with way more flexibility and the ability to move information around system, and that tool was a code interpreter.

Speaker 1 It was able to just write the code for whatever tool I went and and built for. So, like, okay. Something weird is is happening there. And It started to make the whole, like, function calling tool approach seem like we were reinventing the wheel And that these things are language models, they make language, and we have been trying for decades to figure out what the ideal language interface is in the computers, it's code. We've been pouring tons of time into making this a very beautiful way for humans to control computers through, language.

Speaker 1 So I just open sourced that part. I just threw the rest of it out. I was like, alright. This is just a thing that you can talk to it like it's a language model that can run code. And the idea too that by just running code, you actually don't need something trained on function Calling.

Speaker 1 So, for example, a really popular model to use with open interpreter is mixed roll. And that works because you can actually just have this thing Parsing, I don't I don't know how technical the audience is if if it's sort of something that would be fun to go into or if it would be, like, Let's keep it high level. What do you think?

Speaker 0 Yeah. We're we're pretty fairly technical. I think, at least on this call, maybe, like, a 100% are at least developers.

Speaker 1 Yeah. Yeah. Yeah. Oh, great. Okay.

Speaker 1 That's great.

Speaker 0 Let's do it.

Speaker 1 Okay. Sweet. Yeah. So no. I mean, this is a simple thing, but just the fact that, like, You know, when a model that doesn't have function calling is is used in open interpreter, just to the open interpreter is super simple.

Speaker 1 It's it's the idea that you equip language models With the ability to to run code. So for function calling model, that means that they have 1 function, the execute function, And the parameters are code and language. So you let them provide some code, that, like, gets all the files in your desktop or something, and then a language like Python. Then we run it and we then send it back to the model. It very much is like JTBT's code interpreter.

Speaker 1 Have you ever played with that? And now it just happens automatically. I think within it, it's like just the analysis portion of TEGPT, but it runs locally. And by just doing the code part, you can do things like, you know, the way that it it runs code calls the function is actually just by using 3 little markdown back ticks and then specifying the language. You know, we've all seen that.

Speaker 1 That if you ask the same for, like, shell code, it's like tick tick tick Shell then writes some shell kit. And then once it closes it out, we just go and we run it. This is very, very intuitive for language models. Way more intuitive than function calling. Does not need to be fine tuned.

Speaker 1 And it's intuitive because it's in distribution. Like, this is there's a tremendous amount of data out there about these things Just spitting out code and mark down code blocks, and that means that open interpreter is totally language model agnostic. You can plug anything in there. Makes it we make it very easy to connect to Claude, you know, to connect to everything on together, AI, all the news Hermes models, all of the PHY 2, Mixtrill, it is all capable of piloting your computer. And there's a lot of other stuff we're working on.

Speaker 1 Just trying to figure out How to get to build the technology that lets language models control computers. That's really the goal. And there's more to it than that. But, Yeah. So I'd I'd say that's that's how it it came about, and, that's a little bit about the project itself.

Speaker 0 Yeah. So, in case any any of us don't know, open interpreter has about, like, 40,000 stars on GitHub now. So it it's like like you mentioned before, it blew up on GitHub. We came, like, the number 1 project on GitHub in the world for, like, 5 days straight. And, so are you currently working by yourself, or are you, do you have, like, a team that's maintaining this full time?

Speaker 1 Yeah. So on the open source repo, it's mainly It's mainly, me and a couple of folks who have have come out. I mean, over time, there have been different people that cycle through the open source community, and I've got the chance to work with, like, dozens of Extraordinary people, because it's fully open source. And I'm I'm still, like, lead maintainer and rate most of the code. But, you know, a part of this too is that we also have A desktop app that we've been working really hard on, which is trying to be like, alright.

Speaker 1 Well, can we make it so that you can use this technology? You can use up an interpreter without having to Know how to use Python and have a terminal where we just want, like, you just download a desktop app and you start talking to it. And the guy who's leading the desktop app right now is Ty, and Ty is who I met through AI Tinkers. There's also this guy, Mike, Also, who's up in Canada, Mike Bird, he was, like, within the first, 1 or 2 weeks, 1 of the first people to get it running on an Android, which is very interesting, the idea that you can just use this to make any device have a natural language interface. You know, take a picture with the front facing camera kind of stuff.

Speaker 1 Like, And Mike is now, gonna be joining us full time. His first day is tomorrow. So now it's just us. It's just the the 3 of us as the team. And there is other stuff that's happening As we're trying to build an open source rabbit r 1, which is a little bit of a different project, but it has a lot to do with open interpreter, and that team is different.

Speaker 1 But but, yeah, on on open interpreter, it's me and Ty and Mike.

Speaker 0 Great. Somebody has to remind us that we should So, like, reach out to Mike, get him on the on the webinar at some point, that'd be great.

Speaker 1 Yeah. He's not far from Ottawa.

Speaker 0 Is he in Toronto or or Montreal or something?

Speaker 1 I don't know. He's in no. He's in a tiny 600 person town, and he's, like, on the board of the Chamber of Commerce out there. And he's, like, telling me about doing, like, Doing, like, events where he meet like, he recognizes everybody. I don't know.

Speaker 1 I yeah. I should send it to you. He has a air b he has a bed and breakfast. Not an air b b. Out in a Tiny town.

Speaker 1 I think not far out of out of, Ottawa. He'd love to come up. I've told him. I'm like, you gotta get to the AI takers events. They're the best ones.

Speaker 0 Awesome. So about, generating text and then have the feedback takes, and then that's where the code block is, and you run that code block. Is that like, in my head, I'm thinking like, that's sounds like some really simple, like, text parsing at the end. Right? Like, you just have it generate whatever it wants, And then at the end, if there's code, run whatever that code is.

Speaker 0 Is that right?

Speaker 1 Exactly. Exactly right.

Speaker 0 That's it. That's literally it.

Speaker 1 Yeah. So, I mean, here's the thing. You know, to have To have this thing, you know here to me is is what made this project The 1 that people were drawn to because this has been done before. I mean, that this Sharif's your meme was having these things run code in in 2020, and that's where That that was 1 of the big things that, honestly, inspired me to drop out. This is not a new idea.

Speaker 1 And I think that the reason why this was 1 of the ones was because, first of all, it is a total It is a total rethink of the way that these things run, like, run use tools. Like, the alternative to function calling is is a totally fair way of, I think, looking at it. And So it is it is seriously central, this this 1 thing. And, yeah, it's just a parsing thing going on at that part of it. Most of the work has been doing okay.

Speaker 1 What is what is a real time code execution environment look like for language models? So this is, like, something that can very robustly, like, handle a bunch of different languages and also can emit their outputs in real time. So this this surprised me because this actually is not even a lang like, I I feel like a lot of people are finding this, but, like, you go to build something For language models and for AI in the space, and, like, 99% of what you do has nothing to do with language models. And it's just building stuff. You know, I guess we're application layer companies, most of us.

Speaker 1 So we're not, like, the intelligence layer, so that makes sense. But so you'd think that there would be something really great that would just let you put in any language, Run it and get real time streaming output of it. And the closest thing is Jupyter, but even that, like, it's Jupyter is running at the center now, the latest interpreter, but it's still it requires a lot of cajoling to really get it to look good and to feel good And to be useful for the language model to use. But the idea of real time streaming output in your terminal is is, I think, quite a lot of what's fun about this. So a lot of it is, like, honestly, thanks to, to Rich, which is the library by Textualize, Which gives us these beautiful code blocks and code output blocks in the terminal.

Speaker 1 It's just as eye candy. Active line highlighting of when the the code is you know, which line is being run, things like that, and trying to build a a library then that people can build their applications on top of. That, for example, you can import interpreter And then say interpreter dot chat and ask it to do something like what time is it in Ottawa or something like that, and actually treat it like a generator in in Python. So you can just be streaming out chunks first from the language model and then code that the language model wants to run-in in a frame that specifies everything about the code. And then as soon as it's done writing the code, it starts streaming the output of that code, along with the active line that's being run and things like that so you can build a very beautiful UI Found it.

Speaker 1 But, yeah, it's just it's a laser focused project on code interpretation and on a real time code execution environment for language models. And then there's another part to this whole thing too, which is the idea of, like, alright. Now that we know and, you know, 2023 to me was the focus of of building A great code execution real time code execution environment for language models. 2024 has been you know, it seems that some of these packages that it can run Are very easy for it to understand. They're very easy for it to control.

Speaker 1 Some of them are not. Some of them this method of running code where this is not really something you see in GitHub repositories too. It's this type of Code where, like, you're, like, trying to run code and then figure out things about your system and then navigate around the world by, like it's like you're in, like, a black box and you can only run code. Some libraries are are very great. They're verbose.

Speaker 1 They fail very quickly, and they work very well with that Kind of thing. So kind of biasing it towards using that those packages, is something that we've we play with, you know, how do you do that with with rag and stuff like that. But what's way better is to be like, alright. Well, let's Let's let's start making some of our own. So 2024 really has been, like, this this, last month has been All about the computer API.

Speaker 1 So this is a new part of open interpreter, which is trying to say, let's build our own tools for language models. They're not really tools. It's just this is literally a Python package, the computer API, and it's meant to be Kind of expose all the foundational capabilities of a computer through, a Python package is especially great for multimodal models. For example, you can run, like, computer dot view, and that returns a screenshot of what's on the screen so that a multimodal model can see what what you're looking at. Computer.mouse.moveapplelogo or the name of some icon, and then it will find the app logo on the screen and go and click it.

Speaker 1 So you just have now built a way for multimodal, 9 of miles in that case, To control your your mouse and your keyboard and to see what's on your screen. And in a way that's intuitive for them because they know how to write this code and how to pass it around. This new update 2 0.2.1 is gonna be coming out either either tomorrow or on Monday maybe, which is Just introduces a ton of new computer API stuff into this. Like, what does it look like to rewrite the browser for a language model to operate? What does it look like to and that's just like wrapping Selenium, but in in a way that we've all this stuff we've learned about what makes these things very good at operating them.

Speaker 1 What does it look like to We read the file system, so that's gonna be computer dot files. Just has all this incredible new technology in it, like, that lets you semantic search over Anything on your computer. A lot of things that Are not they were not built because Python the whole Python ecosystem, as as fun as it is to expose that to everybody, and that to me is actually a lot of what's cool about this project is that it basically exposed a ton of the Python ecosystem. People that don't really know how to use Python, like, Pretty much every library became, like, an app that you could call by just, like, asking your computer to do it. And, You know, as much fun as that is, it's really seeming to be the way to try to give it the flexibility to be in a real time code execution environment, But bias it towards using these really LLM friendly packages.

Speaker 1 I mean, the computer API has written 4 language models to use. As probably known as the first Python programming package it is. And yeah. That's that's that's too that's a bit of the the road map and stuff and a bit deeper into the technology of it. Is that it's real time code execution environment for language models and also The computer API, which is, foreign language models.

Speaker 0 That's incredible. So to recap what you said, Language models like to use Python libraries that are more verbose that has a lot of logging so we can understand what's going on and then kinda, like, Think and write more code afterwards. And, what were some what was 1 thing that was, like, Surprisingly easy when you're building this this new library, computer API. And what was something that was like surprisingly difficult about building it.

Speaker 1 It it was surprisingly easy to get the the, kind of IO stuff To work where, like, someone recognizes the other day too that, oh, I open interpreter in IO. We just flipped with something something fun there. But getting it to just control, like, you know, have access to the clipboard, have access to the mouse, have access to the keyboard is just PyAutoGUI. Like, I kinda thought that would be a little bit more. Very thin wrapper around PyAutoGUI with an extremely thick amount of technology Getting it to figure out where to click, that has been shockingly hard.

Speaker 1 You know, 1 way that folks have approached this, like, for example, Josh Bickett, who's extraordinary, made this thing called the self operating computer repository. If anybody heard that, that also blew up. Fantastic library. And and the way it was working was by having, you know, screenshots we get sent to gpt 4 vision, And then you would you would ask it to emit coordinates as to where it was gonna click. And what they would do is they would draw on the screen a grid That had, like, the x and y coordinates along that grid so that you could imagine it would be very easy.

Speaker 1 Like, you you just look at where the button is that you'd wanna do, and you can see that the text There says, like, you know I think that there it was not pixel coordinates even. It was it was, like you'd see, like, you know, 5 x, Like, 6 wire or something like that. And then the language model would emit something like that, and then it would use that to go and click it. And It became pretty clear, to me that this is kind of, like, asking language models to do math, bizarrely, that this this kind of spatial Plus bounding box sort of thing. It was just for the current language models a little bit closer to them doing math and that it was not really in distribution and that it was it was kinda hard for them to do it.

Speaker 1 But what they're great at is saying, you know, click click, compose. They easily could do that with these these multimodal vision models. So okay. Well, can we give them some tools? I mean, that's what the computer API is.

Speaker 1 It's like, let's cut bootstrap a language model and try to do the stuff that it If we're others, I have to write a ton of code for so computer dot mouse dot click takes text. You can put in computer dot mouse dot click compose and it will use OCR to find compose on the screen and click it. So that becomes that's a very different that's very, very different level of difficulty for a language model to try to find that, like, the coordinates of something Versus just saying, you know, this is the the text on that button or that link or in that text box. You know, for YouTube, it can very easily click the search bar. It just Says click, like, you know, search here or whatever is in the YouTube search bar.

Speaker 1 You can click on YouTube videos by putting out the title of the video that it can totally easily read. By the way, open interpreter is, again, language model agnostic. I think of it totally as an application layer thing. This works with Other multi model language models, including locally running ones. So it's not just gpt4v.

Speaker 1 It's the only 1 you can use with it. It's just the thing that lets any language model run code And return, like, image outputs and then make decisions based off of that or text outputs or any kind of output. And And, what's even harder than that, and this is what I've been working quite a lot with Vic, if you'd seen him on Twitter, who, just released the Moon Dream. Has anybody here heard of Moon Dream? Leonard, Moon Dream is a 1,600,000,000 parameter vision model That it is not fair.

Speaker 1 It is not reasonable how intelligent this thing is. It's ridiculous, and I really encourage anybody to try it out on on Hugging Face. It's a lot of fun. 1,600,000,000 parameters Totally changes. You know?

Speaker 1 The The dream was always to try to have a really, really good local mode and open interpreter that you go interpreter dash dash local and everything would run locally. You turn off your Internet. And, you know, so far we've we've, you know, working with Justine Tunney at, Llama File at at Mozilla, they're trying to figure out the best best way to do that. I'm working with YAGS at LM Studio to figure out the best best way to do that. These are great, brilliant people, the most brilliant people of our time in in inference.

Speaker 1 And Moondream is it's just way ahead of where we should be, for these really tiny models that can run on really small devices. And it has the capability of emitting bounding box for things. Like, you know, you can ask it for something in an image, and it will emit coordinates They're they're around where that thing is. So working with Vic to be like, alright. Well, let's train 1 of these train 1 of these things for the GUI.

Speaker 1 Let's train 1 of these things to operate a computer And to be able to, you know, around any any icon, around any, You know, even with with dragging actions or something. It'd probably be something that another language model like a planning heavy language model that that understood how to write code We'd use to identify objects in the screen the same way that, like, that OCR trick works. But even just rudimentary tests I'm actually about to tweet out a demo And what we got, have been everything's different since that thing came out. And the promise of this being something that that can run locally or run on very small devices Is feasible. But yeah.

Speaker 1 So that's that would be my very long answer to, like, Well, it's very easy to get it to do the things that that actually do the stuff on the computer. I'd say the same thing too about AI file system. Like, why is there not a library that me do semantic search over a folder. It's just ridiculous. So we had, like, we had to make 1.

Speaker 1 It's called the AI file system. MIT licensed some thing. That was cool. And Very hard to try to get it to find things on the screen. And, yeah, just just shocking how how hard it is actually do some good semantic search.

Speaker 0 Yeah. Great. I have a few more questions about that, but, So get more more about, like, segment everything, and maybe use that with, like, trying to find figure out where to click and which zone to click. And also, like, I wanna ask your opinion about other projects, like, multi on and, like, all those, like, running agents on your computer, but I wanna be mindful of time. Do you still have some couple more minutes for, like, Open q and a with everybody.

Speaker 1 Oh my god. Yeah. I totally forgot. So let me I think my next thing is at 11:30.

Speaker 2 Thanks for the The explanation of where you are are very interesting stuff. I have a ridiculous suggestion that you've probably thought about. But with regard to finding things on the screen, have you thought about using, for example, on Apple, their assistive Display. So it really makes it high contrast, and the buttons pop out. You use the Use the operating systems, capabilities to your advantage, in other words.

Speaker 1 Yes. Yes. I I actually could talk so much about this, so I'll I'll try to so I will. Okay. So this goes both ways In the sense that in the sense that I you you really quickly realize that that 1, Pretty much everything that has been built for accessibility is very smart to rely on for language models.

Speaker 1 Because in a sense, these things, You know, at least especially pre vision models, but they're they're essentially, like, people with low vision. And the technology that was built to let People use computers when they were blind or had low vision. It's the same technology that makes language models able to use computers. The accessibility tree is a very powerful, interface for for language models, for example, to be able to use a browser or use the computer. And high contrast display is absolutely a part of it.

Speaker 1 So it's it is a 100% of the stuff that goes into, I think it's fed into the model. We we do a lot of work to try to, you know, do as much as we can beforehand Assuming that this thing is gonna be, have difficulty seeing things and and locating items. So so there's there's a bunch of transformations that get applied that look a lot like what, we do for people to have low vision. And, you know, so this this goes that way and that there are many things like that or things that were built for accessibility. Even frankly this, you know, I I think we're gonna start to find more and more that stuff that was built for remote work.

Speaker 1 So, like, this thing that we're all on now where I can, like, open up a whiteboard and and do things, and and there's all oh, this way that I can interact with other humans that are not Physically here, is going to be extremely useful for language models that are also physically here. And, you know, tools that we built for accessibility and tools that we built for remote work, I think are gonna really Help realize this dream. And at the same time, it goes the other way. I really think That when Vic and I train this GUI understanding model where you can say click the search bar, No. Click.

Speaker 1 And right now, by the way, there's a there's a version of it that works. It's very small, and this is what the the demo too that I'll post is is what our progress is on optimizing that. That is capable of of finding icons in the screen. Click the crop icon. You know, click the click the x button.

Speaker 1 Click these things that that you don't use OCR for. And it could be a lot better. But even where it's at, It is better than any accessibility tool I have ever used. When we build stuff for language models like this I mean, it's sort of embarrassing, it feels like, For humanity, but, honestly, I think that a lot of the things that we end up building for language models will be the best accessibility tools that ever existed, Especially open interpreter. To me, if you get a language model to control a computer, You know, the reason why I became a middle school science teacher and why I thought that was the most important thing I could be doing with my life was computer literacy.

Speaker 1 And it was the fact that Nobody talks about it, but the difference between somebody's life when they know how to use a computer or they don't, that is 1 of the greatest designs Divide in society because of how much we have organized all essential services governmental services and everything. We gate it behind being able to use a computer, and I get that, Until you have something like my I had a very formative experience where my grandma, asked me to fill out a VA Housing forms to veteran. And, man, I couldn't do it. Like, I went online, and I could not figure it out. Like, It was just terrible design, and that was cruel.

Speaker 1 That was cruel of the government. Like, I think Bad design when you are trying to, you know, disseminate essential services Is cruel. And there is an opportunity with language models to make it so that folks that Don't know how to use a computer? You don't have to. You ask it to fill out, like, a Medicaid form.

Speaker 1 You you ask it. You have you have a way of You know, Lowering the barriers to all of humanity's greatest digital tools, even the simple ones, which I care more about even than the than the more powerful workstation y ones. I really care about and this thing fell out of Medicaid form. This thing fell out of VA housing form. And I really think that we're gonna end up building the greatest accessibility tool that has ever existed, if we solve this.

Speaker 1 And let people who like, the 0 to 1 folks Upon computer literacy, because nothing will change your life as much as really knowing how to use a computer. I mean, you even imagine a society Where everybody you knew knew how to use a computer as well as some of the greatest people that you know how to use computers. So this goes both ways. Both both it's great stuff for accessibility, to use them with language models, And I think that the development of of language models and tools for them is going to radically improve accessibility Technology, especially in in my case around computer literacy. That's a great point, though, but I really love that.

Speaker 2 Well, thank you very much. I don't wanna monopolize your time, But just going back to Moon Dream for 1 minute. So, yeah, you you see my Twitter feed. There's not a lot.

Speaker 1 There's not a

Speaker 2 lot I haven't heard of, I think. But, you know, if anybody is training a model specifically for GUI Versus MoonDream is, like you said, amazing. It's it's I don't know how many images they've trained it on. But it seems to me that if you If you only trained a model on GUIs, you should even get a smaller, faster model.

Speaker 1 I totally agree. Yes. So Vic again who made Moon Dream, he's he's on the o 1 core team, Quickly becoming a great friend of mine. I met him at AI Tinkers, and, yeah, he's the creator of Moon Dream, and and the idea that, We're gonna try to figure out, you know, how to get this this dataset. It it's the number 1 thing that we're talking about Is is let's make a version of Moon Dream that is specifically for controlling GUIs and showing it, just, yeah, tons and tons of bounding box It's pairs to to elements on the screen.

Speaker 1 And I think a lot of that data can be generated synthetically, to be honest. But yes. Exactly. I I think I think we will be the first to do it. And, also, of course, there's, you know, Adapt, whose landing page I copied, and and a bunch of other people too who have who have thought about this about TriNet.

Speaker 1 And, MultiOn, you mentioned too. Hi. You know, folks that have have thought about ways that this all on computer interface can be built. But I'm really, really a fan of Tiny models in a way that none of these other people seem to be thinking about. And thanks to AI, tinkers have access to who I think is the smartest person in the world about tiny vision models.

Speaker 1 Got a minute. I'm curious if you've seen anything being done with the Raspberry Pi 5. So far, I I, they just finally came back in stock, and I'm tempted to order 1, but I wanna do it obviously with an AI angle. Yes. Yeah.

Speaker 1 I just did. Like, I literally just ordered a reservoir. So yeah. Actually, this would be a good time too. I'll I'll just, I'd love to talk a bit about the o 1 project, which is this idea of creating a oh, no way, Leonard.

Speaker 1 Is that the 5, or is that is that a 4?

Speaker 2 That's a 3 b. We have to get things running on a 3 b. Sorry.

Speaker 1 Got it. Yeah. But the idea of, you know, there's a lot of what I talk about with with Vic. It's like, okay. Raspberry Pi 5.

Speaker 1 What what is our tokens per second On that thing, because this is very, very exciting for, so there's there's other project 2, which is the o 1, is this idea of of, you know, the Rabbit r 1 came out, if if anybody's kind of familiar with that, Where this thing you can talk to it, and it seems to control a computer in the background, which I love, and I'm I'm so so into it. And the idea that, you know, we could have an open source version of that thing that other people could build startups on top of, And that, you know, if you're just, like, a hacker at home and you have a Raspberry Pi, you could you could throw everything together, would be We'd be sick. So we on, last Saturday, the Saturday before this last 1, actually, we had about 15 people or so come out. I advised just tweeted out. I was, like, is anyone gonna build this with me?

Speaker 1 I'm in Seattle. And a bunch of people responded, and a lot of them were in Seattle. And so I got together with the 12 just or the 15 best of them and had Just whiteboard it out. What what does this look like? What what can we do?

Speaker 1 Then the next Saturday, so this this last Saturday, brought everybody out to Airbnb, and from 12 PM To 5 AM, we were coding, whiteboarding, and just being, like, alright. Let's we're gonna bang this thing out because it's time boxed. We have to launch it within 6 weeks from when I first said that. An open source Rabbit r 1. And that starts by building an operating system.

Speaker 1 So we built it. We built an operating system in a day, from it took I mean, that took a lot of people and took a lot of hours. So that next 1, there was 20 people. We're I hope I think they're gonna just keep growing. There's more people that are a part of this.

Speaker 1 And, you know, I'm I'm very interested in getting everything working on a Raspberry Alright. To me, I I wanna get the operating system really, really nice. Like, press a nice, like, ISO that that is gonna be The 1 OS, and this thing is, I I could talk for a long time about this idea of A language model computer, which is like, alright, if if you're starting to have these ideas about how language model control computers, you can kind of rethink what a computer is And ask yourself, what does it look like to build a new kind of computer with a language model at the center of it? Carpathi It has a great tweet, l m o s, that kind of sketched out an idea for this. That's what we've ripped apart at our first meet, and we're like, what do we do to actually implement this?

Speaker 1 Then we built it. And the idea of loading this thing up on a Raspberry Pi Is, the goal. And what what we wanna do is have it so that it basically would look at the hardware that's been loaded up onto, And it would download a language model that would give you some good speed for that. So if it was a real beast that the thing was loaded up onto, then it would it would fire up mixed roll. And if it was a Raspberry Pi, it would fire up, like, 52, in MoonDream.

Speaker 1 And And, Yeah. I actually am not a hardware person, really. Like so I I don't know too much about what It is gonna look like I just I trust these brilliant, incredible people that have, decided to be a part of this project in Seattle, And the Raspberry Pi seems like it it needs to work on that. I'm actually even more excited about it working on, Like, old Android phones with a bunch of stickers on them. We'll get there, but the idea of it, kind of being the Raspberry Pi for a new generation of startups that, could could build products like the r 1, tab, PIN, all that kind of stuff and and making a great Open, source operating system that that was built for those kind of products.

Speaker 1 Yeah. It's working on a Raspberry Pi is super Important. And then this next 1, I just ordered a ton of stuff. I ordered, like, an NVIDIA, Nano Jetson, you know, $130 thing that It's apparently pretty good at inference and some, like, coral stuff, coral AI, which you can plug in as, like, a TPU kind of accelerator, and and a bunch of Raspberry Pis. So we're just gonna riff on that and and build something.

Speaker 1 By the way, it's a different question too of of, Making something that can run everything locally versus something that I think this is still useful, would be able to, easily be configured to To use some streaming LML. Like, I'm, in general, a total fan of it, this thing running locally. I want a tape recorder that I can hit a button on, talk to this thing, let go. This is a code running language model in my hands that can, you know, tell me all any the whole entire Python ecosystem is exposed to this thing, But it runs locally so that I can bury this thing. And I can you you would find it in 50 years and, like, Dust it off with the you know, who's the president that's, like, all wrong information?

Speaker 1 That's what I really, really want. But short of it running locally, there's the idea of of having these things work on much smaller devices that could stream some of the, some of those operations with an Internet connection. Anyway, so that's I've actually forgot what the the question was, but, you know yeah. Raspberry Pi is a lot of fun. Thank you for that.

Speaker 1 I gotta head out. It was pleasure. Oh, yeah. Take care. It was great meeting you, man.

Speaker 1 Excellent. Take care. Bye. Right.

Speaker 0 So I gotta head out too, but, thank you for hopping on to this webinar today.

Speaker 1 Yeah. It's all fine.

Speaker 0 We should follow-up with some some sort of talk about Hardware next time. Maybe if if you're free. I know you're super busy.

Speaker 1 I'm busy.

Speaker 0 So before we go, how can people find you online?

Speaker 1 Oh, yeah. So open interpreter is openinterpreter.com. And on Twitter, I'm hello, is my handle. And that's part of the easiest place to find me.

Speaker 0 Awesome. Well, thank you for your time. That's it for the webinar. So, appreciate it.

Speaker 1 Thank you so much. Thank you so much. Yeah. Nick, Ross, Charles, Jonathan, Leonard, Carla, that was a lot of fun.