Summary of LangChain "OpenSource LLMs" Webinar

Summary LangChain "OpenSource LLMs" Webinar - YouTube (Youtube) www.youtube.com

10,831 words - YouTube video - View YouTube video

One Line

The LangChain webinar on YouTube discusses the significance of open source LLMs in LinkedIn, introducing Mosaic Ka's composer, streaming, and L foundry library, while emphasizing cost-effectiveness and efficiency in training methods for large models.

Slides

Slide Presentation (16 slides)

Copy slides outline Copy embed code Download as Word

Open Source LLMs: Revolutionizing Learning Management Systems

Source: www.youtube.com - video - 10,831 words - view

Introduction

• The LangChain “OpenSource LLMs” webinar focuses on the importance of Open Source Learning Management Systems (LLMs) in various aspects of LinkedIn.

Open Source in No

• Brandon, the founder and CEO of No, talks about Gp for all and how No approaches Open Source and open data.

• Open documentation, open code, open weights, and open data determine whether a model is truly open source.

Mosaic Ka's Composer and Streaming

• Daniel, a machine learning engineer at Mosaic Ka, discusses their composer, streaming, and L foundry library.

• Composer for training and streaming datasets for high-performance data streaming.

• Efficient training methods for large models.

LangChain: Efficient LLM Training

• LangChain is an open-source platform for training large language models (LLMs).

• Focus on improving efficiency and cost-effectiveness of LLM training.

• Challenges of deploying large models at scale.

Importance of Open Documentation

• Open documentation ensures transparency and accessibility.

• Facilitates collaboration and knowledge sharing.

• Encourages innovation and improvement.

Importance of Open Code

• Open code allows for verification and reproducibility.

• Enables customization and adaptation.

• Fosters a community of developers and contributors.

Importance of Open Weights

• Open weights provide insights into model architecture.

• Facilitates model understanding and interpretability.

• Promotes trust and accountability.

Importance of Open Data

• Open data enables benchmarking and comparison.

• Enhances model performance evaluation.

• Drives advancements in the field.

Scalability and Deployment Challenges

• Challenges of deploying large models at scale.

• Need for efficient training methods.

• LangChain's approach to starting small and iterating quickly.

Evaluation and Unit Testing

• Robust evaluation methods for LLMs.

• Importance of human evaluation alongside automated metrics.

• Unit testing for LLMs and the need for a test dataset.

Open-Source Leaderboards

• Evaluating models within one's own framework.

• Considering various options for open-source leaderboards.

• Importance of customization and relevance.

Mosaic ML Training Stack

• Supporting startups and resource-constrained communities in LLM training.

• Starting with open-source tooling and scaling up to the platform.

• Seamless transition for easy compute resource scaling.

Revolutionizing Learning Management Systems with Open Source LLMs

• Open Source LLMs offer transparency, customization, and collaboration.

• Improved efficiency and cost-effectiveness in LLM training.

• Embrace the power of open-source for accessible and innovative learning.

[Visuals: Include relevant visuals such as graphs showing the impact of open-source LLMs, images of collaborative development, and charts highlighting the benefits of open documentation, code, weights, and data.]

Note: The above presentation is a summary of the long document, YouTube video, or other long-form content. The content has been condensed and organized to effectively convey the main points to a professional audience familiar with the topic.

Key Points

The LangChain "OpenSource LLMs" webinar focuses on the importance of Open Source Learning Management Systems (LLMs) in various aspects of LinkedIn.
The webinar includes presentations from experts in the field, discussing topics such as open documentation, open code, open weights, and open data in determining whether a model is truly open source.
Brandon, the founder and CEO of No, talks about Gp for all and how No approaches Open Source and open data.
Daniel, a machine learning engineer at Mosaic Ka, discusses their composer, streaming, and L foundry library.
LangChain is an open-source platform for training large language models (LLMs) that emphasizes the importance of efficient training methods and challenges of deploying large models at scale.

Summaries

87 word summary

The LangChain webinar on YouTube highlights the importance of Open Source LLMs in LinkedIn. Presentations discuss the significance of open documentation, code, weights, and data in determining if a model is truly open source. Mosaic Ka's composer, streaming, and L foundry library are introduced. The motivation behind creating Mosaic as an open-source LLM platform is explained, emphasizing cost-effectiveness and suitability for specialized AI systems. LangChain emphasizes efficient training methods and challenges in deploying large models at scale. They support startups and open-source communities in LLM training efforts.

237 word summary

The LangChain “OpenSource LLMs” webinar on YouTube includes presentations on the importance of Open Source LLMs in LinkedIn. Brandon discusses Gp for all and No's approach to Open Source and open data, while Daniel presents Mosaic Ka's composer, streaming, and L foundry library. Von talks about their OpenSource cell and training and inference stack. The presentations emphasize the significance of open documentation, code, weights, and data in determining if a model is truly open source. No's tool, Atlas, for training data analysis and improving explainability and accessibility in AI models is also discussed. The webinar concludes with upcoming releases and audience questions.

Ban and Daniel explain the motivation behind creating Mosaic as an open-source LLM platform, highlighting the cost-effectiveness of training company-specific models and the suitability of specialized AI systems for high-value workflows. They introduce Mp 7 b, the model they trained, along with Composer and streaming datasets. They invite people to check out their open-source tooling and join their community.

LangChain is an open-source platform for training large language models. The team emphasizes efficient training methods and challenges in deploying large models at scale. They discuss starting small, iterating quickly, and using their streaming feature to avoid cloud and vendor lock-in. They highlight robust evaluation methods, unit testing, open-source leaderboards, and using the Mosaic ML training stack for startups and resource-constrained communities. The team is dedicated to supporting startups and open-source communities in LLM training efforts.

326 word summary

The LangChain "OpenSource LLMs" webinar is available on YouTube and features quick introductions, 15-minute presentations, and a Q&A session. The focus is on Open Source LLMs and their importance in various aspects of LinkedIn. The first presentation by Brandon discusses Gp for all and No's approach to Open Source and open data. Daniel presents on Mosaic Ka's composer, streaming, and L foundry library, while Von talks about their OpenSource cell and training and inference stack. The presentations highlight the importance of open documentation, code, weights, and data in determining if a model is truly open source. Atlas, a tool developed by No, is also discussed for training data analysis and improving explainability and accessibility in AI models. The webinar concludes with upcoming releases and audience questions.

Ban and Daniel explain the motivation behind creating Mosaic as an open-source LLM platform. They argue for the cost-effectiveness of training company-specific models and the suitability of specialized AI systems for high-value workflows. They envision a future where people can buy an external API or build and deploy their own models. They share successful model training examples with small teams and introduce Mp 7 b, the model they trained, along with Composer and streaming datasets. They discuss architecture choices, software stack, infrastructure, data protection, privacy, inference infrastructure, hosted API, and enterprise tier. They invite people to check out their open-source tooling and join their community.

LangChain is an open-source platform for training large language models. The team emphasizes efficient training methods and challenges in deploying large models at scale. They suggest starting small, iterating quickly, and using their streaming feature to avoid cloud and vendor lock-in. They discuss robust evaluation methods, the importance of unit testing, open-source leaderboards, and using the Mosaic ML training stack for startups and resource-constrained communities. They encourage users to start with open-source tooling and scale up to the platform when needed. The team is dedicated to supporting startups and open-source communities in LLM training efforts.

697 word summary

The LangChain "OpenSource LLMs" webinar was recorded and will be accessible on YouTube. The format of the webinar includes quick introductions, 15-minute presentations from each group, and a general Q&A session. The focus of the webinar is on Open Source LLMs (Learning Management Systems) and their importance in various aspects of LinkedIn. The goal is to learn from experts in the field. The first presentation is by Brandon, the founder and CEO of No, who talks about Gp for all and how No approaches Open Source and open data. The second presentation is by Daniel, a machine learning engineer at Mosaic Ka, who discusses their composer, streaming, and L foundry library. Von, who manages the engineering team at Mosaic, also presents on their OpenSource cell and training and inference stack. The presentations highlight the importance of open documentation, open code, open weights, and open data in determining whether a model is truly open source. Brandon also discusses the use of Atlas, a tool developed by No, for training data analysis and its role in improving explainability and accessibility in AI models. He shares case studies and examples to illustrate these concepts. Brandon also emphasizes the importance of low resource models and privacy in accessible AI. The webinar concludes with a mention of upcoming releases and the opportunity for questions from the audience.

Ban and Daniel discuss the motivation behind creating Mosaic, an open-source large language model (LLM). They believe that there should be a place for both pre-trained models from big companies and models trained by individual companies. They argue that it is more cost-effective for companies to train their own models and that specialized AI systems are better suited for high-value workflows. They envision a future where people can either buy an external API or build and deploy their own models. They address the perception that building language models is difficult and expensive and provide examples of successful model training with small teams. They introduce Mp 7 b, the model they trained, and the tools they used, including Composer for training and streaming datasets for high-performance data streaming. They describe the architecture and training choices that went into creating Mp 7 b, such as using ALI for long context models and the Adam optimizer. They also discuss their software stack and infrastructure, including the Mosaic control plane and compute plane, which allows deployment on any cloud. They emphasize the importance of data protection and privacy. They mention their inference infrastructure and products, including a hosted API and an enterprise tier for customization and training. They encourage people to check out their open-source tooling and join their community. They conclude by mentioning future improvements and inviting questions from the audience.

LangChain is an open-source platform for training large language models (LLMs). The platform focuses on improving the efficiency and cost-effectiveness of LLM training. The team behind LangChain emphasizes the importance of training and inference, highlighting the need for efficient training methods and the challenges of deploying large models at scale. They suggest starting small and iterating quickly to discover the best approach. In terms of tooling, the team is proud of their streaming feature, which frees users from cloud and vendor lock-in. They also discuss the need for robust evaluation methods for LLMs, as automated metrics can have biases and human evaluation is often necessary. They encourage users to create their own test datasets and develop their own evaluation metrics. The team also mentions the importance of unit testing for LLMs, comparing it to writing unit tests for software. They believe that having a test dataset is critical to ensure that the model is performing as intended. In terms of open-source leaderboards, they suggest looking at various options but emphasize the importance of evaluating models within one's own framework. Finally, they discuss how startups and resource-constrained communities can use the Mosaic ML training stack to train their own custom models. They encourage users to start with the open-source tooling and then scale up to the platform when needed. The seamless transition allows users to easily scale their compute resources. Overall, the team at LangChain is dedicated to supporting startups and open-source communities in their LLM training efforts.

Raw indexed text (58,338 chars / 10,831 words)

Source: https://www.youtube.com/watch?v=9pmCM-JMJrE
Page title: LangChain "OpenSource LLMs" Webinar - YouTube

[00:00:00 - 00:00:17]

Harrison: Awesome guests as well. And so we'll be some deep dives on some of the leading open source models. Before that minor logistics, this is being recorded. So, yes, this is this this is being recorded, You can access it at the link. Afterwards and we'll also put it up on Youtube in a day or 2.

[00:00:18 - 00:00:44]

Harrison: As the webinar is going on, the format that we'll do is we'll do quick intros. We'll do 15 ish minutes from from each group and then go into general q and a. So as you have questions that arise, I'll be monitoring the chat please put them in the q and A box on the right. So if you can see there's a chat section, which most of you are probably in. And then there's a little box the question mark under it called q and a, please you can not only put questions there, but you can also upload them.

[00:00:44 - 00:01:08]

Harrison: So upload the ones that you most wanna see answered and we'll go through those in that order. At the end, There's lots of stuff in LangChain. Today we're gonna mostly be focused on or we're entirely focused on Open source LLMs" and what that means. And and obviously, that actually probably touches every component of linkedin to, but in so much as we can keep the questions and discussion focused on Open source L. Let's definitely do that.

[00:01:08 - 00:01:17]

Harrison: We have a we have a really unique opportunity here to learn from some of the best in the game. So with that, maybe we can start with some introductions. Brandon, do you wanna start?

[00:01:19 - 00:01:23]

Brandon: Yeah. Sure. Hi, everyone. My name is Brandon. I am the founder and Ceo of No.

[00:01:24 - 00:01:31]

Brandon: And today, I'm gonna be talking you guys about Gp for all. How no thinks about Open source, a little bit about open data and stuff like this.

[00:01:34 - 00:01:36]

Harrison: Awesome. And then, Daniel, do you wanna go?

[00:01:37 - 00:01:43]

Daniel: Yeah. So hi. I'm Daniel. I'm a machine learning engineer at Mosaic Ka. Mostly work on...

[00:01:44 - 00:01:52]

Daniel: Our composer and streaming and L foundry library. So kind of our whole, like, training time. Yeah. Great to be. Thanks for having us experience.

[00:01:55 - 00:01:57]

Harrison: Thanks for joining. And last but not least.

[00:01:58 - 00:02:01]

Von: Great. Yeah. Thanks thanks, harrison. Nice to meet you, brendan. And...

[00:02:01 - 00:02:13]

Von: Yeah. Thanks for joining everyone. I think My name is Band. Actually, managed Engineering team at mosaic. And I'm gonna be tech teaming with Dan did day talk about our OpenSource cell and and our training and in Inference stack.

[00:02:16 - 00:02:27]

Harrison: Awesome. So as you can tell, we've got some awesome guests and we'll start off with Brandon doing a deep dive on No and Gp for all and and lots of good stuff. So take it away.

[00:02:30 - 00:02:37]

n/a: Thank you. Let me just share my screen. Alright. Looks like you all can see that. Hopefully, all can see that.

[00:02:39 - 00:02:40]

n/a: As I said, my

[00:02:40 - 00:03:03]

Brandon: name is Brandon I'm from No. And today, I'm gonna talk to you a little bit about explain and accessible Ai. And the reason I'm gonna talk you about this is it's really No mixed goal to improve the explain and accessibility of Ai. And you know, while explain ability and accessibility might seem like they are distinct concepts at first. And indeed like each has their own sort of like, particulars to them.

[00:03:04 - 00:03:56]

Brandon: You know, we at no see them as having a very important intersection. And the way that we think about these ideas, I think is pretty well illustrated in this venn diagram. So on 1 side here, we have the explain ability piece where we're doing things like analyzing training data of models modeling intervention we make there observing their outputs. On the other side, we have the accessibility piece, building models that can run in low resource environments, or can run in sort of like privacy constrained virus. But really, what we see is the intersection of these 2 things is a set of a set of sort of criteria, you know, open documentation, open code, open the way open data that we sort of broadly call open source Ai And so, you know, looking at open source Ai as the intersection of sort of like explain ability and accessibility I think is.

[00:03:57 - 00:04:12]

Brandon: Really 1 of the core reasons why why no is really interested in in moving that forward. And 1 question you might ask is what makes a model you know, open source. There are a ton of models out there. Nowadays. They've all maybe got different subsets of them open source.

[00:04:13 - 00:04:54]

Brandon: So perhaps an architecture, but no data, perhaps you have only a subset of the weights, perhaps no weights at all. Maybe you don't have documentation, but We propose sort of a 4 item checklist to, you know, kind of guess and check yourself about okay, like, is is this model sort of actually open source. The easiest thing to sort of open source, the the least stringent requirement we think is open documentation. So this is, you know, a model card or a tech report or a paper, ideally with a method section, not that any major Ai lab would release it. Tech report without a method section that describes, you know, what is happening in your model.

[00:04:54 - 00:05:24]

Brandon: Not only is this critical for, like, accessibility because it lets people understand your method, but it's critical for explain ability because without knowing what the system is doing, It's gonna be very hard to understand when you should use it and debug it. Going sort of beyond this, we then get into OpenSource. And the observation here is open docs are often insufficient. For those of you that have been around in the Ai sphere for long enough. You remember the sort of, like pre versus post layer norm discussion that was occurring.

[00:05:25 - 00:05:47]

Brandon: Where there was some documentation that showed that the layer norm was in 1 place and the transformer. But if you actually looked at the code, it turned out it was in another place. There was, you know, some questions around where it should actually go. And the only way that you can actually resolve questions like this, is by diving into the code. So, you know, again, like, critical for the explain access or sort of aspect in...

[00:05:48 - 00:06:10]

Brandon: In terms of trying to figure out what's actually going on. Next thing is open ways. Right? So the observations here is a lot of models are going to be tested on some benchmark dataset. It is probably going to be the case that there's a mismatch between that benchmark dataset and whatever you want to use the model on downstream or however, that model is being deployed.

[00:06:10 - 00:06:35]

Brandon: Often it's just like the element valuation of harness or something like this or, like, helm, maybe. But really, the the question that practitioner should be asking is how does this model perform in my operating domain? And so open dates gives... Or open weights gives us the ability to actually, like, test these models on particular operating domains that are, you know, tasks... Specific, which is sort of critical for responsibly deploying these things.

[00:06:36 - 00:07:34]

Brandon: And the last thing that I wanna comment here on is open data, What we have found at No and I think what Mosaic will agree on maybe little comment on it in their presentation is the idea that the training data itself really dictates a lot of the behavior of these models. And so having some kind of open training data is really going to 1, improve your ability to understand like why among was behaving in certain ways. And 2, it's going to allow you to, you know, properly attribute that models maybe behaviors or ideas back to the training data itself. Again, not that any major Ai lab would systematically pirate training data from the entire Internet for their own profit. But anyway, this is sort of how we think about whether or not a model is open source and now I'm gonna go into a little bit around, you know, once you have these open source components, how might you use them to maybe explain a model.

[00:07:35 - 00:08:08]

Brandon: And and then a little bit in a little bit, how might you improve accessibility with them. So the first thing that we look at at no is, you know, training data analysis And I actually have a meme from Cody from Mosaic on the right here, which I think sums up the situation pretty well. Where really... Like, there seems to be this resistance to actually, like, looking at your training data. But it gives you such an outs size ability to understand what is happening in your model to And knowing make 1 of to make it sort of as easy and painless as possible to really dive in and explore your training data.

[00:08:09 - 00:08:27]

Brandon: And so we built Atlas as a tool to try and make... Make that as simple as possible. A atlas interface in a second, but we're gonna do so through a case study of the koala model, which is this open source model from there. So looking at our checklist, like, yeah. Great, they have the, you know, model, it's open.

[00:08:27 - 00:08:55]

Brandon: They have some documentation on it, but 1 thing that you'll see is under the the section, you know, the methods section where they talk about how they trained it, You can see they trained it on the ant ent H h dataset. And when you dive into the code, what you see that they did is they actually ran causal the language model on the chosen and responses of this dataset. And this is a dataset set that we've mapped with atlas in the past. So here's the ant philanthropic H h dataset set. All of the colors correspond to clusters.

[00:08:56 - 00:09:41]

Brandon: And then the words are sort of like semantic labels that we've extracted using Gp for all from those clusters, And 1 of the things that pops out to us immediately is, like, this area up here that says racism, Like that might be something we don't wanna train on. So if we zoom in and maybe look at something adjacent to it, which is this sort of like insult cluster. 1 of the things that we'll end up finding is there's this sort of, like dialogue here where the model says something like saying offensive of things and, like, clearly, that's not something that we act want to teach our model. So we shouldn't really be doing, you know, causal a language modeling on these phrases. Actually there's a lot worse stuff in this dataset set that I can't put up on screen right now.

[00:09:41 - 00:10:07]

Brandon: But I invite you to go and, like, look through it for yourself. The reason why it's in this dataset set is a little bit subtle because this dataset isn't for language modeling. It's for building reward models for Rl jeff. But regardless, you know, what the open data aspect of the Q model allows us to do is, like, pinpoint the effect. There is toxic data that has been used to train it, and then perhaps resolve that for future model trains.

[00:10:08 - 00:10:47]

Brandon: That would not be possible if you didn't act actually have, you know, the the open data portion of it. I'm gonna skip the line example for now just from a time perspective, another thing that you might care about here is once you have identified, maybe some some bad data in your training set to understand the effect of removing it. This is some speculative work that we've done recently in collaboration with Johns Hopkins where we changed 2 language models. 1 is a baseline that's trained you know, on a bunch of classes from an encyclopedia, via it you know plants and, animals and art. And 1 of them is trained, but never sees any articles about plants.

[00:10:47 - 00:11:25]

Brandon: And what we found is that inside of this plant ab bladed model. Not only are the representations of plants affected, but we see sort of this bleed effect where removing plants from the trained data also affects the representations of animals and natural places. And so we are super interested in trying to understand like, 4 different mixes of training data how these, you know, leads sort of like, affect end up sort of like coming to b and like how they interact. And then, you know, again, there's this idea as well of ob durability of these models. So we talked a little bit before about, like...

[00:11:26 - 00:11:46]

Brandon: Having open weights so that you can actually test these things on your operating domain, often that operating domain changes over time, and a lot of your errors will generally be, you know, traced back to a shift in in that sort of operating domain. What you see on the right here is an atlas map of Twitter, and we sort of, like, filter it out and show you how that evolves over time and Yeah.

[00:11:47 - 00:11:54]

Harrison: Your your screen might be a bit frozen. So... Your screen share might have been a bit frozen. So might wanna retry.

[00:11:54 - 00:11:55]

n/a: U. Yeah.

[00:11:56 - 00:11:59]

n/a: That's not good. Yeah. We... Yeah.

[00:12:02 - 00:12:04]

Harrison: And that... And now it's gone. So...

[00:12:07 - 00:12:14]

n/a: Yeah. I I turned it off. Let's see. And you guys see now. Potentially?

[00:12:19 - 00:12:23]

n/a: Can't see anything. Oh

[00:12:26 - 00:12:27]

n/a: Apologies everybody.

[00:12:29 - 00:12:31]

Brandon: Can you least share me? Like, call the crew. Okay?

[00:12:32 - 00:12:40]

Harrison: Yeah. You're a little bit blurry. The audio is fine. The the video is a bit blurry. There's some speculation that...

[00:12:41 - 00:12:48]

Harrison: That the atlas. No mix thing is is taking up all your Cpu power. So you might wanna close that tab. I don't know if that's sure not.

[00:12:50 - 00:13:11]

n/a: I think it's my bad hotel Wifi f. Is the situation situation here. Let's see. Yeah. Apologies for the technical difficulties.

[00:13:22 - 00:13:22]

n/a: Dear

[00:13:28 - 00:13:31]

n/a: You're doing 1 second. No worries.

[00:14:02 - 00:14:08]

Brandon: Goodbye camera. We have moved to the... I am t on my phone part of the of the life here.

[00:14:10 - 00:14:11]

n/a: Hopefully that will work.

[00:14:13 - 00:14:13]

Brandon: Alright. How are we looking?

[00:14:15 - 00:14:16]

Harrison: I can see your screen.

[00:14:17 - 00:14:30]

Brandon: Cool. I will put headache here just for the sake of time then and and we'll finish up. Alright. Thanks everyone for bearing with me. While we handle the Wifi Issues.

[00:14:32 - 00:15:02]

Brandon: Anyway, so the the whole point of that being O train data very, very critical for sort of like, explain ability in these models. The other thing I'm gonna talk to you today about is accessible Ai. And we sort of think of this in no in 2 kind of buckets, low resource and private models. So 1 of the first things that, you know, you can do to kind of understand the importance of low resource models is... Pause a couple of Axiom and then, you know, derive theorem from it.

[00:15:02 - 00:15:41]

Brandon: The axiom that we follow are 1, you know, people with access to these models are going to far economically out pace those without access to them. And the second is there's large populations of people that don't have access to the compute required to run these models. I think anyone in this chat that's tried to provision a Gpu in the last 6 months will know how hard it is to even get something that you, even get a Gpu and that is, you know, as someone that can pay 25 dollars an hour for maybe, you know, an 8X81 hundred. Right? And so if you want to mitigate the risk of differential access to this technology, you need to develop systems that can run on limited compute.

[00:15:42 - 00:16:15]

Brandon: And 1 of the, you know, proudest moments for me personally in sort of like building No was when I first saw this graph on the right, which is the geographic breakdown of our discord server. And we saw that the largest sort of like geography represented was other, which I think really means that we're able to bring this technology you know, to a lot of places that might not. Have necessarily the compute access that that we do. So I think that is gonna be really critical for the adoption of this technology. The other thing that these low resource models kind of enable is the idea of privacy.

[00:16:16 - 00:17:07]

Brandon: So, you know, right now, if you wanna use Gp before, you send all of your data to OpenSource, and then they promised not to train on it, much in the same way that they promised that they were actually a non nonprofit like, 5 years ago. And so if you wanna do that, that's great. But if you don't, what you're going to probably want is something like an air gap system, And because you might not have, say access to the and gray Gpu technology, you can have those air dash systems be enabled by the ability to run these models on sort of like low resource hardware. And you know, the Samsung story that maybe we've all at this point where they leaked a bunch of, you know, notes OpenSource code is is sort of, like the first of what I imagine will be many cautionary tales about what happens when you actually sort of get... Put this data out into the wild.

[00:17:08 - 00:17:35]

Brandon: And then the conclude here for the sake of time, you know, people might ask us what's next. I think the closest thing that we have to releasing is adding the Falcon 7 b model for v ecosystem ecosystem. So if you're looking... If you're watching us closely, you'll see that 1 of our awesome engineers, Erin opened this poor request last week adding, you know, compressed, you know, G support to Falcon 7. So be on the lookout for that in the Gb for ecosystem soon.

[00:17:36 - 00:17:41]

Brandon: Anyway. Thank you guys, and happy to take any questions at the end.

[00:17:44 - 00:18:02]

Harrison: Awesome. Thanks for that. I I have 1 question kind of at the... Or so like, what's been the and this is around kind of, like, the the atlas listing thing that you showed earlier. So I think 1 of the these use cases you mentioned was, like, finding kind of, like, pockets of...

[00:18:03 - 00:18:24]

Harrison: Data that you might wanna... Like, like, I guess, like, what are the main use cases you see people using kind of like Atlas for, specifically 1 kind of like, is it is it like their use it as a step before they train the models or they use it as a step when they're validating other models that have been trained, like, how how And and if so for either of those, like, what are the main, like, pockets of data that you see people being, like, really interested in?

[00:18:25 - 00:18:41]

Brandon: Yeah. So we see it all over the place? We think of Atlas very much just as a tool for exploring large structured datasets, training datasets sets happen to be of that type. The datasets produced by models. All happen to be of that type.

[00:18:42 - 00:19:12]

Brandon: Yep. And so it's applicable in in, you know, both cases, In terms of, like, the actual specific use cases, we'll see 1 what I showed today, which is the idea of going through your training data understanding the composition. Understanding if you have things that are toxic in there. You have the ability to, like, layer metadata on top of it, which helps in terms of, like, trying to understand what filters on top of your data might be doing. And so, you know, some of the work that we did with Lai, Let me tempt tempt fake here and maybe try and share my screen as an answer to this question.

[00:19:14 - 00:19:22]

Brandon: So this is 15000000 images from the lie on data that. And we worked with them to really try and understand

[00:19:25 - 00:19:25]

n/a: that

[00:19:27 - 00:19:30]

Harrison: I think I think we may have tempted fate too much they holler

[00:19:30 - 00:19:42]

Brandon: or by the thought safer work score. And what are the things that we help them find is they had this failure mode where they were all... Oh, dear. Well, Say.

[00:19:45 - 00:19:49]

Harrison: It worked it worked for a little long, and we got we got it we got a quick view of it.

[00:19:51 - 00:20:13]

Brandon: Alright. Hopefully, you saw the sort of like, hotspots popping up on the map for the not safer work score. And those hot spots allowed us to sort of like, investigate what is happening along the decision boundaries, There are a couple of errors that that we helps them find. So we we see it for the sort of, like, data quality and in sort of, like filtering step as well.

[00:20:14 - 00:20:20]

Harrison: Awesome. Awesome. Alright. Let's let's move on to the next 1. Jose, you guys are up.

[00:20:20 - 00:20:21]

Harrison: What do what do you got for us?

[00:20:23 - 00:20:23]

n/a: Cool.

[00:20:25 - 00:20:28]

Von: Wrong button. That's that's how I'm gonna lead off. 1 second.

[00:20:38 - 00:20:40]

n/a: Alright. Do

[00:20:40 - 00:20:45]

Von: you guys see the screen? Yep. Alright. Great. Yeah.

[00:20:45 - 00:20:52]

Von: So thanks again, Harrison. For having us year. My name is Ban. As we... I mentioned for, and I'm gonna be tackling this with Daniel.

[00:20:52 - 00:21:15]

Von: I'm gonna talk to you a little bit about our motivation, you know, why does mosaic exist. And why we actually decided to train a large language model and an open source it, I'm gonna hand it off to Daniel. Talk about about the Science and engineering that went into some of the decision making that went into training that model. And have the pleasure of playing the next slide game. And then after that, I'll...

[00:21:16 - 00:21:34]

Von: I think I'll take it back and give you a quick overview of the actual tooling that used to train the model. So I think first motivation. Right? Like, I think 3 history lesson, you know, I mean, I think it's all within the last year or so. But And the real question we all wanted to ask ourselves is like, does 1 language model really rule them all.

[00:21:34 - 00:21:57]

Von: Right? Like, is are we gonna live in a world only a few companies that have the money, the expertise, the compute, to train these large models, you know, just and a few large foundational models that are very capable. Is that the world that we want to live in? We don't think so, mosaic, and and we actually think that there's probably a place for both. But we believe in the contrary.

[00:21:57 - 00:22:13]

Von: We believe that, you know, numerous high utility As systems are gonna merge. We believe that companies will wanna create their own moat by training their own model. We also believe this would be more cost effective to kind of train your own model deploy it. And remember cost here includes both training and deployment costs. Right?

[00:22:13 - 00:22:36]

Von: So it's actually important to kind of look at both those hand in hand, And, you know, I think there's a whole body of evidence kind of showing this Right? Thought leaders are also kind of arguing that. Hey, like high value workflows are are definitely gonna be addressed by specialized Ai systems, not necessarily general purpose ones. So I think there's a lot of kind of traction here. And this is a great place for us to kind of make a big big impact.

[00:22:36 - 00:23:09]

Von: And so the future we believe in is kind of 1 for both. We think that there's gonna be a lot of folks out there that can probably just buy an external Api for some use cases and go build develop applications around it. And you know, I a you like shade has done an amazing job kind of enabling people in this space. And the area that we're gonna also be focusing on is enabling people to build their own models and deploy their own models. So talk a little bit about what we're offering in both of these spaces, But, yeah, I think this is where we're where we're really headed.

[00:23:11 - 00:23:25]

Von: The problem with this is that the perception is still that building your own language models is really hard. It's expensive. It requires a lot of compute. There were numbers being thrown around that training Q 3 costs, like, 5 dollars 12000000 dollars at some point. You know, that's that's huge.

[00:23:25 - 00:23:34]

Von: Right? That's a that's a major capital expenditure for a lot of people. But we're here to kind of bus this myth. Right? The reality is different.

[00:23:34 - 00:24:00]

Von: Training Element models is accessible. And we've already seen a lot of use cases for this, and we've seen a lot of instances of this. Example, just a few weeks ago, rep with a very small team, 2 engineers, train a model, less than a hundred k on the Mmo platform. This is a leading code company complete, for model and and you know, be it'd be codecs. You know, there's been other instances of, for example, Bloomberg Gp outperforming other financial Nlp tasks.

[00:24:01 - 00:24:20]

Von: And of course, the folks are at Stanford, you know, who built a model that you know, performs very similar to Text vinci. 3, but being small. You know and some of these know, weren't commercially l and and I'll kind talk about that in terms of of mosaic ml. But this is happening. Right.

[00:24:20 - 00:24:37]

Von: This is this is not an impossibility. So what did we do? At the end of the day, we built this great platform. We have some open source tooling to build and train these platforms and train these models rather. But really, we had to train a model.

[00:24:37 - 00:24:51]

Von: Right? We had to we had to demonstrate that we can do this. The proof is in the pudding, so we decided to give the world the taste of the pudding. So what we released so far is an Mp amputee, 7000000000. Think about it, maybe as a demo, what's possible in the mosaic ml platform.

[00:24:52 - 00:25:05]

Von: We actually released 4 models. The first is the base model. We also fine tune 3 other models, right? Chat model and extract model in story writer or model. And story writers has been very popular.

[00:25:05 - 00:25:15]

Von: I'm sure there are folks who have played with this. It's got a massive 65 k context length. We needed to Ali for that story And first thing we did is fed in the great Ga speed and generated a new Epi. Right? And so...

[00:25:17 - 00:25:28]

Von: And what's amazing about this is the base model. Just took 10 days and about a 200000 dollars to train. And it's it's commercially friendly licensed. So you can start using this with your business. You can start deploying these models.

[00:25:30 - 00:25:51]

Von: And you can start fine tuning this. Along with the model, we also released the L, foundry, which is our kind of large LangChain training stack. And basically, it's using the same exact tooling that we used train N mp sub billion. So we've seen great traction. And you know, there's been a lot of amazing talks.

[00:25:51 - 00:26:13]

Von: Some folks have worked on making it multi model, There's awesome great tutorials and and, you know, I think there is this there's also great tutorial about how to work with Mp 7000000000 with Lan. And so let me kinda stop there. I know it kinda flew through that, but we'll hand it over to Daniel and actually kinda get to the meat of the presentation, which is... The science and engineering behind training this OpenSource model.

[00:26:15 - 00:26:29]

Daniel: Yeah. So I'm gonna talk a bit about what went into creating Mp. 7 b. So there are kind of 3 main components still talk about a little bit, the data the algorithms and architecture and then kind of like the infrastructure we're

[00:26:29 - 00:26:29]

n/a: training it.

[00:26:30 - 00:26:56]

Daniel: Next slide. So the data mix as Brandon was talking about is it's very important for the final model that you get, and there's still, like, a ton of science. To do here for sure. So we started kind of with the the pre training mix from various published works. So that's like a mix of some, like common crawl data and some other higher quality maybe more domain specific data, like archive, Github, wikipedia.

[00:26:57 - 00:27:23]

Daniel: And then we we did a bunch of experiments at smaller model scales to kind of tune this mix a little bit and get a better idea of how different subsets impacted the outcomes. And a couple a few interesting decisions to call out. We'd decided folks on English for this model. We're... We don't have, like, multilingual experts and a lot of this is, like looking at the data, folks looking at the models playing with them.

[00:27:24 - 00:27:54]

Daniel: And so we decided folks on English for this 1, And then we also found, in our early experiments that repeating data is not the end of the world as maybe some papers and some... The general kind of idea may have been. So we repeated some of these smaller datasets sets up to a small number of times. Still ultimately get to our 1000000000000 data mix. And then at the end of the day, there was a little bit of intuition thrown into it as well.

[00:27:54 - 00:28:26]

Daniel: So We we're pretty sure we wanted a model that was good at some other stuff, like code and have world knowledge. And so there there was a little bit of intuition in in deciding the final data mix and over time, we'll get better at the the science side of it and hopefully rely a little bit less on the intuition. Go next slide. The next thing I'll talk about a little bit is some of the model architecture and, like, training choices. So the the first 1 that is kind of front and center, especially what the long context model is that we chose to use All.

[00:28:27 - 00:28:54]

Daniel: Which allows us to dynamically extend the context link. There's a bunch of science again, still to do about, like, how much the model actually uses the whole long context length and all of that. But it does remove kind of like the the first technical blocker to to playing with a a long context model. The And we also found that using Ali actually produced better models and more stable training runs. So those were 2 other kind of key factors in choosing Ali buy.

[00:28:55 - 00:29:16]

Daniel: And And then for the optimize, Adam is probably the most popular 1. While we were leading up to to training this model, the line optimize came out. And we ran some experiments very quickly and found that it had no performance. Station relative to Adam, but it has a much smaller memory footprint. So it was kind of strictly a win for us.

[00:29:17 - 00:29:34]

Daniel: So we went with that. And then the last thing is kind of part of this, maybe like, a little bit of myth busting that Von was talking about. For all of our models so far, we have only used F. So that's fully started data parallel. There's no...

[00:29:34 - 00:29:57]

Daniel: Tensorflow There's no pipeline parallel. And this means that the code is much simpler first of all and much more accessible to extend and read. And also kind of helps us improve iteration velocity. So I've talked about kind of all about all these experiments and the model. Yes.

[00:29:57 - 00:30:06]

Daniel: Sorry. Thanks a lot. Thanks. But what is it that kind of lets us do all of this easily. So we've worked really hard to build good software and infrastructure to...

[00:30:07 - 00:30:29]

Daniel: Do experiments and good science and reliably produce models. And we've talked a bunch of about in our blog, So I encourage you to to check that out. But it's also very important for us because this is the stack that our customers using. Our cut our customers are using. And this is also our open source stack.

[00:30:29 - 00:30:51]

Daniel: So all of the code that we use to train Mp 7 b. Von will talk a little bit about the difference between the open source stack and our platform. But all of the, like, main driving code components are open source. And there's a bunch of nice stuff in all of that open source code. I'll just mention a few of the things that kind of are important to doing these experiments.

[00:30:51 - 00:31:27]

Daniel: So 1 is, like, easily allowing us to human data from cloud Object store in different mixes. So on the left, there's like, what 1 of our Yam looks like with a mixture of c 4 markdown and m mc 4, in a deter way. Another is, we have our own evaluation harness that we run during our training runs. So we do Ico in context learning, evaluation live during the training run, so we can kind of get a better idea of how things are changing and how the training run is going. That's Hopefully, a little more informative than just looking at the loss curve.

[00:31:28 - 00:31:50]

Daniel: And then next slide. The last thing, I'll mention here is that the... With the Mosaic platform, we have automatic presumption from hardware failures. So this image is essentially our log book for the Mp 7 b run. We just turned it on, and then watched it for a while and then turned it off when it was done.

[00:31:50 - 00:32:09]

Daniel: And this cost about 200000 dollars as Von said, over 10 days on 440 a 1 hundreds. And everything... So again, everything here, except for the Gpus and the orchestration. So this this automatic presumption from hardware failures. Is open everything else is open source.

[00:32:10 - 00:32:23]

Daniel: And I'll hand it back to Bondage to kind of discuss the components of that stack a little bit more and close with what's coming next and how you can get involved if you're interested in kind of working with our open source tools or models.

[00:32:24 - 00:32:31]

Von: Great. Thanks a lot, Daniel. Yeah. So I think the the first thing you know, that we need is great tooling. Right?

[00:32:31 - 00:32:39]

Von: I think if you looked at... Kind of the previous loss curve where you're looking at our our our almost a seamless option. Right? I mean we are seeing failures. Right?

[00:32:39 - 00:33:08]

Von: Things, hardware fails, things happen, messes up a setting, being able to stop resume quickly, being able to start where you left off, someone is really important. And so, you know, we went built a software stack that can do all of. So I think there's kind of on the left here, we're trying to depict what your typical Ml training stack will look like. I think the main thing here missing is sort of the experiment trackers, but assume as kind of like a a box there somewhere. But, yeah.

[00:33:08 - 00:33:39]

Von: You have a user code, your data and and processing tooling, you have your distributed training framework, you know, whether that's vanilla p code or... Some kind of more fancy thing like P lightning or or or fast Ai. You have, you know, your deep learning libraries, like P or Tensorflow, And then when you're running on multiple Gpus, you need some type of deployment orchestration and then obviously, the completely complex. Software ecosystem some of the device drivers and tool kits and tooling and all the fancy version mismatch and things like that. Yep deal with.

[00:33:40 - 00:33:59]

Von: So how do we how do we kind of addresses at mistake. Well the first thing, 1 of the first things we actually built and this is right when the company was was very new as we built composer. Originally, we bought Composer as a training library. It was actually built to facilitate a lot of our algorithm research and Algorithm efficiency research. So there's a lot of amazing features there.

[00:33:59 - 00:34:16]

Von: It's actually an optimized OpenSource trainer at the end of the day. Supports multi node training, check to object stores. And it also logs to your favorite experiment trackers. Right? So whether you're using weights and biases, tensorflow, Ml flow, anything anything that you really want and it's also very ex.

[00:34:16 - 00:34:37]

Von: So it's easy to add new things. We have a 2 way callback system that lets you easily modify what's happening in the training loop So we kinda build all this for a research and then realize this is a really, really great way to also to start trick using to train large language models. So it's a... Since a very core part of the Yellow and foundry. In addition to that, we kinda realized, we need to go where the compute is, Right?

[00:34:37 - 00:35:02]

Von: When we were starting the company. Whether we can get v 1 hundreds on Aws or a 100 somewhere else or wherever. And what we really kinda found out is the data kinda anchors you, And so what we ended up doing is building this streaming datasets library. And what this does is enabling high performance streaming from any cloud optic object store. And so you can stuff your, you know, dataset in an R 2 bucket somewhere and stream it.

[00:35:02 - 00:35:27]

Von: You know, with 0 egress fees to Aws or Oc or, you know, Gcp, it really kinda gives you flexibility to to run anywhere. And The other thing that's streaming less you do is very fast res. Right? So we don't have to deal with spending hours or days even spinning our data loader when we have to resume a run, we can kinda of just pick up very quickly where we left off. And it's deter.

[00:35:27 - 00:35:43]

Von: Right? So in it preserves sample order between checkpoints and res. And then when it works in tandem with composer, it it kind of works with the checkpoint system very seamlessly. So basically, everything is vertically integrated. So everything on top there, everything above deep...

[00:35:44 - 00:35:54]

Von: Learning libraries is Open source. You can go check out the streaming group. You can go check out the composer repo. And that's actually what we refer to our training run time. Everything below that is actually our...

[00:35:55 - 00:36:13]

Von: Our paid product, which I kind talk about next tier. But this is, you know, to to actually basically have a great stack. We have a great training stack. We have to Actually have a great platform to run it. And so this is really what's responsible for running and orchestrating and scheduling jobs.

[00:36:14 - 00:36:28]

Von: So basically, the platform architecture is actually built in a way that we split it into basically 2 planes. The Mosaic control plane. Essentially cloud agnostic. You know, and it it manages all the multi node orchestration, scheduling runs. It does lifetime live real time monitoring.

[00:36:28 - 00:36:50]

Von: For faults and and implements and works very, you know, integrated very well with composer of the training stack to do automatic recovery of runs when there's some type of hardware failure. The control plane can run and be deployed anywhere. And that's what allows us to actually focus on putting the compute plan in any cloud. Right? This is where your actual accelerator located.

[00:36:50 - 00:37:04]

Von: This is where the training time will actually run. And this can be anywhere. Right? Your cloud of choice, It could be an on prem data center if you want it, if you wanna set the set at home, you know, I'm sure I'm sure we can get that to work. The platform LLMs" completely stateless.

[00:37:05 - 00:37:28]

Von: So At least the compute plan is actually completely stateless where, you know, data can be streamed in, but storage is a ep. We do not save your data. Is completely private. The other advantages we offer by being able to deploy where your compute plan is is data never has to leave your Vpc either. So if you have strict kind of data, provisioning or data access requirements.

[00:37:29 - 00:37:56]

Von: This all is actually very friendly to that kind of enterprise space. And lastly, we've got a really great Cli various options in terms of command line interface that lets you manage train. Submit different training jobs, different training config. It's all also accessible via Python Sdk, and we are building out a web console. To actually have feature parity with that, but currently, it's used for kind of account management billing and and gathering usage statistics.

[00:37:57 - 00:38:11]

Von: So overall, it... It's a complete... Solution we're a 1 stop shop. So I think where we're at now is we've got a great model We've got a rate model architecture. We've got a a place you can train it from scratch.

[00:38:11 - 00:38:29]

Von: You can fine tune it. What we really need now is a place to deploy so it's usable. And so a few weeks ago, we actually announced our inference infrastructure and inference products. And this really is available now in about, 2 tiers. So we have a starter here and this is this is your hosted Api.

[00:38:29 - 00:38:50]

Von: Right? So we have Mp t deployments hosted that you basically pay for Api usage after you sign up and are giving access to the platform. And this is very price competitive. So I kinda put some of the pricing on the the bottom right there. And the other thing we also offer is an enterprise tier.

[00:38:50 - 00:39:03]

Von: Where let's say, you know, this hosted model isn't enough. You need to do some customization or you won't... You've you've played with it. You've where you start entering your pipeline, and you actually wanna go and train your own model. That's where you kinda upgraded the enterprise tier.

[00:39:03 - 00:39:28]

Von: And this offers some of the same benefits as the training stack where you can deploy on your own private network, you can deploy on any cloud and kinda handle all that infrastructure for you. You're... You know, you basically just pay for Gpu. So it's a very very scalable and understandable pricing model. And and and, yeah, basically, same privacy advantages, we do not very save for even for this order here.

[00:39:28 - 00:39:50]

Von: We do not save and train on your data. So so, yeah, that's a that's kind of a very very core tenant, I think, to to what we built out, right, is is data protection and privacy. And, yeah, what's the come? Larger, better, interesting models. Definitely check out our open source tooling.

[00:39:50 - 00:40:02]

Von: We we have we have some we have eventually we have a whole burst of announcements coming up. So stay tuned. Join our community Slack on social media, LLMs" get engaged. We love the OpenSource community. Well you guys...

[00:40:02 - 00:40:22]

Von: I mean, just everyone has done a fantastic job of finding issues and making feature requests and, you know, we do our best to kinda handle them as fast as we can. And sign up for the as platform. If you if you think that you can start using this and it'll it'll meet your needs. So I think with that, we can open it up for any questions.

[00:40:24 - 00:40:34]

Harrison: Awesome. I've got a few and then maybe we can kick. To round table. The the first 1 I'd be curious to hear about is, like, how do you guys think about training from scratch versus, like, fine tuning. Is there a big difference there?

[00:40:34 - 00:40:43]

Harrison: Is that largely the same under the bot... And under the hood, would you do you recommend 1 versus the other for certain types of clients? Yeah.

[00:40:45 - 00:40:47]

Von: Yeah. I'll take it

[00:40:47 - 00:40:47]

n/a: dan. Correct.

[00:40:47 - 00:41:03]

Daniel: Yeah yeah. Yeah. I think there's... Again, like, with a lot of these questions, there's, like, still a lot of science left to do to figure out the right answer in in every case. Generally, though, it kind of depends how much data you're showing up with.

[00:41:04 - 00:41:49]

Daniel: If you're showing up with enough data to do a a training run from scratch. It's often something worth trying, particularly if you're kind of in a very different domain. So, like, maybe the most extreme example is, like, if you're showing up with a language that's not English like Arabic and you're gonna wanna train your own token, then if you're doing your own token, you have to start from scratch. We don't currently have any ways to kind of adapt the model to a new token, if you're showing up with kind of a a smaller fine tuning set of either unlabeled or unlabeled data, then you're gonna probably want to be starting from a base model? The other answer is that you build.

[00:41:49 - 00:42:10]

Daniel: Right? So do the cheapest thing first, if if you have a small dataset set, start with a base model, fine tune it, see how that does. Because 1 of the most important things here, right, is, like, you have to actually try the model on the thing that you care about. And so do that as quickly as possible with the cheapest... Possible option.

[00:42:10 - 00:42:22]

Daniel: And then start scaling up to maybe train from scratch, maybe larger models and, you know, stop 1 that thing is good enough for you. Like, there's no reason to spend extra money in time if it's solving your problem.

[00:42:24 - 00:42:44]

Harrison: Awesome. And then and then 1 other question I have for you guys is just on the difference between, like training and inference? Like, where do you think there's the most improvements for both you guys and the industry as a whole to kind of like make progress? Is it is there is there, like, more work left to be done on the training side or on the inference side of of the infrastructure part?

[00:42:45 - 00:42:50]

Von: I think at a high level. LLMs" I think the answer is both. Just... Yeah. Choose more.

[00:42:51 - 00:42:52]

Harrison: Yeah. Different more like India.

[00:42:53 - 00:42:57]

Von: It's hard. It... Okay. So it's it... So training definitely have a lot of...

[00:42:58 - 00:43:15]

Von: I mean, just... You know, if you go back to the days when we were doing pure algorithm research training and some of the stuff that we did with res and Bert. And deep app, for example, and just bringing out those costs. I mean, we think, you know, obviously, there's a lot of room for improvements in terms of how you're training L. So you see this console me.

[00:43:16 - 00:43:31]

Von: Some of our speed of methods we get on Twitter. Right? Some I'm little post, a, I flipped this flag and I got, like a 10 and or 10 percent improvement or something. And we get on that really quick. So training definitely low a lot of low hanging fruit, but also a lot of research to do in terms of efficiency and bring the cost compute.

[00:43:31 - 00:43:32]

Von: It's hard for me

[00:43:32 - 00:43:33]

Harrison: to d of those... D couple

[00:43:33 - 00:43:47]

Von: those 2. And I think like that is, I think a mistake that people make, great, where Okay. Let's say you go and train a hundred billion parameter model. The problem is deploying that at scale and making so people can use it, like you can't you can't d... It's...

[00:43:47 - 00:43:57]

Von: You just can't d couple that picture. Right? And so that's 1 of the hardest things is like, if we have someone come in and says, oh, I wanna go train a trillion parameter model. It's like, Okay. I mean, you even if you have the data?

[00:43:57 - 00:44:03]

Von: It's like, what, why? Because I this... How are you gonna host that?? How are you gonna deploy that like, what... You know?

[00:44:03 - 00:44:23]

Von: And for some very, very niche cases that might be possible. But generally, like, a lot of people we don't think really need. Right? So I think what Daniel was saying earlier is probably the way to go is at you start small, and the the benefit of start small is that you can iterate quickly. And, yeah, maybe, you know, you don't get the thing that you want right away, but mean, I would argue, you probably don't know that right away.

[00:44:23 - 00:44:29]

Von: Right? You you kinda need... It's a discovery process. So you start with what's been built, iterate on that. If you see value...

[00:44:30 - 00:44:45]

Von: Pre training, pre trained, but also don't forget about inference. Right? And so I think even in inference, hosting large language models, there's a lot we can continue to do. Right There's a lot of interesting things going on and research in terms of how to compress large language models are dis distilled down in the smaller models so they're cheaper. So...

[00:44:46 - 00:44:53]

Von: Right If I had to pick for this, I'll choose training, but I I do think it's hard to d up the 2. Just like I can get in yang scenario going on there.

[00:44:54 - 00:45:16]

Daniel: I think also probably be for the parts of inference that you can d couple from training, it's a bit easier to work on, like, you need less resources? You don't need a giant cluster to to work on it inference efficiency. So I think we've seen a lot of great work from the open source. Community there with, like, g and llama Cp and all of that.

[00:45:18 - 00:45:31]

Harrison: Sense. Reminder for people to add questions to the question box? I'll go there after 1 more. I wish Brandon was here for that for this, but I think he's having some Internet issues. But I guess, my question is, like, a lot of what both of you guys talked about was the tooling around.

[00:45:31 - 00:45:56]

Harrison: So these models. And so my question would be, like, what is the tooling that you guys have built that you think is, like, most novel and you're most proud of? And then what is 1 piece of tooling that you... That doesn't exist in your guys platform currently, but you wish someone... Like, maybe you guys will build listen in the future, Maybe wish someone would build this Like, 1 1 external piece of twin that you think would be really valuable and needs to have kind of like more more attention paid there.

[00:46:00 - 00:46:25]

Von: That's a good question. I mean, I think the probably the most novel is... I mean, to me, I think it's probably streaming. In that, you know, it really really kind of frees you from from cloud and vendor lock in. You can put your data somewhere, you can move it around there's pricing and costs associated doing all that, but but you can that you can calculate that.

[00:46:26 - 00:46:41]

Von: Right? That's that's an understandable model whereas, previously, you had to c locate, you know, your data with your compute, you were tied in. So I think, yeah, That's why I think tooling is actually great because... It'll unlocks you from clouds and really what you want the clouds to be doing is a race to the bottom. Right?

[00:46:42 - 00:47:04]

Von: They need to be competing on pricing and offering you the best cost per Gpu. So giving giving... Open source and ourselves anyone who needs to pay for compute, that type of that type of leverage I think, is is probably what makes streaming great. I'm sue what Daniel says and then think about what I what I I have a long wish list. I'm wondering what I should ask for if I had this

[00:47:04 - 00:47:25]

Daniel: I can I can take stab at my my wish list LLMs"? So I think 1 of the things that we and the community have been struggling a lot with is evaluating these models robust. You know, you you run academic benchmarks, you get some numbers, community who knows what that means. You try out the model. You play with it for a little bit.

[00:47:25 - 00:47:51]

Daniel: And, like, ultimately, everyone's has their own little internal test for these models in deciding, which one's best, Which one's best for their use case. And some of that is unavoidable. Right Like, different models are gonna be good for different things. For And there's really no, like, substitute for just like, having a little test set and looking at it and see what happens. But I'd love to see more tooling from tooling around kind of evaluating the aspects...

[00:47:51 - 00:48:15]

Daniel: Different aspects of the model that we can. Automatically, robust evaluate, even if they're not, each 1 individually is not like the Be all and all of model quality, any, like, discrete accurate points you can get on model eval is very helpful and kind of reduces the iteration cost of just like checking the vibes. As we say. So so I'd love to see, like, more robust tooling for evaluating these large models

[00:48:18 - 00:48:34]

Von: So that's a great 1II think I know mine, which I think, you know, I have a former hardware background, you know when I come from 1 of the the the hardware chip companies as well. But where is anyone else but Nvidia for training. Right? I think that's what I'd love to see. Like, Amd had some great announcements yesterday.

[00:48:35 - 00:48:40]

Von: You know, my m my I 300, and my I 300 x. We wanna use it. Right? Like, we're ready. We're...

[00:48:40 - 00:48:53]

Von: We love Nvidia don't get me wrong. They make they make amazing stuff. And and we... You know, I think there's also other companies out there making amazing things. There's there's a, you know, a very robust thriving hardware startup ecosystem.

[00:48:55 - 00:49:15]

Von: You know, I wanna try some new hardware and and really optimize them as a platform on on different things and And honestly, my hypothesis is that different hardware will be good for different types of models as well. You know, is actually something that we talked a lot about at the beginning. Early days of mosaic? Like, and does it make sense to train LLMs" on Tpu versus Gpus or does it... You know, where is the best place to do this?

[00:49:15 - 00:49:24]

Von: And and it kinda gets the hard hardware, like, an a lottery ticket type thing. But but I think basically... Sorry, hard hardware lottery type thing But, yeah,

[00:49:24 - 00:49:25]

Brandon: we wanna run on on more hardware.

[00:49:28 - 00:49:42]

Harrison: Awesome. So so the first... The the uploaded question is very similar to what you were talking about, Daniel and it's around evaluation. And and it ties in a little bit of I think it's combo kind of like mosaic and then also some tie to Link change. So let me try to parse this and and turn this into some question.

[00:49:43 - 00:50:04]

Harrison: But basically, yeah, is there systematic and quantitative approach to determine the most suitable. Open source, pre trained For a specific use case, such as, like, document question answering in a particular domain. So that's 1 of the main, like, chains that we have. And and so there's some stuff that he mentioned this beam inside LangChain and I'm happy to talk about that. But, yeah.

[00:50:04 - 00:50:14]

Harrison: I guess, like... Building on your answer before. And, yeah. Like... And, you know, I I think you said something really key about, like, having your, like, having a test dataset set of sorts.

[00:50:15 - 00:50:30]

Harrison: And so, like, yeah, as you go from not these, like, academic benchmarks, but your particular task, like, what does what does a good workflow? And and... Yeah. What does a good workflow seem like to you? And and I'll take a step at this after as well, but curious to hear your guys as.

[00:50:31 - 00:50:48]

Daniel: Sure. Yeah. So, I mean, unfortunately, the answer is is, like... Sort of note. There there is not a current good like, systematic quantitative, but the more more, like, you know, if you have an enterprise and a bunch of people to to label data and evaluating stuff.

[00:50:49 - 00:51:15]

Daniel: At the end of the day, you just, like, hey you have to create a test set that is what the thing that you care about. And if your test set involves something that you can't automatically evaluate reliably. So if it's like OpenSource to generation, you have to evaluate it with humans. There there are some approaches, you know, like, there's automated metrics that you can start with, and then there's, like, you can of maybe evaluate with, like, Gp 4 or something like that. Was

[00:51:15 - 00:51:17]

n/a: gonna ask that I

[00:51:17 - 00:51:19]

Harrison: was gonna ask about that. What what... Yeah. Where... Have you done that before?

[00:51:20 - 00:51:20]

Harrison: Have you seen that work?

[00:51:21 - 00:51:39]

Daniel: We've we've definitely played it around with it. I think it it definitely is like a signal that you can use in in your evaluation. But for me, personally, Like, I just don't at this point, trust evaluating models with models. It's a good like read. Like, it probably gives you, like, summit information.

[00:51:39 - 00:52:04]

Daniel: And since it's automated, you can do it more more frequently, perhaps than than human eval. But there's There's both obvious biases, like, the... There's paper showing recently that, like, Gp 4 eval biased to the position. Of the of the choice. And then I'm sure there's a bunch of harder to detect biases hiding.

[00:52:05 - 00:52:13]

Daniel: And so At the end of the day, like, if you wanna evaluate something hard, it's still humans for me. Yeah.

[00:52:14 - 00:52:37]

Harrison: And so so the the rest of the answer mentioned some, like things related to LangChain. So there's like a Qa eval chain, which does the open Ai or the... Or it uses a model to grade it. Gpu 4 being the 1 that most people use practices it. It mentions auto evaluator, which is a way to do Qa specific evaluation that, like, Lance Martin put out.

[00:52:37 - 00:52:45]

Harrison: And then there's a question about, yeah. What what... And I guess this let me reframe this. Like, how do you get this test dataset set? So like you mentioned this test dataset set.

[00:52:46 - 00:52:47]

Harrison: How do you how do you get that?

[00:52:48 - 00:52:48]

n/a: Yeah.

[00:52:48 - 00:52:56]

Harrison: And I guess, like, what should that look like? How many how many data points what are characteristics of the of the tests data points don't to speak?

[00:52:57 - 00:53:22]

Daniel: Yeah. You know, it's hard to answer in a in a general way perhaps, but I think people tend to shy away from just like, spending a couple days writing data. And that can actually be a great way to do it. Like, if you sit down with your task and like, take a couple days, you can write a reasonable amount of test data. Or you can play with your model for a reasonable amount of time and kind of find the the bounds.

[00:53:23 - 00:53:48]

Daniel: So I think don't shy away from... As Brandon talked about a lot, like, looking at the data and perhaps even writing some data. And yeah. So, like the the tooling that that you mentioned is great and and helps you get some idea. Of the performance of the model, and then building tooling to make your, like, evaluation easier.

[00:53:49 - 00:54:00]

Daniel: So, like, Ab testing or something like that. Just like a tiny little Ui can be very helpful. But, yeah, then, like, don't shy away from looking at and writing and data like, those tasks are important.

[00:54:01 - 00:54:11]

Von: Yeah. I mean, in a way, I think it's similar to writing unit tests for software. Right? Like, data everyone kinda probably hates doing it, but it's critical. Right?

[00:54:11 - 00:54:20]

Von: It's necessary. Otherwise, you have no idea if you have working. Working code. And I think it's a little similar here Like, we... Even when customers come in, rent, they say, we wanna train our own model.

[00:54:20 - 00:54:34]

Von: The first thing is The first questions, I think, even our researchers or anyone asks, how are you how are you gonna eval? Right? How are you gonna how are you to know that this model is doing what you want it to do? There's a lot of framers out there. Right?

[00:54:34 - 00:54:46]

Von: There's a lot lot of different eval suites out there that give you Yeah. Some idea of its performance in in different domains. But at the end of the day, like, you're... You know, you're using the model to do something. You are the domain expert.

[00:54:46 - 00:54:58]

Von: Right? And so somehow, right do what Daniel saying, like, go go build your own eval metrics, go build your yu sets, spend time on that. Think about it as unit testing because otherwise, you just you... You're not gonna know. Right?

[00:54:59 - 00:55:00]

Von: There's no way you can scale it.

[00:55:01 - 00:55:18]

Daniel: There's also a a I'll work on generally really quickly that's more from the kind of bert era, and I'm not sure if it's been kind of extended. But I called checklist, and I think it was, like a best paper Ac or something like that. And it was about, like, how unit testing these models in some form. I really enjoyed that paper saying. Encourage people to take out.

[00:55:20 - 00:55:32]

Harrison: I the good 1. That that perfectly answers another question or not perfectly, but there's a question about open source. Encourage leader boards and which ones are kind of like best. And I think the point around just having your own leader board is probably, you know, irr replaceable for Yeah.

[00:55:33 - 00:55:40]

Daniel: Look at all of them. Take them all with a grain of salt. And, like, nothing compares to evaluating all the models in your own frameworks so you know the exact same thing was around.

[00:55:41 - 00:56:00]

Von: Yeah. This might be a Can worth, but, honestly, I think a leader board out there That is actually, like, here are all the different things that we tune a model to do. Right? Is, like, a lot of these are, like, here is a currently trained model and there here's how it outperform in verticals, but really, like, that's important, but also I'm not it's not clear to me that that speaks of the versatility of a model. Right.

[00:56:01 - 00:56:25]

Von: And like, what does that even mean? And so yeah I'm really curious like, you know, I think I was looking for sources for this presentation of community project with Mp, but Very curious like, what sort of different verticals people are using Mp amputee yet because that to me is adaptability and versatility, right? Not just oh, it's great at these things currently? Because we kind of want people to be fine tuning it, and we want people to be modifying it for their own own use cases.

[00:56:27 - 00:56:40]

Harrison: Awesome. We're running up on time. So I'm gonna combine the last 2 questions into 1 and maybe give you guys a final chance to talk about Mosaic. That's... The combo question is basically, how can open source communities and pre revenue start.

[00:56:40 - 00:56:51]

Harrison: So very cash kind of like constrained and resource constrained. How can they use Mosaic Ml to train their own custom models and combining the other part are their best practices that you would recommend for them doing so.

[00:56:51 - 00:57:00]

Von: Oh, yeah. Sure. I mean, first and foremost, start using our training stack. Right? You can you can use that on your 30 90 or 40 90 workstation at home.

[00:57:00 - 00:57:21]

Von: If you have problems ping us, you know, Daniel and I, everyone on the team is very very responsive on our community Slack. I think we're creating a community discord soon, but join our community and tell us your issues, file them on Github and we'll we'll fix them for you. We we love our working with startups. We work with a lot of startups ups. You know, we are a startup up.

[00:57:22 - 00:57:48]

Von: And so that's the best way. When you decide you wanna scale your compute up, What's great is, you know obviously sign up and and and reach out to our sales team or just ping us on community even. And someone will get in touch with you. And the best part of using the stack is if you've got it working on your 40 90, you can easily scale it. To multiple Gpus or multiple nodes, hundreds of Gpus when you start running on the platform.

[00:57:48 - 00:57:59]

Von: So there's a very seamless kinda of transition plan there. So don't have to reinvent every way, just start with our open source tooling and it's a natural entry point into our kind of platform, but you're gonna use more at an enterprise scale.

[00:58:03 - 00:58:08]

Harrison: Awesome. Alright. We're gonna end it there. Thank you guys for joining. Thank you, Brandon for joining earlier.

[00:58:08 - 00:58:28]

Harrison: I'm glad we got the presentation in before his wife pie cut out. So overall, hope this was a deep dive into Open source LLMs". Stole AAA lot of questions in my mind, and I feel like the field is is progressing so rapidly. I don't know how you guys kind of like keep up with it. So thanks thanks for joining.

[00:58:29 - 00:58:35]

Harrison: Thanks for thanks for continuing to push on it. I think we all agree that the more open source all the better. So good. I. Thanks hosting this.

[00:58:36 - 00:58:36]

Daniel: Thanks so much.

[00:58:37 - 00:58:38]

Harrison: You guys?