Summary of Google Gemini 1.5 with Arthur Soroken (Youtube)

Summary Google Gemini 1.5 with Arthur Soroken (Youtube) youtu.be

7,536 words - YouTube video - View YouTube video

One Line

Google's Arthur Soroken introduces Gemini 1.5 with an unprecedented 1,000,000 context window, pushing the boundaries of AI capabilities.

Slides

Slide Presentation (9 slides)

Copy slides outline Copy embed code Download as Word

Unveiling Gemini 1.5: A Revolutionary Leap in AI Technology

Slide 1: Gemini 1.5 Introduces 1,000,000 Context Window

• Significant milestone in AI history

• Allows processing of large amounts of information

• Pushes the boundaries of AI capabilities

Visual: Image of Gemini 1.5 interface

Advancements in AI Technology

• OpenAI's launch of Sora text-to-video modality

• Google's surpassing of a 32,000 context window

• Marking a substantial shift in AI capabilities

Visual: Comparison chart between Gemini 1.5 and previous models

End User Benefits of Gemini 1.5

• Ability to input entire books or videos for tasks

• Practical applications for summarization and categorization

• Monumental moment in AI history for non-technical users

Visual: Example of inputting a video prompt into Gemini 1.5

Technical Achievements with Gemini 1.5

• Major advancement for Google in AI technology

• Rapid progression of machine learning capabilities

• Potential to transform daily lives with AI assistance

Visual: Technical diagram showcasing the architecture of Gemini 1.5

Accessibility of Large Language Models (LLMs)

• Transformation for non-technical users and technical experts

• Expediting tasks like code testing and content creation

• Broadening accessibility to machine learning capabilities

Visual: Infographic on the impact of LLMs on various industries

Future Trends in AI Technology

• Advancements in modalities like audio and smell

• Opportunities for creativity and content generation

• Empowering creators and artists with AI tools

Visual: Concept art depicting future AI applications

Soroken's Vision for Generative AI

• Emphasis on zero to one innovation and developer tools

• Safety measures to evaluate model input and output

• Democratizing effect of generative AI on various applications

Visual: Soroken discussing generative AI applications

Embracing the Future of AI Technology

• Transformational impact of Gemini 1.5 on AI history

• Opportunities for innovation and creativity across industries

• Reminder to explore the potential of generative AI tools

By presenting the main points from the original content in a concise and visually engaging manner, this presentation effectively conveys the significance of Gemini 1.5 and its implications for the future of AI technology.

Key Points

Google introduced Gemini 1.5, a significant milestone in AI history, featuring a 1,000,000 context window for processing large amounts of information.
OpenAI launched Sora, a text-to-video modality, marking a major advancement in AI technology.
The shift from a 32,000 to a 1,000,000 context window in Gemini 1.5 represents a significant achievement for Google and AI models.
The increased accessibility of large language models (LLMs) like Gemini 1.5 opens up new possibilities for end users and technical experts.
Generative AI has the potential to revolutionize industries and technologies, democratizing access to advanced tools and thought partners.
Arthur Soroken emphasizes the transformative nature of generative AI and the importance of safety measures in evaluating model input and output.

Summaries

18 word summary

Google's Arthur Soroken introduces Gemini 1.5 with a 1,000,000 context window, surpassing previous limits and revolutionizing AI capabilities.

54 word summary

Google's Arthur Soroken unveiled Gemini 1.5, featuring a 1,000,000 context window for processing vast amounts of information, surpassing Google's previous 32,000 context window and OpenAI's Sora text-to-video modality. The expanded window has broad implications, making tasks like summarization and translation more accessible to non-technical users. This launch marks a significant moment in AI history.

139 word summary

Google's Arthur Soroken introduced Gemini 1.5, a milestone in AI with a 1,000,000 context window for processing large amounts of information. This surpasses Google's previous 32,000 context window and OpenAI's Sora text-to-video modality. The expanded context window has broad implications, allowing non-technical users to input books or videos for tasks like summarization and translation. It also signifies AI's increasing accessibility and rapid progression. Large language models (LLMs) like Gemini 1.5 have the potential to transform daily life by assisting with tasks and content generation. This accessibility benefits both non-technical users and technical experts, who can leverage LLMs to enhance productivity. Looking ahead, AI advancements may include new modalities like audio and smell, offering opportunities for creativity and content generation. Overall, Gemini 1.5's launch marks a significant moment in AI history, opening up new practical applications and transforming daily life.

401 word summary

Arthur Soroken, a Google representative, recently introduced Gemini 1.5, a significant milestone in AI history. This new release features a 1,000,000 context window, allowing for input of large amounts of information such as video, audio, code, and text. The increased capacity of the context window opens up numerous possibilities for practical use and is a significant technical achievement for Google. OpenAI recently launched Sora, a high-quality text-to-video modality that can generate one-minute videos based on a prompt. Additionally, Google's previous launch of a 32,000 context window was surpassed by the 1,000,000 context window in Gemini 1.5. These developments mark a substantial shift in the capabilities of AI models and have significant implications for both end users and technical experts.

The increased capacity of the context window in Gemini 1.5 has far-reaching implications for both end users and technical experts. From an end user perspective, the ability to input large amounts of information in a single prompt opens up numerous possibilities for practical applications. For example, users can now upload entire books or videos and request tasks such as summarization, categorization, translations, and more. This significantly expands the range of use cases that are accessible to non-technical users. From a technical perspective, the advancement from a 32,000 to a 1,000,000 context window represents a major achievement for Google. It demonstrates the rapid progression of AI technology and the increasing accessibility of machine learning capabilities to a broader audience.

The increased accessibility of large language models (LLMs) such as Gemini 1.5 has the potential to transform daily lives by providing assistance and co-piloting capabilities. This accessibility allows for a wide range of applications that were previously inaccessible to non-technical users. The ability to prompt LLMs with natural language text opens up possibilities for tasks such as summarization, categorization, and content generation. The increased accessibility of LLMs also has implications for technical experts who can now leverage these capabilities to enhance productivity and streamline processes.

Looking ahead, the future of AI is likely to involve further advancements in modalities such as audio and smell, as well as new opportunities for creativity and content generation. The increasing accessibility of LLMs has the potential to empower creators and artists to augment their work using AI tools. In conclusion, the launch of Gemini 1.5 with its 1,000,000 context window represents a monumental moment in AI history, opening up new possibilities for practical applications and transforming daily lives.

632 word summary

The launch of Gemini 1.5 comes at a time when other significant developments in AI are also taking place. OpenAI recently launched Sora, a high-quality text-to-video modality that can generate one-minute videos based on a prompt. Additionally, Google's previous launch of a 32,000 context window was surpassed by the 1,000,000 context window in Gemini 1.5. These developments mark a substantial shift in the capabilities of AI models and have significant implications for both end users and technical experts.

From a technical perspective, the advancement from a 32,000 to a 1,000,000 context window represents a major achievement for Google. It demonstrates the rapid progression of AI technology and the increasing accessibility of machine learning capabilities to a broader audience. This shift in the capabilities of AI models has the potential to transform daily lives by providing assistance, co-piloting capabilities, and accessibility to a wide range of applications.

Arthur Soroken discusses the release of Gemini 1.5 and its focus on providing tools for building innovative applications through its low code, no code tool called Google AI Studio. His team at Google is focused on zero to one innovation and building tools for developers, rather than competing directly with OpenAI. He also discusses the safety measures in place to ensure that the input and output of the models are thoroughly evaluated for explicit or inappropriate content.

Soroken expresses his belief that generative AI has the potential to revolutionize various industries and technologies. He emphasizes the democratizing effect of generative AI, allowing more people to access advanced tools and thought partners for various applications.

In conclusion, Soroken emphasizes the need for continued investment in safety measures for generative AI and encourages users to provide feedback on inappropriate content. He highlights the potential for generative AI to advance and augment technology across various industries.

986 word summary

Arthur Soroken, a Google representative, recently introduced Gemini 1.5, a significant milestone in AI history. This new release features a 1,000,000 context window, which allows for input of large amounts of information such as video, audio, code, and text. This is a monumental advancement from previous context windows, which were limited to much smaller amounts of information. The increased capacity of the context window opens up numerous possibilities for practical use and is a significant technical achievement for Google.

The launch of Gemini 1.5 comes at a time when other significant developments in AI are also taking place. OpenAI recently launched Sora, a high-quality text-to-video modality that can generate one-minute videos based on a prompt. This was the first of its kind and represents a major advancement in the field of AI. Additionally, Google's previous launch of a 32,000 context window was surpassed by the 1,000,000 context window in Gemini 1.5. These developments mark a substantial shift in the capabilities of AI models and have significant implications for both end users and technical experts.

The increased capacity of the context window in Gemini 1.5 has far-reaching implications for both end users and technical experts. From an end user perspective, the ability to input large amounts of information in a single prompt opens up numerous possibilities for practical applications. For example, users can now upload entire books or videos and request tasks such as summarization, categorization, translations, and more. This represents a monumental moment in AI history as it significantly expands the range of use cases that are accessible to non-technical users.

The increased accessibility of large language models (LLMs) such as Gemini 1.5 has the potential to transform daily lives by providing assistance and co-piloting capabilities. This accessibility allows for a wide range of applications that were previously inaccessible to non-technical users. The ability to prompt LLMs with natural language text opens up possibilities for tasks such as summarization, categorization, and content generation. This represents a significant shift in the accessibility of machine learning capabilities to a broader audience.

The increased accessibility of LLMs also has implications for technical experts who can now leverage these capabilities to enhance productivity and streamline processes. Tasks such as code testing, error identification, and content creation can be significantly expedited using LLMs with large context windows. This represents a major advancement in the field of AI and has the potential to spur a new wave of innovation across various industries.

Looking ahead, the future of AI is likely to involve further advancements in modalities such as audio and smell, as well as new opportunities for creativity and content generation. The increasing accessibility of LLMs has the potential to empower creators and artists to augment their work using AI tools. This represents an exciting frontier for the future of AI and has the potential to revolutionize the creative process.

In conclusion, the launch of Gemini 1.5 with its 1,000,000 context window represents a monumental moment in AI history. It has far-reaching implications for both end users and technical experts, opening up new possibilities for practical applications and transforming daily lives. As AI technology continues to advance, the future holds exciting opportunities for further innovation and creativity across various industries.

Arthur Soroken, a tech professional with a background in engineering, emphasizes the significance of the current moment in technological history. He highlights the transformative nature of generative AI and urges people to educate themselves on its potential. Unlike previous incremental improvements, this represents a monumental leap that should not be overlooked. He encourages people to play with the tools, create new content, and explore the potential for productivity and education using these technologies.

Soroken discusses the release of Gemini 1.5, which has been made available to around 900 people. He compares it to ChatGPT 4 and emphasizes the differences in functionality and use cases. While ChatGPT focuses on the chat interface, Gemini 1.5 is more focused on providing tools for building innovative applications, particularly through its low code, no code tool called Google AI Studio.

He explains that his team at Google is focused on zero to one innovation and building tools for developers, rather than competing directly with OpenAI. He also discusses the safety measures in place to ensure that the input and output of the models are thoroughly evaluated for explicit or inappropriate content.

Soroken expresses his belief that generative AI has the potential to revolutionize various industries and technologies. He discusses its potential applications in reducing carbon footprint, advancing technological innovations, and augmenting existing technologies. He emphasizes the democratizing effect of generative AI, allowing more people to access advanced tools and thought partners for various applications.

Overall, Soroken's insights shed light on the transformative potential of generative AI and the importance of understanding and utilizing these tools for innovation and advancement in various fields.

Audio transcript

Raw indexed text (39,750 chars / 7,536 words)

[00:00:06 - 00:00:28]

Speaker 0: Alright. Welcome, everybody. I know last time we saw each other was over Mardi Gras, and I was sitting on Friday night trying to think about what the hell am I gonna write on Sunday, after 10 days of Mardi Gras and fun. And I don't wanna talk about Mardi Gras again. And at 7:39 on Friday, I got this email from Arthur.

[00:00:28 - 00:00:54]

Speaker 0: He goes, hey, Tim. My team at Google just launched, a 1, a 1,000,000 context window and video input text output, which I have no idea what he's talking about. I'd love to share this with the new society community. This would be early access since it's behind the wait list. And even though, we made a public announcement, this is a pretty big monumental time in a AI history.

[00:00:54 - 00:01:29]

Speaker 0: And so, with that, we are so excited to have Arthur here who's gonna help bring us through the new release of Gemini 1.5. A little bit about Arthur. Arthur is from the Bay Area, but is a New Orleans resident. Welcome to New Orleans. He's been working in high-tech the last 9 years, over 9 years with Sun Microsystems, Cisco Systems, with, Sonza, which was acquired by Google, which is now, he's at Google.

[00:01:30 - 00:01:48]

Speaker 0: He also managed Google's in house incubator, program for a while. He is now the lead of growth and community for AI, Bard Gemini, at Google for Google Labs. And most importantly, the most important part of this, he is a board member with a, rolling alibi. So

[00:01:48 - 00:01:51]

Speaker 1: right? Yes. Right here. Good.

[00:01:52 - 00:02:04]

Speaker 0: Okay. Just to kick this off, 1 Arthur, welcome. Thank you for being here. And thank you for sharing this with with the group. So why is this a pretty big and monumental time in AI history?

[00:02:04 - 00:02:12]

Speaker 1: Awesome. So I will answer that I will answer that, but I do wanna say I am missing an Elviv board meeting right now for this, so just just to be super clear.

[00:02:12 - 00:02:14]

Speaker 0: Don't don't tell Scott. Right?

[00:02:14 - 00:02:29]

Speaker 1: Oh, god. Yeah. Don't tell Scott. So on Thursday, 2 very big things happened, and 1 of them, Google wasn't aware of, that it was happening and 1 Google was part of. And the 2 things that happened does anyone know?

[00:02:29 - 00:02:36]

Speaker 1: Anyone wanna shout out 1 of them? Yeah? Sora launched. Does anyone know what Sora is? Yeah.

[00:02:36 - 00:02:47]

Speaker 1: Okay. Fantastic. So OpenAI launched a new modality, which is, text to video. Incredible. I think it does 1 minute videos.

[00:02:47 - 00:03:01]

Speaker 1: You can enter a prompt and get a very high quality video. I mean, the physics were amazing in the video. Very monumental. This is the first, high quality text to video modality. So this happened.

[00:03:02 - 00:03:19]

Speaker 1: But right before that launch, Google launched 1,000,000 context window. What is a 1,000,000 context window? Well, to date, most models handle in, small small tokens. And so small tokens what is a token? Let's start with that first.

[00:03:19 - 00:04:09]

Speaker 1: The token is, it's a piece of information, so that could be a word, that could be an image, that could be a video, and those tokens limit how much you can send to the model. There's cost implications, there's compute implications, etcetera, and so tokenization is the way that they they they deal it's the currency essentially for dealing with an LLM. To date, the the current tokens for that had been up to date so Google had a 32,000 context window, so you could put, and think about a token as about 3 to 4 words. So you could add 3 to 4 words, you know, by by 32,000. You can input that into the prompt as you're trying to converse with the l o n.

[00:04:10 - 00:04:27]

Speaker 1: The next big monumental moment that happened many, many, many months ago was Claude. Does anyone know what Claude's launch was? Cool. It was about 200 did someone say it? 2 100,200,000, which was very monumental at the time many, many months ago, which is a lot.

[00:04:27 - 00:04:50]

Speaker 1: It's a lot of tokens, mostly text at that point, so there's a lot of text that you could prompt in. But then Google came and added a 1,000,000 context window. So that still doesn't say, like, why this is important, but you can at least get an idea of how big this is. You can add lots of information in a single 1 shot prompt. When I say 1 shot prompt, it's literally saying, hey, LLM.

[00:04:50 - 00:05:05]

Speaker 1: Hey, large language model. Hey, Gemini. I would love for you to make my, give me an itinerary for a week, for a vacation. I'm going to Austin, which is happening coming up here soon. I'm going on vacation to Austin.

[00:05:06 - 00:05:18]

Speaker 1: Give me an itinerary. You could send that. That's not that many tokens. That's not that many words, and you could do that. And that's all you could do for a while, but now with 1,000,000 context window, what can you do?

[00:05:18 - 00:05:40]

Speaker 1: You could put and I'm if you guys can see over there, you could do 1 hour of video. You could upload an entire video, a 1 hour video, and that would count under the 1,000,000 context window. You could do 11 hours of audio. You could do 30 greater than 30 lines of code. You could also do 700,000 words.

[00:05:41 - 00:05:59]

Speaker 1: That's still I mean, you you get an idea of, like, wow, that sounds like a a lot, but imagine you went and took 3 books. You were a student, and you went and you took 3 to 4 books. You uploaded them, and you said, create me a quiz. I have a test tomorrow. Create me a bunch of flashcards.

[00:05:59 - 00:06:29]

Speaker 1: You Couldn't do that before. You could only do single small 32,000 text input. Now you can actually send entire books, entire videos, etcetera, and you can get, summarization, categorization, translations, etcetera. This is a very monumental moment. It's monumental from the number of use cases from any practical person like, any any end user, you have so much you can do, so you don't need to be a technical person to get the value of this.

[00:06:29 - 00:06:51]

Speaker 1: From a technical perspective, this is also very monumental because we went from, in just a year, 32,000 to 200,000 to a 1000000. It was, a, a flex, but, b, it was just a very technical a very big technical monument monumental moment for for Google. So stop there.

[00:06:52 - 00:07:09]

Speaker 0: I got a lot of questions, but, so stepping back. So as a human being using this thing, what how is this gonna transform our daily lives? What what could I use this for? Mhmm. What are some applications that you see that we're gonna realize, oh my gosh.

[00:07:09 - 00:07:10]

Speaker 0: I never knew I couldn't do this before?

[00:07:13 - 00:07:49]

Speaker 1: So I'll say this first before I get into specific applications. What I'll say first is that when you start to think about large language models as a assistant, as a copilot, as a as an aid. All these words mean the same thing, but as an aid, it's really a tool for for you and whatever you're trying to do. So and and the applications, they're they're they're endless. I I know you guys can't you have to look over there to to see what I'm saying, and so I just give a few examples here, but I'll I'll say this in words as well.

[00:07:52 - 00:08:23]

Speaker 1: The and I said this the last time I came here. I think machine learning this is the first time, probably this era, machine learning has been around for a very long time. This is the first time, in the last few years where machine learning has become accessible to anyone. You don't have to be a technical person creating transformers, going in, doing, click creating classifiers, all these things. Now you can just prompt.

[00:08:23 - 00:08:44]

Speaker 1: You can send text, natural language text, to a large language model and say, do this thing for me. Give me this thing. Summarize this thing. Categorize this thing. Come up with the top 100 tweets that I should copy and paste here just based off this image so that I can copy and paste it right into my my x account.

[00:08:44 - 00:09:04]

Speaker 1: I you you when you think about it as a Copilot and the accessibility of it, it's it's incredible. We've taken something so deeply technical and put it in everyone's pocket. We've given everyone the ability to to do this. How many people in here use chat gpt? Awesome.

[00:09:04 - 00:09:22]

Speaker 1: How many people in here are technical? How crazy was that? Did you guys take a look at that? So, like, everyone's used chat gpt, but not many technical people have used it. If you had talked about machine learning 10 years ago, 5 years ago, I think everyone would have said, probably, a, what's chat gpt?

[00:09:22 - 00:09:30]

Speaker 1: But if even if it did exist, you know, people would say, machine learning? No. No. I I can't. That that's way out of my way out of my reach.

[00:09:30 - 00:09:59]

Speaker 1: So I'll just say, first of all, it's accessible to everyone. So that's a huge win. When we talk about applications so I'll give a few interesting ones that that that I'm seeing, just based off of this launch, I'm seeing people use, and and they fall under a few categories. 1 is productivity, fun, and copilot. And and they're all somewhat similar, but but I think there's those 3 categories.

[00:09:59 - 00:10:16]

Speaker 1: So let's start with co copilot. From a Copilot perspective, let's take the technical perspective just for a second, and I think you guys will see value in it if you're not a technical person. Writing code is hard. It's hard for anyone. It's hard for developers.

[00:10:16 - 00:10:39]

Speaker 1: It's hard to write good code. It's hard to test your code. It's it takes time. We now live in a world with a 1,000,000 context window where you could take your entire code base, you could upload it, and you could say, write me a bunch of tests to verify that this will work. You could do that.

[00:10:39 - 00:10:56]

Speaker 1: Before, you would have to go through every function and method, and you would have to write unit test, and you would have to you could just ask the l l m to do it for you. So this is 1. Quite technical. Number 2, I have an error. Find all the errors in my code base and give me suggestions on how to fix it.

[00:10:56 - 00:11:11]

Speaker 1: I mean, when you think about that perspective, I probably would have passed my classes way faster in CS, but it's incredible. I mean, you're you're taking and you can translate this to any field. Right? Write me an essay. Here's the book, upload the book.

[00:11:11 - 00:11:24]

Speaker 1: Write me an essay about this book and do it in my in my voice. You could you could do that. Whereas, before, you couldn't. So co copilot. Number 2, I'm seeing fun.

[00:11:24 - 00:11:55]

Speaker 1: So I'm seeing a lot of people who are, and I'll I'll demo this this later, but I'm seeing a lot of people who are love the the the ability to find the quirky and nicheness of of of certain things. So I'll give you an example. People are uploading videos and saying, summarize this let's start with the basics. Summarize this video for me. Just huge, huge video, takes, you know, 500,000 tokens.

[00:11:55 - 00:12:14]

Speaker 1: It's a 5 5 to 10 minute video. Summarize this for me. I don't even need to watch it. That's a total bummer because watching movies is great, but summarize this for me. Next, I want you to find that moment in this video where somebody handed somebody a gift, and this is a 10 minute video.

[00:12:14 - 00:12:28]

Speaker 1: Just it's it's a gift, anything. This moment where someone entered the door, the actor that was speaking with this other actor at this point in the video. You can the LLM can do this. This is incredible. This is niche.

[00:12:28 - 00:12:54]

Speaker 1: It's fun. Productivity. I'm seeing probably the most use cases where I think I start to, get excited about the world and the power of AI when I think about productivity. I'm seeing people use this for, I want you to help me create. I don't have the fun.

[00:12:54 - 00:13:14]

Speaker 1: I'm a small start up. I don't have the funds to create a marketing department. I don't have the funds to create a social department. I want to use the LLM to craft me a bunch of blog posts, x posts, social posts, and, also, what is something that will will go viral? What are the steps to do that?

[00:13:14 - 00:13:34]

Speaker 1: And this is incredible. This is incredible. When you when you think about it without the productivity hack or the, I am I I'm a small start up. I'm I'm not good in this field. I'm not when you start to think about this, this is this is something that you it's a companion with you and it can really increase productivity.

[00:13:35 - 00:13:55]

Speaker 1: Now I hope that we increase productivity and then we all spend more time with our family and our friends and our loved ones. We'll probably just end up working more and more. But I do love the fact that that we are creating tools and people are seeing this and and are rounding out their knowledge and and, taking things that were once difficult fields and and democratizing them.

[00:13:56 - 00:14:20]

Speaker 0: So a couple of things you said that sparked a question of this could spur a whole new wave of innovation. This is gonna open up like the Internet did in others. If I'm an entrepreneur, in your mind, where's the puck going? Where where would be some remarkable spaces that's next next that is gonna get to us fast and I could be there first?

[00:14:20 - 00:14:36]

Speaker 1: I love this question because I I see this as an arc. I get asked this question a lot, and I I I think people jump to robots are taking over AI's taking over the world. Robots are taking over the world. The world is over. People see that and, like, oh, that's the next 10 years.

[00:14:36 - 00:14:53]

Speaker 1: That's where we're going. But I actually see this as an arc. And anyone who's not seeing it in arc, I get kind of scared because things are happening daily. I mean, things that change the the technological advancements are just happening so fast that to think too far out is is almost missing. We're we're hallucinating.

[00:14:54 - 00:15:26]

Speaker 1: Just like our LLMs, we're hallucinating a bit here. So I will say that there are a few frontiers that I think, if I were giving ideas of what what to focus on, I think I would start with very technical. I think what you're seeing now more than anything is that chips, resources when I say resources, I mean TPUs, GPUs. The things the processing power that runs the these LLMs are very expensive. There's also limited supply.

[00:15:26 - 00:15:57]

Speaker 1: And so when I think about opportunities, I think from a very technical perspective, someone needs to solve this. Using an LLM, there are only a few big players in the space that can build them, and can offer them because you need lots of infrastructure and you need a lot of money to to purchase this. And, again, supply chains are are quite limited. So I think just this is an immediate next step future thing. If I take it out of the less technical, I start to think about a world where well, let's actually go to the next part of the arc.

[00:15:57 - 00:16:24]

Speaker 1: The next part of the arc would be more modalities. So today, even with this Gemini 1.5 launch and what you've seen with Sora and what you've seen, in Firefly, Adobe's, you're starting to see this modality of we went from text, we now have text to image, image, and, you know, you send an image and you get text output. We have video as a modality. What do you guys think is naturally next? Oh, give me something.

[00:16:25 - 00:16:30]

Speaker 1: Smell. Smell would be amazing. Wow. I totally new thing. Now we're hallucinating again.

[00:16:30 - 00:16:37]

Speaker 1: I love that. I want what he's drinking. So I would say audio is probably a natural modality. That's next. What was that?

[00:16:38 - 00:17:00]

Speaker 1: Oh, creating you can imagine another modality of building and creating things. I want to create x. I want a social platform that does y. I need a personality that does z. So I think these are natural modalities that that I think will come next.

[00:17:00 - 00:17:31]

Speaker 1: It's a natural arc. I don't think there's any crazy or surprises by by this. The third thing I would say is where I start to I don't think everyone's on board with this yet, but I start to imagine a world where the power of LLMs is LLMs are large language models are the best autocomplete in the world. They're trained on natural language. They're incredible at processing lots of natural language, making connections, and outputting content.

[00:17:33 - 00:18:01]

Speaker 1: When I start to think about that, that it's 1 of the best l it's 1 of the best auto completes in the world, I start to think about what happens to creativity? What happens to new content? What happens to new innovations? What happened when we get to this world where it's just trained on all previous content that that exists in the world. So I actually and I said this last time, and I and I I firmly believe this, but I I think this is probably the longest term future, is I start to think about creatives.

[00:18:01 - 00:18:14]

Speaker 1: I think about the creative people. Think about artists. I think about people who are generating unique and creative content, and I start to get really excited. I find that and and and, by the way, none of that's technical. Right?

[00:18:14 - 00:18:30]

Speaker 1: I'm not talking about technical creators and artists. I'm talking about art artists in general. These are the people who create delight in the world that we love to consume. As New Orleans, we love to go watch Mardi Gras parades. We love our jazz bands.

[00:18:31 - 00:18:54]

Speaker 1: We love Big Frida. Like, we love these people create amazing moments for us that the LLMs can't give us. They're giving us now because there's shock value in everything that's happening. But I imagine a world where and you're actually seeing this too, but I imagine a world where creators are at the forefront of creating and generating new content as they will always do. This it's not a new thing for a creator.

[00:18:54 - 00:19:32]

Speaker 1: They will always do do that. But now they will be using this Copilot or this tool, this LLM as a tool to help them augment, or maybe they become the, I'm sure you guys have heard this term, prompt engineers, for example. But imagine that they they are the new new era of creators becoming having these tools and using these tools to augment what they do and generate new content for all of us to still benefit from. And you're you're actually seeing this. I I think about a year ago or less than a year ago, Although this got pulled very quickly, but you saw a music artist.

[00:19:32 - 00:19:58]

Speaker 1: I forget what what, label it was from, but they had a complete AI, AI, artist. And it was generating content, and it wasn't good. It was very early early days. But I look at that, and I'm, like, well, that's a very that's interesting. That's that's someone thinking in this space of how do I take an artist or how do I take a persona from the arts and and use the LLM to to power it.

[00:19:58 - 00:20:00]

Speaker 1: I I do see a future in that.

[00:20:01 - 00:20:27]

Speaker 0: So if if the people who create the the amazing moments are creators. And New Orleans has a lot of these creators. And if they were all in this room and said, guys, Skouros, people, what would be your message to them now that for New Orleans to be positioned for the future? Maybe we have a shot of being in the forefront. What would be your message to creators now?

[00:20:28 - 00:20:44]

Speaker 1: It would be get rid of all your Apple devices. Use Android only. It would be I mean, I think this one's the the simplest 1. I I I believe this. When I said a monumental moment, I get really I get really tingly.

[00:20:44 - 00:21:02]

Speaker 1: I've been in tech for quite some time. There's very few moments where I'm watching change happen live and that change feels big. It feels really big. It feels so big. I mean, somebody who went to you know, I I went to undergraduate and studied engineering.

[00:21:02 - 00:21:23]

Speaker 1: I've been doing engineering for quite some time. I'm watching a moment going, oh, man. If I don't hold on to this, I I will lose out. Like, this is more powerful than anything I've seen. And so I I say that unlike other times where you're watching incremental incremental improvements and innovations, this is a huge leap.

[00:21:23 - 00:21:41]

Speaker 1: And I would say, unlike crypto or other things, I would say that this 1, you don't want to have this pass you by. And I would say get educated. And I wouldn't say this if we were talking machine learning. I wouldn't say everyone go get a CS degree. I'm not this is a totally different world.

[00:21:41 - 00:21:54]

Speaker 1: What I'm actually saying is play with these tools. I asked how many people are playing with chat g d p t and everyone in here. That's the first step. I think the next step is how do you create and generate new content? How do you become a creator yourself?

[00:21:54 - 00:22:22]

Speaker 1: How do you use these tools to create productivity? How do you educate others to use these tools? Do do not let this moment pass you by because this will be here for quite some time. I'll probably eat my words a year from now, like VR and AR, but well, actually, VisionPRO is not right, so everyone's excited again. But I would say this is this is a monumental time of time time to educate yourself on on generative AI in general.

[00:22:22 - 00:22:40]

Speaker 0: I think you mentioned earlier there's 900 people that have been allowed off the waiting list. And and so far, what in your mind or do you see as at the differences or the advantages and disadvantages of Gemini 1.5 versus chat GPT 4? How do you see?

[00:22:41 - 00:22:48]

Speaker 1: So I love this. Hopefully, I can't wait till this report I do that. I love all of it. Woof. So let's hit that 1.

[00:22:48 - 00:23:08]

Speaker 1: So so I'll hit the first thing. The this is announced to the this this moment talk about monumental moment in just technological history. This is also a monumental moment right here, right now because it's exactly what you said. We announced this publicly. The number of people who have this is around 900 people today.

[00:23:08 - 00:23:25]

Speaker 1: It's announced pub like, the world the world's talking about this. How many people are on x? How many people are watching the Gemini 1.5 amazingness happening all over x? How many are watching this or seeing it? Only 900 people in the world have this thing playing with it.

[00:23:25 - 00:23:43]

Speaker 1: That's incredible. So this is just for you guys, this is a monumental moment that we're just able to sit here and talk about this. The world's talking about it, but not many people have it. We'll change that later today. I think that remind me of your question.

[00:23:43 - 00:23:43]

Speaker 1: Yeah.

[00:23:43 - 00:23:44]

Speaker 0: Chat g p

[00:23:44 - 00:23:45]

Speaker 1: g. Oh, yes. Okay. Yeah. Sorry.

[00:23:45 - 00:24:14]

Speaker 1: So so what I challenge what Google has done what Chatgbt has done really well is chat gpt has created a brand awareness around their their chat, interface, this this web tool. How many people use the a p chat GPT or open AI's APIs? How many people use it? Okay. Cool.

[00:24:14 - 00:24:21]

Speaker 1: So there's, like, 1 or 2 people, a few people. How many people are using the chat app? Okay. Perfect. Most people.

[00:24:21 - 00:24:45]

Speaker 1: So when you think of they've done a very good job of leaning into the chat interface for them. On our side of the house, if you wanna call it a competitor, Bard is our competitor. It's a little bit different on how I handle it, but it's a chat interface and engagement. So has anyone used Bard, actually? Oh, wow.

[00:24:45 - 00:25:04]

Speaker 1: Okay. Great. So slightly fewer hands. But I would I so that that's a competitor. On my side of the house, my my goal so on the Gemini API in Google AI Studio, my side of the house is more focused on giving people the tools to build amazing things.

[00:25:04 - 00:25:51]

Speaker 1: We just so happen to have a low code, no code tool called Google AI Studio, formerly called Maker Suite. I think I've talked about Maker Suite in in previous sessions. And so I think when you start to compare OpenAI to to Google and where do we see the comparisons, I I think you see I think you would start to make a comparison of OpenAI's APIs and Gemini's APIs, but I think that the functionality is a bit different, and I think the use cases are tailored a bit differently. On my side of the house, we're focused on 0 to 1 innovation. So we're kind of a startup within Google, I guess you could say, called called Labs.

[00:25:52 - 00:26:45]

Speaker 1: On the cloud side of our house, it's more focused to the enterprise, etcetera. And so when you start to think about that, you you think that Labs is creating We're really focused although we're focused on scale and getting adoption and all those great things, they all pay the bills, but we're actually focused on the innovation side of things. And this is why you see things like 1,000,000 context window coming out. And so I think we we don't compare ourselves on the lab side of the house with OpenAI much. If Anything, we root them on because we love just technical innovation, but we're focused more on the innovation, less on the traditional cloud side of the house, which is looking towards closing enterprise, creating, you know, amazing reliability, availability, support, all those great things that sales teams, etcetera.

[00:26:46 - 00:27:04]

Speaker 1: My team does launch an API. I'm on the product team. I'm I me and a few other people are essentially the sales team, but we're on the product team. From a with support as well, I mean, it's a it's a tiny team. So I I think about them very, very differently.

[00:27:04 - 00:27:09]

Speaker 1: Think about us as, like, an innovation arm, and we don't really see ourselves competing with with OpenAI.

[00:27:10 - 00:27:18]

Speaker 0: Yeah. Here's my final question before we open the questions. You guys are ready. 9 in the game. If this is a 9 in the game, where are we?

[00:27:18 - 00:27:20]

Speaker 1: Oh, god. We haven't even got started yet.

[00:27:22 - 00:27:31]

Speaker 0: Yeah. What what do you see? I mean, if, you know, like the microprocessor, if is is there gonna be 1 day a trillion? I mean, you know, where are we in this? Where is this going in your mind?

[00:27:35 - 00:28:12]

Speaker 1: I think I think, like VR, AR, crypto, all of these technologies, I think we're all searching for the killer app. And so I don't know if we've created the killer app, and I don't even know if that would be our focus at on labs in my my team. Hank, our interest is, how do we create the tools for people to to build? And so when I say we're really early on, it's less on, again, the the what comes next. You could think about modalities and more copiloting and agents.

[00:28:12 - 00:28:41]

Speaker 1: You can imagine like, even if I take it outside the the, the lab side of the house, you could imagine I mean, how many people use Gmail? Great. How many people use Google Search? Fantastic. All of these surface areas are just gonna slow you can imagine slow I mean, already, most of them have gotten some AI element, but you're just gonna start to see Google's surface areas that have tons of users that rely on this every day start to pull in these generative AI advancements into into the technology.

[00:28:41 - 00:29:13]

Speaker 1: So I think we still have a lot of from what you know today about what exists today, we still have a lot of surface areas we haven't even gone into yet and that we haven't launched yet. And so I I think we have still a lot of room just in that. From what happens next in the future, I actually think it depends on you guys. And I say that because, again, if you remember, my mission statement, my team's mission statement is to build developer tools for all of you. Developer tools.

[00:29:13 - 00:29:43]

Speaker 1: When I say developer tools, I I I mean I mean technical and nontechnical. Our our goal is to build tools for you. I don't know if we're gonna get it's a sound we probably won't get this working, but I will show you you guys can come up in after this, and and I can show you the tool we have, which is a low code, no code tool that you can actually start to experiment in. But you're gonna see it's pretty raw. I mean, it's the the goal of these tools that we build is all about giving you the power to tell us what's next.

[00:29:43 - 00:30:01]

Speaker 1: And so we we're pretty close to the customer. Again, we have no sales team. It's me and a few other people who are meeting with 1,000 of of developers, individuals, companies sitting down with them saying, like, what's working? What's not? What's next?

[00:30:04 - 00:30:23]

Speaker 1: And I think that that you guys dictate what's what's next for us. But, yeah, I think we still have a long way to go is just getting generative AI and what what exists today in just our surface areas. And then, hopefully, we figure out what's next or, oof, that'll get that'll get rough. So are you guys familiar with Palm and Maker Suite? Okay.

[00:30:23 - 00:30:33]

Speaker 1: I know you are. Okay. So Palm was just the predecessor to Gemini. It was our state of the art model, last year, I guess, at this point. Jeez.

[00:30:33 - 00:31:09]

Speaker 1: And Maker Suite was the our low code, no code tool at that point. So we usually launch a low code, no code tool and the APIs together, and we give both different types of users access. They're super technical, would use the or not super technical, but people who feel comfortable using API would use the APIs out of the house. And, people who do not feel comfortable could use the low code, no code tool. I think we were we we had an assumption we were going after developers, like the people who were trying to build infrastructure and trying to build and launch production apps for their companies or for themselves.

[00:31:10 - 00:31:34]

Speaker 1: I think we were going after that. I think that was pretty clear. It was the stance we were taking. But when Gemini 1 point Gemini launched, Gemini Pro and then Gemini 1.5, I think we relooked at it and just said, man, this modality video modality, file upload modality, this is this is crazy. Like, this is something that lots of people wanna touch and experiment with, and I I think we're open now.

[00:31:34 - 00:32:11]

Speaker 1: Maybe we maybe we have a new user, and I think we're still exploring. But our our goal is always give people the tools, and let's let's see what happens with it. 1 of the biggest issues we have in, in generative AI in this space is that OpenAI had did done such a great job of setting the standard and creating a brand around, again, I I keep saying this, but this chat interface. So when people think about OpenAI, they think about Chat gpt. Actually, how many people know Chat gpt?

[00:32:11 - 00:32:25]

Speaker 1: How many people knew before I just said this that Chat gpt was owned by OpenAI? Okay. Cool. So you do have association with it. But what's incredible is that when people think about this, they with OpenAI, they think about chat gbt, it may like, directly.

[00:32:25 - 00:32:43]

Speaker 1: They don't think about DALL E. They don't think about the API. They don't even think about Sora until I mean, maybe now because it's so public. But so I say that to say that what's really hard in this industry now is that everything is compared to it. And that's a great thing for OpenAI.

[00:32:44 - 00:33:09]

Speaker 1: It becomes a little difficult at other companies that are yeah. Google is building Bard and also a chat companion or a chat interface. But the power is I mean, Bard is incredible power. Equality is great. But there's so many things that Google is offering, like, on my side of the house, the tools, the API.

[00:33:09 - 00:33:51]

Speaker 1: When I say tools, I mean developer tools, like, the programming language, the client libraries, the API, the access to the LLM, so that you could write literally a few lines of code and have this type of functionality right right out of the box, I mean, pretty easily. You need to write a few lines of code. Although, I could show you from the Low Code No Code tool how to go from to build this thing to exporting code and copy and paste and do it. So so I think the first thing I'll say is that, we aren't building today my team is not building extensions. But Bart, on the other hand, is playing in this space and and and doing this, but my side of the house is, like, build the tools for you.

[00:33:52 - 00:34:33]

Speaker 1: So I think, I think when it comes to extensions and and building productivity tools with, the GPTs, the agents, all this stuff, that is a special secret sauce of OpenAI, and I think they've done a great job. Will there ever exist a similarity from chat gbt in Google, and will this be in the surface areas that I use every day? Again, I I don't have the road map of all the product areas at Google, but this functionality, you have to imagine pretty soon that it's already happening to some degree. Like, anyone use Google Docs and the generative AI features in Google Docs? Okay.

[00:34:33 - 00:34:41]

Speaker 1: So that's rolling out. Right? So some God, I'm so glad people raised their hand. I was like, did I leak something? I think you're already seeing that.

[00:34:41 - 00:35:06]

Speaker 1: I think it's just a matter of time to give similar functionality of like, oh, I don't have to build an agent or a GPT over here. I can just use my Gmail that has this thing that's baked in that use it. You have to imagine that that's probably on the roadmap for Google. But the functionality exists today. And by the way, I keep leaning into functionality exists today because my my team is focused on building tools for internal too.

[00:35:07 - 00:36:06]

Speaker 1: So I build when I talk about the API and I talk about Gemini API and I talk about the large language models, my team supports all the products. Oh, I know you guys are gonna have lots of questions for me, but all the products that are launching internally at Google and all the external developers and and influencers and, creators as well. So, again, if you're thinking about Gemini 1.5 as a, as, like, chat gbt, it won't do those things like, take this and put it in this folder, put it on my calendar, and also send me a reminder email, which that's incredible functionality. The the LLM doesn't do that today, but what it can do is, with a little bit of code, you could create all that linkage very easily with very little code. You could even say, hey, llm, give me the code to do this.

[00:36:06 - 00:36:25]

Speaker 1: Copy and paste. Do this. So you do need to do a little bit of programming, and that's why I keep coming back. It's like, labs is very focused on giving you the tools to to build it. Now let's imagine Workspace do you use Workspace?

[00:36:26 - 00:36:59]

Speaker 1: You could imagine a world I don't know their road map, but you can imagine a world where Workspace has a lot of your productivity tools. It has your Google Sheets. It has your Docs. It you could imagine that being able to move around workspace or even in the Google ecosystem of things, you have to imagine because the the tools exist today. You can imagine a world pretty easily where Google is building this functionality and building a UI tool for you to do these things, very quickly.

[00:36:59 - 00:37:21]

Speaker 1: Doesn't exist today. It would be great. But build the tools first and then build the build the tools, see if they're if they work, people use them, and then build the killer app, which this would be a killer app, by the way. I would use this quite a lot. Then build the killer app, I think, is what you'll we'll probably see now, but you couldn't do that today specifically.

[00:37:22 - 00:38:07]

Speaker 1: If you use the generative AI tools at Google specifically, I'll talk about that because I know that that 1 a lot better, We we are pretty conservative on safety controls, where we have team a team building a a safety service for ensuring that the input and the output is completely looked at and and make sure that it passes a safety bar. And we invest a ton of resources in this space. A, it's the right thing to do, but, b, the liability at a big company like this is extreme. So there's a ton of resourcing that goes in this. Now we also do tons of evals on all of these looking for explicit content, appropriate content, etcetera.

[00:38:07 - 00:38:32]

Speaker 1: So we do a lot of safety and evaluating on each of the models that we release. So I think heavily invested. Do things happen? Definitely. And when these things get flagged, we you know, you you can imagine we're rolling back and getting new, ensuring that the model is, updated and and and and relaunched very quickly.

[00:38:33 - 00:39:07]

Speaker 1: But I would say that and I'm not a I'm not a I'm not on that safety team, but heavily resourced. I mean and and every model that goes out is heavily eval for all of this type of content. So yeah. So and how do you protect again it? I think that you I think that the LLM out of the box is I I think we we have to I think there's a thing for for us as Googlers and you as users end users of this.

[00:39:07 - 00:39:32]

Speaker 1: And I'd say us as Googlers is to continue to invest in safety. It's just it's the right thing to do. Misinformation, we see what misinformation does in this city and beyond. I think it's Google to continue to to resource this. And and just to be super clear, my organization, James Manika is the head of my organization.

[00:39:32 - 00:40:04]

Speaker 1: I don't know if anyone knows who James Manika is. So James Manika, we are under, I think it's Safety AI and Compliance. I forget the exact name of the org, but it falls under, a safety part of the org and so heavily invest in this. Google as a whole, as an entire organization, has heavily invested. But you as end users, I will say that there's also, you know, as you guys find inappropriate content we get this all the time, by the way, but as you guys find this content flagging it through the system, I mean, is an incredible thing for us too.

[00:40:04 - 00:40:32]

Speaker 1: I mean, we you know, there are hopefully, we don't miss things, but, obviously, we do and but we love the feedback, and we love when users reach out to us and we we get it quite a bit and react pretty quickly. There's a few things in the space that are already being flagged. I think plagiarism is a big risk. I think we're seeing this. And I actually, this is the beauty of an LLM is for every use case, there's a counter to use the LLM to solve that use case.

[00:40:32 - 00:40:51]

Speaker 1: So plagiarism is, like, a really good example of, you know, using the LLM, write this write this paper, and I will just take this and submit it. And people are writing on the opposite side. Tell me if this is this is fact or fiction. Tell me if this is plagiarized. Tell me if this is written.

[00:40:51 - 00:41:14]

Speaker 1: So people are doing both both sides of spectrum. I look at compute power and electricity, which is bigger than your your question, but I look at this as, you know, are we doing the right thing for the world? Are c o 2 emissions with all this compute that's happening? Are are we but, again, on the opposite side. And I think I said this last time too, and I and I believe this.

[00:41:15 - 00:41:40]

Speaker 1: So you have this problem, carbon footprint or c o 2 emissions. And on the opposite side of the spectrum, you have people now you have a tool where you can say, you have a thought partner, a very technical, deep minded, very researched thought partner, a large language model, to that you can ask and say, hey. I have this thing I'm doing. How do I reduce our carbon footprint? How do I reduce emissions?

[00:41:40 - 00:42:08]

Speaker 1: And you could now have something that you didn't have before. You only had a few researchers that could, you know, study this, but, now, we've democratized it. So, I think that's probably another space. Can we use are we using and there's actually 2 ways to answer this. There's are we using LLMs to advance do, come up with technological advancements that are already happening in the world?

[00:42:08 - 00:42:28]

Speaker 1: Like, can we use an LLM to do that? And then the second part of that is, are these technological advancements that we're aware of, like vision or fusion, using AI itself with, like, inside the system as opposed to being a thought partner with somebody? Are we using them inside the system to help accelerate these things? Is that safe to say? Yeah.

[00:42:32 - 00:43:00]

Speaker 1: Okay. Definitely. I I don't know about that specific use case. I mean, I I know a lot about the physics behind that specific use case, but I don't know about the generative AI aspects of it. But I there are definitely I mean, looking at our devices and the VisionPRO and our computers and everything we've talked about tonight, I think you're already seeing generative AI baked into these things, improving these things.

[00:43:00 - 00:43:52]

Speaker 1: Like using generative AI to create CPU efficiency, compute efficiency, using generative AI to, oh, god. Some someone did an amazing, demo the other day of, using a visually impaired individual, putting a, camera on their glasses, and having it take screenshot images and feeding that to a generative AI or an LLM and having it give a description of what's happening. You've seen so that that that's possible today. But, something that was possible a while back, but just to hit this home as well, like, like real time translator, this is done with AI as as well. So I think this is definitely the case.

[00:43:52 - 00:44:19]

Speaker 1: We are definitely using generative AI to advance and even augment technology. I think that's the goal of it. Right? I think that it it's about the system itself. On the other side of the spectrum, I think you I think you're also seeing today the if I were, like, using the nontechnical version of this, it's like having your LLM friend, Gemini, sitting next to you being like, write that blog post.

[00:44:19 - 00:44:41]

Speaker 1: Here it is. This is happening. So you're having this thought partner with you. So, again, I don't know specifically about generative AI and and vision use case, but it's already taking over augmenting and advancing all the technologies that are that are happening today. It's it's everywhere, for sure.

[00:44:41 - 00:45:06]

Speaker 0: Well, what I'd love to do is is is is phase this part of it. And and I wanna thank Arthur for a couple of things. 1 is spending time here secondarily, you know, helping New Orleans get a competitive advantage. I mean, we can sort of have a competitive advantage of understanding these tools and use those tools and more importantly, challenge New Orleanians to focus on what's next. So give me round of applause for Arthur, please.