Summary Google "We Have No Moat, And Neither Does OpenAI" www.semianalysis.com
2,912 words - html page - View html page
One Line
Open source AI is gaining ground on Google and OpenAI, as the barrier to entry for AI training and experimentation has dropped, with many major open problems already solved and in people's hands today, while both Google and OpenAI's competitive advantage is decreasing.
Key Points
- OpenAI and other companies are releasing models and datasets using open source methods.
- Google should consider taking a more open approach to language models and working with the open source community.
- Open source innovations have solved problems that Google is still struggling with, and paying more attention to their work could help avoid reinventing the wheel.
- The barrier to entry for AI training and experimentation has dropped, allowing for a tremendous outpouring of innovation from the open-source community.
- Giant models are slowing down progress, and the best hope is to learn from and collaborate with others outside of Google.
- A leaked internal document from a Google researcher claims that open source AI will outcompete Google and OpenAI.
Summaries
286 word summary
A leaked internal Google document suggests that open source AI will outcompete Google and OpenAI, as the barrier to entry for AI training and experimentation has dropped, allowing for a tremendous outpouring of innovation from the open-source community. Google's models still hold a slight edge in quality, but the gap is closing quickly, and many major open problems are already solved and in people's hands today. The uncomfortable truth is that neither Google nor OpenAI has a moat in this arms race. Google should consider whether each new application needs a whole new model or if aggressive forms of distillation can be used to retain previous generations' capabilities. Open source innovations solved problems that Google is still struggling with, and paying more attention to their work could help avoid reinventing the wheel. There is a vast outpouring of creativity in image generation models, with individuals creating and using models in their particular subgenre. Google should take a more open approach to its language models and cooperate with the broader conversation in the open source community. The value of owning the ecosystem cannot be overstated. OpenAI has announced that they lack a significant competitive advantage, or "moat." They have released a dialogue model called Koala, trained using freely available data, and introduced instruction tuning and multimodality in just one hour of training with 1.2M learnable parameters. Cerebras has trained the GPT-3 architecture using the optimal compute schedule and scaling, resulting in an open-source GPT-3 ecosystem. Nomic has created ShareGPT, which uses GPT-4-powered eval to provide qualitative comparisons of model outputs. A cross-university collaboration has released a 13B model that achieves parity with Bard, and low rank fine-tuning can be distributed easily and separately from the original weights.
623 word summary
OpenAI has announced that they have no moat, meaning that they don't have any significant competitive advantage. They have released a dataset and a model called Koala, which is a dialogue model trained entirely using freely available data. They have also introduced instruction tuning and multimodality in one hour of training with just 1.2M learnable parameters, achieving a new SOTA on multimodal ScienceQA. Cerebras has trained the GPT-3 architecture using the optimal compute schedule implied by Chinchilla, and the optimal scaling implied by Vicuna, resulting in an open-source GPT-3 ecosystem. Nomic has created ShareGPT, which uses GPT-4-powered eval to provide qualitative comparisons of model outputs. A cross-university collaboration has released a 13B model that achieves parity with Bard, and low rank fine-tuning can be distributed easily and separately from the original weights, making them independent of the original license from Meta. Google should take a more open approach to its language models and cooperate with the broader conversation in the open source community. The release patterns of Google and OpenAI aim to retain tight control over how their models are used, but this control is a fiction. By owning the platform where innovation happens, Google can cement itself as a thought leader and direction-setter, shaping the narrative on ideas that are larger than itself. The value of owning the ecosystem cannot be overstated. The one clear winner in all of this is Meta, as the leaked model was theirs, and they have effectively garnered an entire planet's worth of free labor. There is a vast outpouring of creativity in image generation models, with individuals creating and using models in their particular subgenre. Individuals have legal cover and access to cutting-edge technologies while corporations are constrained by licenses. Open source has significant advantages that cannot be replicated in proprietary models. High-quality datasets are open source and rapidly becoming the standard way to do training outside of Google. Focusing on maintaining some of the largest models on the planet puts Google at a disadvantage, as small models can be fine-tuned quickly and iterated faster. Google should consider whether each new application needs a whole new model or if aggressive forms of distillation can be used to retain previous generations' capabilities. Training giant models from scratch throws away pretraining and iterative improvements, making full retrain costly. LoRA is an effective technique as it allows model fine-tuning at a fraction of the cost and time. Open source innovations solved problems that Google is still struggling with, and paying more attention to their work could help avoid reinventing the wheel. Low-cost public involvement was enabled by a vastly cheaper mechanism for fine-tuning called low rank adaptation. The current renaissance in open source LLMs comes hot on the heels of a renaissance in image generation, with many calling this the Stable Diffusion moment. The barrier to entry for AI training and experimentation has dropped, allowing for a tremendous outpouring of innovation from the open-source community. While Google's models still hold a slight edge in quality, the gap is closing astonishingly quickly, and many major open problems are already solved and in people's hands today. Giant models are slowing down progress, and the best hope is to learn from and collaborate with others outside of Google. The uncomfortable truth is that neither Google nor OpenAI has a moat in this arms race. A leaked internal document from a Google researcher claims that open source AI will outcompete Google and OpenAI. The document has been verified as authentic, but is only the opinion of one employee and not representative of the entire firm. The document raises interesting points, but other researchers do not agree with its contents. A separate piece will be published for subscribers with their opinions on the matter.