One Line
LLMs and ChatGPT have been useful for NLP, but have limitations and environmental costs, and bring up risks of relying too heavily on models for knowledge and understanding culture.
Key Points
- LLMs (Large Language Models) have the capability to ground symbols related to textual manipulation, but cannot ground symbols such as "floor" or "chair" or "dog".
- Modularity is a major question in the field of AI and language understanding, and it is important to separate core language understanding from specific factual knowledge.
- ChatGPT is different from "traditional" language models and performs better on many tasks due to it being supervised, with access to an external modality and trained explicitly by demonstration to follow a large set of instructions.
- Current-day language models have achieved remarkable performance, but have not yet "solved" all language understanding problems.
- Yoav Goldberg argues that achieving perfect language modeling is equivalent to achieving human-level intelligence.
- Large Language Models (LLMs) can be accessed via GitHub and provide information about these models.
Summaries
255 word summary
LLMs (Large Language Models) are important tools for Natural Language Processing (NLP) and GitHub offers LLMs.md, which provides information about them. Yoav Goldberg argues that achieving perfect language modeling is equivalent to achieving human-level intelligence. Current-day language models (e.g. GPT-3) have achieved remarkable performance, but have not yet "solved" all language understanding problems.
ChatGPT differs from GPT-3 in three ways: it is trained on programming language code data, it uses supervised learning, and it is grounded. Traditional language models do not have a connection to anything outside the text, so they cannot access meaning or communicative intent.
A discussion was also started on the topic of LLMs, their efficiency, and environmental costs. It was suggested that increases in efficiency of hardware correlate with higher demand, which may not necessarily lower emissions. It was proposed that training and using LLMs is expensive.
A discussion was started on the topic of ChatGPT and its evolving abilities and limitations. It was asked if it can be used to clarify its own abilities and limitations, as well as if it can relate multiple texts to each other. Modularity is an issue when it comes to language understanding and machine translation cannot capture nuances of culture. Large language models have been useful, but there is a risk of them fabricating plausible statements out of thin air. Citing sources and understanding epistemic authority are important for inclusive language understanding. Lastly, there is a risk of relying too heavily on models for knowledge which may contain traps such as the Gettier problem.
447 word summary
Large Language Models (LLMs) are important tools for Natural Language Processing (NLP). GitHub offers LLMs.md, which provides information about these models and can be downloaded as a ZIP or cloned with Git or SVN. It can also be embedded in a website via a sharable link or HTML code. Yoav Goldberg argues that achieving perfect language modeling is equivalent to achieving human-level intelligence due to the complexity of the game. Current-day language models (e.g. GPT-3) have achieved remarkable performance, but have not yet "solved" all language understanding problems. ChatGPT differs from GPT-3 in three ways: it is trained on programming language code data, it uses supervised learning, and it is grounded. Traditional language models do not have a connection to anything outside the text, so they cannot access meaning or communicative intent. The latest wave of models is trained on programming language code data, which includes natural language instructions or descriptions and the corresponding programming language code. Models do not understand language like humans do, but they can model observed language and encode biases. Training these models is expensive, but the cost is miniscule compared to energy consumption. ChatGPT and RLHF perform better than traditional language models due to being supervised and trained explicitly. However, they cannot understand real-world events and form a coherent world view, as they lack mechanisms to distinguish between knowledge and guesswork. Modularity is an issue when it comes to language understanding and machine translation cannot capture nuances of culture. Large language models have been useful, but there is a risk of them fabricating plausible statements out of thin air. Citing sources and understanding epistemic authority are important for inclusive language understanding. Lastly, there is a risk of relying too heavily on models for knowledge which may contain traps such as the Gettier problem. LLMs (Large Language Models) can ground symbols related to textual manipulation, such as "summarize", "translate", and "paraphrase". Chomsky's basic argument about linguistics suggests that humans have already evolved to possess pre-built patterns for language usage, and humans learn a language quickly compared to the data set used to train LLMs. LLMs are data hungry, requiring large sets of data to achieve impressive performance.
A discussion was started on the topic of ChatGPT and its evolving abilities and limitations. It was asked if it can be used to clarify its own abilities and limitations, as well as if it can relate multiple texts to each other.
A discussion was also started on the topic of LLMs, their efficiency, and environmental costs. It was suggested that increases in efficiency of hardware correlate with higher demand, which may not necessarily lower emissions. It was proposed that training and using LLMs is expensive.
1221 word summary
You signed out in another tab or window. Reload to refresh your session. You signed in with another tab or window. Reload to refresh your session.
A discussion was started on the topic of LLMs (Language Model Machines), their efficiency and environmental costs. It was suggested that increases in efficiency of hardware correlate with higher demand, which may not necessarily lower emissions. It was also proposed that training and using LLMs is expensive.
The discussion then shifted to the topic of ChatGPT and its evolving abilities and limitations. It was asked if it can be used to clarify its own abilities and limitations, as well as if it can relate multiple texts to each other.
Finally, it was commented that the post was not written by ChatGPT, and the author's spelling was corrected by another user. LLMs (Large Language Models) are capable of grounding symbols related to textual manipulation, such as "summarize", "translate", and "paraphrase". It is unclear if they can be trained to ground symbols such as "floor" or "chair" or "dog". Chomsky's basic argument about linguistics suggests that humans have already evolved to possess pre-built patterns for language usage, and humans learn a language quickly compared to the data set used to train LLMs. However, LLMs are data hungry and require large sets of data to achieve impressive performance.
Other comments suggest that ML models may not be able to include disruptive changes in knowledge when they have been running for a while and start to drift. Additionally, it was suggested that citing the source is important. Finally, there was a discussion of the risk of creating a world that relies on a better form of Galactica for its knowledge, which could contain traps such as the Gettier problem. Large language models have been incredibly useful as knowledge tools, but it is important that they not be allowed to fabricate plausible statements out of thin air. Determining epistemic authority and how one knows what they know is as important as language understanding. Current language models are more than language models and can do much more than expected, but language modeling is not enough for inclusive language understanding. Modularity is a major question in the field of AI and language understanding. To address data hunger and cultural knowledge gaps, it is important to separate core language understanding from specific factual knowledge. Machine translation can provide superficial understanding, but cannot capture nuances of culture, norms, and events. This means that results achieved in English cannot be replicated in other languages that have less data available. This is a major technical issue that needs to be addressed. Rare events, high recall setups, and high coverage setups make me suspicious of the ability of models to learn from rare events or recall rare occurrences. Their basic building blocks are "word pieces" which do not correspond to numbers, making it difficult for them to perform math. Also, they lack explicit mechanisms to distinguish between knowledge and guesswork, and have no notion of time. This can lead to them “confidently making stuff up” and believing contradictory statements. Models trained on multiple texts cannot understand how these texts relate to real-world events and form a coherent world view. There are several challenges in current "large language models" which prevent them from fully understanding language in some real sense, such as not being able to cite sources, or understand the effects these models may have on society. The models can't learn anything meaningful based only on form, but they are not trained only on form. They still can tell us a lot about language structure, and for what they don't tell us, we can look elsewhere. Models do not understand language like humans do, but they cover certain aspects very well. Those who want to really understand language may prefer to look elsewhere, but the models do model observed human language and encode many biases and stereotypes. Training these models is expensive, but the total cost is miniscule compared to other energy consumptions. ChatGPT is different from "traditional" language models and performs better on many tasks due to it being supervised, with access to an external modality and trained explicitly by demonstration to follow a large set of instructions. RLHF also helps the model learn how dialogs work by observing two humans in a conversation, one playing the role of a user and the other playing the role of "the AI". This is significant as it produces a direct form of grounding between human language and programming language, allowing us to learn more from it than we could learn "from form alone". The latest wave of models is trained on programming language code data, which includes natural language instructions or descriptions and the corresponding programming language code. This makes the model easier to learn from direct instructions than from non-instruction data, and allows for the use of less training text.
Instructions provide a level of grounding to the text, allowing the model to learn the communicative intent of the user who asks for a "summary", for example. This is known as "supervised learning".
Traditional language models do not have a connection to anything outside the text, so they cannot access meaning or communicative intent. This is known as "not grounded".
ChatGPT is different from GPT-3 in three ways: it is trained on programming language code data, it uses supervised learning, and it is grounded. Current-day language models (e.g. GPT-3) differ from the traditional understanding of language modeling (LM) as defined by Shannon's game. These models have achieved remarkable performance, but have not yet "solved" all language understanding problems. The performance of current-day LMs is not just obtained through LM, but rather a combination of naturally occuring text data and other techniques. Building a large LM alone will not "solve everything". Yoav Goldberg argues that achieving perfect language modeling is equivalent to achieving human-level intelligence. This is because the game requires understanding of text, the situation described in the text, and responding appropriately. In 2014-2017, Yoav Goldberg was giving a lecture about this idea and in a panel was asked what he would do if given infinite compute and no need to worry about labour costs. He responded that he would train a really huge language model, which has aged poorly or not. He shares his perspective on language understanding in relation to chatGPT and similar models.
Yoav Goldberg argues perfect language modeling is AI-complete, meaning it would require solving every AI problem to reach human-level intelligence. Computers are still not very good at this game compared to humans, but by teaching them to play it, we gain implicit knowledge of language. Humans are great at this game without even practicing, but it is hard for them to get better at it. He provides examples of the game demonstrating various levels of linguistic understanding needed to play it well. Large Language Models (LLMs) are an important tool for Natural Language Processing (NLP). GitHub offers LLMs.md, a file which provides information about these models. It can be downloaded as a ZIP or cloned with Git or SVN. Additionally, it can be embedded in a website via a sharable link or HTML code. The file was last active on January 8, 2023 and is maintained by yoavg. Sign up or sign in to GitHub to access LLMs.md and other gists.