Summary Ted Xiao on Twitter: "The golden days of internet-scale models achieving unprecedented zero-shot results seem to be waning. The new Big Thing is subsequent fine tuning with humans increasingly out of the loop. How does this work? Let’s explore *Prior Amplification* 🔎 (1/N) https://t.co/5JWFik2hgX" / Twitter twitter.com
521 words - html page - View html page
One Line
Ted Xiao's Twitter thread proposes Prior Amplification as an alternative to internet-scale models, with GPT-3's RLHF using a human preference model learned from "good" dialogue to fine-tune and achieve success.
Key Points
- Ted Xiao's Twitter thread proposes Prior Amplification as an alternative to internet-scale models achieving zero-shot results, which involves training a base model on unstructured data and using engineered assumptions or small datasets to learn priors
- GPT-3's RLHF has evolved from fine-tuning with human-curated dialogue to fine-tuning with a human preference model learned from "good" dialogue
- Prior Amplification and GPT-3's RLHF are making waves in the AI field
Summary
89 word summary
Ted Xiao's Twitter thread examines the waning success of internet-scale models achieving zero-shot results, and suggests Prior Amplification as an alternative that leverages "good" priors to combine scale and alignment. This process involves training a base model on large amounts of unstructured data, using engineered assumptions or small, high-quality datasets to learn priors, and applying those priors to create a new dataset for training. GPT-3's RLHF has evolved from fine-tuning with human-curated dialogue to fine-tuning with a human preference model learned from "good" dialogue, making waves in the process.