Summary Retrieval-Augmented Language Generation with Continuous Information Gathering arxiv.org
14,392 words - PDF document - View PDF document
One Line
Researchers propose retrieval-augmented language models as a solution to inaccurate output and demonstrate the effectiveness of their method, FLARE, on generating natural language text for complex questions.
Key Points
- Retrieval-augmented language models (LMs) can improve accuracy in long-form generation tasks.
- FLARE is a retrieval-augmented language generation framework that uses active retrieval augmented generation methods.
- FLARE achieves superior or competitive performance compared to single-time and multi-time retrieval baselines.
- FLARE involves multi-time retrieval triggered either by every token or every few tokens.
- The approach involves implicit and explicit query formulation methods and iteratively retrieves information based on the upcoming sentence.
- The proposed method for generating natural language responses to ambiguous questions involves identifying and answering each possible interpretation separately.
Summaries
210 word summary
Researchers propose retrieval-augmented language models as a solution to inaccurate output from relying solely on input. They introduce active retrieval-augmented generation, where a language model actively decides when and what to retrieve during the generation process. The study tests FLARE along with baselines over 4 long-form knowledge-intensive generation tasks and achieves superior or competitive performance on all tasks, demonstrating the effectiveness of the method. FLARE is a method for generating natural language text that involves multi-time retrieval triggered by every token or every few tokens. The paper evaluates the effectiveness of FLARE on 4 different datasets, including generating comprehensive answers to complex questions and utilizing world and commonsense knowledge to generate an answer. The document presents a natural language processing system that generates multiple answers to ambiguous questions by identifying and answering each possible interpretation separately. One paper discusses the challenges of open-domain multi-document summarization and retrieval for open-domain question answering, proposing a retrieval-augmented language model that combines retrieval and language models within a single transformer. Another paper explores the use of retrieval-augmented language models for generating long-form answers to multi-hop questions, emphasizing the importance of chain-of-thought reasoning and consistency. The document also includes tables with statistics and experimental settings for different tasks/datasets, as well as hyperparameters and excluded domains.
390 word summary
The document presents a natural language processing system that generates multiple answers to ambiguous questions by identifying and answering each possible interpretation separately. The method involves considering different criteria or versions to determine the correct interpretation and generate a concise answer. The document provides examples of 2WikiMultihopQA questions and their answers, emphasizing the use of the Search API and step-by-step thinking. One paper discusses the challenges of open-domain multi-document summarization and retrieval for open-domain question answering, proposing a retrieval-augmented language model that combines retrieval and language models within a single transformer. Another paper explores the use of retrieval-augmented language models for generating long-form answers to multi-hop questions, emphasizing the importance of chain-of-thought reasoning and consistency. The document also includes tables with statistics and experimental settings for different tasks/datasets, as well as hyperparameters and excluded domains. FLARE is a method for generating natural language text that involves multi-time retrieval triggered by every token or every few tokens. It generates questions that can be answered by retrieved information using explicit and implicit query generation. The paper evaluates the effectiveness of FLARE on 4 different datasets, including generating comprehensive answers to complex questions and utilizing world and commonsense knowledge to generate an answer. The paper proposes an active retrieval augmented generation framework called FLARE that iteratively generates a temporary next sentence, uses it as a query to retrieve relevant documents, and then regenerates the next sentence conditioning on the retrieved documents. The approach involves iteratively generating search queries and retrieving relevant documents to regenerate the next sentence. FLARE achieves superior or competitive performance compared to single-time and multi-time retrieval baselines. Researchers propose retrieval-augmented language models as a solution to inaccurate output from relying solely on input. They introduce active retrieval-augmented generation, where a language model actively decides when and what to retrieve during the generation process. The paper discusses approaches to active retrieval, including forward-looking active retrieval augmented generation (FLARE), which iteratively generates a temporary next sentence based on user input and initial retrieval results. Retrieving extra information during the generation process is necessary for generating complex content. Long-form generation tasks, such as open-domain summarization and multihop question answering, can benefit from active retrieval-augmented generation. The study tests FLARE along with baselines over 4 long-form knowledge-intensive generation tasks and achieves superior or competitive performance on all tasks, demonstrating the effectiveness of the method.
933 word summary
Retrieval-augmented language models (LMs) are proposed as a solution to the issue of LMs generating inaccurate output due to relying solely on input. The researchers from Carnegie Mellon University's Language Technologies Institute and Sea AI Lab propose a new paradigm for long-form generation tasks called active retrieval-augmented generation, where a language model (LM) actively decides when and what to retrieve throughout the generation process. The paper discusses several approaches to active retrieval, including forward-looking active retrieval augmented generation (FLARE), which iteratively generates a temporary next sentence based on user input and initial retrieval results. The paper highlights the importance of retrieving extra information as needed during the generation process to ensure that the LM has enough knowledge to generate complex content. The paper also presents several examples of long-form generation tasks, including open-domain summarization and multihop question answering, where active retrieval-augmented generation can be applied. The study tests FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks and achieves superior or competitive performance on all tasks, demonstrating the effectiveness of the method. The paper proposes a retrieval-augmented language generation framework called FLARE, which uses active retrieval augmented generation methods to aid long-form generation with retrieval. FLARE iteratively generates a temporary next sentence, uses it as a query to retrieve relevant documents, and then regenerates the next sentence conditioning on the retrieved documents. FLARE achieves superior or competitive performance compared to single-time and multi-time retrieval baselines. The paper discusses confidence-based active retrieval and direct FLARE methods to address issues with generating search queries and disrupting answer generation. The approach involves iteratively generating search queries and retrieving relevant documents to regenerate the next sentence. FLARE is a method for generating natural language text by iteratively retrieving and incorporating information from external sources. The approach involves multi-time retrieval, which can be triggered either by every token or every few tokens. FLARE generates questions that can be answered by the retrieved information using explicit and implicit query generation. The paper discusses retrieval-augmented language generation for multihop question answering and commonsense reasoning, and evaluates the effectiveness of FLARE on 4 different datasets. The tasks include generating comprehensive answers to complex questions, answering ambiguous questions, and utilizing world and commonsense knowledge to generate an answer. The article discusses open-domain summarization and its evaluation metrics, including ROUGE and QA model with Disambig-F1. It presents the FLARE method for retrieval-augmented language generation, which outperforms other baseline methods in multihop QA tasks and is effective at generating on-topic output. The paper also conducts an ablation study to evaluate the efficacy of various design choices, including the importance of forward-looking retrieval and question decomposition. The paper proposes an active retrieval augmentation framework for long-form language generation that retrieves relevant information during generation. The framework uses a combination of implicit and explicit query formulation methods and iteratively retrieves information based on the upcoming sentence. The paper concludes by acknowledging the support of funding agencies and individuals who contributed to the experiments. Finally, it mentions the authors and references related to the paper, including the use of FLARE for answering open-domain questions and several datasets used for evaluating language models. This collection of papers covers various topics in natural language processing and machine learning, showcasing the current state of the art in the field. One paper discusses the challenges of open-domain multi-document summarization and retrieval for open-domain question answering, proposing a retrieval-augmented language model that combines retrieval and language models within a single transformer. Another paper explores the use of retrieval-augmented language models for generating long-form answers to multi-hop questions, emphasizing the importance of chain-of-thought reasoning and consistency. The document also includes tables with statistics and experimental settings for different tasks/datasets, as well as hyperparameters and excluded domains. Retrieval-Augmented Language Generation with Continuous Information Gathering provides examples of 2WikiMultihopQA questions and their answers, emphasizing the use of the Search API and step-by-step thinking. The questions include identifying director's death dates, determining Nepalese Royals' grandchildren, identifying film directors who died later, and more. The document also includes a compilation of yes or no questions with their respective answers.
The proposed method for generating natural language responses to ambiguous questions involves identifying and answering each possible interpretation separately. Examples of ambiguous questions include release dates of Bohemian Rhapsody in different countries, the second largest city in the USA by population or area, and actresses who played Bonnie in different versions of Gone with the Wind. The method involves considering different criteria or versions to determine the correct interpretation and generate a concise answer. The document also includes examples of straightforward questions that can be answered with a yes or no. The document, Retrieval-Augmented Language Generation with Continuous Information Gathering, presents a natural language processing system that generates multiple answers to ambiguous questions. The system considers different interpretations of the question and provides answers for each. Examples of ambiguous questions include those about music, movies, and sports teams. For each question, hints are provided to identify which aspect of the question is ambiguous. The system considers different charts, regions, releases, versions, locations, and venues to provide accurate answers for each interpretation. The Aslanhane Mosque in Turkey was built in 1290 and restored in 2010-2013. "Untold Legends: The Warrior's Code" is a hack n' slash action role-playing game with mixed reviews. St. Augustine was destroyed and then rebuilt by Spanish settlers after the Treaty of Nonsuch and the Raid on St. Augustine in 1586 by Sir Francis Drake. Lakewood is a historic mansion in Alabama with a rare plan for early Alabama architecture. Carlos Moedas is a Portuguese politician known for promoting innovation and entrepreneurship in Europe.
3730 word summary
Carlos Moedas, a Portuguese politician, has had a successful career in politics. He was born in Beja, Portugal in 1970 and was educated at the University of Lisbon where he studied engineering. He worked at Goldman Sachs before entering politics. He served as the Minister of Environment, Spatial Planning and Energy from 2011 to 2014 and later as the European Commissioner for Research, Science and Innovation from 2014 to 2019. Moedas is known for his work on promoting innovation and entrepreneurship in Europe.
Lakewood is a historic mansion located in Livingston, Alabama. Built in 1840, it features a rare plan for early Alabama architecture, with a brick ground floor topped by one-and-a-half-stories of wood-frame construction. The facade is divided into five bays with central entrance doors on the ground and principle floors. The bays are divided by two-story Doric pilasters, with the middle third of the facade occupied by a two-tiered tetrastyle Doric portico. The house has formal rooms on the principle floor and bedrooms on the upper floor, with a central hallway present on all levels.
In 1586, Sir Francis Drake led an expedition to raid and plunder the Spanish New World. He had previously attacked Santiago in November 1585 and Santo Domingo in January 1586. He then attacked the important city of Cartagena on February 19th before heading north towards the Spanish settlement of St Augustine. There, he hoped to avenge his fellow Protestants who had been ordered to leave by the Spanish under Pedro Menendez de Aviles. Drake's fleet traveled north within sight of land on the Florida peninsula sailing past the Cuban mainland. They stopped to gather supplies and counter an outbreak of dysentery before making the Transatlantic crossing back to England. Raid on St. Augustine: Background: After the Treaty of Nonsuch, Philip II of Spain declared war on England. The fort in St. Augustine was destroyed and replaced with Castillo de San Marcos. The Spanish settlers had to rebuild after the English left. Aftermath: When the Spanish settlers returned, they found the settlement in ruins with very little left.
Untold Legends: The Warrior's Code: Development: The game is a hack n' slash action role-playing game with a focus on action-oriented combat. Gameplay: The game received "mixed or average reviews" according to Metacritic.
Aslanhane Mosque: History: The mosque is one of the oldest mosques in Turkey and was built during the reign of Mesud II of the Anatolian Seljuks in 1290. It was commissioned by Ahi leaders Husamettin and Hasaneddin and was later repaired by S?erafettin in 1330. The mosque was restored by the directorate general of foundations in 2010-2013 term. Location: The mosque is located in the old quarter of Ankara next to Ankara Castle and overlooks Ankara at an altitude of 947 meters.
The Breakfast Club: Interpretation: The question is ambiguous in terms of which specific name is being referred to - the character's name or the actor's name. Therefore, both interpretations need to be considered when answering the question.
Roy Orbison's First Number One Hit: Interpretation: The question has two interpretations - one for the US Hot 100 and Canada, and one for the UK and Ireland. Retrieval-Augmented Language Generation with Continuous Information Gathering is a document that discusses a system for generating answers to ambiguous questions. The system considers different interpretations of the question and provides answers for each. The document provides examples of ambiguous questions and their different interpretations. The examples include questions about music, movies, and sports teams. For each question, the document provides hints on which aspect of the question is ambiguous and suggests possible interpretations. The system considers different charts, regions, releases, versions, locations, and venues to provide accurate answers for each interpretation. Retrieval-Augmented Language Generation with Continuous Information Gathering is a document that discusses a natural language processing system. The system is designed to interpret ambiguous questions and provide multiple possible answers based on different interpretations. The document provides examples of ambiguous questions and how the system can be used to provide multiple answers. The examples include questions about the release date of Bohemian Rhapsody, the second largest city in the USA, who played Bonnie in Gone with the Wind, and the name of the criminal's character in The Breakfast Club. The system uses hints to determine which aspect of the question is ambiguous and provides answers based on all possible interpretations. Question 1: What is the criminal's name in the breakfast club? This question is ambiguous in terms of which specific name is being referred to - the character's name or the actor's. In order to figure out its interpretations and answer them one by one.
Question 2: What was roy orbison's first number one hit? This question is ambiguous in terms of which specific chart or region is being referred to. In order to figure out its interpretations and answer them one by one.
Question 3: When does the movie summer of 84 come out? This question is ambiguous in terms of which release of the movie is being referred to. In order to figure out its interpretations and answer them one by one.
Question 4: When did xbox one come out in australia? This question is ambiguous in terms of which specific version of the Xbox One is being referred to. In order to figure out its interpretations and answer them one by one.
Question 5: Where do the Philadelphia Eagles play their home games? This question has 3 interpretations: (1) What city do the Philadelphia Eagles play their home games? (2) In what sports complex do the Philadelphia Eagles play their home games? (3) What stadium do the Philadelphia Eagles play their home games? The answers to all interpretations are provided. Retrieval-Augmented Language Generation with Continuous Information Gathering proposes a method for generating natural language responses to ambiguous questions by identifying and answering each possible interpretation separately. The document provides examples of ambiguous questions and their multiple interpretations, including questions about the release date of Bohemian Rhapsody in different countries, the second largest city in the USA by population or area, and the actress who played Bonnie in different versions of Gone with the Wind. The method involves considering different criteria or versions to determine the correct interpretation and generate a concise answer. The document also includes examples of straightforward questions that can be answered with a yes or no, such as whether a pear would sink in water or whether a llama could give birth twice during the War in Vietnam. The following text excerpt is a compilation of yes or no questions with their respective answers. The first question is about whether a llama can birth twice during the War in Vietnam, to which the answer is no. The second question asks if it is common to see frost during some college commencements, to which the answer is yes. The third question is about whether Hydrogen's atomic number squared exceeds the number of Spice Girls, to which the answer is no. The fourth question asks if Brooke Shields could succeed at the University of Pennsylvania, to which the answer is yes. The fifth question is about whether hamsters provide food for any animals, to which the answer is yes. The sixth question is about which country the director of the film Citizen Mavzik is from, to which the answer is not provided in the text. The seventh question asks who is the grandchild of Krishna Shah (Nepalese Royal), to which the answer is Prithvipati Shah. The eighth question is about which film had the director die later, The Gal Who Took the West or Twenty Plus Two, to which the answer is Twenty Plus Two. The ninth question asks when did the director of film Laughter In Hell die, to which the answer is August 25, 1963. The tenth question asks who was born first out of Martin Hodge and Ivania Martinich, to which the answer is Martin Hodge. The eleventh question asks who is Boraqchin (Wife Of Ogedei)'s father-in-law, to which the answer is Genghis Khan. The twelfth question asks if the director of film Coolie No. 1 (1995 Film) and director of film The Sensational Trial have the same nationality, to which the answer is no. The thirteenth question asks if Kurram Garhi and Trojkrsti are located in the same country, to which the answer is no. Retrieval-Augmented Language Generation with Continuous Information Gathering provides examples of 2WikiMultihopQA questions and their answers. The document emphasizes two skills for answering questions: utilizing the Search API and thinking step-by-step. The questions include identifying a director's death date, determining the grandchild of a Nepalese Royal, identifying a film's director who died later, determining the birth order of two individuals, identifying a father-in-law, determining if two directors have the same nationality, identifying the location of two places, and identifying the meaning behind colors on a flag. The answers are provided with step-by-step reasoning and in some cases, utilizing the Search API. The document omits any unrelated information and does not include boilerplate or cookie warnings. This document discusses retrieval-augmented language generation using continuous information gathering. The document includes tables with statistics and experimental settings for different tasks/datasets, as well as prompts and exemplars for these tasks/datasets. The hyperparameters of FLARE on different datasets and the Wikipedia-related domains excluded from Bing's search results are also listed in tables. The document is part of the proceedings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Retrieval-augmented language generation with continuous information gathering is a topic covered in a research paper. The paper discusses the use of models in generating long-form answers to multi-hop questions. It also covers the use of chain-of-thought reasoning and tool-former models in language generation. The paper emphasizes the importance of consistency in chain-of-thought reasoning and highlights the benefits of self-knowledge-intensive multi-step questions. Additionally, the paper covers the use of open-domain question answering and probabilistic relevance frameworks in information retrieval. The paper concludes by questioning how much knowledge can be packed into the parameters of a language model. Retrieval-Augmented Language Generation with Continuous Information Gathering is a paper by Adam Roberts, Colin Raffel, and Noam Shazeer. The paper explores the use of retrieval-augmented language models, which are unsupervised multitask learners that use transfer learning with a unified text-to-text transformer. The authors also discuss the limits of these models and narrowing the compositionality gap in language models. They also explore measuring the quality of abstractive summarization and training language models to follow instructions. Other papers mentioned include Language Models as Knowledge Bases?, Retrieval-Augmented Generation for Question Answering, and You Only Need One Model for Open-Domain Question Answering. The article discusses the retrieval-augmented language generation approach for solving complex tasks in natural language processing (NLP). It explores the challenges of open-domain multi-document summarization and retrieval for open-domain question answering. The authors propose a method of composing retrieval and language models within a single transformer. They also discuss the use of dense passage retrieval and demonstrate-search-predict techniques in NLP. Additionally, the article highlights the importance of memorization and nearest neighbor language models in learning representations. The authors also introduce a supervised challenge dataset for reading comprehension and discuss the hurdles to progress in long-form question answering. Finally, the article mentions various researchers and their contributions in the field of NLP. This document is a collection of various papers related to natural language processing and machine learning. The papers cover topics such as retrieval-augmented language models, question answering, few-shot learning, summarization, and language modeling. Some of the papers highlight new datasets or benchmarks for evaluating models, while others propose new methods or techniques for improving model performance. Overall, the papers showcase the current state of the art in natural language processing research. This is a list of authors and references related to the paper "Retrieval-Augmented Language Generation with Continuous Information Gathering". The paper discusses the use of retrieval-based models for generating text, specifically in the context of answering open-domain questions. The authors experiment with a technique called FLARE, which involves continuous retrieval and integration of information during generation. They note that this approach can be computationally expensive and may not always improve performance. The paper also mentions several datasets used for evaluating language models, including ELI5 and Wizard of Wikipedia. The paper proposes an active retrieval augmentation framework for long-form language generation that retrieves relevant information during generation. The framework uses a combination of implicit and explicit query formulation methods and iteratively retrieves information based on the upcoming sentence. The effectiveness of the framework is demonstrated through experimental results on four tasks/datasets. The paper also investigates the importance of active retrieval and compares the performance of different query formulation methods. The results show that high-confidence sentences may not always need retrieval and that the use of low-confidence tokens can impede generation. The paper concludes by acknowledging the support of funding agencies and individuals who contributed to the experiments. The paper presents FLARE, a retrieval-augmented language generation method that uses forward-looking retrieval to improve question answering performance. FLARE outperforms baselines on StrategyQA, ASQA, ASQA-hint, WikiAsp, and 2WikiMultihopQA datasets. The paper also conducts an ablation study to evaluate the efficacy of various design choices, including the importance of forward-looking retrieval, the use of retrieval instructions and exemplars, and the effectiveness of question decomposition. The study shows that forward-looking retrieval is more powerful than past-context-based retrieval, and that retrieval instructions and exemplars are challenging to generate manually. The study also shows that the question decomposition approach achieves the best performance among all baselines. The article discusses the FLARE (Forward-Looking Active Retrieval augmented generation) method for retrieval-augmented language generation with continuous information gathering. The method involves using previous sentences to retrieve related entities and relations for use in generating subsequent sentences. FLARE outperforms other baseline methods in multihop QA tasks, and is effective at identifying and generating on-topic output. The article provides thorough comparisons with baselines, experimental results, and evaluations using various metrics such as ROUGE and UniEval. The FLARE method can retrieve information from open-web documents, and is effective for generating aspect-based summaries about entities from 20 domains in Wikipedia. This text excerpt discusses various natural language processing tasks and their evaluation metrics. The first task is open-domain summarization, which aims to generate a comprehensive summary about a specific topic by gathering information from the open web. Evaluation metrics for this task include ROUGE and a combination of a QA model and Disambig-F1.
The second task is answering ambiguous questions, where systems must provide answers that cover all possible interpretations of the question. The testbed for this task is a collection of crowdsourced questions with multiple interpretations, and outputs are evaluated using exact match and F1 scores.
The third task is generating comprehensive answers to questions seeking complex information. The testbed for this task is a dataset of long-form QA, and evaluation metrics include exact match and BM25.
The final task is commonsense reasoning, which requires systems to utilize both world and commonsense knowledge to generate an answer. Evaluation metrics for this task include EM and token-level F1, precision, and recall. The paper discusses retrieval-augmented language generation for multihop question answering. The authors use a max-input length limit of text-davinci-003 to increase performance and retrieve several documents using the format in Prompt 3.3. They use exemplars to retrieve documents and incorporate retrieval results for exemplars. The authors use Wikipedia articles as the retrieval corpus, BM25 as the retriever, and state-of-the-art CoT methods. The goal of multihop QA is to answer complex questions through a process of information retrieval and reasoning, and the authors evaluate the effectiveness of FLARE on 4 different datasets. Question decomposition approaches require task-specific prompt engineering, which restricts their generalizability in new tasks. The paper discusses different approaches to generating follow-up questions and concludes that dynamic retrieval is effective for generating follow-up questions. The article presents a framework for retrieval-augmented language generation. The approach involves multi-time retrieval, which can be triggered either by every token or every few tokens. The retrieved information is used to guide the language model in generating relevant outputs. The article compares different retrieval strategies and proposes three baseline categories based on when information is retrieved. The document corpus is obtained either from open web search engines or from curated datasets. The user input is formatted to include relevant documents and is ranked according to relevance. This document outlines a method called Retrieval-Augmented Language Generation with Continuous Information Gathering (FLARE) that generates natural language text by iteratively retrieving and incorporating information from external sources. The process begins with an initial query, which can be generated using a language model (LM) or provided by the user. Then, FLARE generates sentences by iteratively retrieving and incorporating information from documents, using a ranking list to aid future generations. To generate questions that can be answered by the retrieved information, FLARE uses two methods: explicit query generation, which masks low-confidence tokens in the sentence and prompts the LM to generate an explicit question, and implicit query generation, which prompts the LM to generate a question based on extracted spans with probabilities below a threshold. FLARE aims to improve retrieval accuracy and remove potential distractions from the generated text. The paper discusses retrieval-augmented language generation using continuous information gathering. The approach involves using a query to retrieve relevant information from a large dataset and generate long-form content. The authors propose confidence-based active retrieval and direct FLARE methods to address issues with generating search queries and disrupting answer generation. The approach involves iteratively generating search queries and retrieving relevant documents to regenerate the next sentence. The authors found that LMs can effectively combine two skills and generate meaningful search queries to aid future generations. The paper proposes a retrieval-augmented language generation framework called FLARE. Two methods are proposed to instruct the LM to generate answers using retrieval-encouraging instructions and retrieval queries when necessary while generating. The framework is inspired by Toolformer and implements active retrieval augmented generation methods to reflect the intents of future generations. The structure of the prompt is shown in Prompt 3.1, which generates search queries while performing the task. LMs should only retrieve relevant documents, which are prepended before the generation, and use the query terms to retrieve knowledge to avoid unnecessary or inappropriate queries. At each step, the user input is used to aid future generation until the next search query is generated or reaches the end. The input of the test case consists of several task-related exemplars and several search-related exemplars. The paper offers API access to elicit such behavior by few-shot prompting. A straightforward way of expressing information needs for retrieval is to generate ”[Search(query)]”. The FLARE with Retrieval Instructions is used as the initial query, and the user input is used as a condition for the next generation. The paper proposes a framework for active retrieval augmented generation, in which the model decides when and what to retrieve throughout the generation process. The goal is to aid long-form generation with retrieval. The most common approach is to use the input as the query for retrieval and generate the answer using the retrieved documents. The paper examines various methods of determining when and what to retrieve, and proposes Forward-Looking Active Retrieval augmented generation (FLARE), which iteratively generates a temporary next sentence, uses it as a query to retrieve relevant documents, and then regenerates the next sentence conditioning on the retrieved documents. FLARE is evaluated on 4 diverse tasks/datasets involving generating long outputs, including multihop QA, open-domain summarization, commonsense reasoning, and long-form QA. FLARE achieves superior or competitive performance compared to single-time and multi-time retrieval baselines, demonstrating its effectiveness and generalizability. The paper argues that it is important to consider what LMs intend to generate in the future when deciding what to retrieve, and adopts an active retrieval strategy that only retrieves when LMs generate low-probability tokens. The paper proposes a new paradigm for long-form generation tasks called active retrieval-augmented generation, where a language model (LM) actively decides when and what to retrieve throughout the generation process. The goal is to avoid unnecessary or inappropriate retrieval and ensure that LMs retrieve information only when they lack the required knowledge to generate content. The paper discusses several approaches to active retrieval, including forward-looking active retrieval augmented generation (FLARE), which iteratively generates a temporary next sentence based on user input and initial retrieval results. The paper also highlights the importance of retrieving extra information as needed during the generation process to ensure that the LM has enough knowledge to generate complex content. Finally, the paper presents several examples of long-form generation tasks, including open-domain summarization and multihop question answering, where active retrieval-augmented generation can be applied. Generative language models (LMs) have become a foundational component in many natural language processing (NLP) systems. Retrieval-augmented LMs commonly use retrieval components that can look up relevant information from external knowledge resources such as document corpora, augmenting parametric LMs with non-parametric retrieval to generate language. In recent years, increasingly powerful large LMs have demonstrated abilities in more complex tasks solely based on the input. To address the issue of hallucination, one promising direction is to augment generation with retrieval, which involves retrieving relevant knowledge once formation needs are clear in the user's input and then generating a complete answer conditioning on the retrieved documents based on the user's input.
Retrieval-augmented generation (FLARE) is a generic retrieval-augmented generation method that proposes Forward-Looking Active Retrieval methods that actively decide when and what to retrieve across the course of the generation. FLARE iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents. We test FLARE along with baselines comprehensively over 4 long-form knowledge-intensive generation tasks and achieve superior or competitive performance on all tasks, demonstrating the effectiveness of our method. In this work, we provide a generalized view of active retrieval augmented generation, methods that involve generating long-form output, such as long-form QA, open-domain summarization, and (chain-of-thought; CoT) reasoning. Retrieval-augmented language models (LMs) retrieve information from external knowledge resources to generate text. This approach addresses the issue of LMs generating factually inaccurate output due to relying solely on input. Active retrieval augmented generation is a promising solution to augment LMs by continuously gathering information during text generation. Large language models have remarkable abilities, but they tend to hallucinate and generate inaccurate output. The study was conducted by researchers from Carnegie Mellon University's Language Technologies Institute and Sea AI Lab.