Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club
Link: https://arxiv.org/abs/2102.01454
Blog post: https://deepmind.com/research/publications/2021/improving-language-models-by-retrieving-from-trillions-of-tokens
Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
Видео Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club канала The NLP Lab
Blog post: https://deepmind.com/research/publications/2021/improving-language-models-by-retrieving-from-trillions-of-tokens
Abstract: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a 2 trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
Видео Improving Language Models by Retrieving from Trillions of Tokens | NLP Journal Club канала The NLP Lab
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
CTRLsum: Towards Generic Controllable Text SummarizationJukebox: A Generative Model for Music | NLP Journal ClubRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | NLP journal clubMemorizing TransformersAnswering Complex Open-Domain Questions with Multi-Hop Dense Retrieval | NLP Journal ClubPitfalls of Static Language Modelling | NLP Journal ClubHow Close is ChatGPT to Human Experts?Best General-purpose NLP Libraries to Use in 2021Who will Win the Large Language Model App Race? with @SlatorLearning to Reason and Memorize with Self-Notes | Paper summaryFalcon LLM: the Best Open-source LLM Available at the MomentTree of Thoughts: Deliberate Problem Solving with Large Language Models | Paper summaryWhen to use a large language model? 4 points to consider in 2023.QURIOUS: Question Generation Pretraining for Text Generation | NLP Journal ClubGreaseLM: Graph REASoning Enhanced Language Models for Question AnsweringA Distributional Approach to Controlled Text Generation | NLP Journal ClubREALM: Retrieval-Augmented Language Model Pre-Training | NLP Journal ClubMulti-scale Transformer Language Models*Paper summary* ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte ModelsMirror-Generative Neural Machine Translation | NLP Journal ClubDeep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT | NLP Journal Club