Загрузка страницы

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (paper explained)

ELECTRA (Pre-training Text Encoders as Discriminators Rather Than Generators) is a novel and very efficient way to pre-train text encoders. It's not only efficient in compute and parameters but also achieves the state-of-the-art results on SQuAD (machine reading comprehension) dataset.
Connect
Linkedin https://www.linkedin.com/in/xue-yong-fu-955723a6/
Twitter https://twitter.com/home
Email edwindeeplearning@gmail.com

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Paper: https://openreview.net/pdf?id=r1xMH1BtvB
Code: https://github.com/google-research/electra

Abstract
A text encoder trained to distinguish real input tokens from plausible fakes efficiently learns effective language representations.
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a more sample-efficient pre-training task called replaced token detection. Instead of masking the input, our approach corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments demonstrate this new pre-training task is more efficient than MLM because the task is defined over all input tokens rather than just the small subset that was masked out. As a result, the contextual representations learned by our approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale, where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when using the same amount of compute.

Видео ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (paper explained) канала Deep Learning Explainer
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
16 марта 2020 г. 7:44:05
00:54:00
Другие видео канала
ChatGPTs Take Over a Town: 25 Agents Experience Love, Friendships, and Life!ChatGPTs Take Over a Town: 25 Agents Experience Love, Friendships, and Life!ChatGPT Plugins, Github Copilot X, Bard, Bing Image Creator - Crazy Week for AIChatGPT Plugins, Github Copilot X, Bard, Bing Image Creator - Crazy Week for AICan Machines Learn Like Humans - In-context Learning\Meta\Zero-shot Learning | #GPT3  (part 3)Can Machines Learn Like Humans - In-context Learning\Meta\Zero-shot Learning | #GPT3 (part 3)Introduction of GPT-3: The Most Powerful Language Model Ever - #GPT3 Explained Series (part 1)Introduction of GPT-3: The Most Powerful Language Model Ever - #GPT3 Explained Series (part 1)What Is A Language Model? GPT-3: Language Models Are Few-Shot Learners  #GPT3 (part 2)What Is A Language Model? GPT-3: Language Models Are Few-Shot Learners #GPT3 (part 2)Question and Answer Test-Train Overlap in Open Domain Question Answering DatasetsQuestion and Answer Test-Train Overlap in Open Domain Question Answering DatasetsWav2CLIP: Connecting Text, Images, and AudioWav2CLIP: Connecting Text, Images, and AudioMultitask Prompted Training Enables Zero-shot Task Generalization (Explained)Multitask Prompted Training Enables Zero-shot Task Generalization (Explained)Magical Way of Self-Training and Task Augmentation for NLP ModelsMagical Way of Self-Training and Task Augmentation for NLP ModelsWell read Students Learn Better: On The Importance Of Pre-training Compact ModelsWell read Students Learn Better: On The Importance Of Pre-training Compact ModelsPre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (Paper Explained)Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (Paper Explained)Vokenization Improving Language Understanding with Visual Grounded Supervision  (Paper Explained)Vokenization Improving Language Understanding with Visual Grounded Supervision (Paper Explained)Sandwich Transformer: Improving Transformer Models by Reordering their SublayersSandwich Transformer: Improving Transformer Models by Reordering their SublayersToo many papers to read? Try TLDR - Extreme Summarization of Scientific DocumentsToo many papers to read? Try TLDR - Extreme Summarization of Scientific DocumentsREALM: Retrieval-Augmented Language Model Pre-training | Qpen Question Answering SOTA #OpenQAREALM: Retrieval-Augmented Language Model Pre-training | Qpen Question Answering SOTA #OpenQATeach Computers to Connect Videos and Text without Labeled Data - VideoClipTeach Computers to Connect Videos and Text without Labeled Data - VideoClipTransformer Architecture Explained | Attention Is All You Need | Foundation of BERT, GPT-3, RoBERTaTransformer Architecture Explained | Attention Is All You Need | Foundation of BERT, GPT-3, RoBERTaBART: Denoising Sequence-to-Sequence Pre-training for NLG & Translation (Explained)BART: Denoising Sequence-to-Sequence Pre-training for NLG & Translation (Explained)GAN BERT: Generative Adversarial Learning for Robust Text Classification (Paper Explained) #GANBERTGAN BERT: Generative Adversarial Learning for Robust Text Classification (Paper Explained) #GANBERTRevealing Dark Secrets of BERT (Analysis of BERT's Attention Heads) - Paper ExplainedRevealing Dark Secrets of BERT (Analysis of BERT's Attention Heads) - Paper ExplainedLinkedin's New Search Engine | DeText: A Deep Text Ranking Framework with BERT | Deep Ranking ModelLinkedin's New Search Engine | DeText: A Deep Text Ranking Framework with BERT | Deep Ranking Model
Яндекс.Метрика