Загрузка страницы

Can Machines Learn Like Humans - In-context Learning\Meta\Zero-shot Learning | #GPT3 (part 3)

This video explains how GPT-3 uses in-context learning to learn complex NLP tasks on the fly without seeing any examples and performing any gradient update.
GPT-3 Explained Series:
Introduction of GPT-3: The Most Powerful Language Model Ever (part1)
https://youtu.be/Rv5SeM7LxLQ

Connect
Linkedin https://www.linkedin.com/in/xue-yong-fu-955723a6/
Twitter https://twitter.com/home
Email edwindeeplearning@gmail.com

Paper: Language Models are Few-Shot Learners
https://arxiv.org/abs/2005.14165

Abstract
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Видео Can Machines Learn Like Humans - In-context Learning\Meta\Zero-shot Learning | #GPT3 (part 3) канала Deep Learning Explainer
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
15 августа 2020 г. 21:08:50
00:24:04
Другие видео канала
ChatGPTs Take Over a Town: 25 Agents Experience Love, Friendships, and Life!ChatGPTs Take Over a Town: 25 Agents Experience Love, Friendships, and Life!ChatGPT Plugins, Github Copilot X, Bard, Bing Image Creator - Crazy Week for AIChatGPT Plugins, Github Copilot X, Bard, Bing Image Creator - Crazy Week for AIIntroduction of GPT-3: The Most Powerful Language Model Ever - #GPT3 Explained Series (part 1)Introduction of GPT-3: The Most Powerful Language Model Ever - #GPT3 Explained Series (part 1)What Is A Language Model? GPT-3: Language Models Are Few-Shot Learners  #GPT3 (part 2)What Is A Language Model? GPT-3: Language Models Are Few-Shot Learners #GPT3 (part 2)Question and Answer Test-Train Overlap in Open Domain Question Answering DatasetsQuestion and Answer Test-Train Overlap in Open Domain Question Answering DatasetsWav2CLIP: Connecting Text, Images, and AudioWav2CLIP: Connecting Text, Images, and AudioMultitask Prompted Training Enables Zero-shot Task Generalization (Explained)Multitask Prompted Training Enables Zero-shot Task Generalization (Explained)Magical Way of Self-Training and Task Augmentation for NLP ModelsMagical Way of Self-Training and Task Augmentation for NLP ModelsWell read Students Learn Better: On The Importance Of Pre-training Compact ModelsWell read Students Learn Better: On The Importance Of Pre-training Compact ModelsPre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (Paper Explained)Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning (Paper Explained)Vokenization Improving Language Understanding with Visual Grounded Supervision  (Paper Explained)Vokenization Improving Language Understanding with Visual Grounded Supervision (Paper Explained)Sandwich Transformer: Improving Transformer Models by Reordering their SublayersSandwich Transformer: Improving Transformer Models by Reordering their SublayersToo many papers to read? Try TLDR - Extreme Summarization of Scientific DocumentsToo many papers to read? Try TLDR - Extreme Summarization of Scientific DocumentsREALM: Retrieval-Augmented Language Model Pre-training | Qpen Question Answering SOTA #OpenQAREALM: Retrieval-Augmented Language Model Pre-training | Qpen Question Answering SOTA #OpenQATeach Computers to Connect Videos and Text without Labeled Data - VideoClipTeach Computers to Connect Videos and Text without Labeled Data - VideoClipBART: Denoising Sequence-to-Sequence Pre-training for NLG & Translation (Explained)BART: Denoising Sequence-to-Sequence Pre-training for NLG & Translation (Explained)GAN BERT: Generative Adversarial Learning for Robust Text Classification (Paper Explained) #GANBERTGAN BERT: Generative Adversarial Learning for Robust Text Classification (Paper Explained) #GANBERTRevealing Dark Secrets of BERT (Analysis of BERT's Attention Heads) - Paper ExplainedRevealing Dark Secrets of BERT (Analysis of BERT's Attention Heads) - Paper ExplainedTransformer Architecture Explained | Attention Is All You Need | Foundation of BERT, GPT-3, RoBERTaTransformer Architecture Explained | Attention Is All You Need | Foundation of BERT, GPT-3, RoBERTaLinkedin's New Search Engine | DeText: A Deep Text Ranking Framework with BERT | Deep Ranking ModelLinkedin's New Search Engine | DeText: A Deep Text Ranking Framework with BERT | Deep Ranking Model
Яндекс.Метрика