BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
https://arxiv.org/abs/1810.04805
Abstract:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.
Authors:
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Видео BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding канала Yannic Kilcher
Abstract:
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.
Authors:
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Видео BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding канала Yannic Kilcher
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
BERT Neural Network - EXPLAINED!Transformer Neural Networks - EXPLAINED! (Attention is all you need)[Classic] Word2Vec: Distributed Representations of Words and Phrases and their CompositionalityBERT Research - Ep. 1 - Key Concepts & SourcesAttention Is All You NeedTransformers, explained: Understand the model behind GPT, BERT, and T5Applying BERT to Question Answering (SQuAD v1.1)汉语自然语言处理-transformer模型(二.2)BERT的实践应用-情感分析分类--doc2vec-语料预处理-数据增强-解决过拟合问题-深度学习训练技巧Language Learning with BERT - TensorFlow and Deep Learning SingaporeNLP历史突破!快速解读Google BERT模型 + Word EmbeddingCS480/680 Lecture 19: Attention and Transformer NetworksLanguage Model Overview: From word2vec to BERTIllustrated Guide to Transformers Neural Network: A step by step explanationGPT-3: Language Models are Few-Shot Learners (Paper Explained)XLNet: Generalized Autoregressive Pretraining for Language Understanding[Code] PyTorch sentiment classifier from scratch with Huggingface NLP Library (Full Tutorial)An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)Vision Transformer in PyTorchHow I Read a Paper: Facebook's DETR (Video Tutorial)