Загрузка страницы

ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators

This video explains the new Replaced Token Detection pre-training objective introduced in ELECTRA. ELECTRA is much more compute efficient due to defining the loss on the entire input sequence and avoiding the introduction of the [MASK] token into the self-supervised learning task. ELECTRA-small is trained on 1 GPU for 4 days and outperforms GPT trained with 30x more compute. ELECTRA is on par with RoBERTa and XLNet with 1/4 of the compute and surpasses those models with the same level of compute!
Thanks for watching! Please Subscribe!

Paper Link:
ELECTRA: https://openreview.net/pdf?id=r1xMH1BtvB
BERT: https://arxiv.org/abs/1810.04805

Видео ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators канала Connor Shorten
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
10 марта 2020 г. 20:10:53
00:11:42
Яндекс.Метрика