Загрузка страницы

Strategies for pre-training the BERT-based Transformer architecture – language (and vision)

What is masked language modelling? Or next sentence prediction? And why are they working so well? If you ever wondered what tasks the Transformer architectures are trained on and how the Multimodal Transfomer learns about the connection between images and text, then this is the right video for you!

🎬 Ms. Coffee Bean explained the Multimodal Transformer: https://youtu.be/dd7nE4nbxN0
🎬 She also explained the Language-based Transformer: https://youtu.be/FWFA4DGuzSc

Content:
* 00:00 Pre-training strategies
* 00:48 Masked language modelling
* 03:37 Next sentence prediction
* 04:31 Sentence image alignment
* 05:07 Image region classification
* 06:14 Image region regression
* 06:53 Pre-training and fine-tuning on the downstream task

📄 This video has been enabled by the beautiful overview table in the Appendix of this paper:
VL-BERT: Su, Weijie, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. "Vl-bert: Pre-training of generic visual-linguistic representations." arXiv preprint arXiv:1908.08530 (2019). https://arxiv.org/pdf/1908.08530.pdf
🔗 Links:
YouTube: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/

#AICoffeeBreak #MsCoffeeBean

Video and thumbnail contain emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0

Видео Strategies for pre-training the BERT-based Transformer architecture – language (and vision) канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
15 июля 2020 г. 22:58:33
00:08:23
Яндекс.Метрика