Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

What are Transformer Neural Networks?

This short tutorial covers the basics of the Transformer, a neural network architecture designed for handling sequential data in machine learning.

Timestamps:
0:00 - Intro
1:18 - Motivation for developing the Transformer
2:44 - Input embeddings (start of encoder walk-through)
3:29 - Attention
6:29 - Multi-head attention
7:55 - Positional encodings
9:59 - Add & norm, feedforward, & stacking encoder layers
11:14 - Masked multi-head attention (start of decoder walk-through)
12:35 - Cross-attention
13:38 - Decoder output & prediction probabilities
14:46 - Complexity analysis
16:00 - Transformers as graph neural networks

Original Transformers paper:
Attention is All You Need - https://arxiv.org/abs/1706.03762

Other papers mentioned:
(GPT-3) Language Models are Few-Shot Learners - https://arxiv.org/abs/2005.14165
(DALL-E) Zero-Shot Text-to-Image Generation - https://arxiv.org/abs/2102.12092
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - https://arxiv.org/abs/1810.04805
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - https://arxiv.org/abs/2101.03961
Finetuning Pretrained Transformers into RNNs - https://arxiv.org/abs/2103.13076
Efficient Transformers: A Survey - https://arxiv.org/abs/2009.06732
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth - https://arxiv.org/abs/2103.03404
Do Transformer Modifications Transfer Across Implementations and Applications? - https://arxiv.org/abs/2102.11972
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies - https://ml.jku.at/publications/older/ch7.pdf
Transformers are Graph Neural Networks (blog post) - https://thegradient.pub/transformers-are-graph-neural-networks

Video style inspired by 3Blue1Brown

Music: Trinkets by Vincent Rubinetti

Видео What are Transformer Neural Networks? канала Ari Seff

Показать

Комментарии отсутствуют