What are Transformer Neural Networks?
This short tutorial covers the basics of the Transformer, a neural network architecture designed for handling sequential data in machine learning.
Timestamps:
0:00 - Intro
1:18 - Motivation for developing the Transformer
2:44 - Input embeddings (start of encoder walk-through)
3:29 - Attention
6:29 - Multi-head attention
7:55 - Positional encodings
9:59 - Add & norm, feedforward, & stacking encoder layers
11:14 - Masked multi-head attention (start of decoder walk-through)
12:35 - Cross-attention
13:38 - Decoder output & prediction probabilities
14:46 - Complexity analysis
16:00 - Transformers as graph neural networks
Original Transformers paper:
Attention is All You Need - https://arxiv.org/abs/1706.03762
Other papers mentioned:
(GPT-3) Language Models are Few-Shot Learners - https://arxiv.org/abs/2005.14165
(DALL-E) Zero-Shot Text-to-Image Generation - https://arxiv.org/abs/2102.12092
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - https://arxiv.org/abs/1810.04805
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - https://arxiv.org/abs/2101.03961
Finetuning Pretrained Transformers into RNNs - https://arxiv.org/abs/2103.13076
Efficient Transformers: A Survey - https://arxiv.org/abs/2009.06732
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth - https://arxiv.org/abs/2103.03404
Do Transformer Modifications Transfer Across Implementations and Applications? - https://arxiv.org/abs/2102.11972
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies - https://ml.jku.at/publications/older/ch7.pdf
Transformers are Graph Neural Networks (blog post) - https://thegradient.pub/transformers-are-graph-neural-networks
Video style inspired by 3Blue1Brown
Music: Trinkets by Vincent Rubinetti
Видео What are Transformer Neural Networks? канала Ari Seff
Timestamps:
0:00 - Intro
1:18 - Motivation for developing the Transformer
2:44 - Input embeddings (start of encoder walk-through)
3:29 - Attention
6:29 - Multi-head attention
7:55 - Positional encodings
9:59 - Add & norm, feedforward, & stacking encoder layers
11:14 - Masked multi-head attention (start of decoder walk-through)
12:35 - Cross-attention
13:38 - Decoder output & prediction probabilities
14:46 - Complexity analysis
16:00 - Transformers as graph neural networks
Original Transformers paper:
Attention is All You Need - https://arxiv.org/abs/1706.03762
Other papers mentioned:
(GPT-3) Language Models are Few-Shot Learners - https://arxiv.org/abs/2005.14165
(DALL-E) Zero-Shot Text-to-Image Generation - https://arxiv.org/abs/2102.12092
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - https://arxiv.org/abs/1810.04805
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity - https://arxiv.org/abs/2101.03961
Finetuning Pretrained Transformers into RNNs - https://arxiv.org/abs/2103.13076
Efficient Transformers: A Survey - https://arxiv.org/abs/2009.06732
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth - https://arxiv.org/abs/2103.03404
Do Transformer Modifications Transfer Across Implementations and Applications? - https://arxiv.org/abs/2102.11972
Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies - https://ml.jku.at/publications/older/ch7.pdf
Transformers are Graph Neural Networks (blog post) - https://thegradient.pub/transformers-are-graph-neural-networks
Video style inspired by 3Blue1Brown
Music: Trinkets by Vincent Rubinetti
Видео What are Transformer Neural Networks? канала Ari Seff
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Illustrated Guide to Transformers Neural Network: A step by step explanationWhy do random walks get lost in 3D?An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)Resolution-robust Large Mask Inpainting with Fourier Convolutions (w/ Author Interview)What are Normalizing Flows?Attention is all you need; Attentional Neural Network Models | Łukasz Kaiser | MasterclassTransformers, explained: Understand the model behind GPT, BERT, and T5Intro to Transformers and Transformer ExplainabilityA brief history of the Transformer architecture in NLPWhat is Automatic Differentiation?Transformer Neural Networks - EXPLAINED! (Attention is all you need)CS480/680 Lecture 19: Attention and Transformer NetworksLSTM is dead. Long Live Transformers![ML News] DeepMind tackles Math | Microsoft does more with less | Timnit Gebru launches DAIRRecurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)Transformer Positional Embeddings With A Numerical Example.NLP for Developers: Transformers | RasaAI Language Models & Transformers - ComputerphileAttention is all you need. A Transformer Tutorial. 3: Residual Layer Norm/Position Wise Feed ForwardVision Transformer in PyTorch