- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Transformers Visually Explained
in this video, we build transformers from scratch and understand how they actually work — starting from embeddings and self-attention to multi-head attention, positional encoding, and the full encoder-decoder architecture , along with masking, cross-attention, and the difference between training and inference, so by the end you get a complete and intuitive understanding of how modern LLMs like GPT are built
Attention is all you need paper:- https://arxiv.org/abs/1706.03762
These are very important to know before you understand tranformers:-
Neural Networks:- https://youtu.be/sE6OaMndGZg
Backpropagation:- https://youtu.be/nAMkcgxKwfA
Normalization:- https://youtu.be/W2vqsTg-rDU
BatchNorm:-https://youtu.be/PaIKIXb3v9Q
RNNs:- https://youtu.be/eCwTQYcNG3o
Residual Connections:- https://youtu.be/M108HPERPc8
Link for the animation codes:- https://github.com/ByteQuest0/Animation_codes/tree/main/2026/Transfomers
00:00 Introduction – Why Transformers?
02:44 Tokenization and One-Hot Encoding
04:59 Word Embeddings Explained
08:37 Static Embeddings Problem (Bank Example)
15:12 Self-Attention
18:00 Why Scaling by √dk?
20:42 Self Attention Recap
21:33 Multihead Self Attention
25:29 Positional Encoding Intuition
30:45 Transformer Architecture Overview
31:36 Residual Connections + LayerNorm
33:00 Feed Forward Network Explained
33:25 Transformer Architecture Overview
34:00 Masked Multi-Head Attention
37:00 Cross Attention Explained
38:28 Transformer Architecture Overview
39:12 Stacked Layers (Nx)
39:37 Training vs Inference
42:43 Transformers Advantage
🎥 Animations created using Manim:
Manim is an open-source Python library for creating mathematical animations. Learn more or try it yourself:
🔗 https://www.manim.community
Let's Connect:-
GitHub:- https://github.com/ByteQuest0
Reddit:- https://www.reddit.com/r/ByteQuest/
Видео Transformers Visually Explained канала ByteQuest
Attention is all you need paper:- https://arxiv.org/abs/1706.03762
These are very important to know before you understand tranformers:-
Neural Networks:- https://youtu.be/sE6OaMndGZg
Backpropagation:- https://youtu.be/nAMkcgxKwfA
Normalization:- https://youtu.be/W2vqsTg-rDU
BatchNorm:-https://youtu.be/PaIKIXb3v9Q
RNNs:- https://youtu.be/eCwTQYcNG3o
Residual Connections:- https://youtu.be/M108HPERPc8
Link for the animation codes:- https://github.com/ByteQuest0/Animation_codes/tree/main/2026/Transfomers
00:00 Introduction – Why Transformers?
02:44 Tokenization and One-Hot Encoding
04:59 Word Embeddings Explained
08:37 Static Embeddings Problem (Bank Example)
15:12 Self-Attention
18:00 Why Scaling by √dk?
20:42 Self Attention Recap
21:33 Multihead Self Attention
25:29 Positional Encoding Intuition
30:45 Transformer Architecture Overview
31:36 Residual Connections + LayerNorm
33:00 Feed Forward Network Explained
33:25 Transformer Architecture Overview
34:00 Masked Multi-Head Attention
37:00 Cross Attention Explained
38:28 Transformer Architecture Overview
39:12 Stacked Layers (Nx)
39:37 Training vs Inference
42:43 Transformers Advantage
🎥 Animations created using Manim:
Manim is an open-source Python library for creating mathematical animations. Learn more or try it yourself:
🔗 https://www.manim.community
Let's Connect:-
GitHub:- https://github.com/ByteQuest0
Reddit:- https://www.reddit.com/r/ByteQuest/
Видео Transformers Visually Explained канала ByteQuest
transformers transformer architecture self attention multi head attention attention is all you need transformers explained gpt architecture llm explained deep learning neural networks positional encoding encoder decoder model machine learning natural language processing nlp tutorial transformer from scratch attention mechanism how transformers work gpt llm masked multihead attention cross attention query key value residual connection
Комментарии отсутствуют
Информация о видео
15 апреля 2026 г. 12:57:49
00:44:00
Другие видео канала




















