Transformers explained | The architecture behind LLMs
All you need to know about the transformer architecture: How to structure the inputs, attention (Queries, Keys, Values), positional embeddings, residual connections. Bonus: an overview of the difference between Recurrent Neural Networks (RNNs) and transformers.
9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionality at the end. Sorry for messing up the animation!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
Outline:
00:00 Transformers explained
00:47 Text inputs
02:29 Image inputs
03:57 Next word prediction / Classification
06:08 The transformer layer: 1. MLP sublayer
06:47 2. Attention explained
07:57 Attention vs. self-attention
08:35 Queries, Keys, Values
09:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector).
11:26 Multi-head attention
13:04 Attention scales quadratically
13:53 Positional embeddings
15:11 Residual connections and Normalization Layers
17:09 Masked Language Modelling
17:59 Difference to RNNs
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Kshitij
Our old Transformer explained 📺 video: https://youtu.be/FWFA4DGuzSc
📺 Tokenization explained: https://youtu.be/D8j1c4NJRfo
📺 Word embeddings: https://youtu.be/YkK5IKgxp-c
📽️ Replacing Self-Attention: https://www.youtube.com/playlist?list=PLpZBeKTZRGPM8PNRyv6fNMcAW3dMDq_A-
📽️ Position embeddings: https://www.youtube.com/playlist?list=PLpZBeKTZRGPOQtbCIES_0hAvwukcs-y-x
@SerranoAcademy Transformer series: https://www.youtube.com/watch?v=OxCpWwDCDFQ&list=PLs8w1Cdi-zva4fwKkl9EK13siFvL9Wewf
📄 Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Music 🎵 : Sunset n Beachz - Ofshane
Video editing: Nils Trost
Видео Transformers explained | The architecture behind LLMs канала AI Coffee Break with Letitia
9:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector). Otherwise we do not get the 1x3 dimensionality at the end. Sorry for messing up the animation!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
Outline:
00:00 Transformers explained
00:47 Text inputs
02:29 Image inputs
03:57 Next word prediction / Classification
06:08 The transformer layer: 1. MLP sublayer
06:47 2. Attention explained
07:57 Attention vs. self-attention
08:35 Queries, Keys, Values
09:19 Order of multiplication should be the opposite: x1(vector) * Wq(matrix) = q1(vector).
11:26 Multi-head attention
13:04 Attention scales quadratically
13:53 Positional embeddings
15:11 Residual connections and Normalization Layers
17:09 Masked Language Modelling
17:59 Difference to RNNs
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, @Mutual_Information , Kshitij
Our old Transformer explained 📺 video: https://youtu.be/FWFA4DGuzSc
📺 Tokenization explained: https://youtu.be/D8j1c4NJRfo
📺 Word embeddings: https://youtu.be/YkK5IKgxp-c
📽️ Replacing Self-Attention: https://www.youtube.com/playlist?list=PLpZBeKTZRGPM8PNRyv6fNMcAW3dMDq_A-
📽️ Position embeddings: https://www.youtube.com/playlist?list=PLpZBeKTZRGPOQtbCIES_0hAvwukcs-y-x
@SerranoAcademy Transformer series: https://www.youtube.com/watch?v=OxCpWwDCDFQ&list=PLs8w1Cdi-zva4fwKkl9EK13siFvL9Wewf
📄 Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." Advances in neural information processing systems 30 (2017).
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Music 🎵 : Sunset n Beachz - Ofshane
Video editing: Nils Trost
Видео Transformers explained | The architecture behind LLMs канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
21 января 2024 г. 17:45:39
00:19:48
Другие видео канала
Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED[Quiz] Eigenfaces, Domain adaptation, Causality, Manifold Hypothesis, Denoising Autoencoder[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR WorkshopData BAD | What Will it Take to Fix Benchmarking for NLU?Preparing for Virtual Conferences – 7 Tips for recording a good conference talkCan a neural network tell if an image is mirrored? – Visual ChiralityAI Coffee Break - Channel Trailer[Quiz] Interpretable ML, VQ-VAE w/o Quantization / infinite codebook, Pearson’s, PointCloudsWhat is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #ShortsAdding vs. concatenating positional embeddings & Learned positional encodingsTransformer in Transformer: Paper explained and visualized | TNTPre-training of BERT-based Transformer architectures explained – language and vision!Training learned optimizers: VeLO paper EXPLAINEDAdversarial Machine Learning explained! | With examples.Transformer LLMs are Turing Complete after all !?Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper ExplainedFNet: Mixing Tokens with Fourier Transforms – Paper ExplainedAI understanding language!? A roadmap to natural language understanding.Why ChatGPT fails | Language Model Limitations EXPLAINED