The math behind Attention: Keys, Queries, and Values matrices
This is the second of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples.
Video 1: The attention mechanism in high level https://youtu.be/OxCpWwDCDFQ
Video 2: The attention mechanism with math (this one)
Video 3 (upcoming): Transformer models
If you like this material, check out LLM University from Cohere!
https://llm.university
00:00 Introduction
01:18 Recap: Embeddings and Context
04:46 Similarity
11:09 Attention
20:46 The Keys and Queries Matrices
25:02 The Values Matrix
28:41 Self and Multi-head attention
33:54: Conclusion
Видео The math behind Attention: Keys, Queries, and Values matrices канала Serrano.Academy
Video 1: The attention mechanism in high level https://youtu.be/OxCpWwDCDFQ
Video 2: The attention mechanism with math (this one)
Video 3 (upcoming): Transformer models
If you like this material, check out LLM University from Cohere!
https://llm.university
00:00 Introduction
01:18 Recap: Embeddings and Context
04:46 Similarity
11:09 Attention
20:46 The Keys and Queries Matrices
25:02 The Values Matrix
28:41 Self and Multi-head attention
33:54: Conclusion
Видео The math behind Attention: Keys, Queries, and Values matrices канала Serrano.Academy
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
How Large Language Models are Shaping the FutureWhat are Transformer Models and how do they work?The Attention Mechanism in Large Language ModelsThe Binomial and Poisson DistributionsEuler's number, derivatives, and the bank at the end of the universeDecision trees - A friendly introductionThank you for 100K subscribers! I’m planning tons of new content coming soon, so excited!How do you minimize a function when you can't take derivatives? CMA-ES and PSOWhat is Quantum Machine Learning?Denoising and Variational AutoencodersEigenvectors and Generalized EigenspacesThompson sampling, one armed bandits, and the Beta distributionThe Beta distribution in 12 minutes!A friendly introduction to deep reinforcement learning, Q-networks and policy gradientsThe Gini Impurity Index explained in 8 minutes!The covariance matrixGaussian Mixture ModelsSingular Value Decomposition (SVD) and Image CompressionROC (Receiver Operating Characteristic) Curve in 10 minutes!Restricted Boltzmann Machines (RBM) - A friendly introduction