What are Transformer Models and how do they work?
This is the last of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples.
Video 1: The attention mechanism in high level https://www.youtube.com/watch?v=OxCpWwDCDFQ
Video 2: The attention mechanism with math https://www.youtube.com/watch?v=UPtG_38Oq8o
Video 3 (This one): Transformer models
If you like this material, check out LLM University from Cohere!
https://llm.university
Get the Grokking Machine Learning book!
https://manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)
00:00 Introduction
01:50 What is a transformer?
04:35 Generating one word at a time
08:59 Sentiment Analysis
13:05 Neural Networks
18:18 Tokenization
19:12 Embeddings
25:06 Positional encoding
27:54 Attention
32:29 Softmax
35:48 Architecture of a Transformer
39:00 Fine-tuning
42:20 Conclusion
Видео What are Transformer Models and how do they work? канала Serrano.Academy
Video 1: The attention mechanism in high level https://www.youtube.com/watch?v=OxCpWwDCDFQ
Video 2: The attention mechanism with math https://www.youtube.com/watch?v=UPtG_38Oq8o
Video 3 (This one): Transformer models
If you like this material, check out LLM University from Cohere!
https://llm.university
Get the Grokking Machine Learning book!
https://manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)
00:00 Introduction
01:50 What is a transformer?
04:35 Generating one word at a time
08:59 Sentiment Analysis
13:05 Neural Networks
18:18 Tokenization
19:12 Embeddings
25:06 Positional encoding
27:54 Attention
32:29 Softmax
35:48 Architecture of a Transformer
39:00 Fine-tuning
42:20 Conclusion
Видео What are Transformer Models and how do they work? канала Serrano.Academy
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Geometric series and my Irish heritageHow do you minimize a function when you can't take derivatives? CMA-ES and PSOProximal Policy Optimization (PPO) - How to train Large Language ModelsDecision trees - A friendly introductionReinforcement Learning with Human Feedback - How to train and fine-tune Transformer ModelsThe Binomial and Poisson DistributionsSingular Value Decomposition (SVD) and Image CompressionYou are much better at math than you thinkTraining Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)How Large Language Models are Shaping the FutureThe Attention Mechanism in Large Language ModelsThompson sampling, one armed bandits, and the Beta distributionBook by Luis Serrano - "Grokking Machine Learning" (40% off promo code)Latent Dirichlet Allocation (Part 1 of 2)The Gini Impurity Index explained in 8 minutes!Principal Component Analysis (PCA)Machine Learning: Testing and Error MetricsLogistic Regression and the Perceptron Algorithm: A friendly introductionClustering: K-means and HierarchicalA Friendly Introduction to Generative Adversarial Networks (GANs)