Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models
Reinforcement Learning with Human Feedback (RLHF) is a method used for training Large Language Models (LLMs). In the heart of RLHF lies a very powerful reinforcement learning method called Proximal Policy Optimization. Learn about it in this simple video!
This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs.
Full Playlist: https://www.youtube.com/playlist?list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-
Video 0 (Optional): Introduction to deep reinforcement learning https://www.youtube.com/watch?v=SgC6AZss478
Video 1: Proximal Policy Optimization https://www.youtube.com/watch?v=TjHH_--7l8g
Video 2 (This one): Reinforcement Learning with Human Feedback
Video 3 (Coming soon!): Deterministic Policy Optimization
00:00 Introduction
00:48 Intro to Reinforcement Learning (RL)
02:47 Intro to Proximal Policy Optimization (PPO)
4:17 Intro to Large Language Models (LLMs)
6:50 Reinforcement Learning with Human Feedback (RLHF)
13:08 Interpretation of the Neural Networks
14:36 Conclusion
Get the Grokking Machine Learning book!
https://manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)
Видео Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models канала Serrano.Academy
This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs.
Full Playlist: https://www.youtube.com/playlist?list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-
Video 0 (Optional): Introduction to deep reinforcement learning https://www.youtube.com/watch?v=SgC6AZss478
Video 1: Proximal Policy Optimization https://www.youtube.com/watch?v=TjHH_--7l8g
Video 2 (This one): Reinforcement Learning with Human Feedback
Video 3 (Coming soon!): Deterministic Policy Optimization
00:00 Introduction
00:48 Intro to Reinforcement Learning (RL)
02:47 Intro to Proximal Policy Optimization (PPO)
4:17 Intro to Large Language Models (LLMs)
6:50 Reinforcement Learning with Human Feedback (RLHF)
13:08 Interpretation of the Neural Networks
14:36 Conclusion
Get the Grokking Machine Learning book!
https://manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)
Видео Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models канала Serrano.Academy
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Geometric series and my Irish heritageHow do you minimize a function when you can't take derivatives? CMA-ES and PSOProximal Policy Optimization (PPO) - How to train Large Language ModelsDecision trees - A friendly introductionThe Binomial and Poisson DistributionsSingular Value Decomposition (SVD) and Image CompressionYou are much better at math than you thinkTraining Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)How Large Language Models are Shaping the FutureThe Attention Mechanism in Large Language ModelsThompson sampling, one armed bandits, and the Beta distributionBook by Luis Serrano - "Grokking Machine Learning" (40% off promo code)Latent Dirichlet Allocation (Part 1 of 2)The Gini Impurity Index explained in 8 minutes!Principal Component Analysis (PCA)Machine Learning: Testing and Error MetricsLogistic Regression and the Perceptron Algorithm: A friendly introductionClustering: K-means and HierarchicalWhat are Transformer Models and how do they work?A Friendly Introduction to Generative Adversarial Networks (GANs)