Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Reinforcement Learning with Human Feedback (RLHF) is a method used for training Large Language Models (LLMs). In the heart of RLHF lies a very powerful reinforcement learning method called Proximal Policy Optimization. Learn about it in this simple video!

This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs.

Full Playlist: https://www.youtube.com/playlist?list=PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-

Video 0 (Optional): Introduction to deep reinforcement learning https://www.youtube.com/watch?v=SgC6AZss478
Video 1: Proximal Policy Optimization https://www.youtube.com/watch?v=TjHH_--7l8g
Video 2 (This one): Reinforcement Learning with Human Feedback
Video 3 (Coming soon!): Deterministic Policy Optimization

00:00 Introduction
00:48 Intro to Reinforcement Learning (RL)
02:47 Intro to Proximal Policy Optimization (PPO)
4:17 Intro to Large Language Models (LLMs)
6:50 Reinforcement Learning with Human Feedback (RLHF)
13:08 Interpretation of the Neural Networks
14:36 Conclusion

Get the Grokking Machine Learning book!
https://manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)

Видео Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models канала Serrano.Academy

Показать

Комментарии отсутствуют