- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Learn to post-train LLMs in this free course
Learn more: https://bit.ly/4lqtWmr
Before a large language model can follow instructions, it undergoes two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning.
In our latest short course, Post-training of LLMs, you’ll learn how to use three of the most common post-training techniques: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL), to reshape model behavior for specific tasks or capabilities.
Taught by Banghua Zhu, Assistant Professor at the University of Washington, Principal Research Scientist at Nvidia, and co-founder of NexusFlow, this course covers:
- When to apply post-training and how it compares to pre-training
- How to curate and structure training data for each method
- How to use SFT to turn a base model into an instruct model
- How contrastive learning in DPO improves output quality
- How to design reward functions for RL tasks like math or code
- How to evaluate whether post-training improved or degraded model behavior
You’ll also get hands-on experience implementing each technique with Hugging Face’s TRL library to:
- Fine-tune a base model into an instruction-following assistant
- Modify a model’s responses using preferred and rejected examples
- Improve a model’s reasoning with online RL and verifiable rewards
Whether you’re building safer assistants or targeting domain-specific improvements, this course will help you adapt LLMs with precision.
Enroll now: https://bit.ly/4lqtWmr
Видео Learn to post-train LLMs in this free course канала DeepLearningAI
Before a large language model can follow instructions, it undergoes two key stages: pre-training and post-training. In pre-training, it learns to predict the next word or token from large amounts of unlabeled text. In post-training, it learns useful behaviors such as following instructions, tool use, and reasoning.
In our latest short course, Post-training of LLMs, you’ll learn how to use three of the most common post-training techniques: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Online Reinforcement Learning (RL), to reshape model behavior for specific tasks or capabilities.
Taught by Banghua Zhu, Assistant Professor at the University of Washington, Principal Research Scientist at Nvidia, and co-founder of NexusFlow, this course covers:
- When to apply post-training and how it compares to pre-training
- How to curate and structure training data for each method
- How to use SFT to turn a base model into an instruct model
- How contrastive learning in DPO improves output quality
- How to design reward functions for RL tasks like math or code
- How to evaluate whether post-training improved or degraded model behavior
You’ll also get hands-on experience implementing each technique with Hugging Face’s TRL library to:
- Fine-tune a base model into an instruction-following assistant
- Modify a model’s responses using preferred and rejected examples
- Improve a model’s reasoning with online RL and verifiable rewards
Whether you’re building safer assistants or targeting domain-specific improvements, this course will help you adapt LLMs with precision.
Enroll now: https://bit.ly/4lqtWmr
Видео Learn to post-train LLMs in this free course канала DeepLearningAI
Комментарии отсутствуют
Информация о видео
9 июля 2025 г. 19:28:32
00:02:53
Другие видео канала







![#26 AI for Good Specialization [Course 1, Week 2, Lesson 2]](https://i.ytimg.com/vi/Dmw_-1Yh22g/default.jpg)


![#29 Machine Learning Engineering for Production (MLOps) Specialization [Course 1, Week 3, Lesson 5]](https://i.ytimg.com/vi/a-oCxdzFapE/default.jpg)

![#25 AI for Good Specialization [Course 1, Week 2, Lesson 2]](https://i.ytimg.com/vi/3eJZzSl6I4g/default.jpg)

![#21 AI for Good Specialization [Course 1, Week 2, Lesson 2]](https://i.ytimg.com/vi/RGJE32ZyLoI/default.jpg)
![#6 AI for Good Specialization [Course 1, Week 1, Lesson 2]](https://i.ytimg.com/vi/7M5o9FO1t7k/default.jpg)
![#5 AI for Good Specialization [Course 1, Week 1, Lesson 2]](https://i.ytimg.com/vi/eZZwySXsj-g/default.jpg)

![#29 AI for Good Specialization [Course 1, Week 3, Lesson 1]](https://i.ytimg.com/vi/Yhp8MbOvU7o/default.jpg)

