- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch
In this lecture, we establish the connection between GRPO and Reasoning. This connection was first established in the DeepSeek R1 paper which came out in the year January 2025. The first model DeepSeek R11 used pure reinforcement learning for fine-tuning the LLM and they observed that the model develops reasoning capabilities autonomously, signified through two things:
1. A graph which clearly shows increase in the average length of responses with the training time
2. The famous aha moment which demonstrates the model learning to re-evaluate its initial answers.
This lecture marks the end of the second phase which was pure reinforcement learning of this course and we have made sure that we go from the basics of RL to where we are at right now in a step-by-step way so that people understand how to build reasoning models using pure reinforcement learning.
At the end, we understand how to convert a non-reasoning model to a reasoning model. We consider the model Qwen-2.5 3B, and using GRPO, we understand how the reasoning capabilities of the model can be improved.
Here is the Google Colab link for this project: https://colab.research.google.com/drive/1nMr10tuAE6XIjIIBo9HnIkPy5sFw9HYG?usp=sharing
Видео Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch канала Vizuara
1. A graph which clearly shows increase in the average length of responses with the training time
2. The famous aha moment which demonstrates the model learning to re-evaluate its initial answers.
This lecture marks the end of the second phase which was pure reinforcement learning of this course and we have made sure that we go from the basics of RL to where we are at right now in a step-by-step way so that people understand how to build reasoning models using pure reinforcement learning.
At the end, we understand how to convert a non-reasoning model to a reasoning model. We consider the model Qwen-2.5 3B, and using GRPO, we understand how the reasoning capabilities of the model can be improved.
Here is the Google Colab link for this project: https://colab.research.google.com/drive/1nMr10tuAE6XIjIIBo9HnIkPy5sFw9HYG?usp=sharing
Видео Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch канала Vizuara
Комментарии отсутствуют
Информация о видео
30 июля 2025 г. 9:30:32
00:21:38
Другие видео канала





















