Загрузка...

Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch

In this lecture, we establish the connection between GRPO and Reasoning. This connection was first established in the DeepSeek R1 paper which came out in the year January 2025. The first model DeepSeek R11 used pure reinforcement learning for fine-tuning the LLM and they observed that the model develops reasoning capabilities autonomously, signified through two things:
1. A graph which clearly shows increase in the average length of responses with the training time
2. The famous aha moment which demonstrates the model learning to re-evaluate its initial answers.

This lecture marks the end of the second phase which was pure reinforcement learning of this course and we have made sure that we go from the basics of RL to where we are at right now in a step-by-step way so that people understand how to build reasoning models using pure reinforcement learning.

At the end, we understand how to convert a non-reasoning model to a reasoning model. We consider the model Qwen-2.5 3B, and using GRPO, we understand how the reasoning capabilities of the model can be improved.

Here is the Google Colab link for this project: https://colab.research.google.com/drive/1nMr10tuAE6XIjIIBo9HnIkPy5sFw9HYG?usp=sharing

Видео Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch канала Vizuara
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять