Загрузка...

Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch

In this lecture, we establish the connection between GRPO and Reasoning. This connection was first established in the DeepSeek R1 paper which came out in the year January 2025. The first model DeepSeek R11 used pure reinforcement learning for fine-tuning the LLM and they observed that the model develops reasoning capabilities autonomously, signified through two things:
1. A graph which clearly shows increase in the average length of responses with the training time
2. The famous aha moment which demonstrates the model learning to re-evaluate its initial answers.

This lecture marks the end of the second phase which was pure reinforcement learning of this course and we have made sure that we go from the basics of RL to where we are at right now in a step-by-step way so that people understand how to build reasoning models using pure reinforcement learning.

At the end, we understand how to convert a non-reasoning model to a reasoning model. We consider the model Qwen-2.5 3B, and using GRPO, we understand how the reasoning capabilities of the model can be improved.

Here is the Google Colab link for this project: https://colab.research.google.com/drive/1nMr10tuAE6XIjIIBo9HnIkPy5sFw9HYG?usp=sharing

Видео Lecture 21 - Build your first Reasoning Model using GRPO | Reasoning LLMs from Scratch канала Vizuara

Комментарии отсутствуют

Информация о видео

30 июля 2025 г. 9:30:32

00:21:38

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Introduction to n8n | New course launch | Learn to build AI agents and AI workflows

India's first AI Hackathon in Schools

Building using Claude code - Email reply agent from scratch | LLM Context engineering | Lecture 8

Handwritten notes are intentional, not nostalgic | Access Now✍🏻

How to design a space exploration mission? From concept to cosmos!

Build Interactive Visualizers for Projects: An Introduction to Claude Artifacts

Find Middle Of The Linked List | Leetcode 876

R Masterclass full course | The only video you need to watch to Master R language as total beginner

R Lecture | Data Visualization 3 of 3| Multi-variate Graphs

How To Approach SQL Question | SQL Coding Question

How to build your research profile for grad school | A 1 hour webinar

Dissecting DeiT paper - Data efficient image Transformer

Build your research profile for grad school applications

Can Language Models reason?

Step‐by‐Step Nano VLM Workflow #vizuara #nanovlm

Lecture 5: Motivation behind Language Diffusion Models

An interactive AI session with students at Arise International School

Grad school applications | How to prepare to get into top universities

Lecture 2 - Robot Imitation Learning | Modern Robot Learning From Scratch

Build OpenClaw-RL + VoiceAgents using Claude Code | LLM context engineering series | Lecture 10

Introduction to Bit Manipulation | Python | Check whether K-th bit is set or not

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять