Загрузка...

Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

So far, in the Reinforcement Learning Phase, we have looked at tabular methods for calculating the value functions. That is, the states and their values are represented in the form of tables.

In most practical problems, these methods are not useful, since the number of states are quite large. For example, the number of states in a game of chess are ~10^46.

From this lecture onwards, we will start to look at function approximate methods, use to calculate values of a certain states and then generalize to other states.

It is quite similar to supervised learning except:

(1) The Target is not known beforehand
(2) The Target is non-stationary

We will learn how to use a function to express the value function, and also how to use gradient descent to optimize this function.

We are now getting closer to understanding how reinforcement learning is used in language models. This lecture marks the beginning of this transition.

Видео Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch канала Vizuara

Комментарии отсутствуют

Информация о видео

2 июня 2025 г. 9:30:00

00:53:21

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

Introduction to n8n | New course launch | Learn to build AI agents and AI workflows

India's first AI Hackathon in Schools

Building using Claude code - Email reply agent from scratch | LLM Context engineering | Lecture 8

Handwritten notes are intentional, not nostalgic | Access Now✍🏻

How to design a space exploration mission? From concept to cosmos!

Build Interactive Visualizers for Projects: An Introduction to Claude Artifacts

Find Middle Of The Linked List | Leetcode 876

R Masterclass full course | The only video you need to watch to Master R language as total beginner

R Lecture | Data Visualization 3 of 3| Multi-variate Graphs

How To Approach SQL Question | SQL Coding Question

How to build your research profile for grad school | A 1 hour webinar

Dissecting DeiT paper - Data efficient image Transformer

Build your research profile for grad school applications

Can Language Models reason?

Step‐by‐Step Nano VLM Workflow #vizuara #nanovlm

Lecture 5: Motivation behind Language Diffusion Models

An interactive AI session with students at Arise International School

Grad school applications | How to prepare to get into top universities

Lecture 2 - Robot Imitation Learning | Modern Robot Learning From Scratch

Build OpenClaw-RL + VoiceAgents using Claude Code | LLM context engineering series | Lecture 10

Introduction to Bit Manipulation | Python | Check whether K-th bit is set or not

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять