Загрузка...

Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

So far, in the Reinforcement Learning Phase, we have looked at tabular methods for calculating the value functions. That is, the states and their values are represented in the form of tables.

In most practical problems, these methods are not useful, since the number of states are quite large. For example, the number of states in a game of chess are ~10^46.

From this lecture onwards, we will start to look at function approximate methods, use to calculate values of a certain states and then generalize to other states.

It is quite similar to supervised learning except:

(1) The Target is not known beforehand
(2) The Target is non-stationary

We will learn how to use a function to express the value function, and also how to use gradient descent to optimize this function.

We are now getting closer to understanding how reinforcement learning is used in language models. This lecture marks the beginning of this transition.

Видео Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch канала Vizuara
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять