- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch
So far, in the Reinforcement Learning Phase, we have looked at tabular methods for calculating the value functions. That is, the states and their values are represented in the form of tables.
In most practical problems, these methods are not useful, since the number of states are quite large. For example, the number of states in a game of chess are ~10^46.
From this lecture onwards, we will start to look at function approximate methods, use to calculate values of a certain states and then generalize to other states.
It is quite similar to supervised learning except:
(1) The Target is not known beforehand
(2) The Target is non-stationary
We will learn how to use a function to express the value function, and also how to use gradient descent to optimize this function.
We are now getting closer to understanding how reinforcement learning is used in language models. This lecture marks the beginning of this transition.
Видео Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch канала Vizuara
In most practical problems, these methods are not useful, since the number of states are quite large. For example, the number of states in a game of chess are ~10^46.
From this lecture onwards, we will start to look at function approximate methods, use to calculate values of a certain states and then generalize to other states.
It is quite similar to supervised learning except:
(1) The Target is not known beforehand
(2) The Target is non-stationary
We will learn how to use a function to express the value function, and also how to use gradient descent to optimize this function.
We are now getting closer to understanding how reinforcement learning is used in language models. This lecture marks the beginning of this transition.
Видео Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch канала Vizuara
Комментарии отсутствуют
Информация о видео
2 июня 2025 г. 9:30:00
00:53:21
Другие видео канала





















