- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Notebook 3: Reward Modeling — Part 2 of 2 | The Frontier Path
Part 2 of 2 of the complete Notebook 3 (Reward Modeling) walkthrough from The Frontier Path — every concept built from scratch and explained out loud.
▶ IN THIS PART
00:21 Reading The Curves
01:02 Dig Into The Scores
01:40 Length Bias
02:15 Sycophancy
02:54 Reward Hacking
03:33 More Ways It Breaks
04:13 Pairwise Vs Pointwise
04:51 The Kl Leash
05:27 Overoptimization Laws
06:11 Rm Vs Llm-As-Judge
06:52 Rlaif & The Trend
07:35 What Makes An Rm Good
08:21 Production Scale
09:04 Monitor & Retrain
10:25 You Now Get Reward Models
▶ FULL SERIES (all 2 parts in order)
https://www.youtube.com/playlist?list=PLTotE_hCoIRw
▶ RUN IT YOURSELF (free + MIT)
Notebook: https://github.com/mootvstherubric-l/frontier-ml-toolkit/blob/main/01-rlhf/notebooks/03-reward-modeling.ipynb
Colab: https://colab.research.google.com/github/mootvstherubric-l/frontier-ml-toolkit/blob/main/01-rlhf/notebooks/03-reward-modeling.ipynb
representative scenarios, not any company's real questions. ai-generated.
#machinelearning #mlinterview #frontierai #aiengineering #deeplearning
questions? dm @mootvstherubric on instagram: https://instagram.com/mootvstherubric
Видео Notebook 3: Reward Modeling — Part 2 of 2 | The Frontier Path канала moot-vs-the-rubric
▶ IN THIS PART
00:21 Reading The Curves
01:02 Dig Into The Scores
01:40 Length Bias
02:15 Sycophancy
02:54 Reward Hacking
03:33 More Ways It Breaks
04:13 Pairwise Vs Pointwise
04:51 The Kl Leash
05:27 Overoptimization Laws
06:11 Rm Vs Llm-As-Judge
06:52 Rlaif & The Trend
07:35 What Makes An Rm Good
08:21 Production Scale
09:04 Monitor & Retrain
10:25 You Now Get Reward Models
▶ FULL SERIES (all 2 parts in order)
https://www.youtube.com/playlist?list=PLTotE_hCoIRw
▶ RUN IT YOURSELF (free + MIT)
Notebook: https://github.com/mootvstherubric-l/frontier-ml-toolkit/blob/main/01-rlhf/notebooks/03-reward-modeling.ipynb
Colab: https://colab.research.google.com/github/mootvstherubric-l/frontier-ml-toolkit/blob/main/01-rlhf/notebooks/03-reward-modeling.ipynb
representative scenarios, not any company's real questions. ai-generated.
#machinelearning #mlinterview #frontierai #aiengineering #deeplearning
questions? dm @mootvstherubric on instagram: https://instagram.com/mootvstherubric
Видео Notebook 3: Reward Modeling — Part 2 of 2 | The Frontier Path канала moot-vs-the-rubric
Комментарии отсутствуют
Информация о видео
Вчера, 10:38:19
00:11:05
Другие видео канала





















