Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL!

In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.

After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.

If you want to support this channel, here is my patreon link:
https://patreon.com/ArxivInsights --- You are amazing!! ;)

If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: https://pensight.com/x/xander-steenbrugge

Links mentioned in the video:
⦁ PPO paper: https://arxiv.org/abs/1707.06347
⦁ TRPO paper: https://arxiv.org/abs/1502.05477
⦁ OpenAI PPO blogpost: https://blog.openai.com/openai-baselines-ppo/
⦁ Aurelien Geron: KL divergence and entropy in ML: https://youtu.be/ErfnhcEV1O8
⦁ Deep RL Bootcamp - Lecture 5: https://youtu.be/xvRrgxcpaHY
⦁ RL-adventure PyTorch implementation: https://github.com/higgsfield/RL-Adventure-2
⦁ OpenAI Baselines TensorFlow implementation: https://github.com/openai/baselines

Видео Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL! канала Arxiv Insights

Показать

Комментарии отсутствуют