Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL!
In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.
After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.
If you want to support this channel, here is my patreon link:
https://patreon.com/ArxivInsights --- You are amazing!! ;)
If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: https://pensight.com/x/xander-steenbrugge
Links mentioned in the video:
⦁ PPO paper: https://arxiv.org/abs/1707.06347
⦁ TRPO paper: https://arxiv.org/abs/1502.05477
⦁ OpenAI PPO blogpost: https://blog.openai.com/openai-baselines-ppo/
⦁ Aurelien Geron: KL divergence and entropy in ML: https://youtu.be/ErfnhcEV1O8
⦁ Deep RL Bootcamp - Lecture 5: https://youtu.be/xvRrgxcpaHY
⦁ RL-adventure PyTorch implementation: https://github.com/higgsfield/RL-Adventure-2
⦁ OpenAI Baselines TensorFlow implementation: https://github.com/openai/baselines
Видео Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL! канала Arxiv Insights
After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.
If you want to support this channel, here is my patreon link:
https://patreon.com/ArxivInsights --- You are amazing!! ;)
If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: https://pensight.com/x/xander-steenbrugge
Links mentioned in the video:
⦁ PPO paper: https://arxiv.org/abs/1707.06347
⦁ TRPO paper: https://arxiv.org/abs/1502.05477
⦁ OpenAI PPO blogpost: https://blog.openai.com/openai-baselines-ppo/
⦁ Aurelien Geron: KL divergence and entropy in ML: https://youtu.be/ErfnhcEV1O8
⦁ Deep RL Bootcamp - Lecture 5: https://youtu.be/xvRrgxcpaHY
⦁ RL-adventure PyTorch implementation: https://github.com/higgsfield/RL-Adventure-2
⦁ OpenAI Baselines TensorFlow implementation: https://github.com/openai/baselines
Видео Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL! канала Arxiv Insights
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
An introduction to Reinforcement LearningOvercoming sparse rewards in Deep RL: Curiosity, hindsight & auxiliary tasks.OpenAI Five: Facing Human Pro's in Dota IIA Short Introduction to Entropy, Cross-Entropy and KL-DivergenceProximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO TutorialVariational AutoencodersThe Molecular Basis of Life, an introductionPolicy Gradient Theorem Explained - Reinforcement LearningActor Critic AlgorithmsQ Learning for TradingHow AlphaGo Zero works - Google DeepMindComparing humans with the best Reinforcement Learning algorithms'How neural networks learn' - Part III: The learning dynamics behind generalization and overfittingMIT 6.S191: Reinforcement LearningDeep RL Bootcamp Lecture 4A: Policy GradientsEverything You Need To Master Actor Critic Methods | Tensorflow 2 TutorialPython Reinforcement Learning Tutorial for Beginners in 25 MinutesThe Future of Deep Learning Research'How neural networks learn' - Part I: Feature VisualizationAn Introduction to Graph Neural Networks: Models and Applications