Policy Gradient Theorem Explained - Reinforcement Learning
In this video, I explain the policy gradient theorem used in reinforcement learning (RL). Instead of showing the typical mathematical derivation of the proof, I explain the resulting formula by walking through an example of playing a game and figuring out how we can estimate the policy gradient of the expected return by sampling episodes from the environment. I also show some graph visualizations that give an intuition for how the partial derivatives with respect to the action probabilities are backpropagated to get the correct policy gradient within the limited action space (where all probabilities have to sum to 1). I also explain how we can use the log probabilities instead of the direct probabilities (the log-derivative trick) for improved computational efficiency. I also walk through some pseudocode (Python / PyTorch inspired) of the derived policy gradient algorithm, which is a variant of the REINFORCE algorithm. And I show how we can reduce the variance by normalizing the future returns and dividing by the number of steps instead of the number of episodes.
Policy gradient methods are used in many of the current state-of-the-art reinforcement learning algorithms, and I think it is likely that policy gradient methods will play be an important role in advancing the field of RL. I'm excited to continue exploring this field and sharing what I learn along the way.
Join our Discord community:
💬 https://discord.gg/cdQhRgw
Connect with me:
🐦 Twitter - https://twitter.com/elliotwaite
📷 Instagram - https://www.instagram.com/elliotwaite
👱 Facebook - https://www.facebook.com/elliotwaite
💼 LinkedIn - https://www.linkedin.com/in/elliotwaite
🎵 Kazukii - Return
→ https://soundcloud.com/ohthatkazuki
→ https://open.spotify.com/artist/5d07MpiIaNmmEMTq79KAga
→ https://www.youtube.com/user/OfficialKazuki
Видео Policy Gradient Theorem Explained - Reinforcement Learning канала Elliot Waite
Policy gradient methods are used in many of the current state-of-the-art reinforcement learning algorithms, and I think it is likely that policy gradient methods will play be an important role in advancing the field of RL. I'm excited to continue exploring this field and sharing what I learn along the way.
Join our Discord community:
💬 https://discord.gg/cdQhRgw
Connect with me:
🐦 Twitter - https://twitter.com/elliotwaite
📷 Instagram - https://www.instagram.com/elliotwaite
👱 Facebook - https://www.facebook.com/elliotwaite
💼 LinkedIn - https://www.linkedin.com/in/elliotwaite
🎵 Kazukii - Return
→ https://soundcloud.com/ohthatkazuki
→ https://open.spotify.com/artist/5d07MpiIaNmmEMTq79KAga
→ https://www.youtube.com/user/OfficialKazuki
Видео Policy Gradient Theorem Explained - Reinforcement Learning канала Elliot Waite
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
An introduction to Reinforcement LearningBayes theoremPyTorch Hooks Explained - In-depth TutorialDeep RL Bootcamp Lecture 4A: Policy GradientsRichard Feynman: Can Machines Think?Derivative of Sigmoid and Softmax Explained VisuallySuper Custom Keyboard Shortcuts with HammerspoonRL Course by David Silver - Lecture 7: Policy Gradient MethodsThere is No Algorithm for Truth - with Tom ScottPolicy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL!Reinforcement Learning - "DDPG" explainedDeep Deterministic Policy GradientsSoftmax Function Explained In Depth with 3D VisualsStanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 - Policy Gradient IMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)Reinforcement Learning 6: Policy Gradients and Actor CriticsMIT 6.S191 (2019): Deep Reinforcement LearningPolicy Gradient Methods TutorialCS885 Lecture 7b: Actor CriticNumberphile's Beautiful Trigonometry Explained