Загрузка страницы

Intuition of the DQN algorithm

Back in 2015, Deep Q Networks (DQN) had created quite a splash in the AI world. DQN was certainly not the first to combine neural network with Reinforcement Learning. However, it was the first to successfully learn to win at a range of 1980s arcade games using only high-dimensional information (i.e., images of the screen) as input.

This video aims to help build an intuition of how the algorithm works. We would look at:
- the schematics of the network,
- the role of the replay memory,
- the significance of Q-values,
- the loss for the gradient updates, and
- the need for a target network.

Oh, and the best part, all this with Garfield helping us through it. 🤪

---------------

Hey there, you wonderful being! 👋
Hope you are all doing well. Thanks for stopping by. Hope you enjoyed the video and found it helpful. Hope you have a magical day. ✨

---------------

// Resources related to this topic that you might find interesting: 📑
1. Human-level control through deep reinforcement learning (the 2015 Nature paper): https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
2. Playing Atari with Deep Reinforcement Learning (the 2013 vanilla-DQN paper): https://arxiv.org/pdf/1312.5602.pdf
3. Formalisation of key concepts like return, state-action value function and so on: https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#key-concepts-and-terminology
4. A series of posts, by me, on DQN: https://www.saashanair.com/dqn-theory/
5. The GitHub repo associated with my blog posts: https://github.com/saashanair/rl-series/tree/master/dqn

// Timestamps ⌛️
0:00 DQN wins at Breakout
0:21 Hello there 👋
0:43 Say hello to Garfield 😸
1:07 Garfield Gridworld and the need for DQN
2:08 But we don’t have a labelled dataset here… 🤔
2:49 What is a Replay Memory? 🧠
3:39 What do Q-values represent: Intuition 🪄
4:14 What do Q-values represent: Formalisation ➗
4:37 Wait, Q-values are recursive 🌀
5:02 NN estimates Q-value
5:26 We finally have labels… 🎉
5:35 Erm, but the targets are estimates… (Bootstrapping) 🧐
5:55 DQN Paper: 2013 vs 2015 (Introducing the Target Network)
7:05 Updated loss function
7:15 Don’t forget to explore during training
8:04 Let’s summarise this information overload 😰
8:42 Don’t forget to check out the blog posts
8:52 Seeeeee ya 🤗

// Who am I? 👩‍💻
I respond to the names: Saasha, Sash and Nair. Unless, of course, I am lost in my own world, which, I must warn you, happens quite often. 🤪
I am interested in all things AI, especially topics relating to Reinforcement Learning and Safety. I am currently living in Pisa, Italy, where I am pursuing a year-long research fellowship at the Sant'Anna School of Advanced Studies. My work focuses on improving the trustworthiness and robustness of the DL-based components in Autonomous Vehicles. 🚗

// Why this YouTube channel? 🎥
First of all, good on you for taking your learning into your own hands. I am proud of you for wanting to expand your limits and for putting in the effort. But self-learning can be quite a lonely journey. So, if you are interested, let's claim this little slice of the internet as our own and build a community where our quirky nerdy selves can shine. Let's support one and other as we follow our curiosities and explore the vast and expansive world of AI. 💙

// Let's connect 📮
Website: https://www.saashanair.com
LinkedIn: https://www.linkedin.com/in/saashanair/
Twitter: https://twitter.com/nair_saasha
Mail: saasha.allthingsai@gmail.com

---------------

Credits

// Video snippets
Computer teaches itself to play games, BBC News: https://youtu.be/nwx96e7qck0
Deep Q-Network Plays Atari 2600 Pong: https://youtu.be/p88R2_3yWPA
Garfield VS Odie, Garfield & Friends: https://youtu.be/9BilR0kUkbs
No More Mondays!, Garfield & Friends: https://youtu.be/N-WE2KQT9lM
Garfield Eating Lasagna Compilation: https://youtu.be/8aKav5zEaTQ

// Music
Colorful Flowers by Tokyo Music Walker https://soundcloud.com/user-356546060
Creative Commons — Attribution 3.0 Unported — CC BY 3.0
Free Download / Stream: https://bit.ly/al-colorful-flowers
Music promoted by Audio Library https://youtu.be/vYp14UesizY

---------------

Subscriber count: 52

Видео Intuition of the DQN algorithm канала Saasha Nair
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
22 апреля 2021 г. 16:39:42
00:09:14
Яндекс.Метрика