Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen
Table of Contents (powered by https://videoken.com)
0:00:00 Speaker Introduction
0:01:22 Introduction to Reinforcement Learning: Part 1 - prediction, Value-Based, Model-free, Control (including DQN)
0:05:18 Reinforcement Learning
0:07:01 Example: AlphaGo & AlphaZero
0:12:39 Key Features of RL
0:14:03 Reinforcement Learning
0:14:36 Example: TD-Gammon
0:16:49 Some RL Successes
0:24:44 Computational framework
0:25:37 The Agent-Environment Interface
0:27:21 Supervised vs Reinforcement Learning
0:28:40 Agent's learning task
0:29:34 Return
0:30:59 Episodic Tasks
0:31:19 Example: Mountain Car
0:35:40 Continuing Tasks
0:40:58 4 value functions
0:44:24 Value function approximation
0:45:00 A natural objective in VFA is to minimize the Mean Square Value Error
0:46:02 Simple Monte Carlo
0:48:55 Gradient MC works well on the 1000-state random walk using state aggregation
0:51:09 Markov Decision Processes
0:53:24 Optimal Value Functions
0:54:40 What About Optimal Action-Value Functions?
0:55:20 Bellman Equation for a Policy
0:57:37 cf. Dynamic Programming
0:58:56 Recall: Monte Carlo
0:59:29 Simplest TD Method
1:01:34 TD Prediction
1:03:24 You are the Predictor
1:06:03 TD vs MC
1:07:22 Semi-gradient TD is less accurate than MC on the 1000-state random walk using state aggregation
1:09:00 n-step TD Prediction
1:11:13 Mathematics of n-step TD Targets
1:12:13 The λ-return is a compound update target
1:12:45 Unified View
1:22:48 Value function approximation (VFA) replaces the table with a general parameterized form
1:23:03 Stochastic Gradient Descent (SGD) is the idea behind most approximate learning
1:25:07 Geometric intuition
1:29:30 TD converges to the TD fixedpoint, OTD a biased but interesting answer
1:32:35 Summing up policy evaluation
1:33:56 TD(λ) performance with a
1:34:31 Q&A
Видео Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen канала virtual mlss2020
0:00:00 Speaker Introduction
0:01:22 Introduction to Reinforcement Learning: Part 1 - prediction, Value-Based, Model-free, Control (including DQN)
0:05:18 Reinforcement Learning
0:07:01 Example: AlphaGo & AlphaZero
0:12:39 Key Features of RL
0:14:03 Reinforcement Learning
0:14:36 Example: TD-Gammon
0:16:49 Some RL Successes
0:24:44 Computational framework
0:25:37 The Agent-Environment Interface
0:27:21 Supervised vs Reinforcement Learning
0:28:40 Agent's learning task
0:29:34 Return
0:30:59 Episodic Tasks
0:31:19 Example: Mountain Car
0:35:40 Continuing Tasks
0:40:58 4 value functions
0:44:24 Value function approximation
0:45:00 A natural objective in VFA is to minimize the Mean Square Value Error
0:46:02 Simple Monte Carlo
0:48:55 Gradient MC works well on the 1000-state random walk using state aggregation
0:51:09 Markov Decision Processes
0:53:24 Optimal Value Functions
0:54:40 What About Optimal Action-Value Functions?
0:55:20 Bellman Equation for a Policy
0:57:37 cf. Dynamic Programming
0:58:56 Recall: Monte Carlo
0:59:29 Simplest TD Method
1:01:34 TD Prediction
1:03:24 You are the Predictor
1:06:03 TD vs MC
1:07:22 Semi-gradient TD is less accurate than MC on the 1000-state random walk using state aggregation
1:09:00 n-step TD Prediction
1:11:13 Mathematics of n-step TD Targets
1:12:13 The λ-return is a compound update target
1:12:45 Unified View
1:22:48 Value function approximation (VFA) replaces the table with a general parameterized form
1:23:03 Stochastic Gradient Descent (SGD) is the idea behind most approximate learning
1:25:07 Geometric intuition
1:29:30 TD converges to the TD fixedpoint, OTD a biased but interesting answer
1:32:35 Summing up policy evaluation
1:33:56 TD(λ) performance with a
1:34:31 Q&A
Видео Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen канала virtual mlss2020
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
MLSS 2020 Feedback VideoMLSS2020 Talent ShowQuantum Machine Learning - Maria Schuld - MLSS 2020, TübingenVlog 10 Shakir, Maria, and MichaelVlog 09 Shakir, Doina, Michael, and MariaBayesian Inference, part 2 - Shakir Mohamed - MLSS 2020, TübingenVlog 08 Mihaela & ArthurDeep Reinforcement Learning, part 2 - Doina Precup - MLSS 2020, TübingenMachine Learning for Healthcare, part 2 - Mihaela van der Schaar - MLSS 2020, TübingenBayesian Inference, part 1 - Shakir Mohamed - MLSS 2020, TübingenVlog 07 Arthur, Mihaela, Yee Whye, and YoshuaGeometric Deep Learning - Michael Bronstein - MLSS 2020, TübingenMachine Learning for Healthcare, part 1 - Mihaela van der Schaar - MLSS 2020, TübingenVlog 06 Yoshua & Yee WhyeMeta Learning, part 2 - Yee Whye Teh - MLSS 2020, TübingenKernel Methods, part 2 - Arthur Gretton - MLSS 2020, TübingenDeep Learning, part 2 - Yoshua Bengio - MLSS 2020, TübingenVlog 05 Constantinos, Francis, and MarcoKernel Methods, part 1 - Arthur Gretton - MLSS 2020, TübingenMeta Learning, part 1 - Yee Whye Teh - MLSS 2020, Tübingen