Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen

Table of Contents (powered by https://videoken.com)
0:00:00 Speaker Introduction
0:01:22 Introduction to Reinforcement Learning: Part 1 - prediction, Value-Based, Model-free, Control (including DQN)
0:05:18 Reinforcement Learning
0:07:01 Example: AlphaGo & AlphaZero
0:12:39 Key Features of RL
0:14:03 Reinforcement Learning
0:14:36 Example: TD-Gammon
0:16:49 Some RL Successes
0:24:44 Computational framework
0:25:37 The Agent-Environment Interface
0:27:21 Supervised vs Reinforcement Learning
0:28:40 Agent's learning task
0:29:34 Return
0:30:59 Episodic Tasks
0:31:19 Example: Mountain Car
0:35:40 Continuing Tasks
0:40:58 4 value functions
0:44:24 Value function approximation
0:45:00 A natural objective in VFA is to minimize the Mean Square Value Error
0:46:02 Simple Monte Carlo
0:48:55 Gradient MC works well on the 1000-state random walk using state aggregation
0:51:09 Markov Decision Processes
0:53:24 Optimal Value Functions
0:54:40 What About Optimal Action-Value Functions?
0:55:20 Bellman Equation for a Policy
0:57:37 cf. Dynamic Programming
0:58:56 Recall: Monte Carlo
0:59:29 Simplest TD Method
1:01:34 TD Prediction
1:03:24 You are the Predictor
1:06:03 TD vs MC
1:07:22 Semi-gradient TD is less accurate than MC on the 1000-state random walk using state aggregation
1:09:00 n-step TD Prediction
1:11:13 Mathematics of n-step TD Targets
1:12:13 The λ-return is a compound update target
1:12:45 Unified View
1:22:48 Value function approximation (VFA) replaces the table with a general parameterized form
1:23:03 Stochastic Gradient Descent (SGD) is the idea behind most approximate learning
1:25:07 Geometric intuition
1:29:30 TD converges to the TD fixedpoint, OTD a biased but interesting answer
1:32:35 Summing up policy evaluation
1:33:56 TD(λ) performance with a
1:34:31 Q&A

Видео Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen канала virtual mlss2020

Показать

Комментарии отсутствуют