Загрузка страницы

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

The goal of hierarchical reinforcement learning is to divide a task into different levels of coarseness with the top-level agent planning only over a high-level view of the world and each subsequent layer having a more detailed view. This paper proposes to learn a set of important states as well as their connections to each other as a high-level abstraction.

https://arxiv.org/abs/1907.00664

Abstract:
In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train a latent pivotal state model and a curiosity-driven goal-conditioned policy in a task-agnostic manner. Second, provided with the information from the world graph, a high-level Manager quickly finds solution to new tasks and expresses subgoals in reference to pivotal states to a low-level Worker. The Worker can then also leverage the graph to easily traverse to the pivotal states of interest, even across long distance, and explore non-locally. We perform a thorough ablation study to evaluate our approach on a suite of challenging maze tasks, demonstrating significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency.

Authors: Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

Видео Learning World Graphs to Accelerate Hierarchical Reinforcement Learning канала Yannic Kilcher
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
8 августа 2019 г. 13:56:43
00:18:39
Другие видео канала
WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)Datasets for Data-Driven Reinforcement LearningDatasets for Data-Driven Reinforcement LearningReinforcement Learning with Augmented Data (Paper Explained)Reinforcement Learning with Augmented Data (Paper Explained)The Odds are Odd: A Statistical Test for Detecting Adversarial ExamplesThe Odds are Odd: A Statistical Test for Detecting Adversarial ExamplesAMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (Paper Explained)AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (Paper Explained)Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)Gradient Origin Networks (Paper Explained w/ Live Coding)Gradient Origin Networks (Paper Explained w/ Live Coding)Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolationALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolationListening to You! - Channel Update (Author Interviews)Listening to You! - Channel Update (Author Interviews)[ML News] Uber: Deep Learning for ETA | MuZero Video Compression  | Block-NeRF | EfficientNet-X[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-XOn the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)Growing Neural Cellular AutomataGrowing Neural Cellular Automata[ML News] DeepMind's Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKL[ML News] DeepMind's Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKLAvoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)
Яндекс.Метрика