MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
https://arxiv.org/abs/1911.08265
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Видео MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model канала Julian Schrittwieser
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Видео MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model канала Julian Schrittwieser
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![MuZero - ICAPS 2020](https://i.ytimg.com/vi/L0A86LmH7Yw/default.jpg)
![DeepMind Made A Superhuman AI For 57 Atari Games! 🕹](https://i.ytimg.com/vi/dJ4rWhpAGFI/default.jpg)
![MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://i.ytimg.com/vi/We20YSAJZSE/default.jpg)
![Deepmind AlphaZero - Mastering Games Without Human Knowledge](https://i.ytimg.com/vi/Wujy7OzvdJk/default.jpg)
![AlphaGo - The Movie | Full Documentary](https://i.ytimg.com/vi/WXuK6gekU1Y/default.jpg)
![David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86](https://i.ytimg.com/vi/uPUEq8d73JI/default.jpg)
![AlphaZero: Shedding new light on the grand games of chess, shogi and Go](https://i.ytimg.com/vi/7L2sUGcOgh0/default.jpg)
!["Exactly How to Attack" | DeepMind's AlphaZero vs. Stockfish](https://i.ytimg.com/vi/bo5plUo86BU/default.jpg)
![Julian Schrittwieser – MuZero, Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://i.ytimg.com/vi/bT877A_kPNM/default.jpg)
![Can we build human-like AI? Should we? | Harri Valpola | TEDxHelsinkiUniversity](https://i.ytimg.com/vi/GUApSRMWxqo/default.jpg)
![Google Deepmind's AlphaZero Chess Engine Makes "Inhuman" Knight Sacrifice](https://i.ytimg.com/vi/YgZEaP6Qte0/default.jpg)
![AI vs. Human: The Greatest Go Tournament Ever](https://i.ytimg.com/vi/KsbQ_HNX6Pg/default.jpg)
![Deep Mind AI Alpha Zero Sacrifices a Pawn and Cripples Stockfish for the Entire Game](https://i.ytimg.com/vi/7-MborNxYWE/default.jpg)
![How Well Can an AI Learn Physics? ⚛](https://i.ytimg.com/vi/2Bw5f4vYL98/default.jpg)
![Charles Blundell - Agent57: Outperforming the Atari Human Benchmark](https://i.ytimg.com/vi/VQEg8aSpXcU/default.jpg)
![CS 181V Reinforcement Learning—Lecture 25(HMC Spring 2020): AlphaGo Zero, Alpha Zero, and Mu Zero](https://i.ytimg.com/vi/dFzFGn87wGM/default.jpg)
![MuZero: DeepMind’s New AI Mastered More Than 50 Games](https://i.ytimg.com/vi/hYV4-m7_SK8/default.jpg)
![AlphaGo Zero Tutorial Part 1 - Overview](https://i.ytimg.com/vi/MPXGiowUr0o/default.jpg)
![This Superhuman Poker AI Was Trained in 20 Hours](https://i.ytimg.com/vi/u90TbxK7VEA/default.jpg)
![Why did Lee Sedol, one of the world’s best ‘Go’ players, retire from the game?](https://i.ytimg.com/vi/PUaCQUal7rM/default.jpg)