From AlphaGo to MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
3/31/2021 Colloquium
Speaker: Thore Graepel (DeepMind/UCL)
Title: From AlphaGo to MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Видео From AlphaGo to MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model канала Harvard CMSA
Speaker: Thore Graepel (DeepMind/UCL)
Title: From AlphaGo to MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Видео From AlphaGo to MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model канала Harvard CMSA
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned ModelDavid Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86MuZero: DeepMind’s New AI Mastered More Than 50 GamesThe Evolution of AlphaGo to MuZeroHow is This Possible? | AlphaZero Shows Us the WayEfficientZero: Mastering Atari Games with Limited Data (Machine Learning Research Paper Explained)This Superhuman Poker AI Was Trained in 20 HoursMuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | RL Paper explainedDeepMind Made A Superhuman AI For 57 Atari Games! 🕹Why did Lee Sedol, one of the world’s best ‘Go’ players, retire from the game?Garry Kasparov: IBM Deep Blue, AlphaZero, and the Limits of AI in Open Systems | AI Podcast ClipsMuZero - ICAPS 2020AlphaGo & Deep Learning - ComputerphileAlpha Zero's "Immortal Zugzwang Game" against StockfishDeep Mind AI Alpha Zero Sacrifices a Pawn and Cripples Stockfish for the Entire GameGoogle Deep Mind AI Alpha Zero Devours StockfishAlphaZero vs AlphaZero || THE PERFECT GAMEDr Demis Hassabis, Co-founder and CEO of DeepMind speaking at CSARMuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned ModelNew DeepMind AI Beats AlphaGo 100-0 | Two Minute Papers #201