Загрузка страницы

Training more effective learned optimizers, and using them to train themselves (Paper Explained)

#ai #research #optimization

Optimization is still the domain of hand-crafted, simple algorithms. An ML engineer not only has to pick a suitable one for their problem but also often do grid-search over various hyper-parameters. This paper proposes to learn a single, unified optimization algorithm, given not by an equation, but by an LSTM-based neural network, to act as an optimizer for any deep learning problem, and ultimately to optimize itself.

OUTLINE:
0:00 - Intro & Outline
2:20 - From Hand-Crafted to Learned Features
4:25 - Current Optimization Algorithm
9:40 - Learned Optimization
15:50 - Optimizer Architecture
22:50 - Optimizing the Optimizer using Evolution Strategies
30:30 - Task Dataset
34:00 - Main Results
36:50 - Implicit Regularization in the Learned Optimizer
41:05 - Generalization across Tasks
41:40 - Scaling Up
45:30 - The Learned Optimizer Trains Itself
47:20 - Pseudocode
49:45 - Broader Impact Statement
52:55 - Conclusion & Comments

Paper: https://arxiv.org/abs/2009.11243

Abstract:
Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Видео Training more effective learned optimizers, and using them to train themselves (Paper Explained) канала Yannic Kilcher
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
3 октября 2020 г. 21:06:32
00:53:36
Яндекс.Метрика