Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Training more effective learned optimizers, and using them to train themselves (Paper Explained)

#ai #research #optimization

Optimization is still the domain of hand-crafted, simple algorithms. An ML engineer not only has to pick a suitable one for their problem but also often do grid-search over various hyper-parameters. This paper proposes to learn a single, unified optimization algorithm, given not by an equation, but by an LSTM-based neural network, to act as an optimizer for any deep learning problem, and ultimately to optimize itself.

OUTLINE:
0:00 - Intro & Outline
2:20 - From Hand-Crafted to Learned Features
4:25 - Current Optimization Algorithm
9:40 - Learned Optimization
15:50 - Optimizer Architecture
22:50 - Optimizing the Optimizer using Evolution Strategies
30:30 - Task Dataset
34:00 - Main Results
36:50 - Implicit Regularization in the Learned Optimizer
41:05 - Generalization across Tasks
41:40 - Scaling Up
45:30 - The Learned Optimizer Trains Itself
47:20 - Pseudocode
49:45 - Broader Impact Statement
52:55 - Conclusion & Comments

Paper: https://arxiv.org/abs/2009.11243

Abstract:
Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Видео Training more effective learned optimizers, and using them to train themselves (Paper Explained) канала Yannic Kilcher

Информация о видео

3 октября 2020 г. 21:06:32

00:53:36

Yannic Kilcher

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Training more effective learned optimizers, and using them to train themselves (Paper Explained)

Blockwise Parallel Decoding for Deep Autoregressive Models

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)

[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)

Gradient Origin Networks (Paper Explained w/ Live Coding)

World Models

Feature Visualization & The OpenAI microscope

Weight Standardization (Paper Explained)

On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)

Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)

Growing Neural Cellular Automata

Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)

Listening to You! - Channel Update (Author Interviews)

[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-X

[ML News] DeepMind's Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKL

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (Paper Explained)

Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)

Until the Litter End

WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)