Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru

In classic optimization, second-order methods and their variants (Hessian-free Newton, natural gradient, L-BFGS and others) form a state-of-the-art approach - they do not need any user-tunable parameters and can outperform by far simple strategies like gradient descent. However, application of these methods in stochastic and non-convex setting ( the most notable example is learning neural nets) remains a very challenging problem. Numerous attempts in this field (e.g. stochastic L-BFGS, Hessian-free for deep learning) hasn’t led to successful and popular algorithm, and thus many practitioners still prefer using here plain stochastic gradient descent (SGD) and its simple modifications.
Recently, a new second-order optimization method with Kronecker-factored approximate Fisher matrix (K-FAC) has been proposed. One of the key advantages of this method is its high-quality approximation for full Fisher matrix based on information from stochastic mini-batches. The iteration complexity and memory cost of the method is only a constant factor higher comparing to SGD. However, thanks to using second-order information, in practice the new method requires several orders less iterations for convergence and has no problem-specific parameters.
In my talk, I tell about basics of natural gradient approach, K-FAC approximation ideas and introduce the resulting algorithm for optimization of fully connected neural networks. Besides, I give a glimpse on KFC approximation - a recent modification of this method for convolutional networks.

Видео Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru канала Arsenii Ashukha

Показать

Комментарии отсутствуют