Загрузка страницы

Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru

In classic optimization, second-order methods and their variants (Hessian-free Newton, natural gradient, L-BFGS and others) form a state-of-the-art approach - they do not need any user-tunable parameters and can outperform by far simple strategies like gradient descent. However, application of these methods in stochastic and non-convex setting ( the most notable example is learning neural nets) remains a very challenging problem. Numerous attempts in this field (e.g. stochastic L-BFGS, Hessian-free for deep learning) hasn’t led to successful and popular algorithm, and thus many practitioners still prefer using here plain stochastic gradient descent (SGD) and its simple modifications.
Recently, a new second-order optimization method with Kronecker-factored approximate Fisher matrix (K-FAC) has been proposed. One of the key advantages of this method is its high-quality approximation for full Fisher matrix based on information from stochastic mini-batches. The iteration complexity and memory cost of the method is only a constant factor higher comparing to SGD. However, thanks to using second-order information, in practice the new method requires several orders less iterations for convergence and has no problem-specific parameters.
In my talk, I tell about basics of natural gradient approach, K-FAC approximation ideas and introduce the resulting algorithm for optimization of fully connected neural networks. Besides, I give a glimpse on KFC approximation - a recent modification of this method for convolutional networks.

Видео Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru канала Arsenii Ashukha
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
9 октября 2016 г. 16:24:51
02:09:35
Другие видео канала
Sergey Bartunov: One-shot generative modelling, bayesgroup.ruSergey Bartunov: One-shot generative modelling, bayesgroup.ruNormalization propagation, Alexander Novikov, bayesgroup.ruNormalization propagation, Alexander Novikov, bayesgroup.ruKirill Struminsky: Discrete variational autoencoders, bayesgroup.ruKirill Struminsky: Discrete variational autoencoders, bayesgroup.ruDJI Mini 3 Pro Vertical Shoot Sample Footage 4KDJI Mini 3 Pro Vertical Shoot Sample Footage 4KДмитрий Ульянов: Image Style Transfer, Neural Doodles & Texture SynthesisДмитрий Ульянов: Image Style Transfer, Neural Doodles & Texture SynthesisVladimir Smolin: Табличное представление и интерполирование функций нейросетямиVladimir Smolin: Табличное представление и интерполирование функций нейросетямиI Just Have Visited Mars 👽 4k DJI Mini 2I Just Have Visited Mars 👽 4k DJI Mini 2DJI Mini 3 Pro Vertical Shoot Sample Footage 4KDJI Mini 3 Pro Vertical Shoot Sample Footage 4KНовогодний Коллоквиум 2, bayesgroup.ruНовогодний Коллоквиум 2, bayesgroup.ruNeural networks. from: LSTM, to: Neural Computer, Danil  Polykovskiy, bayesgroup.ruNeural networks. from: LSTM, to: Neural Computer, Danil Polykovskiy, bayesgroup.ruArtyom Gadetsky: AlphaGo or how Deepmind taught machine to win, bayesgroup.ruArtyom Gadetsky: AlphaGo or how Deepmind taught machine to win, bayesgroup.ruAlexander Panin:  Variational Information Maximizing Exploration, bayesgroup.ruAlexander Panin: Variational Information Maximizing Exploration, bayesgroup.ruНовогодний Коллоквиум 1, bayesgroup.ruНовогодний Коллоквиум 1, bayesgroup.ruDmitry Molchanov: Variational Dropout for Deep Neural Networks and Linear Model, bayesgroup.ruDmitry Molchanov: Variational Dropout for Deep Neural Networks and Linear Model, bayesgroup.ruWater Festival VardavarWater Festival VardavarVladimir Spokoiny: Clustering using adaptive weights, bayesgroup.ruVladimir Spokoiny: Clustering using adaptive weights, bayesgroup.ruAlexander Gasnikov: Современные численные методы стохастической оптимизации, bayesgroup.ruAlexander Gasnikov: Современные численные методы стохастической оптимизации, bayesgroup.ruАлександр Панин: Обучение с подкреплениемАлександр Панин: Обучение с подкреплениемDmitry Ulyanov: Image artistic style transfer, neural doodles and texture synthesis, bayesgroup.ruDmitry Ulyanov: Image artistic style transfer, neural doodles and texture synthesis, bayesgroup.ruVlad Shahuro: Training generative neural networks using Maximum Mean DiscrepancyVlad Shahuro: Training generative neural networks using Maximum Mean Discrepancy
Яндекс.Метрика