Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru
In classic optimization, second-order methods and their variants (Hessian-free Newton, natural gradient, L-BFGS and others) form a state-of-the-art approach - they do not need any user-tunable parameters and can outperform by far simple strategies like gradient descent. However, application of these methods in stochastic and non-convex setting ( the most notable example is learning neural nets) remains a very challenging problem. Numerous attempts in this field (e.g. stochastic L-BFGS, Hessian-free for deep learning) hasn’t led to successful and popular algorithm, and thus many practitioners still prefer using here plain stochastic gradient descent (SGD) and its simple modifications.
Recently, a new second-order optimization method with Kronecker-factored approximate Fisher matrix (K-FAC) has been proposed. One of the key advantages of this method is its high-quality approximation for full Fisher matrix based on information from stochastic mini-batches. The iteration complexity and memory cost of the method is only a constant factor higher comparing to SGD. However, thanks to using second-order information, in practice the new method requires several orders less iterations for convergence and has no problem-specific parameters.
In my talk, I tell about basics of natural gradient approach, K-FAC approximation ideas and introduce the resulting algorithm for optimization of fully connected neural networks. Besides, I give a glimpse on KFC approximation - a recent modification of this method for convolutional networks.
Видео Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru канала Arsenii Ashukha
Recently, a new second-order optimization method with Kronecker-factored approximate Fisher matrix (K-FAC) has been proposed. One of the key advantages of this method is its high-quality approximation for full Fisher matrix based on information from stochastic mini-batches. The iteration complexity and memory cost of the method is only a constant factor higher comparing to SGD. However, thanks to using second-order information, in practice the new method requires several orders less iterations for convergence and has no problem-specific parameters.
In my talk, I tell about basics of natural gradient approach, K-FAC approximation ideas and introduce the resulting algorithm for optimization of fully connected neural networks. Besides, I give a glimpse on KFC approximation - a recent modification of this method for convolutional networks.
Видео Dmitry Kropotov: Optimizing NN using Kronecker-factored Approximate Curvature, bayesgroup.ru канала Arsenii Ashukha
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Sergey Bartunov: One-shot generative modelling, bayesgroup.ru](https://i.ytimg.com/vi/XpIDCzwNe78/default.jpg)
![Normalization propagation, Alexander Novikov, bayesgroup.ru](https://i.ytimg.com/vi/-fpBFHti5ZE/default.jpg)
![Kirill Struminsky: Discrete variational autoencoders, bayesgroup.ru](https://i.ytimg.com/vi/c6GukeAkyVs/default.jpg)
![DJI Mini 3 Pro Vertical Shoot Sample Footage 4K](https://i.ytimg.com/vi/jhwZNjiKrjc/default.jpg)
![Дмитрий Ульянов: Image Style Transfer, Neural Doodles & Texture Synthesis](https://i.ytimg.com/vi/Zb32YICxytA/default.jpg)
![Vladimir Smolin: Табличное представление и интерполирование функций нейросетями](https://i.ytimg.com/vi/2687Qpd2hRQ/default.jpg)
![I Just Have Visited Mars 👽 4k DJI Mini 2](https://i.ytimg.com/vi/37WQA72bbMk/default.jpg)
![DJI Mini 3 Pro Vertical Shoot Sample Footage 4K](https://i.ytimg.com/vi/RBd4ZNDeo9s/default.jpg)
![Новогодний Коллоквиум 2, bayesgroup.ru](https://i.ytimg.com/vi/mrj_hyH974o/default.jpg)
![Neural networks. from: LSTM, to: Neural Computer, Danil Polykovskiy, bayesgroup.ru](https://i.ytimg.com/vi/otRoAQtc5Dk/default.jpg)
![Artyom Gadetsky: AlphaGo or how Deepmind taught machine to win, bayesgroup.ru](https://i.ytimg.com/vi/brcqU7yYhhI/default.jpg)
![Alexander Panin: Variational Information Maximizing Exploration, bayesgroup.ru](https://i.ytimg.com/vi/sRIjxxjVrnY/default.jpg)
![Новогодний Коллоквиум 1, bayesgroup.ru](https://i.ytimg.com/vi/Jh3D8Gi4N0I/default.jpg)
![Dmitry Molchanov: Variational Dropout for Deep Neural Networks and Linear Model, bayesgroup.ru](https://i.ytimg.com/vi/GFNtsYeD4L4/default.jpg)
![Water Festival Vardavar](https://i.ytimg.com/vi/VbH5uo3XodA/default.jpg)
![Vladimir Spokoiny: Clustering using adaptive weights, bayesgroup.ru](https://i.ytimg.com/vi/X2mCnZ8JMpQ/default.jpg)
![Alexander Gasnikov: Современные численные методы стохастической оптимизации, bayesgroup.ru](https://i.ytimg.com/vi/EoUHHpJxVtQ/default.jpg)
![Александр Панин: Обучение с подкреплением](https://i.ytimg.com/vi/TbU0uzNaUXQ/default.jpg)
![Dmitry Ulyanov: Image artistic style transfer, neural doodles and texture synthesis, bayesgroup.ru](https://i.ytimg.com/vi/Br9rySL7GbE/default.jpg)
![Vlad Shahuro: Training generative neural networks using Maximum Mean Discrepancy](https://i.ytimg.com/vi/n_L4Ik54km4/default.jpg)