Загрузка страницы

Anirbit Mukherjee - Provable Training of Neural Nets With One Layer of Activation

Abstract: Provable neural training is a fundamental challenge in the field of deep-learning theory – and it largely remains an open question for almost any neural net of practical relevance. The quest for provable convergence for neural training algorithms almost always leads to exciting new questions in mathematics. In this talk, I shall give an overview of three convergence proofs of ours in this territory: (1) In 2016 we had shown the first deterministic algorithm that converges to the exact global minima of any convex loss function for any depth 2 ReLU neural net for any training data in time that is only polynomial in the training data size. (2) In 2020 we showed the first stochastic algorithm that converges to the global minima of a single ReLU gate in linear time (exponentially fast convergence) for realizable data whilst not assuming any specific distribution for the inputs. (3) In 2022, in a first-of-its-kind result we leveraged the theory of SDEs and Villani functions to show that SGD converges to the global minima of an appropriately Frobenius norm regularized squared loss on any depth 2 neural net with tanh or sigmoid activations – for arbitrary width and data. We shall end the talk delineating various open questions in this direction that can possibly be tackled in the near future.

Видео Anirbit Mukherjee - Provable Training of Neural Nets With One Layer of Activation канала One world theoretical machine learning
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
9 марта 2023 г. 23:58:14
00:43:24
Другие видео канала
Lukasz Szpruch - Mean-Field Neural ODEs, Relaxed Control and Generalization ErrorsLukasz Szpruch - Mean-Field Neural ODEs, Relaxed Control and Generalization ErrorsMatthew Colbrook - Smale’s 18th Problem and the Barriers of Deep LearningMatthew Colbrook - Smale’s 18th Problem and the Barriers of Deep LearningYu Bai - How Important is the Train-Validation Split in Meta-Learning?Yu Bai - How Important is the Train-Validation Split in Meta-Learning?Anna Korba - Kernel Stein Discrepancy DescentAnna Korba - Kernel Stein Discrepancy DescentKevin Miller - Ensuring Exploration and Exploitation in Graph-Based Active LearningKevin Miller - Ensuring Exploration and Exploitation in Graph-Based Active LearningTheo Bourdais - Computational Hypergraph Discovery, a Gaussian Process frameworkTheo Bourdais - Computational Hypergraph Discovery, a Gaussian Process frameworkYaoqing Yang - Predicting & improving generalization by measuring loss landscapes & weight matricesYaoqing Yang - Predicting & improving generalization by measuring loss landscapes & weight matricesKonstantinos Spiliopoulos - Mean field limits of neural networks: typical behavior and fluctuationsKonstantinos Spiliopoulos - Mean field limits of neural networks: typical behavior and fluctuationsNadia Drenska - A PDE Interpretation of Prediction with Expert AdviceNadia Drenska - A PDE Interpretation of Prediction with Expert AdviceMatthias Ehrhardt - Bilevel Learning for Inverse ProblemsMatthias Ehrhardt - Bilevel Learning for Inverse ProblemsPeter Richtarik - The Resolution of a Question Related to Local Training in Federated LearningPeter Richtarik - The Resolution of a Question Related to Local Training in Federated LearningMarcus Hutter - Testing Independence of Exchangeable Random VariablesMarcus Hutter - Testing Independence of Exchangeable Random VariablesYury Korolev - Approximation properties of two-layer neural networks with values in a Banach spaceYury Korolev - Approximation properties of two-layer neural networks with values in a Banach spaceSophie Langer - Circumventing the curse of dimensionality with deep neural networksSophie Langer - Circumventing the curse of dimensionality with deep neural networksStephan Mandt - Compressing Variational Bayes: From neural data compression to video predictionStephan Mandt - Compressing Variational Bayes: From neural data compression to video predictionDerek Driggs - Barriers to Deploying Deep Learning Models During the COVID-19 PandemicDerek Driggs - Barriers to Deploying Deep Learning Models During the COVID-19 PandemicGal Vardi - Implications of the implicit bias in neural networksGal Vardi - Implications of the implicit bias in neural networksZiwei Ji - The dual of the margin: improved analyses and rates for gradient descent’s implicit biasZiwei Ji - The dual of the margin: improved analyses and rates for gradient descent’s implicit biasQi Lei - Predicting What You Already Know Helps: Provable Self-Supervised LearningQi Lei - Predicting What You Already Know Helps: Provable Self-Supervised LearningAlessandro Scagliotti - Deep Learning Approximation of Diffeomorphisms via Linear-Control SystemsAlessandro Scagliotti - Deep Learning Approximation of Diffeomorphisms via Linear-Control Systems
Яндекс.Метрика