Загрузка страницы

Rama Cont - Asymptotic Analysis of Deep Residual Networks

Presentation given by Rama Cont on 27th October in the one world seminar on the mathematics of machine learning on the topic "Asymptotic Analysis of Deep Residual Networks".

Abstract: Residual networks (ResNets) have displayed impressive results in pattern recognition and, recently, have garnered considerable theoretical interest due to a perceived link with neural ordinary differential equations (neural ODEs). This link relies on the convergence of network weights to a smooth function as the number of layers increases. We investigate the properties of weights trained by stochastic gradient descent and their scaling with network depth through detailed numerical experiments. We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature: one may obtain an alternative ODE limit, a stochastic differential equation or neither of these. The scaling regime one ends up with depends on certain features of the network architecture, such as the smoothness of the activation function. These findings cast doubts on the validity of the neural ODE model as an adequate asymptotic description of deep ResNets and point to an alternative class of differential equations as a better description of the deep network limit. In the case where the scaling limit is a stochastic differential equation, the deep network limit is shown to be described by a system of forward-backward stochastic differential equations. Joint work with: Alain-Sam Cohen (InstaDeep Ltd), Alain Rossier (Oxford), RenYuan Xu (University of Southern California).

Видео Rama Cont - Asymptotic Analysis of Deep Residual Networks канала One world theoretical machine learning
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
28 октября 2021 г. 23:29:08
00:57:39
Другие видео канала
Lukasz Szpruch - Mean-Field Neural ODEs, Relaxed Control and Generalization ErrorsLukasz Szpruch - Mean-Field Neural ODEs, Relaxed Control and Generalization ErrorsMatthew Colbrook - Smale’s 18th Problem and the Barriers of Deep LearningMatthew Colbrook - Smale’s 18th Problem and the Barriers of Deep LearningYu Bai - How Important is the Train-Validation Split in Meta-Learning?Yu Bai - How Important is the Train-Validation Split in Meta-Learning?Anna Korba - Kernel Stein Discrepancy DescentAnna Korba - Kernel Stein Discrepancy DescentAnirbit Mukherjee - Provable Training of Neural Nets With One Layer of ActivationAnirbit Mukherjee - Provable Training of Neural Nets With One Layer of ActivationKevin Miller - Ensuring Exploration and Exploitation in Graph-Based Active LearningKevin Miller - Ensuring Exploration and Exploitation in Graph-Based Active LearningTheo Bourdais - Computational Hypergraph Discovery, a Gaussian Process frameworkTheo Bourdais - Computational Hypergraph Discovery, a Gaussian Process frameworkYaoqing Yang - Predicting & improving generalization by measuring loss landscapes & weight matricesYaoqing Yang - Predicting & improving generalization by measuring loss landscapes & weight matricesHao Ni - Path development network for sequential data analysisHao Ni - Path development network for sequential data analysisKonstantinos Spiliopoulos - Mean field limits of neural networks: typical behavior and fluctuationsKonstantinos Spiliopoulos - Mean field limits of neural networks: typical behavior and fluctuationsNadia Drenska - A PDE Interpretation of Prediction with Expert AdviceNadia Drenska - A PDE Interpretation of Prediction with Expert AdviceMatthias Ehrhardt - Bilevel Learning for Inverse ProblemsMatthias Ehrhardt - Bilevel Learning for Inverse ProblemsPeter Richtarik - The Resolution of a Question Related to Local Training in Federated LearningPeter Richtarik - The Resolution of a Question Related to Local Training in Federated LearningMarcus Hutter - Testing Independence of Exchangeable Random VariablesMarcus Hutter - Testing Independence of Exchangeable Random VariablesYury Korolev - Approximation properties of two-layer neural networks with values in a Banach spaceYury Korolev - Approximation properties of two-layer neural networks with values in a Banach spaceSophie Langer - Circumventing the curse of dimensionality with deep neural networksSophie Langer - Circumventing the curse of dimensionality with deep neural networksStephan Mandt - Compressing Variational Bayes: From neural data compression to video predictionStephan Mandt - Compressing Variational Bayes: From neural data compression to video predictionDerek Driggs - Barriers to Deploying Deep Learning Models During the COVID-19 PandemicDerek Driggs - Barriers to Deploying Deep Learning Models During the COVID-19 PandemicGal Vardi - Implications of the implicit bias in neural networksGal Vardi - Implications of the implicit bias in neural networksZiwei Ji - The dual of the margin: improved analyses and rates for gradient descent’s implicit biasZiwei Ji - The dual of the margin: improved analyses and rates for gradient descent’s implicit biasQi Lei - Predicting What You Already Know Helps: Provable Self-Supervised LearningQi Lei - Predicting What You Already Know Helps: Provable Self-Supervised Learning
Яндекс.Метрика