Stefano Soatto: "Invariance and disentanglement in deep representations"
New Deep Learning Techniques 2018
"Invariance and disentanglement in deep representations"
Stefano Soatto, University of California, Los Angeles (UCLA)
Abstract: Theories of Deep Learning are like anatomical parts best not named explicitly in an abstract: Everyone seems to have one. That is why it is important for a theory to be inclusive: It has to be compatible with all known results, and at the very least explain known empirical phenomena. I will describe the basic elements of the Emegence Theory of Deep Learning, that started as a general theory for represenations, and is comprised of three parts: (1) Formalization of desirable properties a representation should possess, based on classical principles of statistical decision and information theory: Sufficiency, Invariance, Minimiality, Independence. This has nothing to do with Deep Leaerning, but is closely tied with the notion of Information Bottleneck and Variational Inference. (2) Description of common empirical losses employed in Deep Learning (e.g., empirical cross-entropy), and implicit or explicit regularization practices, including Dropout, Pooling, as well as recently proven additive entropic components of the loss computed by SGD. Finally, (3) theorems and bounds that show that minimizing suitably (implicitly or explicitly) regularized losses with SGD with respect of the weights implies optimization of the loss described in (1) with respect to the activations of a deep network, and therefore achievement of the desirable properties of the resulting representation formalized in (1). The link between the two is specific to the architecture of deep networks. The theory is related to the Information Bottleneck, but not that described in recent theories, but instead a new Information Bottleneck for the weights of a network, rater than the activation. It is also related to PAC-Bayes, and could be derived with that lens, providing independent validation. It is also related to Kolmogorov complexity. It is also related to “flat minima”, in the sense that the crucial regularizing quantity - the information in the weights - bounds the nuclear norm of the Hessian around critical points. It also shows that there is no need to rethink regularization, and that - unlike the Hessian - information is invariant to reparametrization.
Joint work with Alessandro Achille and Pratik Chaudhari.
References: https://arxiv.org/pdf/1706.01350.pdf and https://arxiv.org/abs/1710.11029
Institute for Pure and Applied Mathematics, UCLA
February 8, 2018
For more information: http://www.ipam.ucla.edu/programs/workshops/new-deep-learning-techniques/?tab=overview
Видео Stefano Soatto: "Invariance and disentanglement in deep representations" канала Institute for Pure & Applied Mathematics (IPAM)
"Invariance and disentanglement in deep representations"
Stefano Soatto, University of California, Los Angeles (UCLA)
Abstract: Theories of Deep Learning are like anatomical parts best not named explicitly in an abstract: Everyone seems to have one. That is why it is important for a theory to be inclusive: It has to be compatible with all known results, and at the very least explain known empirical phenomena. I will describe the basic elements of the Emegence Theory of Deep Learning, that started as a general theory for represenations, and is comprised of three parts: (1) Formalization of desirable properties a representation should possess, based on classical principles of statistical decision and information theory: Sufficiency, Invariance, Minimiality, Independence. This has nothing to do with Deep Leaerning, but is closely tied with the notion of Information Bottleneck and Variational Inference. (2) Description of common empirical losses employed in Deep Learning (e.g., empirical cross-entropy), and implicit or explicit regularization practices, including Dropout, Pooling, as well as recently proven additive entropic components of the loss computed by SGD. Finally, (3) theorems and bounds that show that minimizing suitably (implicitly or explicitly) regularized losses with SGD with respect of the weights implies optimization of the loss described in (1) with respect to the activations of a deep network, and therefore achievement of the desirable properties of the resulting representation formalized in (1). The link between the two is specific to the architecture of deep networks. The theory is related to the Information Bottleneck, but not that described in recent theories, but instead a new Information Bottleneck for the weights of a network, rater than the activation. It is also related to PAC-Bayes, and could be derived with that lens, providing independent validation. It is also related to Kolmogorov complexity. It is also related to “flat minima”, in the sense that the crucial regularizing quantity - the information in the weights - bounds the nuclear norm of the Hessian around critical points. It also shows that there is no need to rethink regularization, and that - unlike the Hessian - information is invariant to reparametrization.
Joint work with Alessandro Achille and Pratik Chaudhari.
References: https://arxiv.org/pdf/1706.01350.pdf and https://arxiv.org/abs/1710.11029
Institute for Pure and Applied Mathematics, UCLA
February 8, 2018
For more information: http://www.ipam.ucla.edu/programs/workshops/new-deep-learning-techniques/?tab=overview
Видео Stefano Soatto: "Invariance and disentanglement in deep representations" канала Institute for Pure & Applied Mathematics (IPAM)
Показать
Комментарии отсутствуют
Информация о видео
17 февраля 2018 г. 4:00:45
00:36:01
Другие видео канала
From Deep Learning of Disentangled Representations to Higher-level CognitionImplicit Neural Representations with Periodic Activation FunctionsNeural Tangent Kernel: Convergence and Generalization in Neural NetworksMachine Learning in Automated Mechanism Design for Pricing and Auctions (ICML 2018 tutorial)Implicit Neural Representations: From Objects to 3D ScenesVariational AutoencodersJuan Carrasquilla - Variational Neural Annealing - IPAM at UCLAStefano Soatto (UCLA): "Dynamics and Control of Differential Learning"Peter Scholze and Fermat's Last TheoremRecent Advances in Unsupervised Image-to-Image TranslationIDIES turbulence vortices movie10. Emergence of Invariance and Disentangling in Deep Representations. Alessandro AchilleThe Universal Approximation Theorem for neural networksRobert Webber - Approximate matrix eigenvalues, subspace iteration w/ repeated random sparsificationVariational Methods for Computer Vision - Lecture 1 (Prof. Daniel Cremers)Functional Kernel Analysis of Neural Networks (ft. Arthur Jacot)Understanding Disentanglement and review of beta-VAE, Factor-VAE, beta-TCVAE and DIP-VAE papersLucas Wagner - Compact representations of excited states from QMC, as data for low-energy modelsBernhard Schölkopf: Learning Causal Mechanisms (ICLR invited talk)