mixup: Beyond Empirical Risk Minimization (Paper Explained)
Neural Networks often draw hard boundaries in high-dimensional space, which makes them very brittle. Mixup is a technique that linearly interpolates between data and labels at training time and achieves much smoother and more regular class boundaries.
OUTLINE:
0:00 - Intro
0:30 - The problem with ERM
2:50 - Mixup
6:40 - Code
9:35 - Results
https://arxiv.org/abs/1710.09412
Abstract:
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Видео mixup: Beyond Empirical Risk Minimization (Paper Explained) канала Yannic Kilcher
OUTLINE:
0:00 - Intro
0:30 - The problem with ERM
2:50 - Mixup
6:40 - Code
9:35 - Results
https://arxiv.org/abs/1710.09412
Abstract:
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Видео mixup: Beyond Empirical Risk Minimization (Paper Explained) канала Yannic Kilcher
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Manifold Mixup: Better Representations by Interpolating Hidden StatesFixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceEmpirical Risk MinimizationGroup Normalization (Paper Explained)A critical analysis of self-supervision, or what we can learn from a single image (Paper Explained)ISO Invariance Tutorial[ML News] Google introduces Pathways | OpenAI solves Math Problems | Meta goes First PersonML Basics: Losses & Risk MinimizationiMAML: Meta-Learning with Implicit Gradients (Paper Explained)An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)DETR: End-to-End Object Detection with Transformers (Paper Explained)Negative Data AugmentationVariational Autoencoders - EXPLAINED!Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)Synthesizer: Rethinking Self-Attention in Transformer Models (Paper Explained)Beyond Empirical Risk Minimization: the lessons of deep learningAdversarial Examples Are Not Bugs, They Are FeaturesGrokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)