Загрузка страницы

[Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1

I share my progress of implementing a research idea from scratch. I attempt to build an ensemble model out of students of label-free self-distillation without any additional data or augmentation. Turns out, it actually works, and interestingly, the more students I employ, the better the accuracy. This leads to the hypothesis that the ensemble effect is not a process of extracting more information from labels.

OUTLINE:
0:00 - Introduction
2:10 - Research Idea
4:15 - Adjusting the Codebase
25:00 - Teacher and Student Models
52:30 - Shipping to the Server
1:03:40 - Results
1:14:50 - Conclusion

Code: https://github.com/yk/PyTorch_CIFAR10

References:
My Video on SimCLRv2: https://youtu.be/2lkUNDZld-4
Born-Again Neural Networks: https://arxiv.org/abs/1805.04770
Deep Ensembles: A Loss Landscape Perspective: https://arxiv.org/abs/1912.02757

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher

Видео [Live Machine Learning Research] Plain Self-Ensembles (I actually DISCOVER SOMETHING) - Part 1 канала Yannic Kilcher
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
6 июля 2020 г. 18:23:54
01:18:43
Другие видео канала
Blockwise Parallel Decoding for Deep Autoregressive ModelsBlockwise Parallel Decoding for Deep Autoregressive ModelsWHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)Datasets for Data-Driven Reinforcement LearningDatasets for Data-Driven Reinforcement LearningReinforcement Learning with Augmented Data (Paper Explained)Reinforcement Learning with Augmented Data (Paper Explained)The Odds are Odd: A Statistical Test for Detecting Adversarial ExamplesThe Odds are Odd: A Statistical Test for Detecting Adversarial ExamplesRepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)On the Measure of Intelligence by François Chollet - Part 4: The ARC Challenge (Paper Explained)On the Measure of Intelligence by François Chollet - Part 4: The ARC Challenge (Paper Explained)Enhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their SolutionsEnhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their SolutionsAxial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)Longformer: The Long-Document TransformerLongformer: The Long-Document TransformerGradient Origin Networks (Paper Explained w/ Live Coding)Gradient Origin Networks (Paper Explained w/ Live Coding)Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)Feature Visualization & The OpenAI microscopeFeature Visualization & The OpenAI microscopeWeight Standardization (Paper Explained)Weight Standardization (Paper Explained)GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)GLOM: How to represent part-whole hierarchies in a neural network (Geoff Hinton's Paper Explained)ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolationALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolationOn the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)
Яндекс.Метрика