Загрузка страницы

Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained)

This paper proposes SimCLRv2 and shows that semi-supervised learning benefits a lot from self-supervised pre-training. And stunningly, that effect gets larger the fewer labels are available and the more parameters the model has.

OUTLINE:
0:00 - Intro & Overview
1:40 - Semi-Supervised Learning
3:50 - Pre-Training via Self-Supervision
5:45 - Contrastive Loss
10:50 - Retaining Projection Heads
13:10 - Supervised Fine-Tuning
13:45 - Unsupervised Distillation & Self-Training
18:45 - Architecture Recap
22:25 - Experiments
34:15 - Broader Impact

Paper: https://arxiv.org/abs/2006.10029
Code: https://github.com/google-research/simclr

Abstract:
One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLR), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9\% ImageNet top-1 accuracy with just 1\% of the labels (≤13 labeled images per class) using ResNet-50, a 10× improvement in label efficiency over the previous state-of-the-art. With 10\% of labels, ResNet-50 trained with our method achieves 77.5\% top-1 accuracy, outperforming standard supervised training with all of the labels.

Authors: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher

Видео Big Self-Supervised Models are Strong Semi-Supervised Learners (Paper Explained) канала Yannic Kilcher
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
20 июня 2020 г. 19:03:33
00:37:31
Другие видео канала
SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)Supervised Contrastive LearningSupervised Contrastive LearningImage GPT: Generative Pretraining from Pixels (Paper Explained)Image GPT: Generative Pretraining from Pixels (Paper Explained)Facebook Research - Unsupervised Translation of Programming LanguagesFacebook Research - Unsupervised Translation of Programming LanguagesTransformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)BYOL: Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (Paper Explained)Ilya Sutskever - The Power of Large Scale Generative Models and RLIlya Sutskever - The Power of Large Scale Generative Models and RLSynthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search (Paper Explained)DeepMind x UCL | Deep Learning Lectures | 8/12 |  Attention and Memory in Deep LearningDeepMind x UCL | Deep Learning Lectures | 8/12 | Attention and Memory in Deep LearningDynamics-Aware Unsupervised Discovery of Skills (Paper Explained)Dynamics-Aware Unsupervised Discovery of Skills (Paper Explained)On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)Set Distribution Networks: a Generative Model for Sets of Images (Paper Explained)Set Distribution Networks: a Generative Model for Sets of Images (Paper Explained)Linformer: Self-Attention with Linear Complexity (Paper Explained)Linformer: Self-Attention with Linear Complexity (Paper Explained)iMAML: Meta-Learning with Implicit Gradients (Paper Explained)iMAML: Meta-Learning with Implicit Gradients (Paper Explained)Planning to Explore via Self-Supervised World Models (Paper Explained)Planning to Explore via Self-Supervised World Models (Paper Explained)An Introduction to Graph Neural Networks: Models and ApplicationsAn Introduction to Graph Neural Networks: Models and ApplicationsA bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)A bio-inspired bistable recurrent cell allows for long-lasting memory (Paper Explained)How I Read a Paper: Facebook's DETR (Video Tutorial)How I Read a Paper: Facebook's DETR (Video Tutorial)FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Яндекс.Метрика