Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)
This paper dives into the intrinsics of the Lottery Ticket Hypothesis and attempts to shine some light on what's important and what isn't.
https://arxiv.org/abs/1905.01067
Abstract:
The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keeping the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results. Ablating these factors leads to new insights for why LT networks perform as well as they do. We show why setting weights to zero is important, how signs are all you need to make the reinitialized network train, and why masking behaves like training. Finally, we discover the existence of Supermasks, masks that can be applied to an untrained, randomly initialized network to produce a model with performance far better than chance (86% on MNIST, 41% on CIFAR-10).
Authors: Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Видео Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained) канала Yannic Kilcher
https://arxiv.org/abs/1905.01067
Abstract:
The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keeping the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varied significantly without impacting the overall results. Ablating these factors leads to new insights for why LT networks perform as well as they do. We show why setting weights to zero is important, how signs are all you need to make the reinitialized network train, and why masking behaves like training. Finally, we discover the existence of Supermasks, masks that can be applied to an untrained, randomly initialized network to produce a model with performance far better than chance (86% on MNIST, 41% on CIFAR-10).
Authors: Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Видео Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained) канала Yannic Kilcher
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://i.ytimg.com/vi/ZVVnvZdUMUk/default.jpg)
![An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)](https://i.ytimg.com/vi/TrdevFK_am4/default.jpg)
![Neural Architecture Search without Training (Paper Explained)](https://i.ytimg.com/vi/a6v92P0EbJc/default.jpg)
![](https://i.ytimg.com/vi/1Dw2Z_ZN4ls/default.jpg)
![AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning](https://i.ytimg.com/vi/BTLCdge7uSQ/default.jpg)
![[ML News] AI-generated patent approved | Germany gets an analog to OpenAI | ML cheats video games](https://i.ytimg.com/vi/SPOqoI0zOPQ/default.jpg)
![[ML News] EU regulates AI, China trains 1.75T model, Google's oopsie, Everybody cheers for fraud.](https://i.ytimg.com/vi/bw1kiLMQFKU/default.jpg)
![J. Frankle & M. Carbin: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://i.ytimg.com/vi/s7DqRZVvRiQ/default.jpg)
![Big Bird: Transformers for Longer Sequences (Paper Explained)](https://i.ytimg.com/vi/WVPE62Gk3EM/default.jpg)
![Very Basic Intro to Neural Networks](https://i.ytimg.com/vi/DG5-UyRBQD4/default.jpg)
![Group Normalization (Paper Explained)](https://i.ytimg.com/vi/l_3zj6HeWUE/default.jpg)
![Understanding ODDS for Instant Lottery Scratch Off Tickets - SOBERING VIDEO](https://i.ytimg.com/vi/kvdHy64baMw/default.jpg)
![[ML News] Facebook AI adapting robots | Baidu autonomous excavators | Happy Birthday EleutherAI](https://i.ytimg.com/vi/-cT-2xvaeks/default.jpg)
![How to Calculate the Odds of Winning the Lottery](https://i.ytimg.com/vi/_aBKxnUtIOk/default.jpg)
![[ML News] Hugging Face course | GAN Theft Auto | AI Programming Puzzles | PyTorch 1.9 Released](https://i.ytimg.com/vi/6_q9DbX35kk/default.jpg)
![Efficient and Modular Implicit Differentiation (Machine Learning Research Paper Explained)](https://i.ytimg.com/vi/8Oy7o3Yu-Xo/default.jpg)
![Imputer: Sequence Modelling via Imputation and Dynamic Programming](https://i.ytimg.com/vi/AU30czb4iQA/default.jpg)
![[ICML19 Talk] Making Convolutional Networks Shift-Invariant Again (06/2019)](https://i.ytimg.com/vi/HjewNBZz00w/default.jpg)
![A neurally plausible model learns successor representations in partially observable environments](https://i.ytimg.com/vi/KXEEqcwXn8w/default.jpg)
![The Lottery Ticket Hypothesis Explained!](https://i.ytimg.com/vi/LXm_6eq0Cs4/default.jpg)