RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained)
Counting repeated actions in a video is one of the easiest tasks for humans, yet remains incredibly hard for machines. RepNet achieves state-of-the-art by creating an information bottleneck in the form of a temporal self-similarity matrix, relating video frames to each other in a way that forces the model to surface the information relevant for counting. Along with that, the authors produce a new dataset for evaluating counting models.
OUTLINE:
0:00 - Intro & Overview
2:30 - Problem Statement
5:15 - Output & Loss
6:25 - Per-Frame Embeddings
11:20 - Temporal Self-Similarity Matrix
19:00 - Periodicity Predictor
25:50 - Architecture Recap
27:00 - Synthetic Dataset
30:15 - Countix Dataset
31:10 - Experiments
33:35 - Applications
35:30 - Conclusion & Comments
Paper Website: https://sites.google.com/view/repnet
Colab: https://colab.research.google.com/github/google-research/google-research/blob/master/repnet/repnet_colab.ipynb
Abstract:
We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos.
Authors: Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Видео RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained) канала Yannic Kilcher
OUTLINE:
0:00 - Intro & Overview
2:30 - Problem Statement
5:15 - Output & Loss
6:25 - Per-Frame Embeddings
11:20 - Temporal Self-Similarity Matrix
19:00 - Periodicity Predictor
25:50 - Architecture Recap
27:00 - Synthetic Dataset
30:15 - Countix Dataset
31:10 - Experiments
33:35 - Applications
35:30 - Conclusion & Comments
Paper Website: https://sites.google.com/view/repnet
Colab: https://colab.research.google.com/github/google-research/google-research/blob/master/repnet/repnet_colab.ipynb
Abstract:
We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called RepNet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos.
Authors: Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Видео RepNet: Counting Out Time - Class Agnostic Video Repetition Counting in the Wild (Paper Explained) канала Yannic Kilcher
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![WHO ARE YOU? 10k Subscribers Special (w/ Channel Analytics)](https://i.ytimg.com/vi/eCH0M4wzKJs/default.jpg)
![Datasets for Data-Driven Reinforcement Learning](https://i.ytimg.com/vi/-h1KB8ps11A/default.jpg)
![Reinforcement Learning with Augmented Data (Paper Explained)](https://i.ytimg.com/vi/to7vCdkLi4s/default.jpg)
![The Odds are Odd: A Statistical Test for Detecting Adversarial Examples](https://i.ytimg.com/vi/sbKaUc0tPaY/default.jpg)
![Expire-Span: Not All Memories are Created Equal: Learning to Forget by Expiring (Paper Explained)](https://i.ytimg.com/vi/2PYLNHqxd5A/default.jpg)
![REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)](https://i.ytimg.com/vi/lj-LGrnh1oU/default.jpg)
![Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)](https://i.ytimg.com/vi/hv3UO3G0Ofo/default.jpg)
![[Classic] Playing Atari with Deep Reinforcement Learning (Paper Explained)](https://i.ytimg.com/vi/rFwQDDbYTm4/default.jpg)
![Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)](https://i.ytimg.com/vi/kP-dXK9JEhY/default.jpg)
![Gradient Origin Networks (Paper Explained w/ Live Coding)](https://i.ytimg.com/vi/v-ZxzTSpmk4/default.jpg)
![Perceiver: General Perception with Iterative Attention (Google DeepMind Research Paper Explained)](https://i.ytimg.com/vi/P_xeshTnPZg/default.jpg)
![PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)](https://i.ytimg.com/vi/nQDZmf2Yb9k/default.jpg)
![ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation](https://i.ytimg.com/vi/-Kgxv64aG3o/default.jpg)
![On the Measure of Intelligence by François Chollet - Part 1: Foundations (Paper Explained)](https://i.ytimg.com/vi/3_qGrmD6iQY/default.jpg)
![Listening to You! - Channel Update (Author Interviews)](https://i.ytimg.com/vi/cO1nSnsH_CQ/default.jpg)
![[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-X](https://i.ytimg.com/vi/fEKZC9mta8w/default.jpg)
![Growing Neural Cellular Automata](https://i.ytimg.com/vi/9Kec_7WFyp0/default.jpg)
![[ML News] DeepMind's Flamingo Image-Text model | Locked-Image Tuning | Jurassic X & MRKL](https://i.ytimg.com/vi/smUHQndcmOY/default.jpg)
![Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments (Review)](https://i.ytimg.com/vi/O_dJ31T01i8/default.jpg)
![AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control (Paper Explained)](https://i.ytimg.com/vi/P38FZrbNHV4/default.jpg)
![SupSup: Supermasks in Superposition (Paper Explained)](https://i.ytimg.com/vi/3jT1qJ8ETzk/default.jpg)