MAMBA and State Space Models explained | SSM explained
We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs.
SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences!
AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBABEAN.
This video also comes in blog post format: 👉 https://open.substack.com/pub/aicoffeebreakwl/p/mamba-and-ssms-explained?r=r8s20&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael
Outline:
00:00 Mamba to replace Transformers!?
02:04 State Space Models (SSMs) – high level
03:09 State Space Models (SSMs) – more detail
05:45 Discretization step in SSMs
08:14 SSMs are fast! Here is why.
09:55 SSM training: Convolution trick
12:01 Selective SSMs
15:44 MAMBA Architecture
17:57 Mamba results
20:15 Building on Mamba
21:00 Do RNNs have a comeback?
21:42 AICoffeeBreak Merch
📄 Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023). https://arxiv.org/abs/2312.00752
📄 MoE-Mamba https://arxiv.org/abs/2401.04081
📄 Vision Mamba https://arxiv.org/abs/2401.09417
📄 MambaByte https://arxiv.org/abs/2401.13660
🕊️ Mamba rejected from ICLR: https://twitter.com/srush_nlp/status/1750526956452577486
📖 Prefix sum (Scan) with Cuda: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda
📺 Transformer explained: https://www.youtube.com/playlist?list=PLpZBeKTZRGPNdymdEsSSSod5YQ3Vu0sKY
Great resources to learn about Mamba:
📙 Mamba: https://jameschen.io/jekyll/update/2024/02/12/mamba.html
📕 The Annotated S4: https://srush.github.io/annotated-s4/
📘 Mamba The Easy Way: https://jackcook.com/2024/02/23/mamba.html
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
Join this channel to get access to perks:
https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Scientific advising by Mara Popescu
Video editing: Nils Trost
Music 🎵 : Sunny Days – Anno Domini Beats
Видео MAMBA and State Space Models explained | SSM explained канала AI Coffee Break with Letitia
SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences!
AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBABEAN.
This video also comes in blog post format: 👉 https://open.substack.com/pub/aicoffeebreakwl/p/mamba-and-ssms-explained?r=r8s20&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael
Outline:
00:00 Mamba to replace Transformers!?
02:04 State Space Models (SSMs) – high level
03:09 State Space Models (SSMs) – more detail
05:45 Discretization step in SSMs
08:14 SSMs are fast! Here is why.
09:55 SSM training: Convolution trick
12:01 Selective SSMs
15:44 MAMBA Architecture
17:57 Mamba results
20:15 Building on Mamba
21:00 Do RNNs have a comeback?
21:42 AICoffeeBreak Merch
📄 Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023). https://arxiv.org/abs/2312.00752
📄 MoE-Mamba https://arxiv.org/abs/2401.04081
📄 Vision Mamba https://arxiv.org/abs/2401.09417
📄 MambaByte https://arxiv.org/abs/2401.13660
🕊️ Mamba rejected from ICLR: https://twitter.com/srush_nlp/status/1750526956452577486
📖 Prefix sum (Scan) with Cuda: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda
📺 Transformer explained: https://www.youtube.com/playlist?list=PLpZBeKTZRGPNdymdEsSSSod5YQ3Vu0sKY
Great resources to learn about Mamba:
📙 Mamba: https://jameschen.io/jekyll/update/2024/02/12/mamba.html
📕 The Annotated S4: https://srush.github.io/annotated-s4/
📘 Mamba The Easy Way: https://jackcook.com/2024/02/23/mamba.html
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
Join this channel to get access to perks:
https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Scientific advising by Mara Popescu
Video editing: Nils Trost
Music 🎵 : Sunny Days – Anno Domini Beats
Видео MAMBA and State Space Models explained | SSM explained канала AI Coffee Break with Letitia
Показать
Комментарии отсутствуют
Информация о видео
17 февраля 2024 г. 19:22:21
00:22:27
Другие видео канала
Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED[Quiz] Eigenfaces, Domain adaptation, Causality, Manifold Hypothesis, Denoising Autoencoder[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR WorkshopData BAD | What Will it Take to Fix Benchmarking for NLU?Preparing for Virtual Conferences – 7 Tips for recording a good conference talkCan a neural network tell if an image is mirrored? – Visual ChiralityAI Coffee Break - Channel Trailer[Quiz] Interpretable ML, VQ-VAE w/o Quantization / infinite codebook, Pearson’s, PointCloudsWhat is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #ShortsAdding vs. concatenating positional embeddings & Learned positional encodingsTransformer in Transformer: Paper explained and visualized | TNTPre-training of BERT-based Transformer architectures explained – language and vision!Training learned optimizers: VeLO paper EXPLAINEDAdversarial Machine Learning explained! | With examples.Transformer LLMs are Turing Complete after all !?Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper ExplainedFNet: Mixing Tokens with Fourier Transforms – Paper ExplainedAI understanding language!? A roadmap to natural language understanding.Transformers explained | The architecture behind LLMsWhy ChatGPT fails | Language Model Limitations EXPLAINED