Загрузка...

DeepSeek's Biggest Leap Yet: V4 Delivers the Impossible

DeepSeek just dropped V4: a 1.6 trillion parameter open-source model that handles 1 million tokens of context using only 10% of the memory of its predecessor. MIT licensed, 6× cheaper than GPT-5.5, and the architecture inside it might be the biggest shift in transformer design since FlashAttention.

Paper: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Timestamps:
00:00 The 90% memory cut
00:29 From V3 to V4: the architectural lineage
01:02 The quadratic wall
01:45 The three architectural pillars
02:32 Hybrid Attention deep dive: CSA + HCA
03:22 mHC and Muon: stabilizing 1.6T parameters
04:09 On-Policy Distillation
04:58 The efficiency leap: 27% FLOPs, 10% KV cache
05:42 Benchmarks: SimpleQA, Codeforces, MRCR
06:36 Head-to-head vs GPT-5.5 and Gemini 3.1 Pro
07:20 What 1M tokens unlocks in production
08:09 Open-source is closing the gap
08:54 Outro

Видео DeepSeek's Biggest Leap Yet: V4 Delivers the Impossible канала Sebastian Buzdugan

AI DeepLearning DeepSeek DeepSeekV4 LLM MachineLearning MoE OpenSource Transformer ai ai news deepseek llm llm explained ml moe open source paper review tech review transformer

Комментарии отсутствуют

Информация о видео

9 мая 2026 г. 16:00:53

00:09:08

Sebastian Buzdugan

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Поделиться

Другие видео канала

We're Teaching Our AI To Read! #shorts

Mistral's Monster Model: Open Source Game Changer #shorts

there will be no single AI winner

NVIDIA Nemotron 3 Nano Explained: The Most Open Large Language Model Ever Released

Devs: Stop Reading Docs, Start Building NOW! #shorts

How LLMs Actually Work (Explained in 7 Minutes)

Gemini 3 Flash: More Intelligence, Lower Cost! #shorts

LangChain 2026 Day 10: Building a Full Stack AI Agent (Streamlit & LangGraph)

100x less energy, 3x more accurate

DeepSeek Locked Nvidia Out of V4

Nova 2.0 Pro: Cheaper Than Frontier Models #shorts

when AI starts building AI

Master AI Agents in 2026: The Complete LangChain Roadmap (Day 0)

OpenAI forced to hand over 20 million chats #Lawsuit #AI

What is an AI Agent? #shorts

LangChain 2026 Day 8: Adding Long-Term Memory (Episodic & Semantic)

Free Google GPUs in VS Code?! Game Changer!

Stanford's 2026 AI Report: What They Found

LangChain & Gemini 3 Pro: Biggest Update EVER! #shorts

why context windows changed LLM architecture

Career Growth in the Age of AI: The One Skill You Can't Automate

Все заметки Новая заметка Страницу в заметки

Страницу в закладки Мои закладки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

О Cookies Напомнить позже Принять