OpenAI's $500 billion AI infrastructure: Why NVIDIA gpu scaling is broken?

Every major AI lab is spending billions on GPUs. But there's a number almost
nobody discusses publicly: the efficiency rate of those clusters.

When Meta trained Llama 3.1, one of the most advanced AI models ever
released, they achieved an MFU (Model FLOPs Utilization) of just 38–43%.
That means more than half the compute they paid for was not doing useful work.

In this video I break down exactly why this happens, and why it gets WORSE as clusters get bigger.
⚡ Amdahl's Law and why GPU cluster efficiency collapses at scale
⚡ The Straggler Problem: why 100,000 GPUs run at the speed of the slowest 1%
⚡ MFU explained: the single number that should change how you evaluate AI investments
⚡ Why NVIDIA caps NVLink at exactly 72 GPUs (power integrity — not just networking)
⚡ CoWoS packaging yield: why you manufacture 100 chips and ship only 55
⚡ Where the real AI infrastructure value is moving: Arista, Broadcom, SK Hynix HBM4

📌 TIMESTAMPS
0:00 — The $500B Problem
0:55 — Who I Am and Why This Matters
1:28 — Chapter 1: Amdahl's Law & The Coordination Problem
2:42 — Insider Insight 1: Why Interconnect Gets the Leftovers
3:32 — Chapter 2: The Straggler Problem
4:22 — Chapter 3: MFU — The Real Efficiency Number
5:42 — Insider Insight 2: The Real Reason NVLink Caps at 72
6:32 — Insider Insight 3: The Yield Math Nobody Discusses
7:18 — Investor Angle: Who Profits From This
8:02 — Conclusion: We Solved Compute. Now We're Stuck On Coordination.
——————————————————————

🔔 New videos every week — semiconductor geopolitics, AI hardware, chip war investing.

──────────────────────────────
I am a semiconductor engineer, not a financial advisor.
Nothing in this video constitutes investment advice. There has been use of my own digital twin to present the information but the information research, analysis, script, and narration are entirely by me.
All analysis is based on publicly available information only, but it's analyzed from an Insider point of view rather than Journalist.
——————————————————————

#GPU #AI #semiconductor #NVIDIA #AIinfrastructure

Видео OpenAI's $500 billion AI infrastructure: Why NVIDIA gpu scaling is broken? канала AI Chip Insider