- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Strix Halo: 128GB to Run Big AI Locally #AMD #LocalAI #AI #nvidia
A lunchbox-sized desktop is now running a 70-billion-parameter model fully offline — no data center, no monthly bill — and the chip that makes it possible is AMD's Ryzen AI Max+ 395, codename Strix Halo.
The breakthrough isn't raw compute, it's memory. Strix Halo is one APU — 16 Zen 5 cores, a 40-CU Radeon 8060S iGPU, and a 50-TOPS XDNA 2 NPU — fed by up to 128GB of unified LPDDR5X that the CPU and GPU share from a single pool, with around 96GB allocatable straight to the GPU. That smashes the consumer-GPU memory wall: an RTX 5080 caps at 16GB and a 4090 at 24GB, so genuinely large models simply don't fit on them. Here, they do.
The honest fine print every senior dev should know: AMD's headline "over 3x faster than an RTX 5080 on DeepSeek R1" is a first-party benchmark that only wins once the model is too big for the card to hold — it's a capacity win, not a compute win. And unified memory runs ~256 GB/s versus ~1TB/s on a discrete GPU, so you trade bandwidth to fit the model at all (Llama 3.3 70B ~14 tok/s; a 235B MoE in INT4 ~8 tok/s — runnable, not blazing).
Why it matters: a one-time ~$2,000 mini PC versus ChatGPT Pro at $200/mo or cloud H100s by the hour — and the model never phones home, which is the killer feature for healthcare, law, finance, and government. AMD isn't alone (NVIDIA's own 128GB DGX Spark ships full CUDA; Apple's done unified memory since the M1) — and NVIDIA still owns ~86% of the data center behind a 20-year CUDA moat. This isn't a coup. It's a new lane: local AI.
Could you ditch the cloud for a box on your desk? What model would you run first? Drop a comment — and watch it again, it hits different once you see the memory math.
#Shorts #AI #AMD #StrixHalo #LocalAI
Видео Strix Halo: 128GB to Run Big AI Locally #AMD #LocalAI #AI #nvidia канала The Concept Caviar
The breakthrough isn't raw compute, it's memory. Strix Halo is one APU — 16 Zen 5 cores, a 40-CU Radeon 8060S iGPU, and a 50-TOPS XDNA 2 NPU — fed by up to 128GB of unified LPDDR5X that the CPU and GPU share from a single pool, with around 96GB allocatable straight to the GPU. That smashes the consumer-GPU memory wall: an RTX 5080 caps at 16GB and a 4090 at 24GB, so genuinely large models simply don't fit on them. Here, they do.
The honest fine print every senior dev should know: AMD's headline "over 3x faster than an RTX 5080 on DeepSeek R1" is a first-party benchmark that only wins once the model is too big for the card to hold — it's a capacity win, not a compute win. And unified memory runs ~256 GB/s versus ~1TB/s on a discrete GPU, so you trade bandwidth to fit the model at all (Llama 3.3 70B ~14 tok/s; a 235B MoE in INT4 ~8 tok/s — runnable, not blazing).
Why it matters: a one-time ~$2,000 mini PC versus ChatGPT Pro at $200/mo or cloud H100s by the hour — and the model never phones home, which is the killer feature for healthcare, law, finance, and government. AMD isn't alone (NVIDIA's own 128GB DGX Spark ships full CUDA; Apple's done unified memory since the M1) — and NVIDIA still owns ~86% of the data center behind a 20-year CUDA moat. This isn't a coup. It's a new lane: local AI.
Could you ditch the cloud for a box on your desk? What model would you run first? Drop a comment — and watch it again, it hits different once you see the memory math.
#Shorts #AI #AMD #StrixHalo #LocalAI
Видео Strix Halo: 128GB to Run Big AI Locally #AMD #LocalAI #AI #nvidia канала The Concept Caviar
Комментарии отсутствуют
Информация о видео
6 ч. 52 мин. назад
00:02:24
Другие видео канала
