Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Fast AI inference on World’s Most Powerful AI Workstation GPUs with 2x NVIDIA RTX PRO 6000 Blackwell

I'm putting two brand new NVIDIA RTX 6000 Blackwell GPUs to the test! With a combined 192GB of VRAM, is this the ultimate rig for running massive AI models locally? Watch as I unbox, install, and benchmark this absolute beast of a workstation.
In this video, we'll see how this dual-GPU setup handles some of the largest and most powerful open-source Large Language Models (LLMs). We test the 235 billion parameter Qwen 1.5 model completely offloaded to VRAM, and then push the limits with huge Mixture-of-Experts (MoE) models like DeepSeek and Llama 4 Maverick using Llama.cpp. We'll look at tokens per second, prompt processing speed, power consumption, and the quality of the AI-generated code. Is all this power worth it? Let's find out.

0:00 - Unboxing the second RTX 6000 Blackwell GPU
1:25 - The "VOID" Tamper-Proof Seal Explained
4:21 - First Boot with Dual Blackwell GPUs
5:24 - nvidia-smi Confirms 192GB VRAM!
6:02 - Full System Specs Overview
8:01 - Test 1: Qwen 3 235B (Fully GPU Offloaded)
10:55 - Qwen 3 235B Loaded - Insane Performance!
12:18 - Qwen 3 235B Benchmark: 58 tokens/sec
18:21 - Qwen 3 235B Pushing the Limit: 128k Context Test
21:14 - Test 2: DeepSeek MoE Model (Partial Offload)
26:43 - Experimenting with Layer Offloading
31:29 - DeepSeek Benchmark & Power Draw
35:27 - DeepSeek's Impressive Snake Game
41:35 - DeepSeek Performance Results (12 tokens/sec)
44:27 - Test 3: DeepSeek on iklama.cpp (IQ3 Quant)
59:36 - iklama.cpp Performance Results (15 tokens/sec)
1:08:31 - Test 4: Llama 4 Maverick MoE Model
1:20:22 - Maverick Performance Results (57 tokens/sec!)
1:26:19 - Final Thoughts & Is It Worth It?

🖥️ System Specifications:
GPUs: 2x NVIDIA RTX 6000 (Blackwell Architecture) - 96GB VRAM Total
CPU: Intel Xeon Sapphire Rapids (QYFS Engineering Sample) - 56 Cores / 112 Threads
Motherboard: Asus WS Pro W790 E-Sage
RAM: 512GB DDR5 ECC @ 4800MHz (Octa-Channel)
OS: Ubuntu 24.04

Models & Software Tested:
Framework: Llama.cpp & iklama.cpp
UI: Open Web UI
Models:
Qwen 3 235B (Q4_K_M)
DeepSeek R1 MoE 685B (Q4_K_M and IQ3 Quant)
Llama 4 Maverick (MoE)

What do you think of this setup? Is 192GB of VRAM overkill, or the future of local AI? Let me know what you would do with this much power in the comments below!

#NVIDIA #Blackwell #RTX6000 #AI #LocalLLM #DeepLearning #PCBuilding #TechReview #Llamacpp #Qwen #DeepSeek #AIbenchmark

Видео Fast AI inference on World’s Most Powerful AI Workstation GPUs with 2x NVIDIA RTX PRO 6000 Blackwell канала Mukul Tripathi

NVIDIA Blackwell RTX6000 AI LocalLLM DeepLearning PCBuilding TechReview Llama.cpp Qwen DeepSeek AIbenchmark

Информация о видео

22 июня 2025 г. 20:00:58

01:26:58

Mukul Tripathi

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

Fast AI inference on World’s Most Powerful AI Workstation GPUs with 2x NVIDIA RTX PRO 6000 Blackwell

Build an AI-Driven Kafka System Locally: Docker & CLI Made Easy!

Add Tool Calling to fast but broken Local LLM Servers using LangGraph - Ik_Llama & KTransformers Fix

LangGraph: AI Agents, Basics, Conditional Edges, and Agentic Flows with OLLAMA and ChatOpenAI

Turn NVIDIA Jetson Orin Nano Super into an AI Brain for Anki Vector Robot – Fun Robotics Project!

Pittsburgh light up night 2011

Terminal as AI Research Assistant (Free GitHub Repo)

Building an AI-Powered Chat App: Deep Dive into Request-Response & State Management | Part 3

Secure AI Development - 02: Agent-Based Programming in Dev Containers with CrewAI and Ollama

11. Online Courses - Spring Web MVC Flow Tieing Controllers and RequestMapping Annotation

Building an AI-Powered Chat App: Final Overview & Terminal vs. Web App | Part 5

Secure AI Development - 03: Integrating Poetry, Dev Containers with AI tools

02. Online Courses - Tools Used

You're Using AI Agents Wrong! Here's the simple fix

08. Online Courses - Deployment Descripter Spring MVC Dispatcher Servlet Spring Data JPA Hibernate I

Running Deepseek R1 Distills on Apple Silicon: MacBook M3 Pro vs M4 Max on Ollama

AI Workflows with Ollama and Prefect

1v1 texasdude11 vs Harsha - Arabia Hun War (Game 1)

Koda's new bone

Building an AI-Powered Chat App: Understanding Tool/Function Calling in OpenAI and Ollama | Part 4

03. Online Courses - Technologies Used