Загрузка...

Fast AI inference on World’s Most Powerful AI Workstation GPUs with 2x NVIDIA RTX PRO 6000 Blackwell

I'm putting two brand new NVIDIA RTX 6000 Blackwell GPUs to the test! With a combined 192GB of VRAM, is this the ultimate rig for running massive AI models locally? Watch as I unbox, install, and benchmark this absolute beast of a workstation.
In this video, we'll see how this dual-GPU setup handles some of the largest and most powerful open-source Large Language Models (LLMs). We test the 235 billion parameter Qwen 1.5 model completely offloaded to VRAM, and then push the limits with huge Mixture-of-Experts (MoE) models like DeepSeek and Llama 4 Maverick using Llama.cpp. We'll look at tokens per second, prompt processing speed, power consumption, and the quality of the AI-generated code. Is all this power worth it? Let's find out.

0:00 - Unboxing the second RTX 6000 Blackwell GPU
1:25 - The "VOID" Tamper-Proof Seal Explained
4:21 - First Boot with Dual Blackwell GPUs
5:24 - nvidia-smi Confirms 192GB VRAM!
6:02 - Full System Specs Overview
8:01 - Test 1: Qwen 3 235B (Fully GPU Offloaded)
10:55 - Qwen 3 235B Loaded - Insane Performance!
12:18 - Qwen 3 235B Benchmark: 58 tokens/sec
18:21 - Qwen 3 235B Pushing the Limit: 128k Context Test
21:14 - Test 2: DeepSeek MoE Model (Partial Offload)
26:43 - Experimenting with Layer Offloading
31:29 - DeepSeek Benchmark & Power Draw
35:27 - DeepSeek's Impressive Snake Game
41:35 - DeepSeek Performance Results (12 tokens/sec)
44:27 - Test 3: DeepSeek on iklama.cpp (IQ3 Quant)
59:36 - iklama.cpp Performance Results (15 tokens/sec)
1:08:31 - Test 4: Llama 4 Maverick MoE Model
1:20:22 - Maverick Performance Results (57 tokens/sec!)
1:26:19 - Final Thoughts & Is It Worth It?

🖥️ System Specifications:
GPUs: 2x NVIDIA RTX 6000 (Blackwell Architecture) - 96GB VRAM Total
CPU: Intel Xeon Sapphire Rapids (QYFS Engineering Sample) - 56 Cores / 112 Threads
Motherboard: Asus WS Pro W790 E-Sage
RAM: 512GB DDR5 ECC @ 4800MHz (Octa-Channel)
OS: Ubuntu 24.04

Models & Software Tested:
Framework: Llama.cpp & iklama.cpp
UI: Open Web UI
Models:
Qwen 3 235B (Q4_K_M)
DeepSeek R1 MoE 685B (Q4_K_M and IQ3 Quant)
Llama 4 Maverick (MoE)

What do you think of this setup? Is 192GB of VRAM overkill, or the future of local AI? Let me know what you would do with this much power in the comments below!

#NVIDIA #Blackwell #RTX6000 #AI #LocalLLM #DeepLearning #PCBuilding #TechReview #Llamacpp #Qwen #DeepSeek #AIbenchmark

Видео Fast AI inference on World’s Most Powerful AI Workstation GPUs with 2x NVIDIA RTX PRO 6000 Blackwell канала Mukul Tripathi
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять