NVIDIA DGX Spark: From “Inference Box” to Dev Rig (What It Actually Is) | Ep 2

Everyone keeps calling NVIDIA DGX Spark an “inference box”… but in practice, it behaves more like a dev rig.
In Ep 2 of Domesticating AI, we break down what Spark is actually good for (AI dev + fine-tuning) vs what it isn’t (a magical drop-in inference server), and why unified memory changes the whole experience for local AI.

What we cover

Training vs inference (and why “inference server” gets misused)

What unified memory changes for model loading + workflows

The “gateway stack”: Ollama + Open WebUI

When you outgrow turnkey UIs and need more control (sampling, behavior, workflows)

Why Spark shines for fine-tuning (QLoRA/Unsloth-style workflows)

Homelab reality: Docker “recipes,” troubleshooting, and why you need friends

Remote access done safely: Tailscale

Cloud vs home economics (when cloud is cheaper… and when it explodes)

Why accelerator workloads can be painful in “Kubernetes everything” land

Links & Resources
NVIDIA / DGX Spark

DGX Spark product page: https://www.nvidia.com/en-us/products/workstations/dgx-spark/

Start building on Spark (recipes + docs hub): https://build.nvidia.com/spark

NIM on Spark (playbook): https://build.nvidia.com/spark/nim-llm

Local AI runners + UIs

Ollama: https://ollama.com/

Open WebUI (GitHub): https://github.com/open-webui/open-webui

Open WebUI docs: https://docs.openwebui.com/

llama.cpp: https://github.com/ggml-org/llama.cpp

LM Studio: https://lmstudio.ai/

vLLM: https://github.com/vllm-project/vllm

Jan: https://jan.ai/

Image / Workflow tools

ComfyUI: https://github.com/Comfy-Org/ComfyUI

AUTOMATIC1111 SD WebUI: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Unsloth: https://github.com/unslothai/unsloth

Networking / Remote access

Tailscale: https://tailscale.com/

Cloud GPU alternatives (mentioned)

Runpod pricing: https://www.runpod.io/pricing

Modal pricing: https://modal.com/pricing

Hosts

Miriah Peterson (Host): Miriah Peterson is a software engineer, Go educator, and community builder focused on production-first AI—treating LLM systems like real software with real users. She runs SoyPete Tech (streams + writing + open-source projects) and stays active in the Utah dev community through meetups and events, with a practical focus on shipping local and cloud AI systems.
Connect:

SoyPete Tech (YouTube): https://www.youtube.com/@SoyPete_Tech

SoyPete Tech (Substack): https://soypetetech.substack.com/

LinkedIn: https://www.linkedin.com/in/miriah-peterson-35649b5b/

Matt Sharp (Host): Matt Sharp is an AI Engineer and Strategist for a tech consulting firm and co-author of LLMs in Production. He’s a recovering data scientist and MLOps expert with 10+ years of experience operationalizing ML systems in production. Matt also teaches a graduate-level MLOps-in-production course at Utah State University as an adjunct professor. You can find him on Substack (Data Pioneer), LinkedIn, and on his other podcast, the Learning Curve.
Connect:

Data Pioneer (Substack): https://thedatapioneer.substack.com/

Chris Brousseau (Host): Chris Brousseau is a linguist by training and an NLP practitioner by trade, with a career spanning linguistically informed NLP, modern LLM systems, and MLOps practices. He’s co-author of LLMs in Production and is currently VP of AI at VEOX. You can find him as IMJONEZZ (two Z’s) on YouTube, GitHub, and on LinkedIn.
Connect:

YouTube (IMJONEZZ): https://www.youtube.com/channel/UCPtkaw_x97yP4WevW7axk0g

LinkedIn: https://www.linkedin.com/in/chris-brousseau/en

📘 LLMs in Production (Matt Sharp & Chris Brousseau): https://www.manning.com/books/llms-in-production

Subscribe + Community

If you’re building local AI at home (or trying to), drop your setup in the comments:
GPU/CPU | RAM | Runner (Ollama/llama.cpp/vLLM) | Model + quant | Use case

And don’t forget to like, subscribe, and comment — it helps the show a ton.

Видео NVIDIA DGX Spark: From “Inference Box” to Dev Rig (What It Actually Is) | Ep 2 канала Domesticating AI

Комментарии отсутствуют