LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does. Dat from Arize AI walks through what observability actually means when the system you are debugging is nondeterministic and the execution path changes with every run.

The talk covers the five flavors of eval signal (LLM as judge, human feedback, golden datasets, deterministic checks, business metrics), what scope to run them at (single span, multispan, trajectory, session), and where this is heading. Arize Phoenix is open source, runs as a single container, no Kubernetes required. The enterprise product adds an AI layer called Alex that scans traces, surfaces high latency and errors, and creates evals automatically. The stated goal: automate you out of the observability loop entirely.

Speaker info:
- https://www.linkedin.com/in/datdarylngo/
- https://x.com/dat_attacked

Видео LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize канала AI Engineer

ai ai engineer ai engineering software development tech startups software architecture machine learning

Комментарии отсутствуют

Информация о видео

7 июня 2026 г. 23:00:06

00:16:32

AI Engineer

Теги

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

LLM Observability, Evaluation, Experimentation Platform — Dat Ngo, Arize

Brian Balfour: The #1 Question Every AI Product Manager Must Answer

Building Protected MCP Servers — Den Delimarsky and Julia Kasper, MCP Steering Committee & Microsoft

The Knowledge Graph Mullet: Trimming GraphRAG Complexity - William Lyon

Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza

Judge the Judge: Building LLM Evaluators That Actually Work with GEPA — Mahmoud Mabrouk, Agenta AI

The State of MCP observability: Observable.tools — Alex Volkov and Benjamin Eckel, W&B and Dylibso

Contact Center Voice AI: Low-Latency Intelligence Extraction from Messy Audio Streams — Dippu Singh

RAG at scale: production ready GenAI apps with Azure AI Search

Unlocking Africa's Potential with AI — Thabang Ledwaba

Machines of Buying and Selling Grace - Adam Behrens, New Generation

GPU Cloud Deployment Without Leaving Your IDE — Audry Hsu, RunPod

Veo 3 for Developers — Paige Bailey, Google DeepMind

The Bitter Layout or: How I Learned to Love the Model Picker — Maximillian Piras, Yutori

Beyond Transcription: Building Voice AI That Understands Conversations — Hervé Bredin, pyannoteAI

Task Fidelity Scaling Laws — Kobie Crawdord, Snorkel

Bending a Public MCP Server Without Breaking It — Nimrod Hauser, Baz

Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation

From Copilot to Colleague: Trustworthy Agents for High-Stakes - Joel Hron, CTO Thomson Reuters

Context Platform Engineering to Reduce Token Anxiety — Val Bercovici, WEKA

Revenue Engineering: How to Price (and Reprice) Your AI Product — Kshitij Grover, Orb

Infra that fixes itself, thanks to coding agents — Mahmoud Abdelwahab, Railway