- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
How To Debug AI Agents: Tracing, Observability & Evals
When you build an AI agent using instructions, LLMs, and tools, it quickly becomes a black box. Your traditional unit tests pass, yet silent failures sneak straight through to your users in production. In this technical walkthrough, we break down how to achieve full production visibility using open-source observability, agent tracing, and non-deterministic evaluations (evals).
Using a real-world procurement-approval agent built with the Microsoft Agent Framework and Azure OpenAI on Microsoft Foundry, you’ll discover how to transition from guessing to data-driven orchestration.
Learn how to:
Implement OpenInference: Extend standard OpenTelemetry semantic conventions to capture AI-specific data like tool calls, tokens, prompt variations, and exact costs.
Visualize with Phoenix: Stream live traces into the free, open-source AI observability platform to track complex multi-turn workflows.
Deploy LLM-as-a-Judge: Construct robust grounding checks using an AI judge to evaluate agent decisions at scale when manual human validation is impossible.
Automate Self-Improving Loops: Leverage evaluation harnesses alongside coding agents (like Claude Code and Copilot CLI) to systematically iterate on prompts and watch your pass rate climb from 40% to 90%.
Chapters:
00:00 Why your AI agent is a black box
00:26 The example: a procurement agent in Microsoft Agent Framework
02:17 Why OpenTelemetry alone isn't enough for AI
02:49 OpenInference: OTEL semantic conventions for agents
04:43 Reading a real agent trace in Phoenix
08:40 The harder question: is your agent actually working?
10:43 Evals 101: code evals vs. LLM-as-a-judge
12:10 Building a grounding-check judge
14:00 Reading the eval results (we only pass 40% of the time)
15:48 Experiments: swap the model, watch the score change
17:21 Phoenix AI skills and self-improving agent loops
Resources:
🔬 Phoenix (open source): https://phoenix.arize.com
🔗 Arize AX: https://arize.com
📖 OpenInference: https://github.com/Arize-ai/openinference
📖 Phoenix docs: https://docs.arize.com/phoenix
Got an agent that's been driving you nuts in production? Drop the specific failure mode in the comments below—our engineering team reads and reviews every single one.
If this deep dive leveled up your AI infrastructure toolkit, make sure to like, subscribe, and hit the bell for more technical agent engineering videos: https://www.youtube.com/@arizeai?sub_confirmation=1
#AIEngineering #AIAgents #AgentEvals
Видео How To Debug AI Agents: Tracing, Observability & Evals канала Arize AI
Using a real-world procurement-approval agent built with the Microsoft Agent Framework and Azure OpenAI on Microsoft Foundry, you’ll discover how to transition from guessing to data-driven orchestration.
Learn how to:
Implement OpenInference: Extend standard OpenTelemetry semantic conventions to capture AI-specific data like tool calls, tokens, prompt variations, and exact costs.
Visualize with Phoenix: Stream live traces into the free, open-source AI observability platform to track complex multi-turn workflows.
Deploy LLM-as-a-Judge: Construct robust grounding checks using an AI judge to evaluate agent decisions at scale when manual human validation is impossible.
Automate Self-Improving Loops: Leverage evaluation harnesses alongside coding agents (like Claude Code and Copilot CLI) to systematically iterate on prompts and watch your pass rate climb from 40% to 90%.
Chapters:
00:00 Why your AI agent is a black box
00:26 The example: a procurement agent in Microsoft Agent Framework
02:17 Why OpenTelemetry alone isn't enough for AI
02:49 OpenInference: OTEL semantic conventions for agents
04:43 Reading a real agent trace in Phoenix
08:40 The harder question: is your agent actually working?
10:43 Evals 101: code evals vs. LLM-as-a-judge
12:10 Building a grounding-check judge
14:00 Reading the eval results (we only pass 40% of the time)
15:48 Experiments: swap the model, watch the score change
17:21 Phoenix AI skills and self-improving agent loops
Resources:
🔬 Phoenix (open source): https://phoenix.arize.com
🔗 Arize AX: https://arize.com
📖 OpenInference: https://github.com/Arize-ai/openinference
📖 Phoenix docs: https://docs.arize.com/phoenix
Got an agent that's been driving you nuts in production? Drop the specific failure mode in the comments below—our engineering team reads and reviews every single one.
If this deep dive leveled up your AI infrastructure toolkit, make sure to like, subscribe, and hit the bell for more technical agent engineering videos: https://www.youtube.com/@arizeai?sub_confirmation=1
#AIEngineering #AIAgents #AgentEvals
Видео How To Debug AI Agents: Tracing, Observability & Evals канала Arize AI
AI agents LLM observability agent evaluation Arize Phoenix OpenInference OpenTelemetry AI LLM LLM as a judge debugging AI agents Microsoft agent framework machine learning infrastructure AI engineering production LLM tracing semantic conventions AI LLM evaluation RAG observability self improving software Azure OpenAI monitoring open source Phoenix OSS prompt engineering Arize AI
Комментарии отсутствуют
Информация о видео
2 июня 2026 г. 19:00:39
00:19:27
Другие видео канала





















