- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
Evaluating AI Agents: Building Reliable AI Applications with Kotlin & Spring AI
In this talk from the Kotlin Meetup Rotterdam, senior software engineer and trainer Peter explains why AI agent applications fail in production so often — and what you can do about it. While traditional software relies on deterministic systems with clear pass/fail tests, AI agents are inherently probabilistic: LLMs hallucinate, embedding models return inconsistent results, and tool calls aren't guaranteed. That demands a completely new approach to quality assurance.
Peter introduces Eval-Driven Development (EDD) as the answer: a methodology that brings TDD into the AI era. Using live code in Kotlin and Spring AI, he demonstrates how to set up an evaluation harness, define quality criteria, use LLM-as-Judge, and feed production data back into your test suite. The talk also covers observability (Langfuse), red teaming, and user feedback loops.
00:00 Introduction — who is Peter?
02:30 Why AI projects fail: stats, the vending machine benchmark & the sorcerer's apprentice
06:20 Traditional testing vs. AI agents: deterministic vs. probabilistic systems
13:30 Introducing Eval-Driven Development (EDD): accuracy, cost & latency
16:20 Step 1: Define goals, users, scenarios and a Minimum Viable Evaluation
21:20 Live demo: Kotlin eval framework — contains evaluator & LLM-as-Judge
26:00 Live demo: advanced evaluators — RAG, hallucination, tool calls & conversation simulation
35:10 Calibrating judges: off-the-shelf vs. manual labeling
36:00 Security: red teaming and LLM vulnerabilities
37:20 In production: observability (Langfuse), monitoring & user feedback loops
41:20 Conclusion: build your harness from day one
42:40 Q&A: go-live baseline, red teaming costs, judge bias & GDPR
Видео Evaluating AI Agents: Building Reliable AI Applications with Kotlin & Spring AI канала Maqqie
Peter introduces Eval-Driven Development (EDD) as the answer: a methodology that brings TDD into the AI era. Using live code in Kotlin and Spring AI, he demonstrates how to set up an evaluation harness, define quality criteria, use LLM-as-Judge, and feed production data back into your test suite. The talk also covers observability (Langfuse), red teaming, and user feedback loops.
00:00 Introduction — who is Peter?
02:30 Why AI projects fail: stats, the vending machine benchmark & the sorcerer's apprentice
06:20 Traditional testing vs. AI agents: deterministic vs. probabilistic systems
13:30 Introducing Eval-Driven Development (EDD): accuracy, cost & latency
16:20 Step 1: Define goals, users, scenarios and a Minimum Viable Evaluation
21:20 Live demo: Kotlin eval framework — contains evaluator & LLM-as-Judge
26:00 Live demo: advanced evaluators — RAG, hallucination, tool calls & conversation simulation
35:10 Calibrating judges: off-the-shelf vs. manual labeling
36:00 Security: red teaming and LLM vulnerabilities
37:20 In production: observability (Langfuse), monitoring & user feedback loops
41:20 Conclusion: build your harness from day one
42:40 Q&A: go-live baseline, red teaming costs, judge bias & GDPR
Видео Evaluating AI Agents: Building Reliable AI Applications with Kotlin & Spring AI канала Maqqie
Kotlin Spring AI AI agents eval-driven development EDD LLM evaluation LLM-as-Judge AI testing probabilistic systems RAG hallucination detection red teaming Langfuse observability Spring Boot AI reliability AI in production agent evaluation test harness Kotlin meetup Rotterdam AI development generative AI LLM tool calling MCP vector database embedding models AI quality assurance software engineering
Комментарии отсутствуют
Информация о видео
23 ч. 32 мин. назад
00:52:41
Другие видео канала





















