- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
How We Evaluate Large Language Models | Patrycja Cieplicka | LLMday Warsaw 2026 Q1
LLMday Warsaw 2026 Q1 - February 12
Grab your ticket for the next LLMday: https://www.llmday.com
Upcoming LLMday CFPs: https://cfp.ninja/?q=llmday&status=open&page=1
Chapters
00:00 Welcome & Speaker Intro: Evaluating Large Language Models
00:11 Two Blocks Overview: What We Build for Clients
00:36 LLM Work in E‑commerce: Adaptation, Evaluation & Optimization
01:29 Four Ways to Measure LLM Performance (Metrics Landscape)
02:24 Pros/Cons of Each Evaluation Method
03:34 Using Open-Source Benchmarks the Right Way
04:34 Benchmark Pitfalls: Overfitting, Setup Differences & Comparability
06:25 Don’t Trust Tiny Gains: Statistical Significance Checks
07:18 Building Your Own Eval: Core Principles for Real-World Apps
09:26 Evaluation-Driven Development: Iterate Evals and Models Together
10:18 Tuning the Evaluator: Human-Labeled Test Sets & Validator Drift
13:43 LLM-as-a-Judge Methods: Scoring vs Pairwise Comparisons
14:34 Prompting Best Practices for LLM Judges (and Avoiding Bias)
19:15 Wrap-Up: Keep Evals Robust, Practical, and Business-Focused
20:06 Q&A: User Feedback in Eval Frameworks + E‑commerce Use Cases
22:25 Final Thanks & Closing
Видео How We Evaluate Large Language Models | Patrycja Cieplicka | LLMday Warsaw 2026 Q1 канала LLMday
Grab your ticket for the next LLMday: https://www.llmday.com
Upcoming LLMday CFPs: https://cfp.ninja/?q=llmday&status=open&page=1
Chapters
00:00 Welcome & Speaker Intro: Evaluating Large Language Models
00:11 Two Blocks Overview: What We Build for Clients
00:36 LLM Work in E‑commerce: Adaptation, Evaluation & Optimization
01:29 Four Ways to Measure LLM Performance (Metrics Landscape)
02:24 Pros/Cons of Each Evaluation Method
03:34 Using Open-Source Benchmarks the Right Way
04:34 Benchmark Pitfalls: Overfitting, Setup Differences & Comparability
06:25 Don’t Trust Tiny Gains: Statistical Significance Checks
07:18 Building Your Own Eval: Core Principles for Real-World Apps
09:26 Evaluation-Driven Development: Iterate Evals and Models Together
10:18 Tuning the Evaluator: Human-Labeled Test Sets & Validator Drift
13:43 LLM-as-a-Judge Methods: Scoring vs Pairwise Comparisons
14:34 Prompting Best Practices for LLM Judges (and Avoiding Bias)
19:15 Wrap-Up: Keep Evals Robust, Practical, and Business-Focused
20:06 Q&A: User Feedback in Eval Frameworks + E‑commerce Use Cases
22:25 Final Thanks & Closing
Видео How We Evaluate Large Language Models | Patrycja Cieplicka | LLMday Warsaw 2026 Q1 канала LLMday
llmday large language models LLM conference 2026 AI conference artificial intelligence event generative AI GPT models foundation models enterprise AI AI for developers AI engineering machine learning conference prompt engineering AI agents retrieval augmented generation RAG AI infrastructure model fine tuning open source LLMs AI startups DevOps and AI MLOps AI product development future of AI machine learning artificial intelligence openai rag
Комментарии отсутствуют
Информация о видео
3 марта 2026 г. 20:55:31
00:22:35
Другие видео канала




















