Building Agents and Eval Harnesses with Local LLMs with Ravin Kumar (Google DeepMind)

Our Q&A will happen on Discord so come join us there and ask questions live (#workshops channel)!
https://discord.gg/W2z3D8uA

The industry is moving rapidly beyond simple chatbots toward autonomous agents: systems that don’t just talk, but actually act. But building an agent that reliably triggers the right tool at the right time is a massive engineering challenge. It requires moving past the "vibe check" era of AI and into a world of rigorous evaluation, an understanding of the subtle gradient between a single function call and a multi-step agent, and sometimes even specialized fine-tuning for on-device applications.

Join host Hugo Bowne-Anderson for an inside look at the frontier of agents, evals, tool-use, and local LLMs with Ravin Kumar, a Researcher at Google DeepMind. Ravin has been on the front lines of developing the Gemma model family, including the newly released FunctionGemma. Whether you are building massive agentic loops with frontier models like Gemini 3 Flash or deploying specialized, on-device models to mobile hardware, this session provides the technical blueprint for making agents production-ready.

In this live, code-forward workshop, we will dive into the practicalities of evaluation and fine-tuning for tool-calling models. We’ll move from basic prompts to building specialized agents capable of executing complex mobile actions.

Hugo's course: Building LLM Applications for Data Scientists and Software Engineers —
https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=vgyt (25% off for viewers).

00:00 Overview of the Workshop
03:10 Ravin Kumar, DeepMind, & Putting Open-Weight Models
on Rocket-ships in Outer Space
05:13 Deep Dive into FunctionGemma
07:38 Loading and Setting Up the Model
30:40 Understanding the Mobile Actions Dataset
39:02 Running AI Models on Edge Devices
39:21 Dolphin Communication and AI
40:18 Applications of AI in Wildlife Research
40:46 Understanding Data Complexities
41:05 Hugging Face Dataset Format
41:26 Gemma Model Formatting
42:39 Context Engineering in AI Models
44:27 Deconstructing Prompts and Control Tokens
46:44 AI Agents & Function Calling in AI Models
54:05 Evaluating AI Models
01:08:54 Vibe Testing and Model Calibration
01:14:09 Guardrails and Security in AI Models
01:17:23 Programming and Unit Tests
01:17:46 Deploying AI Models in Different Environments
01:18:49 Failure Analysis and Evaluation
01:19:33 Functionality and Multi-Step Calls
01:24:50 Training AI Models
01:25:52 Fine-Tuning and Model Deployment
01:31:23 Edge Device Optimization
01:32:58 Q&A and Future Insights

Видео Building Agents and Eval Harnesses with Local LLMs with Ravin Kumar (Google DeepMind) канала Vanishing Gradients

Комментарии отсутствуют