Загрузка...

AI Security 4.2: Human-in-the-Loop Controls for AI Agents - When to Block, When to Allow

A system prompt saying "ask before sending" is not a security control — it's a suggestion the model can be talked out of. This video shows how to build real approval gates in application code that AI agents cannot bypass, with risk-tier classification and patterns that prevent both catastrophic actions and alert fatigue.

In this video, you'll learn:

WHY HITL MATTERS
- Why prompt-level guards fail: the Freysa incident ($47K transferred after 482 failed social engineering attempts)
- The critical difference between prompt-level guards and application-level gates
- How CVE-2025-32711 (EchoLeak) exploited missing checkpoints in Microsoft 365 Copilot

RISK TIER CLASSIFICATION
- Tier 1 (Autonomous): read-only, sandboxed actions — execute and log
- Tier 2 (Log & Notify): reversible writes — execute with audit trail and real-time notification
- Tier 3 (Require Confirmation): irreversible actions — hard block until human approves
- When uncertain, always classify up

IMPLEMENTATION PATTERNS
- Per-tool confirmation callbacks with "approve with changes" support
- Approval queues for background/async agents with TTL expiration
- Dry run mode for batch operations (the Terraform "plan before apply" model)
- Vulnerable vs. secure email-sending code comparison

PREVENTING ALERT FATIGUE
- Why Tier 3 actions must be rare — if confirmations are frequent, your tier classification is wrong
- Making high-risk prompts visually distinct from routine notifications
- Progressive trust: promoting actions only via human administrator, never self-promotion

LIMITATIONS
- HITL does not replace Least Privilege — Tier 1 reads can still exfiltrate data silently
- Confirmation quality depends on showing predicted impact, not just action names
- Defense in depth remains necessary alongside HITL

This is Section 4.2 in the AI Agent Security series. Previous: Section 4.1 — Excessive Agency. Next: Section 4.3 — Multi-Agent Trust.

#AISecurity #HumanInTheLoop #HITL #AgentSecurity #CVE202532711 #EchoLeak #PromptInjection #SecureCoding #DevSecOps #AIAgents #MicrosoftCopilot #Freysa #RiskManagement #ApprovalGates #AlertFatigue #LLMSecurity #OWASP #ApplicationSecurity #AIGovernance #CyberSecurity

Видео AI Security 4.2: Human-in-the-Loop Controls for AI Agents - When to Block, When to Allow канала WiseBuilder
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять