AI Security 4.2: Human-in-the-Loop Controls for AI Agents - When to Block, When to Allow

A system prompt saying "ask before sending" is not a security control — it's a suggestion the model can be talked out of. This video shows how to build real approval gates in application code that AI agents cannot bypass, with risk-tier classification and patterns that prevent both catastrophic actions and alert fatigue.

In this video, you'll learn:

WHY HITL MATTERS
- Why prompt-level guards fail: the Freysa incident ($47K transferred after 482 failed social engineering attempts)
- The critical difference between prompt-level guards and application-level gates
- How CVE-2025-32711 (EchoLeak) exploited missing checkpoints in Microsoft 365 Copilot

RISK TIER CLASSIFICATION
- Tier 1 (Autonomous): read-only, sandboxed actions — execute and log
- Tier 2 (Log & Notify): reversible writes — execute with audit trail and real-time notification
- Tier 3 (Require Confirmation): irreversible actions — hard block until human approves
- When uncertain, always classify up

IMPLEMENTATION PATTERNS
- Per-tool confirmation callbacks with "approve with changes" support
- Approval queues for background/async agents with TTL expiration
- Dry run mode for batch operations (the Terraform "plan before apply" model)
- Vulnerable vs. secure email-sending code comparison

PREVENTING ALERT FATIGUE
- Why Tier 3 actions must be rare — if confirmations are frequent, your tier classification is wrong
- Making high-risk prompts visually distinct from routine notifications
- Progressive trust: promoting actions only via human administrator, never self-promotion

LIMITATIONS
- HITL does not replace Least Privilege — Tier 1 reads can still exfiltrate data silently
- Confirmation quality depends on showing predicted impact, not just action names
- Defense in depth remains necessary alongside HITL

This is Section 4.2 in the AI Agent Security series. Previous: Section 4.1 — Excessive Agency. Next: Section 4.3 — Multi-Agent Trust.

#AISecurity #HumanInTheLoop #HITL #AgentSecurity #CVE202532711 #EchoLeak #PromptInjection #SecureCoding #DevSecOps #AIAgents #MicrosoftCopilot #Freysa #RiskManagement #ApprovalGates #AlertFatigue #LLMSecurity #OWASP #ApplicationSecurity #AIGovernance #CyberSecurity

Видео AI Security 4.2: Human-in-the-Loop Controls for AI Agents - When to Block, When to Allow канала WiseBuilder

Комментарии отсутствуют

Информация о видео

Вчера, 9:52:00

00:12:31

WiseBuilder

Правообладателям

Жалоба на материал Недопустимый материал Нарушение авторских прав

Комментарии

Другие видео канала

AI Security 4.2: Human-in-the-Loop Controls for AI Agents - When to Block, When to Allow

AI Security 1: The AI Security Paradox - Why Faster Code Isn't Always Safer Code

AI Security 2.1: The Evidence - How Often AI Gets Security Wrong (7 Studies, Same Conclusion)

AI Security 2.3: Context Blindness - When AI Writes Working Code That Forgets Who's Calling

AI Security 3.6: Vector & Embedding Weaknesses - How RAG Knowledge Bases Become Attack Surfaces

Claude Code Architecture Deep Dive: How Anthropic's AI Coding Agent Actually Works

AI Security 2.5: Deprecated Libraries & Slopsquatting - When AI Suggests Packages That Don't Exist

AI Security 3.2: Improper Output Handling - When AI Output Becomes the Attack Vector

AI Security 4.5: AI Agent Monitoring - Detecting Prompt Injection and Denial-of-Wallet Attacks

AI Security 3.3: System Prompt Leakage - Protecting Your AI's Hidden Instructions

AI Security 4.1: Excessive Agency in AI Agents - Why Least Privilege Is Your Best Defense

AI Security 2.2: Five Security Vulnerabilities AI Coding Tools Reproduce Most Often

AI Security 2.8: Insecure Test Code - When AI Makes Tests Pass by Removing Security

AI Security 3.1: Prompt Injection - Hijacking the AI's Instructions (The #1 LLM Vulnerability)

AI Security 4.3: Multi-Agent Trust - Securing AI Pipelines Where Agents Orchestrate Agents

AI Security 3.5: Denial of Wallet - When AI Becomes Expensive on Purpose

AI Security 2.4: Over-Permissive Configurations - When AI Gives Everything the Keys to Everything

AI Security 3.4: Sensitive Information Disclosure - How Private Data Leaks Through LLM Applications

AI Security 2.7: Prompt Injection in Your IDE - When Your AI Coding Agent Becomes the Attack Surface

AI Security 4.4: Context Isolation in Multi-User AI Systems - Preventing Cross-Tenant Data Leakage

AI Security 2.6: Secret Leakage from AI Context Windows - How Your .env Files End Up in Git History