- Популярные видео
- Авто
- Видео-блоги
- ДТП, аварии
- Для маленьких
- Еда, напитки
- Животные
- Закон и право
- Знаменитости
- Игры
- Искусство
- Комедии
- Красота, мода
- Кулинария, рецепты
- Люди
- Мото
- Музыка
- Мультфильмы
- Наука, технологии
- Новости
- Образование
- Политика
- Праздники
- Приколы
- Природа
- Происшествия
- Путешествия
- Развлечения
- Ржач
- Семья
- Сериалы
- Спорт
- Стиль жизни
- ТВ передачи
- Танцы
- Технологии
- Товары
- Ужасы
- Фильмы
- Шоу-бизнес
- Юмор
AI Security 3.6: Vector & Embedding Weaknesses - How RAG Knowledge Bases Become Attack Surfaces
Your AI's response is only as trustworthy as the documents it retrieves. If a document in the vector database contains false information or malicious instructions, the AI treats it as authoritative. This section covers two distinct risks: retrieval poisoning (injecting crafted documents into the knowledge base) and embedding inversion (reversing stored vectors to recover original text). Both undermine the assumption that your RAG pipeline is secure.
In this video, you'll learn:
- How RAG pipelines work: document embedding, similarity search, context injection, and response generation
- Retrieval poisoning: how adversaries inject crafted documents that get retrieved for legitimate queries
- The dual condition for a successful attack: retrieval similarity + generation manipulation
- Poisoned document flow: how 2 legitimate + 1 poisoned result produces attacker-chosen answers
- USENIX Security 2025 (Poisoned RAG study): 5 malicious documents in millions achieved 90-97% attack success rate against GPT-4
- Black-box attack: no access to embedding model or vector DB internals required
- Why studied defenses (paraphrasing, perplexity filtering, duplicate detection) provided only modest protection
- Injection surfaces: web scraping, user uploads, shared wikis, email/ticket ingestion, third-party data feeds
- RAG as a delivery mechanism for indirect prompt injection (Slack AI incident, August 2024)
- Embedding inversion: VectuTex study achieved 92% exact token recovery (BLEU score 97.3) from OpenAI ada-002 vectors
- 2024 follow-up: inversion works even without access to the original embedding model
- A breached vector database = approximately a loss of original source documents
- Data at risk: customer PII, internal documents, medical/legal records vs. public docs and anonymized data
- The access control gap: vector similarity search has no built-in permission concept (unlike SQL row-level security)
- The post-filtering antipattern: timing side channels reveal restricted content exists
- Four-stage RAG security: ingestion validation, retrieval access control, context assembly filtering, output scanning
- Vulnerable vs. secure ingestion code: allowlisting sources, scanning for injection patterns, trust-level metadata
- Complete defense summary by risk category: retrieval poisoning, embedding inversion, cross-user leakage
This is Section 3.6 of the LLM Threat Landscape series. Treat document ingestion as a security boundary with the same rigor you apply to any database write path. Control what goes in, and you control what comes out.
#RAGSecurity #VectorDatabase #EmbeddingInversion #RetrievalPoisoning #LLMSecurity #OWASP #KnowledgeBase #SemanticSearch #AccessControl #DataExfiltration #AISafety #DevSecOps #PineconeDB #Embeddings #PromptInjection #AIRisk
Видео AI Security 3.6: Vector & Embedding Weaknesses - How RAG Knowledge Bases Become Attack Surfaces канала WiseBuilder
In this video, you'll learn:
- How RAG pipelines work: document embedding, similarity search, context injection, and response generation
- Retrieval poisoning: how adversaries inject crafted documents that get retrieved for legitimate queries
- The dual condition for a successful attack: retrieval similarity + generation manipulation
- Poisoned document flow: how 2 legitimate + 1 poisoned result produces attacker-chosen answers
- USENIX Security 2025 (Poisoned RAG study): 5 malicious documents in millions achieved 90-97% attack success rate against GPT-4
- Black-box attack: no access to embedding model or vector DB internals required
- Why studied defenses (paraphrasing, perplexity filtering, duplicate detection) provided only modest protection
- Injection surfaces: web scraping, user uploads, shared wikis, email/ticket ingestion, third-party data feeds
- RAG as a delivery mechanism for indirect prompt injection (Slack AI incident, August 2024)
- Embedding inversion: VectuTex study achieved 92% exact token recovery (BLEU score 97.3) from OpenAI ada-002 vectors
- 2024 follow-up: inversion works even without access to the original embedding model
- A breached vector database = approximately a loss of original source documents
- Data at risk: customer PII, internal documents, medical/legal records vs. public docs and anonymized data
- The access control gap: vector similarity search has no built-in permission concept (unlike SQL row-level security)
- The post-filtering antipattern: timing side channels reveal restricted content exists
- Four-stage RAG security: ingestion validation, retrieval access control, context assembly filtering, output scanning
- Vulnerable vs. secure ingestion code: allowlisting sources, scanning for injection patterns, trust-level metadata
- Complete defense summary by risk category: retrieval poisoning, embedding inversion, cross-user leakage
This is Section 3.6 of the LLM Threat Landscape series. Treat document ingestion as a security boundary with the same rigor you apply to any database write path. Control what goes in, and you control what comes out.
#RAGSecurity #VectorDatabase #EmbeddingInversion #RetrievalPoisoning #LLMSecurity #OWASP #KnowledgeBase #SemanticSearch #AccessControl #DataExfiltration #AISafety #DevSecOps #PineconeDB #Embeddings #PromptInjection #AIRisk
Видео AI Security 3.6: Vector & Embedding Weaknesses - How RAG Knowledge Bases Become Attack Surfaces канала WiseBuilder
Комментарии отсутствуют
Информация о видео
31 мая 2026 г. 18:43:13
00:12:48
Другие видео канала




















