Accidental LLM Backdoor - Prompt Tricks
In this video we explore various prompt tricks to manipulate the AI to respond in ways we want, even when the system instructions want something else. This can help us better understand the limitations of LLMs.
Get my font (advertisement): https://shop.liveoverflow.com
Watch the complete AI series:
https://www.youtube.com/playlist?list=PLhixgUqwRTjzerY4bJgwpxCLyfqNYwDVB
The Game: https://gpa.43z.one
The OpenAI API cost is pretty high, thus if you want to play the game, use the OpenAI Playground with your own account: https://platform.openai.com/playground?mode=chat
Chapters:
00:00 - Intro
00:39 - Content Moderation Experiment with Chat API
02:19 - Learning to Attack LLMs
03:06 - Attack 1: Single Symbol Differences
03:51 - Attack 2: Context Switch to Write Stories
05:20 - Attack 3: Large Attacker Inputs
06:31 - Attack 4: TLDR Backdoor
08:27 - "This is just a game"
08:56 - Attack 5: Different Languages
09:19 - Attack 6: Translate Text
10:30 - Quote about LLM Based Games
11:11 - advertisement shop.liveoverflow.com
=[ ❤️ Support ]=
→ per Video: https://www.patreon.com/join/liveoverflow
→ per Month: https://www.youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join
2nd Channel: https://www.youtube.com/LiveUnderflow
=[ 🐕 Social ]=
→ Twitter: https://twitter.com/LiveOverflow/
→ Streaming: https://twitch.tvLiveOverflow/
→ TikTok: https://www.tiktok.com/@liveoverflow_
→ Instagram: https://instagram.com/LiveOverflow/
→ Blog: https://liveoverflow.com/
→ Subreddit: https://www.reddit.com/r/LiveOverflow/
→ Facebook: https://www.facebook.com/LiveOverflow/
Видео Accidental LLM Backdoor - Prompt Tricks канала LiveOverflow
Get my font (advertisement): https://shop.liveoverflow.com
Watch the complete AI series:
https://www.youtube.com/playlist?list=PLhixgUqwRTjzerY4bJgwpxCLyfqNYwDVB
The Game: https://gpa.43z.one
The OpenAI API cost is pretty high, thus if you want to play the game, use the OpenAI Playground with your own account: https://platform.openai.com/playground?mode=chat
Chapters:
00:00 - Intro
00:39 - Content Moderation Experiment with Chat API
02:19 - Learning to Attack LLMs
03:06 - Attack 1: Single Symbol Differences
03:51 - Attack 2: Context Switch to Write Stories
05:20 - Attack 3: Large Attacker Inputs
06:31 - Attack 4: TLDR Backdoor
08:27 - "This is just a game"
08:56 - Attack 5: Different Languages
09:19 - Attack 6: Translate Text
10:30 - Quote about LLM Based Games
11:11 - advertisement shop.liveoverflow.com
=[ ❤️ Support ]=
→ per Video: https://www.patreon.com/join/liveoverflow
→ per Month: https://www.youtube.com/channel/UClcE-kVhqyiHCcjYwcpfj9w/join
2nd Channel: https://www.youtube.com/LiveUnderflow
=[ 🐕 Social ]=
→ Twitter: https://twitter.com/LiveOverflow/
→ Streaming: https://twitch.tvLiveOverflow/
→ TikTok: https://www.tiktok.com/@liveoverflow_
→ Instagram: https://instagram.com/LiveOverflow/
→ Blog: https://liveoverflow.com/
→ Subreddit: https://www.reddit.com/r/LiveOverflow/
→ Facebook: https://www.facebook.com/LiveOverflow/
Видео Accidental LLM Backdoor - Prompt Tricks канала LiveOverflow
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Binary Exploitation vs. Web SecurityHacking Google Cloud?Trying to Find a Bug in WordPressAuthentication Bypass Using Root ArrayMy YouTube Financials - The Future of LiveOverflowDefending LLM - Prompt InjectionAttacking LLM - Prompt InjectionOur Future As Hackers Is At Stake!Cyber Security Challenge Germany (2023)Cybercrime is Not Hacking!Attacking Language Server JSON RPCAdvanced Teleport Hack (stolen from cheaters)VPNs, Proxies and Secure Tunnels Explained (Deepdive)I’m moving, no videos sorryComputer Networking (Deepdive)Revisiting 2b2t Tamed Animal Coordinate ExploitPain in your Hand (RSI)?What is a Protocol? (Deepdive)The Future Of Hacking #shortsGoogle vs. ChatGPT for Hackers #shorts