Is Your Sensitive Data Secure in AI?

On April 24th, GitHub Copilot started training on Free, Pro, and Pro+ users' code by default. The community thread has 291 downvotes and no response from GitHub. Copilot is one many AI tools like ChatGPT, Claude, Gemini, and more that data professionals use every day, so I sat down and read the privacy policies of every single one. What gets sent. Where it goes. How long it's kept. Which plan tier you actually have to be on to keep it out of the training set.

This is a learning-together video, not a security lecture. A few of the findings surprised me, including the gap between "opting out of training" and "zero data retention," the personal-account-vs-company-account trap, amongst others.

In this video, you'll learn:
1) Exactly what each AI tool sends to its servers, and why your prompt is usually a tiny fraction of what gets transmitted
2) How ChatGPT, Claude, Gemini, GitHub Copilot, Microsoft 365 Copilot, Cursor, OpenCode, and OpenRouter handle data at every plan tier
3) Why "opt out of training" does not mean "zero retention," and which tools offer which level of protection
4) Why your personal AI account follows different rules than your company's enterprise plan, even when you're using it for work
5) A practical four-tier data classification framework adapted from ISO 27001 to decide what is safe to share with which tool

~~~~~
RESOURCES

Infographics:
- What AI Receives: https://drive.google.com/file/d/1_2NCca7iLq4RgLaYUGpbxyHbg6p3iZLa/view?usp=sharing
- Comparison Matrix: https://drive.google.com/file/d/11TTwT1PCc-PuJI5gjDmlTxZdh15c_B9l/view?usp=sharing
- Two Levels of Data AI Protection: https://drive.google.com/file/d/1RzGBW4VnLhpcNNRjlBkf08im9hJOrdja/view?usp=sharing
- ISO Framework: https://drive.google.com/file/d/1-_F3QG6s0JlWa0vH2JkQGLAbQkcbFBfd/view?usp=sharing

Official Policies:
- GitHub Copilot policy update: https://github.blog/news-insights/company-news/updates-to-github-copilot-interaction-data-usage-policy/
- GitHub Community Discussion: https://github.com/orgs/community/discussions/188488
- OpenAI Data Controls FAQ: https://help.openai.com/en/articles/7730893-data-controls-faq
- OpenAI Enterprise Privacy: https://openai.com/enterprise-privacy/
- OpenAI Codex CLI Security: https://developers.openai.com/codex/security
- Anthropic, Is My Data Used for Training: https://privacy.claude.com/en/articles/10023580-is-my-data-used-for-model-training
- Anthropic Consumer Terms Update: https://www.anthropic.com/news/updates-to-our-consumer-terms
- Claude Code Data Usage: https://code.claude.com/docs/en/data-usage
- Google Gemini Apps Privacy Hub: https://support.google.com/gemini/answer/13594961
- Microsoft 365 Copilot Privacy: https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365-copilot-privacy
- Microsoft Copilot Chat Privacy: https://learn.microsoft.com/en-us/copilot/privacy-and-protections
- Cursor Data Use: https://cursor.com/data-use
- OpenRouter Data Collection: https://openrouter.ai/docs/guides/privacy/data-collection

Reports & Research:
- Cyberhaven, Sensitive Data Flowing Into AI Tools: https://www.cyberhaven.com/blog/sensitive-data-flowing-into-ai-tools
- Cisco 2025 Data Privacy Benchmark Study: https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2025/m01/cisco-2025-data-privacy-benchmark-study.html
- LayerX Enterprise AI & SaaS Security Report 2025: https://go.layerxsecurity.com/the-layerx-enterprise-ai-saas-data-security-report-2025
- GitGuardian State of Secrets Sprawl 2026: https://blog.gitguardian.com/the-state-of-secrets-sprawl-2026-pr/
- LLM Memorization Research (Berkeley + Google DeepMind): https://arxiv.org/html/2507.05578v1
- ISO 27001 Annex A 5.12 Information Classification: https://hightable.io/iso-27001-annex-a-5-12-classification-of-information/

~~~~~
CHAPTERS
00:00 – The GitHub Copilot Wake-Up Call
02:08 – Why The Privacy Policies All Changed
02:57 – The Stats Behind the Data AI Gets
05:21 – What AI Really Gets When a Prompt is Sent
07:07 – ChatGPT, Claude, and the Five-Year Retention
09:38 – Gemini's Toggles for Your Data
10:39 – Comparison of Major AI Platforms with Data Privacy and Security
13:14 – Microsoft Copilot, GitHub Copilot, and Cursor
15:39 – OpenCode and OpenRouter (The Intermediary Processor)
18:03 – The Two Levels of Protection (No Training vs ZDR)
20:26 – Your Personal Account vs Your Company's Plan
21:30 – "Isn't This Overblown?" The Memorization Research
25:34 – A Data Classification Framework You Can Adopt
28:48 – Wrap-Up + What's Coming Next

I'm Kyle Chalmers and after 9+ years leading data teams, I'm here to make AI approachable for all data professionals and enthusiasts.

Contact me: https://kclabs.ai/contact
Follow me on Substack: https://kylechalmerslabs.substack.com/

#aidataprivacy #githubcopilot #datasecurity #chatgpt #claude #gemini #cursor #aigovernance #dataengineer #dataanalyst #ai

Видео Is Your Sensitive Data Secure in AI? канала Kyle Chalmers | Data + AI