Sensitive data doesn't mean no AI

“I can’t use AI. My data is sensitive.”

I hear that often from technical staff at the university where I work.

I manage web and mobile platforms at Concordia University in Montreal. We work across student-facing systems, content workflows, codebases, logs, and integrations where confidential or internal data can appear quickly. So I understand the concern.

And for some data, the concern is the correct answer. There are real consequences when sensitive information crosses a boundary it shouldn’t. Community Bank, a regional US bank, disclosed in 2026 that customer names, Social Security numbers, and dates of birth were exposed through an unauthorized AI application. Vercel investigated unauthorized access to internal systems after a small third-party AI tool’s Google Workspace OAuth app was compromised. Samsung restricted internal use in 2023 after engineers reportedly pasted proprietary code into ChatGPT.

Different failures, but the same underlying problem: data or access moved somewhere the organization could no longer fully control.

But the answer to “my data is sensitive” is not “then don’t use AI.” The answer is different habits, different tools, and different boundaries. I’ve been using AI tools myself since the GitHub Copilot beta in 2021, and I’ve spent the past two years figuring out what safe use looks like in practice for my own team and through the learning community I run.

The gap is habits, not knowledge

Many organizations have AI guidelines now. Mine does. They specify which tools are permitted, which data goes where, and who to contact for approval. That’s necessary. It’s also not sufficient.

In a 2026 EDUCAUSE survey, 94% of respondents said they had used AI tools for work in the past six months. Only 54% said they were aware of the policies and guidelines meant to guide that use. But even among the ones who know the policy, there’s a gap between understanding “don’t paste sensitive data into ChatGPT” and knowing what to do instead when you’re sitting in front of a real task with real data at 2 p.m. on a Tuesday.

This isn’t unique to universities. Anyone working with customer data, health records, financial information, internal systems, or proprietary code faces the same problem. Policy tells you what not to do. Habits tell you what to do instead.

Keep the pattern, remove the values

The single most useful technique I’ve found is this: give AI the structure of your data, not the data itself.

My team manages the Concordia mobile app (among other things). One of its features shows students their weekly class schedule. The API response includes instructor names, course times, and room locations, all tied to a real student’s schedule. Sensitive, identifiable data. I can’t paste any of it into a consumer AI tool.

But I don’t need to. I need the shape of the data, not the data itself.

I take the API schema, just the field names and structure, with no real values in it. Then I ask AI to generate realistic fake data that matches it: fabricated students and instructor names, real Concordia buildings, a mix of lectures, tutorials, and labs, with enough variety to catch the edge cases (a packed day, a lone evening class, rooms across campus). I can scale it up to as much fake data as the feature needs. None of these students exist.

I use that fake data to build and test a new feature (colour-coding the schedule by class type). The feature works. No real student data touched.

This scales to anything with structure. Database teams: give AI the schema, not the rows. Report teams: give it the column headers, not the data. Anyone with a form, template, or config file: same move. Structure in, sensitive values out.

That covers data you can generate. For a document you can’t synthesize (a real PDF, a contract, a report you need AI to read), redact instead.

Redact names, IDs, and sensitive fields first using a real redaction tool (Preview’s Redact feature, Acrobat’s Redaction tool, or similar) that removes the underlying text. A black rectangle drawn on top covers text visually but leaves it in the file, so anyone (including the AI) can still read it. Then give AI the cleaned copy.

Your file has controls. The AI copy may not.

A question that comes up often: does any of this matter if the file is already in the cloud?

It does. A private file in OneDrive or SharePoint is still primarily a stored document. It has permissions, sharing rules, retention settings, and audit trails. When AI uses that file, the content may enter a different path: stored prompts and responses, cached context, a searchable index of the content, or pieces quoted back in an answer.

Turning text into tokens for a model isn’t anonymization. The model context still represents the original content, so a student ID, salary, or API key in the source text is still present in the AI’s context. In an approved enterprise tool, those paths may be governed. In a consumer tool, you may have created a second copy under completely different terms.

Paying is not protecting

There’s a setting many people don’t know about. It controls whether your conversations are used to train or improve future models. Defaults vary by tool, account type, and region. The unsafe assumption is that paying for a Plus, Pro, or Premium tier automatically protects you. Paid accounts may include better controls, but you still have to verify them.

Check the settings. Turn them off where they exist. Then check monthly, because defaults, toggles, and terms change without notice.

But training off is not the whole picture. Memory carries details across conversations and may be stored separately. Clicking thumbs up or down can also create a separate review path, and on some providers that path may not follow the same training-off setting. Local transcripts and cache may persist on your device. Retention can also be affected by legal holds or compliance obligations.

Training off keeps your data out of the normal model-training path. Storage and retention still vary by tool. It doesn’t make a personal account safe for sensitive data.

An extension can read everything you type

Turning off training closes one path. There’s another one underneath it that no AI setting touches: your browser.

A browser extension with page access can read everything you type, not just what you put in a chat box. It sees the text in your email, your CMS, your forms, and your AI prompts, on every site where it’s active.

Grammarly is the relatable example, and many people have it installed. On individual Grammarly Free, Premium, and single-user Pro accounts, the setting that lets it train on what you write is on by default. You can turn it off in Grammarly’s privacy settings. In a January 2026 study, Incogni ranked it among the most potentially privacy-damaging of the widely used Chrome extensions, mostly because of how much it can access and how many people run it.

Then there’s the malicious version, which security researchers call “Man-in-the-Prompt”. Any extension with page access can silently read, change, and steal every prompt and response in ChatGPT, Claude, Gemini, or Copilot. It needs no special permissions. A malicious coupon extension or theme is all it takes, because the AI chat is just a web page in your browser, and the extension can read and modify it the same way it would any other site. It bypasses whatever protections the AI provider built, because the weak link is your browser, not the AI company.

Extensions are useful, so the goal isn’t to remove them. Limit what they can reach. Restrict an extension’s site access so it only runs where you need it, and disable extensions on sensitive portals like student records or HR dashboards.

AI no longer just answers. Now it acts.

Everything above applies to chat: you paste, it answers. That’s changing. Coding tools like Cursor, Claude Code, and Codex don’t wait for you to paste. Once you start them, they can read files, write code, run commands, and search your project on their own. The safety question shifts from “what am I sending in?” to “what is it doing while I’m not looking?”

Five questions before you let an AI agent (a tool that can take actions on its own, like the tools above) act:

What can it see?
What can it change?
What can it send out?
What can it remember?
How do I undo its changes?

Here’s the catch: you can undo file changes (git, version history, ctrl-z). You can’t unsend data already exposed to a provider. That’s why what you do before the agent runs matters more than what you do after. Set boundaries up front, not after something goes wrong.

The agent isn’t the only point of access. It uses tools to search, read files, and run commands, and each tool has its own permissions. An agent that respects your boundary can still expose data through a tool that doesn’t. I ran into this myself. The next section shows how.

When the agent finishes, don’t trust the summary it writes. Read what it actually changed, ideally with a tool that highlights every change (your editor or version control has a view for this). A summary can skip a file it touched or smooth over a change you would want to see.

The prompt is no longer the whole job. AI tools can now reach through files, search tools, shell commands, browser control, connected apps, and agents. So the question is no longer only “what did I paste?” It’s “what can this tool reach, remember, change, or send?”

Test the guardrails before real data

You don’t know if your guardrails are enough until you test them. Find the gaps, add layers where needed, then decide whether the setup is ready for real data.

One distinction worth holding onto: guardrails control what AI can reach, not whether its output is correct. Those are two different problems. Bad output can often be caught and corrected. This section is about the one you can’t: sensitive data reaching AI that shouldn’t have it.

While setting up a log-analysis workflow at work, I knew from the start that I wouldn’t get the AI guardrails right on the first try, so I built a safety scaffold. I took a real application error log and replaced the sensitive values with realistic fake values.

Then I started configuring boundaries on the sanitized version. I configured an ignore file to hide the log from AI. I added a shell hook to block commands that could access it. Then I discovered that the editor’s built-in search tool bypassed both, because it’s not a shell command and doesn’t respect the ignore file. A single search returned lines with IP addresses sitting in the AI’s context window. Because the IPs were fabricated, nothing real was exposed. That was the point of the scaffold.

I added a third layer: a tool-access hook that intercepts search and file-access tools before they run. That closed the gap.

No policy can hand you this part. AI tools, editors, extensions, agents, and account settings change too quickly for one static checklist to cover every case. You build it on safe data, so you don’t have to risk the real data to find the gaps.

Simpler still: if the data truly can’t be exposed, keep it out of the workspace entirely. The “Keep the pattern, remove the values” technique from earlier scales to whole files and whole projects, not just prompts. No real data, no layers needed.

The habits, distilled

Check training, memory, and extension access. Turn off training and memory where you can, limit what extensions can reach, then check monthly.
Keep the pattern, remove the values. For testing, drafting, and debugging, AI needs the structure, not your real records.
Sensitive data only through approved paths. Use the institutional path your organization names, not a personal account.
Set boundaries before AI acts. Ignore files, deny rules, hooks, and similar access controls. Then read what it actually did, not its summary.
If it truly can’t be exposed, keep it out of the workspace entirely. No combination of guardrails is a perfect sandbox.

The practice, not the policy

I’ve been helping my team adopt AI for about two years now. We’re a small team of four leading digital platforms, design systems, and standards at one of Canada’s largest universities. Everyone on the team uses AI tools daily, but that didn’t happen by mandate. It happened through practice: pairing, experimenting, sharing what works, and building habits around what’s safe. We stay current together.

Policy gives people AI literacy. Practice builds AI operational fluency. The difference is whether you know the rules or whether you’ve built the muscle memory to follow them when you’re tired, when the deadline is tomorrow, when the shortcut is right there.

You learn to work safely with AI by working with AI. Deliberately.