Governing AI: A Practical Guide to Data Privacy
- Jini Stupak
- 8 hours ago
- 5 min read
AI companies know a lot about you. Unlike most software, LLMs invite people to share freely: business logic, customer records, internal strategy, source code, personal information, you name it. People copy-paste, think out loud, and hand over data they would never email to a stranger. But that's what they're doing: every prompt sent to an LLM like Claude or ChatGPT is a data transfer to a third party.
Most organizations know that using LLMs carries risk — to their data, their clients, their compliance obligations, and in some industries, to the users whose sensitive information they are trusted to protect. But, increasingly, the productivity gains from AI are just too difficult to give up.
At Pyyne, we know the question is not whether to use AI. It is how to use it without giving away more than you intend.
This post maps out five AI privacy levels, each with meaningfully different protections, setup requirements, pricing, and other tradeoffs. When working with clients, we do our best to help find an approach to using AI that strikes the right balance between cost, productivity, and privacy.
Level 1: Personal accounts
ChatGPT Free, Plus, Pro. Claude Free, Pro, Max. Gemini Free, Plus, Pro, Ultra.
Price: $
When someone opens a personal ChatGPT or Claude account for work, that data is governed by consumer terms, not a business agreement.
Anthropic, OpenAI, and Google Gemini default to using consumer conversations to train their models unless users manually opt out. They also default to retaining user data for a longer period than on other plans, oftentimes for up to five years. Beyond training, providers collect usage data, device information, and interaction patterns, and may share data with third-party service providers each with their own terms and privacy policies. Employees accept these terms and manage their accounts completely on their own, so corporate data can enter training pipelines without anyone at the company realizing it.
Sensitive data that enters a training pipeline could theoretically surface in response to other users. Regulated personal data such as employee records, client information, and health data processed without agreements may put a company in breach of contracts and regulations such as GDPR (General Data Protection Regulation) and HIPPA (The Health Insurance Portability and Accountability Act). In legal contexts, sharing privileged communications with a third-party AI service can break attorney-client privilege.
Without audit logs, admin controls, or enterprise-level data safeguards, consumer plans are generally considered inappropriate for use in a business context.
Level 2: Enterprise accounts
ChatGPT Enterprise. Claude for Work. Google Workspace (Gemini for Workspace).
Price: $$
Companies like OpenAI, Anthropic, and Google offer enterprise plans that contractually prohibit training on customer data by default.
These plans tend to include a Data Processing Addendum that formally governs how your data is handled and restricts vendors from using it outside the business relationship. Major providers have also passed SOC 2 Type II audits, which verify that certain data protections are in place.
Enterprise plans also give companies more granular control over how and where data is stored. Some offer customer-controlled encryption keys or the ability to store data in specific regions, which matters in certain regulatory contexts.
For most of our clients this is the starting point we recommend. The protections are a meaningful step up from consumer plans, the setup is straightforward, and the plans offer the visibility and access controls needed to build a proper AI policy.
For companies handling regulated personal data, operating under HIPAA or GDPR, processing sensitive client information under NDA, or running high volumes of queries where per-token costs add up, it may be worth looking at the options below.
Level 3: Enterprise API
OpenAI API. Anthropic API. Google Vertex API.
Price: $$-$$$$ (Depends on usage)
Most major AI providers offer access via an application programming interface, or API. Developers use APIs to integrate AI into products, but companies also use them to build internal tools, automate workflows, or add AI capabilities to existing systems. Pyyne regularly works with companies to build custom AI tools into their workflows.
With an API, rather than employees typing directly into a chat interface, your engineering team sits between the user and the model. You control what data gets sent, how it is formatted, and what gets logged. Sensitive information can be stripped or anonymized before it ever reaches the model. You can build your own audit trails and set access rules by team or data type.
At the API level, data is generally not used for training and log retention windows tend to be shorter. Zero-data-retention (ZDR) options are also available, where nothing is stored at all.
The main tradeoff to using an API is that you need engineers to build and maintain your tools. But for companies building AI into their products, or those with strict data handling requirements, it gives a level of control that a chat interface simply cannot.
Level 4: Self-host models locally or on a private cloud server
AWS Bedrock. Azure AI Foundry. GCP.
Price: $$-$$$$ (Depends on usage)
The major cloud providers like Amazon, Microsoft, and Google offer access to a range of frontier AI models through their own infrastructure. Rather than using the Claude app, for example, you can elect to access Anthropic's API through Amazon's infrastructure instead. Cloud providers contractually guarantee your data is not used to train the underlying models.
This is a natural fit for organizations already running on AWS, Azure, or GCP. The audit logging, encryption, and access controls you have already set up apply here too. The main advantage over a standard enterprise account is that you can configure your data to stay within the infrastructure you already manage, rather than being sent to a model vendor's servers directly.
A downside to this approach is that if you are not already on one of these platforms, standing up new cloud infrastructure just to access AI models can be a significant undertaking.
Level 5: Self-host models locally
Private cloud (e.g. Radium Cloud). Local hardware. Open weight models (Qwen, DeepSeek, Llama).
Price: $-$$$$
If you wish to take third parties entirely out of the picture, it is possible to run some AI models on your own hardware. In this case, GDPR, HIPAA, and SOC 2 compliance becomes an infrastructure question rather than a contractual one.
Running top-of-the-line models on a regular laptop, or even a high-powered desktop, is not yet practical. The best models need serious GPU hardware. But running them on a private cloud or dedicated server is very much a real option. Neo cloud providers like Radium Cloud offer dedicated GPU infrastructure with full data isolation and less overhead than managing your own hardware.
A tradeoff of this approach is that it generally restricts you to open-weight models. The main models at this level are Llama 3.3 70B, Mistral, Qwen 2.5, and DeepSeek Coder V2. These are not quite at the level of GPT-5.4 or Claude Opus for complex reasoning, but they are good enough for many business tasks, and the gap is closing.
We have built and run these stacks for healthcare providers, financial institutions, and companies with sensitive IP where the data cannot leave the building.
Which level is right for you?
Most organizations end up using more than one level. General productivity work goes through enterprise accounts. Sensitive but non-regulated data goes through the API or a managed cloud. The most sensitive data stays on private infrastructure.
The table below is a rough guide. Engineering requirement refers to whether you need a technical team to implement and maintain the solution.
| Best for | Data sensitivity | Engineering overhead | Cost |
L1 Personal | Personal use only | Low only | None | $ |
L2 Enterprise | Standard business use | Low to medium | None | $$ |
L3 API | Strict data handling or custom tools | Medium to high | Medium | $$-$$$ |
L4 Cloud provider | Orgs already on AWS / Azure / GCP | Medium to high | Low to Medium | $$-$$$ |
L5 Self-Host | Highest sensitivity; sovereign data | Highest | High | $-$$$$ |
