What are the HIPAA requirements for AI systems processing patient data?

HIPAA requirements for AI systems: PHI (Protected Health Information) used for AI training or inference must be processed under a Business Associate Agreement (BAA) with the AI vendor, minimum necessary standard applies to what PHI is included in AI inputs, AI systems must have access controls and audit logging, and any AI vendor that touches PHI is a Business Associate requiring a signed BAA before data processing begins.

How do you implement HIPAA-compliant AI systems in practice?

HIPAA-compliant AI engineering: process only de-identified data when possible (if the use case allows), sign BAAs with all AI API vendors before sending any PHI, implement access controls on AI system inputs and outputs, maintain audit logs of AI system access to PHI, conduct a HIPAA risk assessment for each AI use case, and establish data minimization processes to limit PHI exposure in AI inputs.

Which AI API vendors are HIPAA-compliant in 2026?

HIPAA BAA availability (2026): OpenAI offers BAAs for Enterprise tier customers. Anthropic offers BAAs for enterprise agreements. Google Cloud AI (Vertex AI, Healthcare NLP API) offers BAAs. AWS Bedrock offers BAAs under AWS BAA. Microsoft Azure OpenAI offers BAAs. Consumer-tier API access (standard OpenAI/Anthropic APIs) does not include a BAA — enterprise agreements are required for HIPAA compliance.

What are the common HIPAA violations in AI system deployments?

Common HIPAA AI violations: sending PHI to AI APIs without a BAA in place, logging PHI in application logs that are not HIPAA-secured, training AI models on PHI without appropriate de-identification or authorization, insufficient access controls on AI outputs that contain PHI, and retaining PHI in AI vendor systems beyond the minimum necessary period.

How long does it take to build a HIPAA-compliant AI feature?

HIPAA-compliant AI development adds 4–8 weeks to a standard feature timeline: BAA negotiation with AI vendors (2–4 weeks for enterprise agreements), security review of data flows and access controls, implementation of audit logging and access controls, and a formal HIPAA risk assessment. Teams that have existing HIPAA infrastructure (BAAs in place, audit logging frameworks) can reduce this to 1–2 weeks for incremental AI features.

Fordel Studios

HIPAA and AI: A Practical Engineering Guide

Deploying AI in healthcare means navigating HIPAA at every layer of the stack. This is the engineering guide we wish existed when we started building healthcare AI systems — covering BAAs, PHI data flows, model training constraints, and audit logging.

Abhishek Sharma· Head of Engg @ Fordel Studios

March 3, 202610 min read min read

HIPAA and AI: A Practical Engineering Guide

HIPAA was written in 1996, before AI, cloud computing, or modern software architecture existed. Yet it governs every AI system that touches protected health information. The challenge is not that HIPAA prohibits AI — it does not. The challenge is that HIPAA's technical safeguard requirements were written for a world of on-premise databases, and applying them to AI pipelines that span cloud services, model APIs, and edge inference requires careful architectural translation.

···

The PHI Data Flow Problem

The first step in HIPAA-compliant AI engineering is mapping every path that PHI travels through your system. This includes obvious paths (patient records in your database) and non-obvious paths (LLM prompts containing patient names, model training data derived from clinical notes, log files that capture request payloads containing PHI, error messages that include patient identifiers).

Most HIPAA violations in AI systems happen in the non-obvious paths. A debugging log that captures the full LLM prompt — including the patient history that was injected as context — is a PHI exposure if that log is stored without encryption or transmitted without access controls.

PHI Flow Mapping for AI Systems

Map the happy path

Trace PHI from ingestion through processing to output. Document every service, API, database, and queue that PHI touches.

Map the error paths

Where does PHI go when something fails? Error logs, dead letter queues, retry stores, exception tracking services — all potential PHI exposure points.

Map the training path

If you fine-tune models on clinical data, the training pipeline is a PHI processing system. The model weights themselves may constitute PHI if the training data can be extracted.

Map the observability path

Distributed traces, application logs, and metrics that include request context can capture PHI. Ensure your observability stack either excludes PHI or meets HIPAA safeguards.

Verify BAA coverage at every hop

Every third-party service that PHI touches requires a BAA. Cloud provider, LLM API, logging service, error tracking — all of them.

LLM API Considerations

Using third-party LLM APIs (OpenAI, Anthropic, Google) with PHI requires specific configurations. Most providers offer HIPAA-eligible endpoints with BAAs, but these endpoints often have restrictions: no data retention for training, specific API versions, and sometimes higher pricing. Azure OpenAI Service and AWS Bedrock provide HIPAA-eligible LLM access within their respective cloud compliance frameworks.

The critical architectural decision is whether to use API-based inference or self-hosted models. API-based inference is simpler to operate but requires trusting the provider's PHI handling. Self-hosted models (running open-source models on your own HIPAA-compliant infrastructure) give you complete control over PHI but dramatically increase operational complexity.

Approach	PHI Control	Operational Burden	BAA Required	Cost
OpenAI API (HIPAA-eligible)	Provider-managed	Low	Yes, enterprise plan	$$
Azure OpenAI	Azure-managed	Low-medium	Yes, Azure BAA	$$
AWS Bedrock	AWS-managed	Low-medium	Yes, AWS BAA	$$
Self-hosted (open-source)	Full control	Very high	No (you are the host)	$$$+

Technical Safeguards Checklist

HIPAA Technical Safeguards for AI Systems

Encryption at rest for all datastores containing PHI (AES-256 or equivalent)
Encryption in transit for all PHI transmission (TLS 1.2+ minimum)
Unique user identification for every human and service account accessing PHI
Automatic session timeout for interactive PHI access
Audit logging for every PHI access, modification, and deletion — with tamper-evident storage
Emergency access procedures for break-glass PHI access scenarios
PHI backup and recovery with encryption and access controls matching production
Integrity controls to detect unauthorized PHI modification

Audit Logging for AI

HIPAA requires audit logs for PHI access. In an AI system, this means logging every time PHI is used as input to a model, every time model output contains PHI, and every time a human reviews AI-generated clinical content. The logs themselves must be protected — stored with encryption, access-controlled, and retained for six years.

The practical challenge is volume. An AI system processing thousands of clinical documents daily generates massive audit logs. Design your audit infrastructure for scale from day one — append-only log stores, efficient compression, and automated retention management.

“HIPAA compliance in AI is not a feature you add — it is an architectural constraint that shapes every decision from model selection to logging infrastructure. Retrofitting HIPAA compliance onto an existing AI system is an order of magnitude harder than building it in from the start.”

···

Business Associate Agreements: What They Actually Require

A Business Associate Agreement (BAA) is a contract required by HIPAA when a covered entity (hospital, insurer, healthcare provider) shares Protected Health Information with a vendor (your software, your cloud provider, your analytics tool). The BAA does not make HIPAA compliance happen — it documents that both parties understand their obligations and assigns liability for breaches.

What a BAA must contain under 45 CFR § 164.504(e): permitted uses and disclosures of PHI, a requirement that the business associate will not use PHI outside those permitted uses, safeguard requirements (administrative, physical, technical), breach notification obligations to the covered entity within 60 days of discovery, and a provision requiring the business associate to pass these obligations down to subcontractors who also handle PHI.

The subcontractor chain is where most engineering teams get caught out. If you process PHI and you use AWS for compute, you need a BAA with AWS. But you also need BAAs with your database provider, your logging service, your error monitoring tool, your customer support platform — any service that might process or store PHI. AWS, GCP, and Azure all offer BAAs. Most major SaaS tools (Datadog, Twilio, Segment, Intercom) also offer BAAs, but you must specifically request them and enable HIPAA-eligible service tiers.

A common engineering mistake: using a service that does not offer a BAA (e.g., a standard Slack workspace or a free-tier analytics tool) in a workflow that touches PHI. Using a non-BAA service for PHI — even incidentally — is a HIPAA violation. Audit your full data flow, including error messages and logs that might contain PHI snippets. For supply chain visibility practices that apply equally here, see our AI dependency audit guide.

···

AWS HIPAA-Eligible Services and GCP Healthcare API

AWS maintains a list of HIPAA-eligible services under its Business Associate Addendum (BAA). As of 2025, the list includes core services that most healthcare applications use: EC2, S3, RDS, Aurora, DynamoDB, Lambda, ECS, EKS, API Gateway, CloudWatch, CloudTrail, KMS, Secrets Manager, Cognito, and others. Notably absent from the eligible list: some newer services and third-party integrations from the AWS Marketplace. Always verify against the current AWS HIPAA eligible services page before adding a new service to a PHI workflow.

Cloud provider	BAA process	Key eligible services	Notable gaps
AWS	Accept online via console (Healthcare compliance section)	EC2, S3, RDS, Lambda, EKS, KMS, CloudWatch, CloudTrail	Not all Marketplace products; verify each service
GCP	Request via sales/support for Healthcare API; separate BAA	Healthcare API (FHIR, DICOM, HL7v2), Cloud Storage, BigQuery, GKE, Cloud Run	Standard GCP services require separate BAA coverage verification
Azure	Included in Microsoft Online Services BAA (accept in portal)	Azure Blob, SQL, Kubernetes, Functions, Key Vault, Monitor	Some preview services excluded; check Azure compliance docs

GCP Healthcare API provides native FHIR R4, DICOM, and HL7v2 store/query capabilities with audit logging built in. For AI applications that process clinical notes, imaging, or lab results, the Healthcare API is the most compliant path — it handles consent, de-identification, and FHIR resource validation natively. The trade-off is cost and lock-in: Healthcare API is significantly more expensive than storing the same data in Cloud Storage with custom code.

···

PHI Handling Patterns: Encryption, Access Logging, and Minimum Necessary

Access controls: unique user identification, automatic logoff, encryption/decryption capability
Audit controls: hardware/software activity logging in systems containing PHI — must be retained 6 years
Integrity controls: mechanisms to ensure PHI is not improperly altered or destroyed
Transmission security: encryption of PHI in transit (TLS 1.2+ minimum; TLS 1.3 recommended)
Encryption at rest: addressable standard — industry practice is AES-256; justify any deviation in writing

Encryption at rest is technically "addressable" under HIPAA, meaning you can document a reason for not implementing it. In practice, no reasonable security program skips encryption at rest in 2025. Use AWS KMS or GCP KMS with customer-managed keys (CMKs) for PHI data stores. This gives you key rotation control and the ability to effectively delete data by destroying the key.

Access logging is non-negotiable. Every access to PHI — reads, writes, deletes — must be logged with user identity, timestamp, and the resource accessed. In AWS, CloudTrail covers API-level access; enable S3 server access logging and RDS audit logging separately. Store audit logs in a separate account or bucket with write-once permissions (S3 Object Lock or similar) so that a compromised application cannot delete its own access trail.

···

AI-Specific HIPAA Concerns: LLM APIs and PHI

The growth of LLM-powered healthcare applications creates a compliance surface that HIPAA's 2003 drafters did not anticipate. The core rule is simple: if you send PHI to an LLM API, you need a BAA with that provider and you need to know how they handle your data. In 2025, the vendor landscape is split:

LLM provider	HIPAA BAA available	PHI handling policy	Recommended approach
Azure OpenAI (GPT-4/o)	Yes (covered under Azure BAA)	Data not used for training on enterprise tiers	Preferred for HIPAA-covered PHI workloads
AWS Bedrock (Claude, Titan, etc.)	Yes (covered under AWS BAA)	No data retention or training on AWS Bedrock	Strong option; verify specific model BAA coverage
Google Cloud Vertex AI	Yes (covered under GCP Healthcare BAA extension)	No training on customer data; CMEK support	Good option; confirm scope of BAA coverage
OpenAI API (direct)	Enterprise plan only; not standard API	Enterprise: zero data retention; standard: data may be retained 30 days	Only via Enterprise with executed BAA
Anthropic API (direct)	Not publicly available as of 2025	Standard data handling policy applies	Do not send PHI without confirmed BAA

Fine-tuning on medical data creates additional concerns. Under HIPAA, using PHI to fine-tune a model is a "use" of PHI that must fall within a permitted purpose. Research and public health are permitted purposes with proper authorisation; improving a commercial product is not self-evidently permitted. Before fine-tuning any model on PHI, consult healthcare legal counsel and document the permitted purpose in your policies.

The zero-trust principle is the safest engineering stance: treat every LLM API call as a potential PHI leak surface and design your data flows to minimise PHI exposure. De-identify or pseudonymise data before sending it to LLM APIs wherever the use case allows. For governance frameworks that formalise this approach, see our guide on AI governance and the NIST RMF.

···

De-Identification: Safe Harbor vs Expert Determination

De-identified data is not PHI under HIPAA, which means it falls outside the BAA requirement and handling restrictions. HIPAA provides two methods for de-identification. Safe Harbor requires removing 18 specific identifiers (names, dates finer than year, geographic data smaller than state, phone numbers, email addresses, SSNs, medical record numbers, IP addresses, device identifiers, URLs, full-face photographs, and any unique identifiers). Expert Determination requires a qualified statistician to certify that the risk of re-identification is very small.

Safe Harbor is mechanistic and auditor-friendly — either the identifiers are removed or they are not. Expert Determination offers more flexibility (you can keep more data attributes) but requires formal statistical analysis and documentation. For most engineering teams, Safe Harbor via an automated de-identification pipeline is the practical choice. NLP-based de-identification tools (AWS Comprehend Medical, Google Healthcare Natural Language API, Microsoft Presidio) can detect and redact PHI entities from unstructured clinical text with 90-98% recall.

Names (patient, family members, employers)
Geographic subdivisions smaller than state (street, city, county, zip except first 3 digits)
Dates (except year) — including admission, discharge, birth, death dates
Phone numbers, fax numbers
Email addresses
Social Security numbers
Medical record, health plan, account numbers
Certificate and license numbers
VIN and vehicle serial numbers
Device identifiers and serial numbers
URLs and IP addresses
Biometric identifiers (finger and voice prints)
Full-face photos and comparable images
Any unique identifying number, code, or characteristic

···

Breach Notification Timelines

HIPAA breach notification has three layers: to affected individuals (within 60 days of discovery), to HHS (within 60 days for breaches affecting 500+ individuals; annual report for smaller breaches), and to prominent media outlets in affected states for breaches affecting 500+ residents of that state. Discovery is defined as when the organisation knew or reasonably should have known about the breach — not when it was reported to security or legal.

A breach is defined as the acquisition, access, use, or disclosure of PHI in a manner not permitted under HIPAA. Importantly, impermissible disclosure is presumed to be a breach unless the covered entity or business associate can demonstrate a low probability that PHI was compromised using a four-factor risk assessment: nature of PHI involved, who made the unauthorised use, whether PHI was actually acquired or viewed, and extent to which risk has been mitigated.

Engineer your incident response playbook to start the 60-day clock from the moment you detect any anomalous PHI access, not from when legal concludes their risk assessment. The risk assessment determines whether notification is required — the 60-day clock does not pause during the assessment. For the broader incident response and zero-trust architecture that supports HIPAA operations, see our zero-trust architecture guide.

breach notification deadlineFrom discovery to notification of affected individuals and HHS — clock starts at detection, not investigation completion

···

Incident Response for PHI Breaches

HIPAA breach notification has specific timelines that differ from general security incident response. If a breach of unsecured PHI is discovered, the covered entity must notify affected individuals within 60 calendar days of discovery — not 60 days from when the breach occurred, but from when it was discovered or reasonably should have been discovered. For breaches affecting 500 or more individuals, the entity must also notify HHS and prominent media outlets in the affected state within the same 60-day window. Breaches affecting fewer than 500 individuals are reported to HHS annually.

The definition of "unsecured PHI" is critical: PHI that has been encrypted with NIST-approved algorithms (AES-128, AES-256) or destroyed is considered "secured" and does not trigger notification requirements even if the encrypted data is exposed. This is the single strongest argument for encrypting PHI at rest and in transit — it converts a potential breach notification event into a security incident that can be handled internally. The cost difference is enormous: a notifiable breach involving 10,000 records typically costs $500K-2M in notification, legal, and remediation costs. An encryption-protected exposure costs the internal investigation time and nothing more.

typical breach cost for 10K recordsIncludes notification, legal review, credit monitoring, HHS resolution, and remediation — avoidable if PHI is encrypted with NIST-approved algorithms

···

AI Model Training on Healthcare Data

Training AI models on healthcare data introduces HIPAA obligations that go beyond standard PHI handling. When PHI enters a training dataset, every copy of that dataset — including intermediate processing artifacts, model weights that memorise training data, and evaluation datasets derived from the training data — is subject to HIPAA protection. This means your ML training infrastructure must meet the same security and access control requirements as your production PHI storage.

The practical implication: you cannot train on PHI using consumer-grade cloud ML services that are not covered by a BAA. AWS SageMaker, Google Vertex AI, and Azure ML are all HIPAA-eligible when configured correctly and covered by a BAA. Databricks, Weights & Biases, and other ML platform tools require individual BAA evaluation. If your model training pipeline includes any service without a BAA, you have a HIPAA violation regardless of whether the PHI was de-identified at other points in the pipeline — the obligation follows the data through every processing step.

Build with us

Need this kind of thinking applied to your product?

We build AI agents, full-stack platforms, and engineering systems. Same depth, applied to your problem.

Start a conversation View services

Newsletter

Enjoyed this? Get the weekly digest.

Research highlights and AI news, delivered every Thursday. No spam.

Loading comments...

Keep Reading

All articles