SOC 2 (System and Organization Controls 2) has become the de facto standard for demonstrating that your technology stack handles customer data responsibly. But when your stack includes machine learning models that make autonomous decisions, the traditional audit playbook falls short. Here's what engineering and compliance teams need to know.
Why SOC 2 Matters for AI
Enterprise buyers increasingly require SOC 2 Type II reports before signing contracts. If your product includes AI components -- recommendation engines, fraud detectors, chatbots, or any model that touches user data -- auditors will expect evidence that these systems meet the same trust principles as your traditional infrastructure.
SOC 2 was designed for deterministic systems. AI systems are probabilistic, change behavior as they learn, and may produce different outputs for identical inputs depending on model version, training data, or inference configuration.
The Five Trust Service Criteria Applied to AI
1. Security. How do you protect model weights, training data, and inference endpoints from unauthorized access? This includes access controls on model registries, encryption of model artifacts at rest and in transit, and audit logging of all model deployments and rollbacks.
2. Availability. What is the uptime guarantee for your AI-powered features? Auditors want to see SLAs, monitoring dashboards, failover mechanisms, and incident response procedures specific to model serving infrastructure.
3. Processing Integrity. This is where AI gets tricky. Can you demonstrate that your model produces reliable, accurate results? You need version-controlled model artifacts, reproducible training pipelines, automated testing of model outputs, and drift detection to catch when accuracy degrades.
4. Confidentiality. Training data often contains sensitive information. Can you prove that PII is properly handled throughout the ML pipeline -- from data collection through feature engineering, training, and inference? Data lineage tracking and access controls on training datasets are essential.
5. Privacy. If your model was trained on personal data, you need to demonstrate compliance with your privacy notice. This includes the ability to delete training data on request (right to erasure), explain model decisions (right to explanation), and prevent models from memorizing and regurgitating personal information.
Common Gaps Teams Miss
No model inventory. You can't govern what you don't know exists. Shadow AI -- models deployed by individual teams without central oversight -- is the number one audit finding.
Missing version control. If you can't trace which model version was serving on a specific date, you can't demonstrate processing integrity.
No drift monitoring. A model that was accurate at deployment may degrade over weeks or months. Without continuous monitoring, you're flying blind.
Incomplete access logs. Who accessed the model? Who changed its configuration? Who approved the last deployment? Without an immutable audit trail, auditors will flag this immediately.
No incident response for AI. When a model makes a bad decision, what's the process? Most teams have incident response for infrastructure outages but not for model failures.
How Automation Helps
Manual compliance tracking doesn't scale. By the time you've documented one model's compliance status, three more have been deployed. Automated governance platforms can continuously scan your AI inventory, assess compliance against SOC 2 criteria, detect drift before it causes incidents, and maintain tamper-proof audit trails -- all without slowing down your ML team.
The goal isn't to add bureaucracy. It's to make compliance a natural byproduct of good engineering practices, so your team can ship fast while your auditors get the evidence they need.
You deploy a model that performs well in testing. Three months later, customers start complaining about incorrect predictions. The model hasn't been retrained -- nothing in your code changed. What happened? Drift.
Three Types of Drift
Data drift occurs when the statistical distribution of input features changes over time. For example, a fraud detection model trained on pre-pandemic transaction patterns will see dramatically different input distributions as consumer behavior shifts. The model itself hasn't changed, but the world it operates in has.
Concept drift occurs when the relationship between inputs and the target variable changes. What counted as "fraudulent" behavior two years ago may be normal today, and vice versa. The ground truth has shifted, making the model's learned patterns obsolete.
Behavioral drift is the observable change in a model's output patterns -- the downstream effect of data drift, concept drift, or both. It's what users actually experience: predictions that used to be accurate are now wrong, confidence scores that were calibrated are now misleading, or edge cases that were handled well are now misclassified.
Why Drift Is Dangerous
Unlike a server crash, drift is silent. There's no error message, no stack trace, no alert from your monitoring system (unless you've built drift detection). The model keeps serving predictions with high confidence -- they're just increasingly wrong.
In regulated industries, undetected drift can have serious consequences: a credit scoring model that gradually becomes discriminatory, a medical diagnosis system that misses an emerging disease pattern, or a content moderation system that fails to catch new forms of harmful content.
Detection Methods
Statistical tests -- Population Stability Index (PSI), Kolmogorov-Smirnov test, and chi-squared tests can detect changes in input distributions.
Embedding comparison -- For NLP and unstructured data, compare embeddings of current inputs against a reference set. Cosine distance above a threshold signals drift.
Performance monitoring -- Track accuracy, precision, recall, and F1 against labeled ground truth. This catches the effect of drift but requires labeled data, which is often delayed.
Prediction distribution monitoring -- Track the distribution of model outputs. A fraud detector that suddenly classifies 30% more transactions as fraudulent is drifting, even if you don't have ground truth yet.
What to Do When You Detect Drift
First, quantify the severity. A drift coefficient below 0.1 might be normal seasonal variation. Above 0.3, you likely need to retrain or investigate. Above 0.5, consider rolling back to a previous model version while you investigate.
Second, diagnose the root cause. Is it data drift (new input patterns), concept drift (changed ground truth), or a data pipeline issue (upstream schema change, missing features)? The fix depends on the cause.
Third, log everything. Your audit trail should capture when drift was detected, what the severity was, what action was taken, and by whom. This is essential for SOC 2 and EU AI Act compliance.
EU AI Act 2025: What Engineering Teams Need to Know
Regulation10 min readApril 2026
The EU AI Act is the world's first comprehensive AI regulation, and its requirements are now enforceable. If your AI systems serve EU users -- or if you're building products that EU customers will evaluate -- here's what your engineering team needs to understand.
Risk Classification
The Act classifies AI systems into four risk tiers, and your compliance obligations scale with the tier:
Unacceptable risk -- Banned outright. Social scoring systems, real-time biometric identification in public spaces (with exceptions), and manipulative AI that exploits vulnerabilities.
High risk -- Subject to strict requirements. This includes AI used in employment decisions, credit scoring, law enforcement, critical infrastructure, and education. Most enterprise AI falls here.
Limited risk -- Transparency obligations only. Chatbots must disclose they're AI. Deepfakes must be labeled.
Minimal risk -- No specific obligations. Spam filters, AI-powered video games, etc.
Requirements for High-Risk Systems
If your system is classified as high-risk, you must implement:
Risk management system. A documented, continuous process for identifying, analyzing, and mitigating risks throughout the AI lifecycle.
Data governance. Training, validation, and testing datasets must meet quality criteria. You need to document data sources, collection methods, and preprocessing steps.
Technical documentation. Detailed descriptions of the system's purpose, architecture, training methodology, performance metrics, and known limitations.
Record-keeping. Automatic logging of events throughout the system's lifecycle. Logs must be retained for a period appropriate to the system's purpose.
Transparency. Users must be informed that they're interacting with AI. Instructions for use must be provided to downstream deployers.
Human oversight. The system must be designed to allow effective human oversight, including the ability to intervene, override, or shut down the system.
Accuracy, robustness, cybersecurity. The system must achieve appropriate levels of accuracy and be resilient to errors and attacks.
Timeline
The Act entered into force in August 2024, with a phased rollout:
February 2025: Bans on unacceptable-risk AI take effect.
August 2025: Requirements for general-purpose AI models apply.
August 2026: Full enforcement for high-risk systems.
Practical Steps for Engineering Teams
Inventory your AI systems. You can't assess risk for systems you don't know about. Start with a complete catalog of every AI/ML component in your stack.
Classify each system. Determine which risk tier each system falls into based on its use case and the populations it affects.
Gap analysis. For high-risk systems, compare your current practices against the Act's requirements. Where are you already compliant? Where are the gaps?
Implement logging and documentation. This is usually the biggest lift. Automate what you can -- manual documentation doesn't scale and becomes stale quickly.
Establish human oversight. Define clear escalation paths, override mechanisms, and kill switches for autonomous decision-making systems.
The EU AI Act isn't just a European concern. It sets the global baseline that other jurisdictions will follow, and multinational companies are already adopting its standards globally. Getting compliant now positions your team ahead of the curve.
You know you need AI governance. Your legal team is asking about it. Your enterprise customers are putting it in RFPs. But where do you start when you have zero governance infrastructure in place?
Why Governance Matters Now
AI governance isn't bureaucracy for bureaucracy's sake. It's risk management. Without it, you face regulatory fines (EU AI Act penalties reach 35M EUR or 7% of global revenue), customer trust erosion, security vulnerabilities in model supply chains, and liability exposure when models make harmful decisions.
You don't need a 50-person compliance team. A pragmatic, engineering-first approach can get you from zero to auditable in weeks, not months.
The Five-Step Framework
Step 1: Inventory. Catalog every AI/ML system in your organization. Include production models, staging experiments, third-party AI APIs, and yes -- that GPT-4 API key someone on marketing is using. For each system, record: name, owner, purpose, data sources, deployment status, and risk tier.
Step 2: Assess. For each system, evaluate risk across multiple dimensions: data sensitivity (does it process PII?), autonomy level (does it make decisions without human review?), impact scope (how many users are affected?), bias potential (could it discriminate?), and regulatory exposure (does it fall under EU AI Act, SOC 2, or industry-specific rules?).
Step 3: Policy. Define clear governance policies based on your assessment. Policies should include: risk thresholds (what risk score triggers a review?), approval workflows (who approves model deployments?), monitoring requirements (what metrics must be tracked?), incident response procedures (what happens when a model fails?), and data retention rules.
Step 4: Monitor. Implement continuous monitoring for every production system. At minimum, track: model performance metrics (accuracy, latency, error rates), behavioral drift (is the model's output distribution changing?), data quality (are input features within expected ranges?), and access patterns (who is querying the model, and how often?).
Step 5: Audit. Establish a tamper-proof audit trail that captures every governance event: model deployments, configuration changes, policy evaluations, drift alerts, incident responses, and human approvals. This trail is what auditors, regulators, and enterprise customers will review.
Common Mistakes
Starting too big. Don't try to govern everything at once. Start with your highest-risk systems and expand from there.
Manual processes. Spreadsheets and documents become stale immediately. Automate governance checks so they run as part of your existing CI/CD and deployment pipelines.
Ignoring shadow AI. The models you don't know about are the ones that will cause problems. Regular discovery scans are essential.
Treating governance as a one-time project. Governance is a continuous process. Models change, data changes, regulations change. Your governance program must evolve with them.
Tools You Need
A functional AI governance program requires: a model registry (inventory), risk assessment engine (scoring), policy engine (automated rule evaluation), drift detection (monitoring), and an immutable audit ledger (compliance evidence). You can build these yourself, use open-source tools, or adopt a platform that provides all five. The key is automation -- manual governance doesn't scale.