Building AI Governance: A Practical Framework for Enterprise LLM Deployment
Learn how to implement robust AI governance frameworks for enterprise LLM deployments. This comprehensive guide covers safety policies, content filtering, and compliance strategies.
Building AI Governance: A Practical Framework for Enterprise LLM Deployment
As organizations rapidly adopt Large Language Models (LLMs) into their production systems, the need for robust AI governance has never been more critical. Without proper guardrails, LLMs can generate harmful content, leak sensitive information, or produce outputs that violate regulatory requirements.
In this comprehensive guide, we'll walk through building a production-ready AI governance framework that balances innovation with safety, compliance, and user protection.
Why AI Governance Matters
The risks of ungoverned LLM deployment are substantial:
- Harmful Content Generation: Models can produce toxic, biased, or offensive outputs
- Data Leakage: Without proper controls, LLMs may expose PII or confidential information
- Regulatory Violations: Non-compliant outputs can result in legal penalties
- Reputational Damage: A single viral incident can destroy brand trust
- Security Vulnerabilities: Prompt injection attacks can compromise system integrity
According to a 2024 Stanford study, over 73% of enterprise AI deployments lack comprehensive governance frameworks, leading to an average of 2.3 safety incidents per quarter.
The Four Pillars of AI Governance
Our framework is built on four essential pillars:
- Input Validation & Safety
- Output Moderation & Filtering
- Audit Logging & Compliance
- Continuous Monitoring & Improvement
Let's dive deep into each pillar with practical implementation examples.
Pillar 1: Input Validation & Safety
The first line of defense is validating and sanitizing user inputs before they reach your LLM.
Implementing Input Guards
import { z } from 'zod'
// Define strict input schema
const userPromptSchema = z.object({
message: z
.string()
.min(1, 'Message cannot be empty')
.max(4000, 'Message too long')
.refine(msg => !containsInjectionPatterns(msg), 'Potential prompt injection detected'),
context: z.record(z.string()).optional(),
maxTokens: z.number().min(1).max(2000).default(500),
})
// Input validation middleware
export async function validateUserInput(rawInput: unknown) {
try {
const validated = userPromptSchema.parse(rawInput)
// Additional safety checks
const safetyScore = await checkContentSafety(validated.message)
if (safetyScore.toxicity > 0.8) {
throw new Error('Input flagged for toxic content')
}
return validated
} catch (error) {
if (error instanceof z.ZodError) {
throw new ValidationError('Invalid input format', error.errors)
}
throw error
}
}
Detecting Prompt Injection Attacks
Prompt injection is one of the most common attack vectors. Here's a detection pattern:
const INJECTION_PATTERNS = [
/ignore (previous|all) instructions/i,
/you are now (a|an) .+ (assistant|bot|ai)/i,
/disregard (your|the) (system|safety|ethical) (prompt|guidelines)/i,
/reveal (your|the) (system prompt|instructions)/i,
/<\|.*?\|>/g, // Special tokens
]
function containsInjectionPatterns(text: string): boolean {
return INJECTION_PATTERNS.some(pattern => pattern.test(text))
}
// Rate limiting by user
const rateLimiter = new Map<string, number[]>()
export function checkRateLimit(userId: string): boolean {
const now = Date.now()
const userRequests = rateLimiter.get(userId) || []
// Remove requests older than 1 minute
const recentRequests = userRequests.filter(timestamp => now - timestamp < 60_000)
if (recentRequests.length >= 20) {
return false // Rate limit exceeded
}
recentRequests.push(now)
rateLimiter.set(userId, recentRequests)
return true
}
Pillar 2: Output Moderation & Filtering
Even with strict input validation, LLMs can still generate problematic content. Output moderation is essential.
Multi-Layer Content Filtering
interface ContentSafetyResult {
safe: boolean
categories: {
hate: number
violence: number
sexual: number
selfHarm: number
profanity: number
}
flaggedPhrases: string[]
}
export async function moderateOutput(content: string): Promise<ContentSafetyResult> {
// Layer 1: Keyword-based filtering (fast)
const keywordFlags = checkBlockedKeywords(content)
// Layer 2: ML-based moderation (OpenAI Moderation API)
const moderationResult = await openai.moderations.create({
input: content,
})
const categories = moderationResult.results[0].category_scores
// Layer 3: PII detection
const piiDetected = detectPII(content)
// Determine if content is safe
const safe =
!keywordFlags.hasBlockedContent &&
categories.hate < 0.7 &&
categories.violence < 0.7 &&
categories.sexual < 0.7 &&
categories['self-harm'] < 0.5 &&
!piiDetected.hasPII
return {
safe,
categories: {
hate: categories.hate,
violence: categories.violence,
sexual: categories.sexual,
selfHarm: categories['self-harm'],
profanity: keywordFlags.profanityScore,
},
flaggedPhrases: [...keywordFlags.flagged, ...piiDetected.findings],
}
}
PII Detection & Redaction
const PII_PATTERNS = {
email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
ssn: /\b\d{3}-\d{2}-\d{4}\b/g,
creditCard: /\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g,
}
function detectPII(text: string) {
const findings: string[] = []
for (const [type, pattern] of Object.entries(PII_PATTERNS)) {
const matches = text.match(pattern)
if (matches) {
findings.push(`${type}: ${matches.length} occurrence(s)`)
}
}
return {
hasPII: findings.length > 0,
findings,
}
}
export function redactPII(text: string): string {
let redacted = text
redacted = redacted.replace(PII_PATTERNS.email, '[EMAIL REDACTED]')
redacted = redacted.replace(PII_PATTERNS.phone, '[PHONE REDACTED]')
redacted = redacted.replace(PII_PATTERNS.ssn, '[SSN REDACTED]')
redacted = redacted.replace(PII_PATTERNS.creditCard, '[CARD REDACTED]')
return redacted
}
Pillar 3: Audit Logging & Compliance
Comprehensive logging is critical for compliance, debugging, and continuous improvement.
Structured Audit Logging
interface AuditLog {
id: string
timestamp: Date
userId: string
sessionId: string
input: {
message: string
metadata: Record<string, any>
}
output: {
response: string
model: string
tokens: number
latency: number
}
safety: {
inputScore: ContentSafetyResult
outputScore: ContentSafetyResult
flagged: boolean
}
compliance: {
gdprCompliant: boolean
dataRetention: string
}
}
export class AuditLogger {
async log(entry: AuditLog): Promise<void> {
// Store in database with encryption
await db.auditLogs.create({
data: {
...entry,
// Encrypt sensitive fields
input: encrypt(JSON.stringify(entry.input)),
output: encrypt(JSON.stringify(entry.output)),
},
})
// Also stream to monitoring system
await this.streamToMonitoring(entry)
}
async queryLogs(filter: {
userId?: string
flaggedOnly?: boolean
startDate?: Date
endDate?: Date
}): Promise<AuditLog[]> {
return db.auditLogs.findMany({
where: {
userId: filter.userId,
'safety.flagged': filter.flaggedOnly ? true : undefined,
timestamp: {
gte: filter.startDate,
lte: filter.endDate,
},
},
})
}
}
GDPR Compliance
export class GDPRCompliance {
// Right to erasure (Article 17)
async deleteUserData(userId: string): Promise<void> {
await db.$transaction([
db.auditLogs.deleteMany({ where: { userId } }),
db.userSessions.deleteMany({ where: { userId } }),
db.userPreferences.delete({ where: { userId } }),
])
}
// Data portability (Article 20)
async exportUserData(userId: string): Promise<any> {
const [logs, sessions, preferences] = await Promise.all([
db.auditLogs.findMany({ where: { userId } }),
db.userSessions.findMany({ where: { userId } }),
db.userPreferences.findUnique({ where: { userId } }),
])
return {
exportDate: new Date().toISOString(),
userId,
auditLogs: logs.map(log => ({
...log,
input: decrypt(log.input),
output: decrypt(log.output),
})),
sessions,
preferences,
}
}
}
Pillar 4: Continuous Monitoring & Improvement
AI governance is not a "set and forget" system. Continuous monitoring and improvement are essential.
Real-Time Monitoring Dashboard
export class GovernanceMonitor {
async getMetrics(timeRange: '1h' | '24h' | '7d' = '24h') {
const metrics = await db.auditLogs.aggregate({
where: {
timestamp: {
gte: this.getTimeRangeStart(timeRange),
},
},
_count: true,
_avg: {
'safety.inputScore.hate': true,
'safety.inputScore.violence': true,
'output.tokens': true,
'output.latency': true,
},
})
const flaggedCount = await db.auditLogs.count({
where: {
'safety.flagged': true,
timestamp: {
gte: this.getTimeRangeStart(timeRange),
},
},
})
return {
totalRequests: metrics._count,
flaggedRequests: flaggedCount,
flagRate: (flaggedCount / metrics._count) * 100,
averageSafetyScores: {
hate: metrics._avg['safety.inputScore.hate'],
violence: metrics._avg['safety.inputScore.violence'],
},
performance: {
avgTokens: metrics._avg['output.tokens'],
avgLatency: metrics._avg['output.latency'],
},
}
}
// Alert on anomalies
async checkAnomalies(): Promise<Alert[]> {
const alerts: Alert[] = []
const metrics = await this.getMetrics('1h')
// High flag rate
if (metrics.flagRate > 5) {
alerts.push({
severity: 'high',
type: 'high_flag_rate',
message: `Flag rate is ${metrics.flagRate.toFixed(2)}% (threshold: 5%)`,
})
}
// Unusual latency
if (metrics.performance.avgLatency > 3000) {
alerts.push({
severity: 'medium',
type: 'high_latency',
message: `Average latency is ${metrics.performance.avgLatency}ms`,
})
}
return alerts
}
}
Putting It All Together: The Complete Pipeline
Here's how all four pillars work together in a production request:
export async function handleAIRequest(rawInput: unknown, userId: string): Promise<AIResponse> {
const auditLogger = new AuditLogger()
const startTime = Date.now()
try {
// 1. Rate limiting
if (!checkRateLimit(userId)) {
throw new Error('Rate limit exceeded')
}
// 2. Input validation & safety
const validatedInput = await validateUserInput(rawInput)
const inputSafety = await checkContentSafety(validatedInput.message)
if (!inputSafety.safe) {
await auditLogger.log({
userId,
input: validatedInput,
safety: { inputScore: inputSafety, flagged: true },
// ... other fields
})
throw new Error('Input flagged by safety system')
}
// 3. Generate LLM response
const llmResponse = await generateLLMResponse(validatedInput.message)
// 4. Output moderation
const outputSafety = await moderateOutput(llmResponse.content)
if (!outputSafety.safe) {
await auditLogger.log({
userId,
output: llmResponse,
safety: { outputScore: outputSafety, flagged: true },
// ... other fields
})
return {
success: false,
error: 'Response flagged by moderation system',
}
}
// 5. PII redaction
const sanitizedResponse = redactPII(llmResponse.content)
// 6. Audit logging
await auditLogger.log({
id: generateId(),
timestamp: new Date(),
userId,
sessionId: validatedInput.context?.sessionId || generateId(),
input: validatedInput,
output: {
response: sanitizedResponse,
model: llmResponse.model,
tokens: llmResponse.usage.totalTokens,
latency: Date.now() - startTime,
},
safety: {
inputScore: inputSafety,
outputScore: outputSafety,
flagged: false,
},
compliance: {
gdprCompliant: true,
dataRetention: '30days',
},
})
// 7. Return safe response
return {
success: true,
content: sanitizedResponse,
metadata: {
tokens: llmResponse.usage.totalTokens,
latency: Date.now() - startTime,
},
}
} catch (error) {
// Error logging
console.error('AI request failed:', error)
throw error
}
}
Best Practices & Recommendations
Based on our experience deploying AI governance at scale, here are our top recommendations:
1. Start with Strict Policies, Then Relax
It's easier to relax strict policies than to tighten loose ones. Begin with conservative safety thresholds and adjust based on real-world data.
2. Layer Your Defenses
No single safety mechanism is perfect. Use multiple layers:
- Input validation
- Prompt engineering
- Output moderation
- Human review for edge cases
3. Monitor Continuously
Set up real-time dashboards and alerts. Review flagged content weekly to identify patterns and improve your filters.
4. Educate Your Team
AI governance is a team sport. Ensure developers, product managers, and legal teams understand the risks and controls.
5. Plan for Incidents
Have a clear incident response plan:
- Who gets alerted?
- How do you shut down the system if needed?
- What's your communication plan?
6. Document Everything
Maintain comprehensive documentation of:
- Safety policies and thresholds
- Incident reports and resolutions
- Model versions and changes
- Compliance audits
Conclusion
Building robust AI governance is not optional—it's a fundamental requirement for responsible LLM deployment. By implementing the four pillars outlined in this guide, you can:
- Protect your users from harmful content
- Maintain regulatory compliance
- Build trust with stakeholders
- Enable rapid, confident AI innovation
Remember, AI governance is an ongoing journey, not a destination. As models evolve and new risks emerge, your governance framework must evolve too.
Additional Resources
- NIST AI Risk Management Framework
- OpenAI Safety Best Practices
- EU AI Act Compliance Guide
- OWASP LLM Top 10
This article is part of our Enterprise AI series. Stay tuned for our next post on "Optimizing LLM Performance at Scale" coming next week.