DevOps for AI Applications: From Development to Production
Master the DevOps practices essential for deploying AI applications at scale. Learn CI/CD pipelines, monitoring, and deployment strategies for LLM-powered systems.
DevOps for AI Applications: From Development to Production
Deploying AI applications is fundamentally different from deploying traditional web apps. You're not just shipping code—you're shipping models, managing inference costs, monitoring for model drift, and ensuring safety guardrails are active at all times.
In this guide, we'll walk through a production-ready DevOps pipeline for AI applications, from local development to multi-region deployment.
The Unique Challenges of AI DevOps
Traditional DevOps practices need adaptation for AI:
Traditional App:
- Code changes are deterministic
- Performance is predictable
- Rollbacks are straightforward
- Monitoring is well-understood
AI App:
- Model outputs are probabilistic
- Costs vary with usage patterns
- Safety issues may emerge gradually
- Need specialized monitoring (hallucination detection, bias tracking)
Architecture Overview
Our production stack:
┌─────────────────────────────────────────┐
│ Load Balancer (Cloudflare) │
└──────────────┬──────────────────────────┘
│
┌──────────┴───────────┐
│ │
┌───▼────┐ ┌────▼────┐
│ Edge │ │ Edge │
│ Region │ │ Region │
│ US │ │ EU │
└───┬────┘ └────┬────┘
│ │
┌───▼─────────────────────▼────┐
│ Application Cluster (K8s) │
│ ┌──────┐ ┌──────┐ ┌──────┐│
│ │ API │ │Cache │ │ Queue││
│ └──┬───┘ └───┬──┘ └───┬──┘│
│ │ │ │ │
│ ┌──▼──────────▼──────────▼──┐│
│ │ LLM Gateway + Governance ││
│ └────────────┬────────────────┘│
└───────────────┼─────────────────┘
│
┌───────▼────────┐
│ LLM Providers │
│ (OpenAI, etc.) │
└─────────────────┘
Docker Configuration
Start with a multi-stage Dockerfile optimized for production:
# Stage 1: Dependencies
FROM node:20-alpine AS deps
WORKDIR /app
# Copy package files
COPY package.json package-lock.json ./
RUN npm ci --only=production
# Stage 2: Builder
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 3: Runner
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
# Copy built application
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public
USER nextjs
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
CMD ["node", "server.js"]
Docker Compose for Local Development
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
ports:
- '3000:3000'
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- DATABASE_URL=postgresql://postgres:postgres@db:5432/ranex
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
volumes:
- ./src:/app/src
- ./public:/app/public
db:
image: postgres:15-alpine
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: ranex
ports:
- '5432:5432'
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- '6379:6379'
volumes:
- redis_data:/data
# Monitoring stack
prometheus:
image: prom/prometheus
ports:
- '9090:9090'
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
grafana:
image: grafana/grafana
ports:
- '3001:3000'
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
volumes:
postgres_data:
redis_data:
prometheus_data:
grafana_data:
CI/CD Pipeline (GitHub Actions)
name: Deploy AI Application
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run type check
run: npm run type-check
- name: Run tests
run: npm test
- name: Run E2E tests
run: npm run test:e2e
- name: Security audit
run: npm audit --audit-level=high
build:
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=sha,prefix={{branch}}-
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: staging
steps:
- name: Deploy to staging
run: |
echo "Deploying to staging cluster"
# kubectl apply -f k8s/staging/
- name: Run smoke tests
run: |
echo "Running smoke tests"
# npm run test:smoke -- --env=staging
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Deploy to production
run: |
echo "Deploying to production cluster"
# kubectl apply -f k8s/production/
- name: Verify deployment
run: |
echo "Verifying deployment health"
# kubectl rollout status deployment/ai-app
Kubernetes Deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-application
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: ai-application
template:
metadata:
labels:
app: ai-application
spec:
containers:
- name: app
image: ghcr.io/yourorg/ai-app:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: 'production'
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-secrets
key: openai-api-key
resources:
requests:
memory: '512Mi'
cpu: '250m'
limits:
memory: '1Gi'
cpu: '500m'
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: ai-application
namespace: production
spec:
selector:
app: ai-application
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-application-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-application
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitoring & Observability
Health Check Endpoints
// app/api/health/route.ts
export async function GET() {
const checks = {
database: await checkDatabase(),
redis: await checkRedis(),
openai: await checkOpenAI(),
}
const healthy = Object.values(checks).every(c => c.healthy)
return Response.json(
{
status: healthy ? 'healthy' : 'degraded',
timestamp: new Date().toISOString(),
checks,
},
{
status: healthy ? 200 : 503,
}
)
}
// app/api/ready/route.ts
export async function GET() {
// Readiness check (lighter than health)
const ready = await checkDatabaseConnection()
return Response.json(
{
ready,
},
{
status: ready ? 200 : 503,
}
)
}
Prometheus Metrics
// lib/metrics.ts
import { Registry, Counter, Histogram, Gauge } from 'prom-client'
export const register = new Registry()
export const llmRequests = new Counter({
name: 'llm_requests_total',
help: 'Total number of LLM requests',
labelNames: ['model', 'status'],
registers: [register],
})
export const llmLatency = new Histogram({
name: 'llm_latency_seconds',
help: 'LLM request latency in seconds',
labelNames: ['model'],
buckets: [0.1, 0.5, 1, 2, 5, 10],
registers: [register],
})
export const llmTokens = new Counter({
name: 'llm_tokens_total',
help: 'Total tokens consumed',
labelNames: ['model', 'type'],
registers: [register],
})
export const activeRequests = new Gauge({
name: 'llm_active_requests',
help: 'Number of active LLM requests',
registers: [register],
})
// app/api/metrics/route.ts
export async function GET() {
const metrics = await register.metrics()
return new Response(metrics, {
headers: {
'Content-Type': register.contentType,
},
})
}
Environment Management
# .env.example
NODE_ENV=production
# Database
DATABASE_URL=postgresql://user:pass@host:5432/db
# Redis
REDIS_URL=redis://host:6379
# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=...
# Monitoring
SENTRY_DSN=https://...
AXIOM_TOKEN=...
# Feature Flags
ENABLE_GPT4=true
ENABLE_STREAMING=true
# Rate Limits
MAX_REQUESTS_PER_MINUTE=60
MAX_TOKENS_PER_REQUEST=2000
Deployment Checklist
Before deploying to production:
- All tests passing (unit, integration, E2E)
- Security audit clean
- Environment variables configured
- Secrets stored securely (Vault, K8s Secrets)
- Monitoring and alerting configured
- Rate limiting enabled
- Content moderation active
- Backup and disaster recovery tested
- Documentation updated
- Rollback plan documented
- On-call rotation scheduled
Best Practices
- Use staged rollouts - Deploy to 10% of traffic first
- Monitor key metrics - Latency, error rate, token usage
- Set up alerts - PagerDuty for critical issues
- Log everything - Structured logging with correlation IDs
- Test disaster recovery - Regular DR drills
- Keep dependencies updated - Automated Dependabot PRs
- Use feature flags - Enable/disable features without deployment
Conclusion
DevOps for AI applications requires special attention to monitoring, cost management, and safety. By implementing proper CI/CD pipelines, comprehensive monitoring, and robust deployment strategies, you can confidently ship AI features to production.
Looking for AI governance best practices? Check out our guide on Building AI Governance.