Ranex

DevOps for AI Applications: From Development to Production

Master the DevOps practices essential for deploying AI applications at scale. Learn CI/CD pipelines, monitoring, and deployment strategies for LLM-powered systems.

Tony O
Tony O
7 min read
Updated:
Share:

DevOps for AI Applications: From Development to Production

Deploying AI applications is fundamentally different from deploying traditional web apps. You're not just shipping code—you're shipping models, managing inference costs, monitoring for model drift, and ensuring safety guardrails are active at all times.

In this guide, we'll walk through a production-ready DevOps pipeline for AI applications, from local development to multi-region deployment.

The Unique Challenges of AI DevOps

Traditional DevOps practices need adaptation for AI:

Traditional App:

  • Code changes are deterministic
  • Performance is predictable
  • Rollbacks are straightforward
  • Monitoring is well-understood

AI App:

  • Model outputs are probabilistic
  • Costs vary with usage patterns
  • Safety issues may emerge gradually
  • Need specialized monitoring (hallucination detection, bias tracking)

Architecture Overview

Our production stack:

┌─────────────────────────────────────────┐
│         Load Balancer (Cloudflare)      │
└──────────────┬──────────────────────────┘
               │
    ┌──────────┴───────────┐
    │                      │
┌───▼────┐           ┌────▼────┐
│ Edge   │           │  Edge   │
│ Region │           │  Region │
│  US    │           │   EU    │
└───┬────┘           └────┬────┘
    │                     │
┌───▼─────────────────────▼────┐
│   Application Cluster (K8s)   │
│  ┌──────┐  ┌──────┐  ┌──────┐│
│  │ API  │  │Cache │  │ Queue││
│  └──┬───┘  └───┬──┘  └───┬──┘│
│     │          │          │   │
│  ┌──▼──────────▼──────────▼──┐│
│  │  LLM Gateway + Governance  ││
│  └────────────┬────────────────┘│
└───────────────┼─────────────────┘
                │
        ┌───────▼────────┐
        │  LLM Providers  │
        │ (OpenAI, etc.)  │
        └─────────────────┘

Docker Configuration

Start with a multi-stage Dockerfile optimized for production:

# Stage 1: Dependencies
FROM node:20-alpine AS deps
WORKDIR /app

# Copy package files
COPY package.json package-lock.json ./
RUN npm ci --only=production

# Stage 2: Builder
FROM node:20-alpine AS builder
WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci

COPY . .
RUN npm run build

# Stage 3: Runner
FROM node:20-alpine AS runner
WORKDIR /app

ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

# Copy built application
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
COPY --from=builder --chown=nextjs:nodejs /app/public ./public

USER nextjs

EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"

CMD ["node", "server.js"]

Docker Compose for Local Development

version: '3.8'

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - '3000:3000'
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/ranex
      - REDIS_URL=redis://redis:6379
    depends_on:
      - db
      - redis
    volumes:
      - ./src:/app/src
      - ./public:/app/public

  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: ranex
    ports:
      - '5432:5432'
    volumes:
      - postgres_data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    ports:
      - '6379:6379'
    volumes:
      - redis_data:/data

  # Monitoring stack
  prometheus:
    image: prom/prometheus
    ports:
      - '9090:9090'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana
    ports:
      - '3001:3000'
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  postgres_data:
  redis_data:
  prometheus_data:
  grafana_data:

CI/CD Pipeline (GitHub Actions)

name: Deploy AI Application

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run type check
        run: npm run type-check

      - name: Run tests
        run: npm test

      - name: Run E2E tests
        run: npm run test:e2e

      - name: Security audit
        run: npm audit --audit-level=high

  build:
    needs: test
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v4

      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=sha,prefix={{branch}}-

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: staging

    steps:
      - name: Deploy to staging
        run: |
          echo "Deploying to staging cluster"
          # kubectl apply -f k8s/staging/

      - name: Run smoke tests
        run: |
          echo "Running smoke tests"
          # npm run test:smoke -- --env=staging

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production

    steps:
      - name: Deploy to production
        run: |
          echo "Deploying to production cluster"
          # kubectl apply -f k8s/production/

      - name: Verify deployment
        run: |
          echo "Verifying deployment health"
          # kubectl rollout status deployment/ai-app

Kubernetes Deployment

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-application
  namespace: production
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: ai-application
  template:
    metadata:
      labels:
        app: ai-application
    spec:
      containers:
        - name: app
          image: ghcr.io/yourorg/ai-app:latest
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: 'production'
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: ai-secrets
                  key: openai-api-key
          resources:
            requests:
              memory: '512Mi'
              cpu: '250m'
            limits:
              memory: '1Gi'
              cpu: '500m'
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: ai-application
  namespace: production
spec:
  selector:
    app: ai-application
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-application-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-application
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Monitoring & Observability

Health Check Endpoints

// app/api/health/route.ts
export async function GET() {
  const checks = {
    database: await checkDatabase(),
    redis: await checkRedis(),
    openai: await checkOpenAI(),
  }

  const healthy = Object.values(checks).every(c => c.healthy)

  return Response.json(
    {
      status: healthy ? 'healthy' : 'degraded',
      timestamp: new Date().toISOString(),
      checks,
    },
    {
      status: healthy ? 200 : 503,
    }
  )
}

// app/api/ready/route.ts
export async function GET() {
  // Readiness check (lighter than health)
  const ready = await checkDatabaseConnection()

  return Response.json(
    {
      ready,
    },
    {
      status: ready ? 200 : 503,
    }
  )
}

Prometheus Metrics

// lib/metrics.ts
import { Registry, Counter, Histogram, Gauge } from 'prom-client'

export const register = new Registry()

export const llmRequests = new Counter({
  name: 'llm_requests_total',
  help: 'Total number of LLM requests',
  labelNames: ['model', 'status'],
  registers: [register],
})

export const llmLatency = new Histogram({
  name: 'llm_latency_seconds',
  help: 'LLM request latency in seconds',
  labelNames: ['model'],
  buckets: [0.1, 0.5, 1, 2, 5, 10],
  registers: [register],
})

export const llmTokens = new Counter({
  name: 'llm_tokens_total',
  help: 'Total tokens consumed',
  labelNames: ['model', 'type'],
  registers: [register],
})

export const activeRequests = new Gauge({
  name: 'llm_active_requests',
  help: 'Number of active LLM requests',
  registers: [register],
})

// app/api/metrics/route.ts
export async function GET() {
  const metrics = await register.metrics()
  return new Response(metrics, {
    headers: {
      'Content-Type': register.contentType,
    },
  })
}

Environment Management

# .env.example
NODE_ENV=production

# Database
DATABASE_URL=postgresql://user:pass@host:5432/db

# Redis
REDIS_URL=redis://host:6379

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=...

# Monitoring
SENTRY_DSN=https://...
AXIOM_TOKEN=...

# Feature Flags
ENABLE_GPT4=true
ENABLE_STREAMING=true

# Rate Limits
MAX_REQUESTS_PER_MINUTE=60
MAX_TOKENS_PER_REQUEST=2000

Deployment Checklist

Before deploying to production:

  • All tests passing (unit, integration, E2E)
  • Security audit clean
  • Environment variables configured
  • Secrets stored securely (Vault, K8s Secrets)
  • Monitoring and alerting configured
  • Rate limiting enabled
  • Content moderation active
  • Backup and disaster recovery tested
  • Documentation updated
  • Rollback plan documented
  • On-call rotation scheduled

Best Practices

  1. Use staged rollouts - Deploy to 10% of traffic first
  2. Monitor key metrics - Latency, error rate, token usage
  3. Set up alerts - PagerDuty for critical issues
  4. Log everything - Structured logging with correlation IDs
  5. Test disaster recovery - Regular DR drills
  6. Keep dependencies updated - Automated Dependabot PRs
  7. Use feature flags - Enable/disable features without deployment

Conclusion

DevOps for AI applications requires special attention to monitoring, cost management, and safety. By implementing proper CI/CD pipelines, comprehensive monitoring, and robust deployment strategies, you can confidently ship AI features to production.


Looking for AI governance best practices? Check out our guide on Building AI Governance.

About the Author

Tony O

Tony O

AI Infrastructure Engineer specializing in LLM governance and deployment