Skip to main content

Deployment Runbook

Historical Context & Merged Sources

The following documents were consolidated into this file for better maintainability:
  • CLOUD_RUN_DEPLOYMENT_GUIDE.md
  • DEPLOYMENT_GUIDE.md

🚀 Cloud Run Deployment Guide (Merged)

Production-Ready FastAPI Backend Deployment

This section covers deploying your FastAPI backend to Google Cloud Run with secure JWT authentication, BigQuery integration, and structured logging.

Quick Deployment

Bash (Linux/Mac):
# Set your project ID
export GCP_PROJECT_ID="your-project-id"

# Run the deployment script
./deploy-to-cloud-run.sh
Manual Deployment:
# 1. Enable APIs
gcloud services enable run.googleapis.com
gcloud services enable secretmanager.googleapis.com
gcloud services enable bigquery.googleapis.com

# 2. Deploy to Cloud Run
gcloud run deploy fastapi-backend \
  --source ./api \
  --region us-central1 \
  --allow-unauthenticated \
  --service-account fastapi-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \
  --set-secrets JWT_SECRET_KEY=jwt-secret:latest \
  --set-env-vars GCP_PROJECT_ID=YOUR_PROJECT_ID,CORS_ORIGINS=https://your-frontend.vercel.app

🚨 Critical Configuration Requirements (Merged)

Frontend (Vercel) Configuration

Environment Variable: NEXT_PUBLIC_API_URL
  • Correct Value: https://payroll-pipeline-cbs-238826317621.us-central1.run.app

AI Chatbot Deployment Runbook (Original)

Last Updated: December 24, 2025
Service: payroll-backend-prod (Cloud Run)
Frontend: Vercel (payroll-pipeline-cbs.vercel.app)

Critical Discovery: Two-Production-Service Architecture

⚠️ IMPORTANT: We have TWO production Cloud Run services by design. We are NOT collapsing them. Service Responsibilities:
  • Config Plane (payroll-pipeline-cbs-api): Rate Plans, versions, rule authoring, admin configuration UIs
  • System of Record (payroll-backend-prod): Agent identities, assignments, execution, audit-grade logic, AI chatbot
Routing:
  • /api/v1/admin/rate-plans/*payroll-pipeline-cbs-api (Config Plane)
  • /api/v1/admin/agent-profiles/createpayroll-backend-prod (System of Record)
  • /api/v1/ai/*payroll-backend-prod (System of Record)
Always verify which Cloud Run service the dashboard points to before deploying.

Backend Deployment (Cloud Run)

Prerequisites

  1. Verify Current Service Configuration:
    gcloud run services describe payroll-backend-prod \
      --region us-central1 \
      --project payroll-bi-gauntlet
    
  2. Confirm Dashboard Points to This Service:
    • Check NEXT_PUBLIC_AI_URL in Vercel environment variables
    • Should be: https://payroll-backend-prod-evndxpcirq-uc.a.run.app
    • If different, update Vercel env vars first
  3. Verify Local Repo State:
    # Confirm you're on the intended commit
    git rev-parse --abbrev-ref HEAD
    git rev-parse HEAD
    git show -s --format=fuller HEAD
    
    # Verify dual router mounts exist in code
    grep -n "include_router.*ai_router" api/main.py
    grep -n 'prefix="/api/v1/ai"' api/main.py
    grep -n 'prefix="/ai"' api/main.py
    

Pre-Deploy Verification

  1. Check Current Revision:
    gcloud run services describe payroll-backend-prod \
      --region us-central1 \
      --project payroll-bi-gauntlet \
      --format="value(status.latestReadyRevisionName)"
    
  2. Check Current Traffic:
    gcloud run services describe payroll-backend-prod \
      --region us-central1 \
      --project payroll-bi-gauntlet \
      --format="value(status.traffic)"
    
  3. Verify Router Mounts in Code:
    • Confirm api/main.py includes both mounts:
      • /api/v1/ai (canonical)
      • /ai (legacy alias)
    • See api/main.py lines 274-280

Deploy Command

⚠️ CRITICAL: Deploy from repo root (not api/ directory), and mirror existing service settings.
# From repo root (c:\Projects\payroll-pipeline-cbs)
gcloud run deploy payroll-backend-prod \
  --source . \
  --region us-central1 \
  --project payroll-bi-gauntlet \
  --allow-unauthenticated \
  --set-env-vars "GCP_PROJECT_ID=payroll-bi-gauntlet,STAGE=prod" \
  --service-account 238826317621-compute@developer.gserviceaccount.com
Alternative: Use deployment script (if available):
./DEPLOY.sh prod
Important Flags (mirror existing service):
  • --allow-unauthenticated: Keep consistent with current service config
  • --service-account: Keep consistent (don’t change unless you know it matches)
  • --set-env-vars: Set required environment variables

Post-Deploy Verification

1. Endpoint Parity (No 404)

Both endpoints must return not-404:
AI_ORIGIN="https://payroll-backend-prod-evndxpcirq-uc.a.run.app"

# Test canonical endpoint
curl -i -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Content-Type: application/json" \
  -d '{"question":"ping"}'
# Expected: 401 (not 404) - endpoint exists, auth required

# Test legacy alias
curl -i -X POST "$AI_ORIGIN/ai/query-public" \
  -H "Content-Type: application/json" \
  -d '{"question":"ping"}'
# Expected: 401 (not 404) - legacy alias works

2. Router Mount Logs

Check Cloud Run logs for router mount confirmation:
gcloud logging read \
  "resource.type=cloud_run_revision AND \
   resource.labels.service_name=payroll-backend-prod AND \
   textPayload=~'ROUTER.*mounted'" \
  --project payroll-bi-gauntlet \
  --limit 10 \
  --format="value(textPayload)" \
  --order=asc
Expected Log Entries:
[ROUTER] AI query router mounted at /api/v1/ai (canonical)
[ROUTER] AI query router mounted at /ai (legacy alias)

3. Auth Gating

Test 3a: No Token → 401
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Content-Type: application/json" \
  -d '{"question":"test"}'
# Expected: 401 Unauthorized
Test 3b: Invalid Token → 401
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer invalid_token_12345" \
  -H "Content-Type: application/json" \
  -d '{"question":"test"}'
# Expected: 401 Unauthorized (not demo fallback)

4. RBAC Matrix (Requires Valid JWT Tokens)

Test 4a: Agent → Self → 200
# As Kenny Young (agent)
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer <kenny_token>" \
  -H "Content-Type: application/json" \
  -d '{"question":"How much did I make this month?"}'
# Expected: 200 OK with commission data
Test 4b: Agent → Non-Downline → 403
# As Kenny Young, ask about Tommy Dang (not in downline)
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer <kenny_token>" \
  -H "Content-Type: application/json" \
  -d '{"question":"How much did Tommy Dang make?"}'
# Expected: 403 Forbidden
Test 4c: Admin → Any Agent → 200
# As admin
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer <admin_token>" \
  -H "Content-Type: application/json" \
  -d '{"question":"How much did Tommy Dang make?"}'
# Expected: 200 OK (admin can access any agent)

5. Firestore Readiness Verification (Phase 8S+)

Applies when: api/** changes OR any Phase 8S+ ledger work is deployed. Firestore is used as the exactly-once coordination layer (mutex). BigQuery remains the immutable ledger.
Both backend services must have Firestore access verified post-deploy.
Step 1 — Confirm Both Services Deployed Because api/** changed, both workflows must complete:
  • deploy_cloudrun.ymlpayroll-backend-prod
  • deploy_pipeline_api.ymlpayroll-pipeline-cbs-api
Confirm both show green in GitHub Actions. Step 2 — Confirm Correct Revision Is Serving For each service:
gcloud run services describe payroll-backend-prod \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --format="value(status.url)"

gcloud run services describe payroll-pipeline-cbs-api \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --format="value(status.url)"
Call /healthz on each URL (unauthenticated) and confirm git_commit_sha matches expected commit. Expected response:
{"ok": true, "git_commit_sha": "<sha or unknown>", "service": "<name>"}
Step 3 — Firestore Preflight Check Using an admin JWT:
curl -s -H "Authorization: Bearer <ADMIN_JWT>" \
  https://<SERVICE_URL>/api/v1/admin/firestore-preflight
Expected:
{"ok": true}
Run against:
  • payroll-backend-prod
  • payroll-pipeline-cbs-api
Both must return 200 + {"ok": true}. Step 4 — If Preflight Fails Verify:
  1. Firestore API enabled:
    gcloud services enable firestore.googleapis.com --project=payroll-bi-gauntlet
    
  2. Firestore DB exists (Native mode, us-central1)
  3. Runtime service account has IAM role: roles/datastore.user
  4. Run make firestore-prereqs to print the gcloud commands, then execute them manually.
Preflight must pass before enabling ledger or /process changes in production. Note: Frontend (Vercel) deployment is NOT required unless API contracts changed. Backend deploy verification must now include Firestore readiness for all Phase 8S+ work.

Troubleshooting

If Deployment Fails

  1. Check Service Account Permissions:
    gcloud projects get-iam-policy payroll-bi-gauntlet \
      --flatten="bindings[].members" \
      --filter="bindings.members:238826317621-compute@developer.gserviceaccount.com"
    
  2. Verify Source Directory:
    • Must run from repo root (contains api/main.py)
    • Not from api/ directory
  3. Check Build Logs:
    gcloud builds list \
      --project payroll-bi-gauntlet \
      --limit 5
    

If Endpoints Return 404

  1. Check Router Mount Logs (see Post-Deploy Verification #2)
  2. Verify Code Includes Dual Mounts:
    grep -A 5 "include_router.*ai_router" api/main.py
    
  3. Check Import Errors:
    gcloud logging read \
      "resource.type=cloud_run_revision AND \
       resource.labels.service_name=payroll-backend-prod AND \
       (textPayload=~'CRITICAL.*router' OR textPayload=~'ImportError')" \
      --project payroll-bi-gauntlet \
      --limit 10
    

Frontend Deployment (Vercel)

Prerequisites

  1. Verify Environment Variable:
    • NEXT_PUBLIC_AI_URL must be set in Vercel dashboard
    • Format: Origin-only (e.g., https://payroll-backend-prod-evndxpcirq-uc.a.run.app)
    • DO NOT include /api/v1 or any path components
  2. Check Current Value:
    • Vercel Dashboard → Project Settings → Environment Variables
    • Verify NEXT_PUBLIC_AI_URL points to payroll-backend-prod service

Deploy Procedure

  1. Clear Build Cache (Critical for stale bundle issues):
    • Vercel Dashboard → Deployments → Click “Redeploy” → Uncheck “Use existing Build Cache”
    • Or use Vercel CLI:
      vercel --prod --force
      
  2. Trigger Redeploy:
    # From repo root
    git add .
    git commit -m "Trigger Vercel rebuild for AI chatbot routing"
    git push origin main
    
  3. Monitor Deployment:
    • Vercel Dashboard → Deployments
    • Wait for build to complete
    • Check build logs for any errors

Post-Deploy Verification

1. Network Tab Verification

  1. Open browser DevTools → Network tab
  2. Navigate to dashboard and open AI Assistant
  3. Send a test query
  4. Verify Request URL:
    • Correct: https://payroll-backend-prod-evndxpcirq-uc.a.run.app/api/v1/ai/query-public
    • Wrong: https://payroll-backend-prod-.../ai/query-public (missing /api/v1)
    • Wrong: https://fastapi-backend-... (wrong service)

2. Console Verification

  1. Open browser DevTools → Console tab
  2. Look for [aiClient] logs:
    [aiClient] AI_BASE_URL=https://payroll-backend-prod-evndxpcirq-uc.a.run.app
    [aiClient] POST https://payroll-backend-prod-.../api/v1/ai/query-public
    

3. Endpoint Verification

  • Requests should hit canonical path: /api/v1/ai/query-public
  • Legacy /ai/query-public should be considered temporary compat only
  • If legacy path still used, frontend bundle is stale (clear cache and redeploy)

Troubleshooting

If Frontend Still Calls Wrong Service

  1. Check Environment Variable:
    • Vercel Dashboard → Settings → Environment Variables
    • Verify NEXT_PUBLIC_AI_URL is correct
    • Must be origin-only (no /api/* paths)
  2. Clear Build Cache:
    • Redeploy with “Clear Build Cache” unchecked
    • Or delete .next directory locally and rebuild
  3. Check Browser Cache:
    • Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
    • Or clear browser cache

If Frontend Calls Legacy Path (/ai/query-public)

  1. Clear Build Cache (see Deploy Procedure #1)
  2. Verify Code:
    • dashboard/src/lib/aiClient.ts should use buildAiQueryPublicUrl() which constructs /api/v1/ai/query-public
  3. Check Build Logs:
    • Vercel Dashboard → Deployments → Build Logs
    • Look for any errors or warnings

Known Wrong Deploy Paths

⚠️ DO NOT USE THESE (they deploy to wrong service):
  • GitHub Actions / Cloud Build config deploying to payroll-pipeline-cbs
  • GitHub Actions / Cloud Build config deploying to fastapi-backend
  • Any deploy path that doesn’t target payroll-backend-prod
Note: payroll-pipeline-cbs may appear in older docs as a legacy/compat name; use payroll-pipeline-cbs-api for current Config Plane references. Always verify: Check which Cloud Run service NEXT_PUBLIC_AI_URL points to before deploying.

Legacy Alias Removal Plan

After confirming all frontend deployments use /api/v1/ai/query-public:
  1. Monitor Usage (7-14 days):
    gcloud logging read \
      "resource.type=cloud_run_revision AND \
       resource.labels.service_name=payroll-backend-prod AND \
       httpRequest.requestUrl=~'/ai/query-public'" \
      --project payroll-bi-gauntlet \
      --limit 100
    
  2. If Usage Drops to Near-Zero:
    • Remove legacy mount from api/main.py:
      # Remove this line:
      app.include_router(ai_router, prefix="/ai", tags=["AI-legacy"])
      
    • Deploy backend
    • Verify /ai/query-public returns 404 (expected)
  • docs/guides/AI_CHATBOT_INCIDENT_2025_12_24.md - Complete incident summary
  • docs/reference/AI_CHATBOT_SECURITY.md - Security features and API contract
  • docs/archive/phases/DEPLOYMENT_VERIFICATION.md - Post-deployment verification steps
  • docs/runbooks/DASHBOARD_SMOKE_TEST.md - Smoke test checklist