Deployment Runbook

Historical Context & Merged Sources

The following documents were consolidated into this file for better maintainability:

CLOUD_RUN_DEPLOYMENT_GUIDE.md
DEPLOYMENT_GUIDE.md

🚀 Cloud Run Deployment Guide (Merged)

Production-Ready FastAPI Backend Deployment

This section covers deploying your FastAPI backend to Google Cloud Run with secure JWT authentication, BigQuery integration, and structured logging.

Quick Deployment

Bash (Linux/Mac):

# Set your project ID
export GCP_PROJECT_ID="your-project-id"

# Run the deployment script
./deploy-to-cloud-run.sh

Manual Deployment:

# 1. Enable APIs
gcloud services enable run.googleapis.com
gcloud services enable secretmanager.googleapis.com
gcloud services enable bigquery.googleapis.com

# 2. Deploy to Cloud Run
gcloud run deploy fastapi-backend \
  --source ./api \
  --region us-central1 \
  --allow-unauthenticated \
  --service-account fastapi-service-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \
  --set-secrets JWT_SECRET_KEY=jwt-secret:latest \
  --set-env-vars GCP_PROJECT_ID=YOUR_PROJECT_ID,CORS_ORIGINS=https://your-frontend.vercel.app

🚨 Critical Configuration Requirements (Merged)

Frontend (Vercel) Configuration

Environment Variable: NEXT_PUBLIC_API_URL

Correct Value: https://payroll-pipeline-cbs-238826317621.us-central1.run.app

AI Chatbot Deployment Runbook (Original)

Last Updated: December 24, 2025
Service: payroll-backend-prod (Cloud Run)
Frontend: Vercel (payroll-pipeline-cbs.vercel.app)

Critical Discovery: Two-Production-Service Architecture

⚠️ IMPORTANT: We have TWO production Cloud Run services by design. We are NOT collapsing them. Service Responsibilities:

Config Plane (payroll-pipeline-cbs-api): Rate Plans, versions, rule authoring, admin configuration UIs
System of Record (payroll-backend-prod): Agent identities, assignments, execution, audit-grade logic, AI chatbot

Routing:

/api/v1/admin/rate-plans/* → payroll-pipeline-cbs-api (Config Plane)
/api/v1/admin/agent-profiles/create → payroll-backend-prod (System of Record)
/api/v1/ai/* → payroll-backend-prod (System of Record)

Always verify which Cloud Run service the dashboard points to before deploying.

Backend Deployment (Cloud Run)

Prerequisites

Verify Current Service Configuration:

gcloud run services describe payroll-backend-prod \
  --region us-central1 \
  --project payroll-bi-gauntlet

Confirm Dashboard Points to This Service:
- Check NEXT_PUBLIC_AI_URL in Vercel environment variables
- Should be: https://payroll-backend-prod-evndxpcirq-uc.a.run.app
- If different, update Vercel env vars first

Verify Local Repo State:

# Confirm you're on the intended commit
git rev-parse --abbrev-ref HEAD
git rev-parse HEAD
git show -s --format=fuller HEAD

# Verify dual router mounts exist in code
grep -n "include_router.*ai_router" api/main.py
grep -n 'prefix="/api/v1/ai"' api/main.py
grep -n 'prefix="/ai"' api/main.py

Pre-Deploy Verification

Check Current Revision:

gcloud run services describe payroll-backend-prod \
  --region us-central1 \
  --project payroll-bi-gauntlet \
  --format="value(status.latestReadyRevisionName)"

Check Current Traffic:

gcloud run services describe payroll-backend-prod \
  --region us-central1 \
  --project payroll-bi-gauntlet \
  --format="value(status.traffic)"

Verify Router Mounts in Code:
- Confirm api/main.py includes both mounts:
  - /api/v1/ai (canonical)
  - /ai (legacy alias)
- See api/main.py lines 274-280

Deploy Command

⚠️ CRITICAL: Deploy from repo root (not api/ directory), and mirror existing service settings.

# From repo root (c:\Projects\payroll-pipeline-cbs)
gcloud run deploy payroll-backend-prod \
  --source . \
  --region us-central1 \
  --project payroll-bi-gauntlet \
  --allow-unauthenticated \
  --set-env-vars "GCP_PROJECT_ID=payroll-bi-gauntlet,STAGE=prod" \
  --service-account 238826317621-compute@developer.gserviceaccount.com

Alternative: Use deployment script (if available):

./DEPLOY.sh prod

Important Flags (mirror existing service):

--allow-unauthenticated: Keep consistent with current service config
--service-account: Keep consistent (don’t change unless you know it matches)
--set-env-vars: Set required environment variables

Post-Deploy Verification

1. Endpoint Parity (No 404)

Both endpoints must return not-404:

AI_ORIGIN="https://payroll-backend-prod-evndxpcirq-uc.a.run.app"

# Test canonical endpoint
curl -i -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Content-Type: application/json" \
  -d '{"question":"ping"}'
# Expected: 401 (not 404) - endpoint exists, auth required

# Test legacy alias
curl -i -X POST "$AI_ORIGIN/ai/query-public" \
  -H "Content-Type: application/json" \
  -d '{"question":"ping"}'
# Expected: 401 (not 404) - legacy alias works

2. Router Mount Logs

Check Cloud Run logs for router mount confirmation:

gcloud logging read \
  "resource.type=cloud_run_revision AND \
   resource.labels.service_name=payroll-backend-prod AND \
   textPayload=~'ROUTER.*mounted'" \
  --project payroll-bi-gauntlet \
  --limit 10 \
  --format="value(textPayload)" \
  --order=asc

Expected Log Entries:

[ROUTER] AI query router mounted at /api/v1/ai (canonical)
[ROUTER] AI query router mounted at /ai (legacy alias)

3. Auth Gating

Test 3a: No Token → 401

curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Content-Type: application/json" \
  -d '{"question":"test"}'
# Expected: 401 Unauthorized

Test 3b: Invalid Token → 401

curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer invalid_token_12345" \
  -H "Content-Type: application/json" \
  -d '{"question":"test"}'
# Expected: 401 Unauthorized (not demo fallback)

4. RBAC Matrix (Requires Valid JWT Tokens)

Test 4a: Agent → Self → 200

# As Kenny Young (agent)
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer <kenny_token>" \
  -H "Content-Type: application/json" \
  -d '{"question":"How much did I make this month?"}'
# Expected: 200 OK with commission data

Test 4b: Agent → Non-Downline → 403

# As Kenny Young, ask about Tommy Dang (not in downline)
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer <kenny_token>" \
  -H "Content-Type: application/json" \
  -d '{"question":"How much did Tommy Dang make?"}'
# Expected: 403 Forbidden

Test 4c: Admin → Any Agent → 200

# As admin
curl -X POST "$AI_ORIGIN/api/v1/ai/query-public" \
  -H "Authorization: Bearer <admin_token>" \
  -H "Content-Type: application/json" \
  -d '{"question":"How much did Tommy Dang make?"}'
# Expected: 200 OK (admin can access any agent)

5. Firestore Readiness Verification (Phase 8S+)

Applies when: api/** changes OR any Phase 8S+ ledger work is deployed. Firestore is used as the exactly-once coordination layer (mutex). BigQuery remains the immutable ledger.
Both backend services must have Firestore access verified post-deploy. Step 1 — Confirm Both Services Deployed Because api/** changed, both workflows must complete:

deploy_cloudrun.yml → payroll-backend-prod
deploy_pipeline_api.yml → payroll-pipeline-cbs-api

Confirm both show green in GitHub Actions. Step 2 — Confirm Correct Revision Is Serving For each service:

gcloud run services describe payroll-backend-prod \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --format="value(status.url)"

gcloud run services describe payroll-pipeline-cbs-api \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --format="value(status.url)"

Call /healthz on each URL (unauthenticated) and confirm git_commit_sha matches expected commit. Expected response:

{"ok": true, "git_commit_sha": "<sha or unknown>", "service": "<name>"}

Step 3 — Firestore Preflight Check Using an admin JWT:

curl -s -H "Authorization: Bearer <ADMIN_JWT>" \
  https://<SERVICE_URL>/api/v1/admin/firestore-preflight

Expected:

{"ok": true}

Run against:

payroll-backend-prod
payroll-pipeline-cbs-api

Both must return 200 + {"ok": true}. Step 4 — If Preflight Fails Verify:

Firestore API enabled:

gcloud services enable firestore.googleapis.com --project=payroll-bi-gauntlet

Firestore DB exists (Native mode, us-central1)
Runtime service account has IAM role: roles/datastore.user
Run make firestore-prereqs to print the gcloud commands, then execute them manually.

Preflight must pass before enabling ledger or /process changes in production. Note: Frontend (Vercel) deployment is NOT required unless API contracts changed. Backend deploy verification must now include Firestore readiness for all Phase 8S+ work.

Troubleshooting

If Deployment Fails

Check Service Account Permissions:

gcloud projects get-iam-policy payroll-bi-gauntlet \
  --flatten="bindings[].members" \
  --filter="bindings.members:238826317621-compute@developer.gserviceaccount.com"

Verify Source Directory:
- Must run from repo root (contains api/main.py)
- Not from api/ directory

Check Build Logs:

gcloud builds list \
  --project payroll-bi-gauntlet \
  --limit 5

If Endpoints Return 404

Check Router Mount Logs (see Post-Deploy Verification #2)

Verify Code Includes Dual Mounts:

grep -A 5 "include_router.*ai_router" api/main.py

Check Import Errors:

gcloud logging read \
  "resource.type=cloud_run_revision AND \
   resource.labels.service_name=payroll-backend-prod AND \
   (textPayload=~'CRITICAL.*router' OR textPayload=~'ImportError')" \
  --project payroll-bi-gauntlet \
  --limit 10

Frontend Deployment (Vercel)

Prerequisites

Verify Environment Variable:
- NEXT_PUBLIC_AI_URL must be set in Vercel dashboard
- Format: Origin-only (e.g., https://payroll-backend-prod-evndxpcirq-uc.a.run.app)
- DO NOT include /api/v1 or any path components
Check Current Value:
- Vercel Dashboard → Project Settings → Environment Variables
- Verify NEXT_PUBLIC_AI_URL points to payroll-backend-prod service

Deploy Procedure

Clear Build Cache (Critical for stale bundle issues):
- Vercel Dashboard → Deployments → Click “Redeploy” → Uncheck “Use existing Build Cache”
- Or use Vercel CLI:
  vercel --prod --force

Trigger Redeploy:

# From repo root
git add .
git commit -m "Trigger Vercel rebuild for AI chatbot routing"
git push origin main

Monitor Deployment:
- Vercel Dashboard → Deployments
- Wait for build to complete
- Check build logs for any errors

Post-Deploy Verification

1. Network Tab Verification

Open browser DevTools → Network tab
Navigate to dashboard and open AI Assistant
Send a test query
Verify Request URL:
- ✅ Correct: https://payroll-backend-prod-evndxpcirq-uc.a.run.app/api/v1/ai/query-public
- ❌ Wrong: https://payroll-backend-prod-.../ai/query-public (missing /api/v1)
- ❌ Wrong: https://fastapi-backend-... (wrong service)

2. Console Verification

Open browser DevTools → Console tab

Look for [aiClient] logs:

[aiClient] AI_BASE_URL=https://payroll-backend-prod-evndxpcirq-uc.a.run.app
[aiClient] POST https://payroll-backend-prod-.../api/v1/ai/query-public

3. Endpoint Verification

Requests should hit canonical path: /api/v1/ai/query-public
Legacy /ai/query-public should be considered temporary compat only
If legacy path still used, frontend bundle is stale (clear cache and redeploy)

Troubleshooting

If Frontend Still Calls Wrong Service

Check Environment Variable:
- Vercel Dashboard → Settings → Environment Variables
- Verify NEXT_PUBLIC_AI_URL is correct
- Must be origin-only (no /api/* paths)
Clear Build Cache:
- Redeploy with “Clear Build Cache” unchecked
- Or delete .next directory locally and rebuild
Check Browser Cache:
- Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
- Or clear browser cache

If Frontend Calls Legacy Path (`/ai/query-public`)

Clear Build Cache (see Deploy Procedure #1)
Verify Code:
- dashboard/src/lib/aiClient.ts should use buildAiQueryPublicUrl() which constructs /api/v1/ai/query-public
Check Build Logs:
- Vercel Dashboard → Deployments → Build Logs
- Look for any errors or warnings

Known Wrong Deploy Paths

⚠️ DO NOT USE THESE (they deploy to wrong service):

GitHub Actions / Cloud Build config deploying to payroll-pipeline-cbs
GitHub Actions / Cloud Build config deploying to fastapi-backend
Any deploy path that doesn’t target payroll-backend-prod

Note: payroll-pipeline-cbs may appear in older docs as a legacy/compat name; use payroll-pipeline-cbs-api for current Config Plane references. Always verify: Check which Cloud Run service NEXT_PUBLIC_AI_URL points to before deploying.

Legacy Alias Removal Plan

After confirming all frontend deployments use /api/v1/ai/query-public:

Monitor Usage (7-14 days):

gcloud logging read \
  "resource.type=cloud_run_revision AND \
   resource.labels.service_name=payroll-backend-prod AND \
   httpRequest.requestUrl=~'/ai/query-public'" \
  --project payroll-bi-gauntlet \
  --limit 100

If Usage Drops to Near-Zero:
- Remove legacy mount from api/main.py:
  # Remove this line: app.include_router(ai_router, prefix="/ai", tags=["AI-legacy"])
- Deploy backend
- Verify /ai/query-public returns 404 (expected)

docs/guides/AI_CHATBOT_INCIDENT_2025_12_24.md - Complete incident summary
docs/reference/AI_CHATBOT_SECURITY.md - Security features and API contract
docs/archive/phases/DEPLOYMENT_VERIFICATION.md - Post-deployment verification steps
docs/runbooks/DASHBOARD_SMOKE_TEST.md - Smoke test checklist

Start Here

Operate & Support

Reference

Build & Integrate

DEPLOYMENT RUNBOOK

Deployment Runbook

Historical Context & Merged Sources

🚀 Cloud Run Deployment Guide (Merged)

Production-Ready FastAPI Backend Deployment

Quick Deployment

🚨 Critical Configuration Requirements (Merged)

Frontend (Vercel) Configuration

AI Chatbot Deployment Runbook (Original)

Critical Discovery: Two-Production-Service Architecture

Backend Deployment (Cloud Run)

Prerequisites

Pre-Deploy Verification

Deploy Command

Post-Deploy Verification

1. Endpoint Parity (No 404)

2. Router Mount Logs

3. Auth Gating

4. RBAC Matrix (Requires Valid JWT Tokens)

5. Firestore Readiness Verification (Phase 8S+)

Troubleshooting

If Deployment Fails

If Endpoints Return 404

Frontend Deployment (Vercel)

Prerequisites

Deploy Procedure

Post-Deploy Verification

1. Network Tab Verification

2. Console Verification

3. Endpoint Verification

Troubleshooting

If Frontend Still Calls Wrong Service

If Frontend Calls Legacy Path (`/ai/query-public`)

Known Wrong Deploy Paths

Legacy Alias Removal Plan

Start Here

Operate & Support

Reference

Build & Integrate

​Deployment Runbook

​Historical Context & Merged Sources

​🚀 Cloud Run Deployment Guide (Merged)

​Production-Ready FastAPI Backend Deployment

​Quick Deployment

​🚨 Critical Configuration Requirements (Merged)

​Frontend (Vercel) Configuration

​AI Chatbot Deployment Runbook (Original)

​Critical Discovery: Two-Production-Service Architecture

​Backend Deployment (Cloud Run)

​Prerequisites

​Pre-Deploy Verification

​Deploy Command

​Post-Deploy Verification

​1. Endpoint Parity (No 404)

​2. Router Mount Logs

​3. Auth Gating

​4. RBAC Matrix (Requires Valid JWT Tokens)

​5. Firestore Readiness Verification (Phase 8S+)

​Troubleshooting

​If Deployment Fails

​If Endpoints Return 404

​Frontend Deployment (Vercel)

​Prerequisites

​Deploy Procedure

​Post-Deploy Verification

​1. Network Tab Verification

​2. Console Verification

​3. Endpoint Verification

​Troubleshooting

​If Frontend Still Calls Wrong Service

​If Frontend Calls Legacy Path (/ai/query-public)

​Known Wrong Deploy Paths

​Legacy Alias Removal Plan

​Related Documentation

Deployment Runbook

Historical Context & Merged Sources

🚀 Cloud Run Deployment Guide (Merged)

Production-Ready FastAPI Backend Deployment

Quick Deployment

🚨 Critical Configuration Requirements (Merged)

Frontend (Vercel) Configuration

AI Chatbot Deployment Runbook (Original)

Critical Discovery: Two-Production-Service Architecture

Backend Deployment (Cloud Run)

Prerequisites

Pre-Deploy Verification

Deploy Command

Post-Deploy Verification

1. Endpoint Parity (No 404)

2. Router Mount Logs

3. Auth Gating

4. RBAC Matrix (Requires Valid JWT Tokens)

5. Firestore Readiness Verification (Phase 8S+)

Troubleshooting

If Deployment Fails

If Endpoints Return 404

Frontend Deployment (Vercel)

Prerequisites

Deploy Procedure

Post-Deploy Verification

1. Network Tab Verification

2. Console Verification

3. Endpoint Verification

Troubleshooting

If Frontend Still Calls Wrong Service

If Frontend Calls Legacy Path (`/ai/query-public`)

Known Wrong Deploy Paths

Legacy Alias Removal Plan

Related Documentation