Deployment Runbook
Historical Context & Merged Sources
The following documents were consolidated into this file for better maintainability:CLOUD_RUN_DEPLOYMENT_GUIDE.mdDEPLOYMENT_GUIDE.md
🚀 Cloud Run Deployment Guide (Merged)
Production-Ready FastAPI Backend Deployment
This section covers deploying your FastAPI backend to Google Cloud Run with secure JWT authentication, BigQuery integration, and structured logging.Quick Deployment
Bash (Linux/Mac):🚨 Critical Configuration Requirements (Merged)
Frontend (Vercel) Configuration
Environment Variable:NEXT_PUBLIC_API_URL
- Correct Value:
https://payroll-pipeline-cbs-238826317621.us-central1.run.app
AI Chatbot Deployment Runbook (Original)
Last Updated: December 24, 2025Service:
payroll-backend-prod (Cloud Run)Frontend: Vercel (
payroll-pipeline-cbs.vercel.app)
Critical Discovery: Two-Production-Service Architecture
⚠️ IMPORTANT: We have TWO production Cloud Run services by design. We are NOT collapsing them. Service Responsibilities:- Config Plane (
payroll-pipeline-cbs-api): Rate Plans, versions, rule authoring, admin configuration UIs - System of Record (
payroll-backend-prod): Agent identities, assignments, execution, audit-grade logic, AI chatbot
/api/v1/admin/rate-plans/*→payroll-pipeline-cbs-api(Config Plane)/api/v1/admin/agent-profiles/create→payroll-backend-prod(System of Record)/api/v1/ai/*→payroll-backend-prod(System of Record)
Backend Deployment (Cloud Run)
Prerequisites
-
Verify Current Service Configuration:
-
Confirm Dashboard Points to This Service:
- Check
NEXT_PUBLIC_AI_URLin Vercel environment variables - Should be:
https://payroll-backend-prod-evndxpcirq-uc.a.run.app - If different, update Vercel env vars first
- Check
-
Verify Local Repo State:
Pre-Deploy Verification
-
Check Current Revision:
-
Check Current Traffic:
-
Verify Router Mounts in Code:
- Confirm
api/main.pyincludes both mounts:/api/v1/ai(canonical)/ai(legacy alias)
- See
api/main.pylines 274-280
- Confirm
Deploy Command
⚠️ CRITICAL: Deploy from repo root (notapi/ directory), and mirror existing service settings.
--allow-unauthenticated: Keep consistent with current service config--service-account: Keep consistent (don’t change unless you know it matches)--set-env-vars: Set required environment variables
Post-Deploy Verification
1. Endpoint Parity (No 404)
Both endpoints must return not-404:2. Router Mount Logs
Check Cloud Run logs for router mount confirmation:3. Auth Gating
Test 3a: No Token → 4014. RBAC Matrix (Requires Valid JWT Tokens)
Test 4a: Agent → Self → 2005. Firestore Readiness Verification (Phase 8S+)
Applies when:api/** changes OR any Phase 8S+ ledger work is deployed.
Firestore is used as the exactly-once coordination layer (mutex). BigQuery remains the immutable ledger.Both backend services must have Firestore access verified post-deploy. Step 1 — Confirm Both Services Deployed Because
api/** changed, both workflows must complete:
deploy_cloudrun.yml→payroll-backend-proddeploy_pipeline_api.yml→payroll-pipeline-cbs-api
/healthz on each URL (unauthenticated) and confirm git_commit_sha matches expected commit. Expected response:
payroll-backend-prodpayroll-pipeline-cbs-api
{"ok": true}.
Step 4 — If Preflight Fails
Verify:
-
Firestore API enabled:
- Firestore DB exists (Native mode, us-central1)
-
Runtime service account has IAM role:
roles/datastore.user -
Run
make firestore-prereqsto print the gcloud commands, then execute them manually.
Troubleshooting
If Deployment Fails
-
Check Service Account Permissions:
-
Verify Source Directory:
- Must run from repo root (contains
api/main.py) - Not from
api/directory
- Must run from repo root (contains
-
Check Build Logs:
If Endpoints Return 404
- Check Router Mount Logs (see Post-Deploy Verification #2)
- Verify Code Includes Dual Mounts:
- Check Import Errors:
Frontend Deployment (Vercel)
Prerequisites
-
Verify Environment Variable:
NEXT_PUBLIC_AI_URLmust be set in Vercel dashboard- Format: Origin-only (e.g.,
https://payroll-backend-prod-evndxpcirq-uc.a.run.app) - DO NOT include
/api/v1or any path components
-
Check Current Value:
- Vercel Dashboard → Project Settings → Environment Variables
- Verify
NEXT_PUBLIC_AI_URLpoints topayroll-backend-prodservice
Deploy Procedure
-
Clear Build Cache (Critical for stale bundle issues):
- Vercel Dashboard → Deployments → Click “Redeploy” → Uncheck “Use existing Build Cache”
- Or use Vercel CLI:
-
Trigger Redeploy:
-
Monitor Deployment:
- Vercel Dashboard → Deployments
- Wait for build to complete
- Check build logs for any errors
Post-Deploy Verification
1. Network Tab Verification
- Open browser DevTools → Network tab
- Navigate to dashboard and open AI Assistant
- Send a test query
- Verify Request URL:
- ✅ Correct:
https://payroll-backend-prod-evndxpcirq-uc.a.run.app/api/v1/ai/query-public - ❌ Wrong:
https://payroll-backend-prod-.../ai/query-public(missing/api/v1) - ❌ Wrong:
https://fastapi-backend-...(wrong service)
- ✅ Correct:
2. Console Verification
- Open browser DevTools → Console tab
- Look for
[aiClient]logs:
3. Endpoint Verification
- Requests should hit canonical path:
/api/v1/ai/query-public - Legacy
/ai/query-publicshould be considered temporary compat only - If legacy path still used, frontend bundle is stale (clear cache and redeploy)
Troubleshooting
If Frontend Still Calls Wrong Service
-
Check Environment Variable:
- Vercel Dashboard → Settings → Environment Variables
- Verify
NEXT_PUBLIC_AI_URLis correct - Must be origin-only (no
/api/*paths)
-
Clear Build Cache:
- Redeploy with “Clear Build Cache” unchecked
- Or delete
.nextdirectory locally and rebuild
-
Check Browser Cache:
- Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)
- Or clear browser cache
If Frontend Calls Legacy Path (/ai/query-public)
- Clear Build Cache (see Deploy Procedure #1)
- Verify Code:
dashboard/src/lib/aiClient.tsshould usebuildAiQueryPublicUrl()which constructs/api/v1/ai/query-public
- Check Build Logs:
- Vercel Dashboard → Deployments → Build Logs
- Look for any errors or warnings
Known Wrong Deploy Paths
⚠️ DO NOT USE THESE (they deploy to wrong service):- GitHub Actions / Cloud Build config deploying to
payroll-pipeline-cbs - GitHub Actions / Cloud Build config deploying to
fastapi-backend - Any deploy path that doesn’t target
payroll-backend-prod
payroll-pipeline-cbs may appear in older docs as a legacy/compat name; use payroll-pipeline-cbs-api for current Config Plane references.
Always verify: Check which Cloud Run service NEXT_PUBLIC_AI_URL points to before deploying.
Legacy Alias Removal Plan
After confirming all frontend deployments use/api/v1/ai/query-public:
-
Monitor Usage (7-14 days):
-
If Usage Drops to Near-Zero:
- Remove legacy mount from
api/main.py: - Deploy backend
- Verify
/ai/query-publicreturns 404 (expected)
- Remove legacy mount from
Related Documentation
docs/guides/AI_CHATBOT_INCIDENT_2025_12_24.md- Complete incident summarydocs/reference/AI_CHATBOT_SECURITY.md- Security features and API contractdocs/archive/phases/DEPLOYMENT_VERIFICATION.md- Post-deployment verification stepsdocs/runbooks/DASHBOARD_SMOKE_TEST.md- Smoke test checklist