Cloud Run Deployment Timeout Fix
Problem
GitHub Actions workflow “Deploy FastAPI Backend to Cloud Run” fails at “Wait for deployment to be ready” with:Deployment did not become ready within 600 seconds- Revision is created but never reaches Ready status
Root Cause Analysis
The workflow now includes auto-diagnostics that will identify the exact failure reason on timeout:- Revision Conditions - Shows why the revision isn’t Ready
- Container Config - Shows image, env vars, ports, concurrency settings
- Recent Logs - Last 200 lines of logs from the failing revision
Common Issues (from diagnostics)
- PORT binding - Container must listen on
0.0.0.0:$PORT(usually 8080) - Missing env vars - JWT_SECRET_KEY, GCP_PROJECT_ID, CORS_ORIGINS, etc.
- Import errors - ModuleNotFoundError in logs
- Missing files - FileNotFoundError (auth_users.yaml, etc.)
- Permission errors - BigQuery/secret access (sa-worker service account)
- Startup timeout - App takes > 600s to start (unlikely)
Changes Made
1. Enhanced Workflow Diagnostics (.github/workflows/deploy_cloudrun.yml)
Added comprehensive auto-diagnostics on deployment timeout:
- Gets latest created revision (may differ from latest ready)
- Shows revision conditions (why it’s not Ready)
- Shows container config (image, env vars, ports)
- Shows last 200 log lines
- Lists common issues to check
2. Diagnostic Script (scripts/diagnose_cloudrun_deployment.sh)
Standalone script to diagnose deployment failures:
3. Production Test Script (scripts/test_prod_save_endpoint.ps1)
PowerShell script to test the save endpoint after deployment:
Verification Steps
Step 1: Check Dockerfile Configuration
The Dockerfile is correctly configured:- ✅ Uses
--host 0.0.0.0 --port ${PORT:-8080} - ✅ Sets
ENV PORT=8080as default - ✅ Uses
execform for proper signal handling
Step 2: Trigger Deployment
The workflow auto-triggers on push tomain when api/** files change, or can be manually triggered.
Step 3: Check Diagnostics on Failure
If deployment times out, the workflow will automatically:- Show revision conditions
- Show container config
- Show recent logs (200 lines)
- List common issues
Step 4: Verify Deployment Success
After successful deployment:- Check revision is Ready
- Verify deployed SHA matches commit (>= 7eebf15)
- Test save endpoint:
Expected Outcomes
After fix:- ✅ Cloud Run revision becomes Ready
- ✅ Deployed SHA >= 7eebf15
- ✅ Two successful prod saves (one via script, one via UI)
- ✅ CI logs print revision failure reason automatically if it happens again
Next Steps
- Commit these changes to
main - Trigger deployment (auto or manual)
- If timeout occurs, check the auto-diagnostics output
- Apply fix based on diagnostic output
- Redeploy and verify save endpoint works
Files Changed
.github/workflows/deploy_cloudrun.yml- Enhanced diagnosticsscripts/diagnose_cloudrun_deployment.sh- New diagnostic scriptscripts/test_prod_save_endpoint.ps1- New test scriptdocs/CLOUDRUN_DEPLOY_TIMEOUT_FIX.md- This document