Root Cause Analysis - Readiness Not Seeing Saved Config
Summary (3–6 bullets)
- Investigates why readiness stayed
0/113 readyafter a successful save call. - Identifies backend service mismatch/split-brain as the leading hypothesis.
- Correlates network timing, endpoint paths, and repeated readiness outcomes.
- Enumerates alternate hypotheses (org scope, effective date, write/query lag).
- Provides targeted next-step checks across headers, logs, and direct BigQuery queries.
When to use this (3–6 bullets)
- When save operations succeed but readiness does not reflect configuration.
- During cross-service or environment routing investigations.
- When validating org scope, period scope, and data visibility assumptions.
- As a template for structured production forensic triage.
What you’ll walk away with (2–5 bullets)
- A ranked set of root-cause hypotheses with supporting evidence.
- A practical investigation sequence to confirm backend/data-path issues.
- A concise action plan to move from symptom to verified fix.
Business ID:
a7ac092572c6fa5cPeriod Label:
2025-12-01
🔍 Critical Finding
Backend Service Mismatch Detected: Console shows requests going to:payroll-pipeline-cbs-api-evndxpcirq-uc.a.run.app
payroll-backend-prod-238826317621.us-central1.run.app
📊 Network Analysis
Save Request
- URL:
PUT https://payroll-pipeline-cbs-api-evndxpcirq-uc.a.run.app/api/v1/admin/onboarding/businesses/a7ac092572c6fa5c/save - Status: 200 OK
- Delay: 14 seconds (unusually long)
Readiness Requests (3 attempts)
- URL:
GET https://payroll-pipeline-cbs-api-evndxpcirq-uc.a.run.app/api/v1/intake/ingestion-wizard/readiness?batch_id=0489e906-d84a-429a-a9b1-384505e99b05&period_label=2025-12-01 - Status: All 200 OK
- Result:
0/113 ready(all 3 attempts)
🎯 Root Cause Hypothesis
Hypothesis 1: Backend Service Split-Brain (MOST LIKELY)
Issue:- Save writes to
payroll-pipeline-cbs-apiservice - Readiness queries
payroll-pipeline-cbs-apiservice - But they might be using different BigQuery datasets or different backend instances
- Both requests go to same service URL (good)
- But readiness still shows 0/113 ready after save succeeds
- 14-second delay on save suggests backend might be slow or using different data source
Hypothesis 2: Org ID Scope Mismatch
Issue:- Save writes with
org_id=NULL(platform-scope) - Readiness queries with
org_id=cbs-main(from x-org-id header) - Readiness filter:
(aa.org_id = @org_id OR aa.org_id IS NULL)should still match - But if save writes to wrong org scope, readiness won’t see it
- Verify save request includes
x-org-id: cbs-mainheader - Verify save writes with correct org_id scope
Hypothesis 3: Effective Date Mismatch
Issue:- Save writes with
effective_start_date: 2025-12-01 - Readiness queries with
as_of_date: 2025-12-01(from period_label) - Readiness filter:
effective_start_date <= @as_of_date 2025-12-01 <= 2025-12-01should be TRUE
Hypothesis 4: BigQuery Write → Query Lag
Issue:- Save writes to BigQuery
- Readiness queries BigQuery immediately
- BigQuery eventual consistency longer than 11 seconds
- 3 retries over 11 seconds
- Still shows 0/113 ready
- Suggests write not queryable even after delay
🔬 Investigation Steps
Step 1: Verify Save Request Details
Check PUT request:- Headers:
x-org-id: cbs-main(should be present) - Body:
period_label: 2025-12-01 - Body:
effective_start_date: 2025-12-01 - Body:
mode: AGENT_PEPM - Body:
agent_pepm_lines: [...]
Step 2: Verify Readiness Query Details
Check GET request:- Query params:
period_label=2025-12-01✅ (confirmed) - Query params:
batch_id=0489e906-d84a-429a-a9b1-384505e99b05✅ (confirmed) - Headers:
x-org-id: cbs-main(need to verify)
Step 3: Check Backend Logs
Verify:- Save write succeeded to BigQuery
- Readiness query includes business
a7ac092572c6fa5c - Readiness query uses correct
as_of_dateandorg_idfilters
Step 4: Direct BigQuery Query
Run query:🎯 Most Likely Root Cause
Backend Readiness Query Not Seeing Saved Config Due To:-
Org ID Scope Mismatch:
- Save writes with
org_id=NULL(platform-scope) - Readiness queries with
org_id=cbs-mainfilter - Filter should match (
OR org_id IS NULL), but might not be working
- Save writes with
-
OR Backend Write → Query Lag:
- BigQuery write takes >11 seconds to be queryable
- Unlikely but possible
-
OR Different Backend Instances:
- Save and readiness hit different backend instances
- Different BigQuery connections or datasets
✅ Next Actions
- Check save request headers/body (verify org_id and period_label)
- Check readiness request headers (verify x-org-id matches)
- Check backend logs for save write and readiness query
- Run direct BigQuery query to verify config was written
- Verify backend service (is
payroll-pipeline-cbs-apithe correct service?)
Status: ⚠️ RETRY LOGIC WORKS, BUT BACKEND READINESS NOT SEEING SAVED CONFIG - NEED BACKEND INVESTIGATION