Readiness Data Source Trace — Audit Report
Date: 2026-01-28Goal: Identify exactly where
GET readiness?period_label=... and GET businesses?period_label=... pull from, and confirm whether discovery from upload is currently executed.
1) Route Handlers
GET /api/v1/admin/onboarding/businesses/readiness
- File:
api/routes/business_onboarding.py - Handler:
get_business_readiness_endpoint()(line 911) - Query Function:
get_business_readiness()(line 932) - Query File:
api/bigquery/business_onboarding_queries.py(line 44)
GET /api/v1/admin/onboarding/businesses
- File:
api/routes/business_onboarding.py - Handler:
list_businesses_endpoint()(line 77) - Query Function:
list_businesses_for_onboarding()(line 96) - Query File:
api/bigquery/business_onboarding_queries.py(line 285)
2) BigQuery Query Functions & Source Tables
get_business_readiness() (line 44)
Source Tables:
config_business_onboarding(line 139, 153) — PRIMARY SOURCEconfig_business_policy(line 142) — JOIN for OWNER_ROLLUP policiesconfig_business_agent_assignment(line 140) — JOIN for agent assignmentsconfig_business_agent_pepm_assignment(line 141) — JOIN for PEPM rows
config_business_onboarding as the base table. Does NOT join with staged upload rows or transaction_events_raw.
list_businesses_for_onboarding() (line 285)
Source Tables:
config_business_onboarding(line 424) — PRIMARY SOURCEdim_business_mapping(line 423) — JOIN for business identityconfig_business_agent_assignment(line 425) — JOIN for assignmentsconfig_business_agent_pepm_assignment(line 428) — JOIN for PEPM
FULL OUTER JOIN between dim_business_mapping and config_business_onboarding, but still requires businesses to exist in config_business_onboarding to appear in the list (preseeds have last_seen_date IS NULL and pass period filters).
3) Discovery Function Analysis
Existing Discovery Function
Function:refresh_dim_business_mapping_from_stage1() (line 2084)Source Table:
payroll_raw.stage1_snapshotsTarget Table:
dim_business_mapping (line 2116)Does NOT upsert to:
config_business_onboarding
Invocation Points:
- Manual endpoint:
POST /api/v1/admin/onboarding/businesses/refresh-dim(line 881 in routes) - NOT invoked automatically during:
- Preflight start
- Readiness refresh
- Upload mapping completion
Intake Flow Discovery Gap
Upload Flow:POST /api/v1/intake/upload→ Creates batch iningestion_batchesPOST /api/v1/intake/{batch_id}/map→ Saves column mappingPOST /api/v1/intake/{batch_id}/process→ Writes totransaction_events_raw(line 473 in intake_processor.py)
- Queries
transaction_events_rawfor new businesses - Normalizes business labels
- Upserts into
config_business_onboarding
4) Confirmation: Discovery NOT Invoked
Answer: NO — Discovery is NOT automatically invoked. Evidence:refresh_dim_business_mapping_from_stage1()only upserts todim_business_mapping, notconfig_business_onboarding- Intake processor (
process_batch) writes totransaction_events_rawbut does NOT call any discovery function - Readiness endpoint queries ONLY
config_business_onboarding(does not join withtransaction_events_raworstage1_snapshots)
transaction_events_raw but do NOT appear in config_business_onboarding, so they are invisible to the readiness and businesses list endpoints.
5) P0 Implementation Required
Function Name:discover_businesses_from_upload()Location:
api/bigquery/business_onboarding_queries.py
Requirements:
- Query
transaction_events_rawfiltered bybatch_id(orperiod_label) - Extract DISTINCT
business_name_rawvalues - Normalize using
normalize_business_name()(matches Python pattern) - Generate
business_idusinggenerate_business_id()(MD5 hash) - Upsert into
config_business_onboardingwith:first_seen_date= MIN(period_label) from batchlast_seen_date= MAX(period_label) from batchinstance_count= COUNT(DISTINCT business_name_raw) per normalized groupsource= ‘INGESTED’scope= NULL (default, can be set later)ignored= FALSE
- After
process_batchcompletes successfully (inintake_processor.py) - Optional: Manual endpoint for batch-level discovery
GET /api/v1/admin/onboarding/debug/discover/{batch_id}- Returns raw labels + normalized labels + business_ids (before upsert)
Summary
| Endpoint | Query Function | Primary Source Table | Joins Staged Upload? |
|---|---|---|---|
GET /readiness | get_business_readiness() | config_business_onboarding | ❌ NO |
GET /businesses | list_businesses_for_onboarding() | config_business_onboarding | ❌ NO |
refresh-dim endpoint exists but only populates dim_business_mapping, not config_business_onboarding.
P0 Fix Required: Implement discover_businesses_from_upload() that queries transaction_events_raw and upserts to config_business_onboarding.