Skip to main content

Readiness Data Source Trace — Audit Report

Date: 2026-01-28
Goal: Identify exactly where GET readiness?period_label=... and GET businesses?period_label=... pull from, and confirm whether discovery from upload is currently executed.

1) Route Handlers

GET /api/v1/admin/onboarding/businesses/readiness

  • File: api/routes/business_onboarding.py
  • Handler: get_business_readiness_endpoint() (line 911)
  • Query Function: get_business_readiness() (line 932)
  • Query File: api/bigquery/business_onboarding_queries.py (line 44)

GET /api/v1/admin/onboarding/businesses

  • File: api/routes/business_onboarding.py
  • Handler: list_businesses_endpoint() (line 77)
  • Query Function: list_businesses_for_onboarding() (line 96)
  • Query File: api/bigquery/business_onboarding_queries.py (line 285)

2) BigQuery Query Functions & Source Tables

get_business_readiness() (line 44)

Source Tables:
  • config_business_onboarding (line 139, 153) — PRIMARY SOURCE
  • config_business_policy (line 142) — JOIN for OWNER_ROLLUP policies
  • config_business_agent_assignment (line 140) — JOIN for agent assignments
  • config_business_agent_pepm_assignment (line 141) — JOIN for PEPM rows
Query Pattern:
WITH onboarding_businesses AS (
  SELECT DISTINCT co.business_id, co.scope, co.owning_org_id
  FROM config_business_onboarding co
  WHERE co.tenant_id = @tenant_id
    AND co.ignored = FALSE
)
-- LEFT JOINs to assignments, PEPM, policies
Key Finding: Queries ONLY config_business_onboarding as the base table. Does NOT join with staged upload rows or transaction_events_raw.

list_businesses_for_onboarding() (line 285)

Source Tables:
  • config_business_onboarding (line 424) — PRIMARY SOURCE
  • dim_business_mapping (line 423) — JOIN for business identity
  • config_business_agent_assignment (line 425) — JOIN for assignments
  • config_business_agent_pepm_assignment (line 428) — JOIN for PEPM
Query Pattern:
WITH business_groups AS (
  SELECT ...
  FROM dim_business_mapping bm
  FULL OUTER JOIN config_business_onboarding co
    ON bm.business_id = co.business_id
  WHERE ...
)
Key Finding: Uses FULL OUTER JOIN between dim_business_mapping and config_business_onboarding, but still requires businesses to exist in config_business_onboarding to appear in the list (preseeds have last_seen_date IS NULL and pass period filters).

3) Discovery Function Analysis

Existing Discovery Function

Function: refresh_dim_business_mapping_from_stage1() (line 2084)
Source Table: payroll_raw.stage1_snapshots
Target Table: dim_business_mapping (line 2116)
Does NOT upsert to: config_business_onboarding
Invocation Points:
  • Manual endpoint: POST /api/v1/admin/onboarding/businesses/refresh-dim (line 881 in routes)
  • NOT invoked automatically during:
    • Preflight start
    • Readiness refresh
    • Upload mapping completion

Intake Flow Discovery Gap

Upload Flow:
  1. POST /api/v1/intake/upload → Creates batch in ingestion_batches
  2. POST /api/v1/intake/{batch_id}/map → Saves column mapping
  3. POST /api/v1/intake/{batch_id}/process → Writes to transaction_events_raw (line 473 in intake_processor.py)
Missing Step: No automatic discovery that:
  • Queries transaction_events_raw for new businesses
  • Normalizes business labels
  • Upserts into config_business_onboarding

4) Confirmation: Discovery NOT Invoked

Answer: NO — Discovery is NOT automatically invoked. Evidence:
  1. refresh_dim_business_mapping_from_stage1() only upserts to dim_business_mapping, not config_business_onboarding
  2. Intake processor (process_batch) writes to transaction_events_raw but does NOT call any discovery function
  3. Readiness endpoint queries ONLY config_business_onboarding (does not join with transaction_events_raw or stage1_snapshots)
Result: New businesses from uploads appear in transaction_events_raw but do NOT appear in config_business_onboarding, so they are invisible to the readiness and businesses list endpoints.

5) P0 Implementation Required

Function Name: discover_businesses_from_upload()
Location: api/bigquery/business_onboarding_queries.py
Requirements:
  1. Query transaction_events_raw filtered by batch_id (or period_label)
  2. Extract DISTINCT business_name_raw values
  3. Normalize using normalize_business_name() (matches Python pattern)
  4. Generate business_id using generate_business_id() (MD5 hash)
  5. Upsert into config_business_onboarding with:
    • first_seen_date = MIN(period_label) from batch
    • last_seen_date = MAX(period_label) from batch
    • instance_count = COUNT(DISTINCT business_name_raw) per normalized group
    • source = ‘INGESTED’
    • scope = NULL (default, can be set later)
    • ignored = FALSE
Invocation Points:
  • After process_batch completes successfully (in intake_processor.py)
  • Optional: Manual endpoint for batch-level discovery
Debug Endpoint:
  • GET /api/v1/admin/onboarding/debug/discover/{batch_id}
  • Returns raw labels + normalized labels + business_ids (before upsert)

Summary

EndpointQuery FunctionPrimary Source TableJoins Staged Upload?
GET /readinessget_business_readiness()config_business_onboarding❌ NO
GET /businesseslist_businesses_for_onboarding()config_business_onboarding❌ NO
Discovery Status: ❌ NOT AUTOMATIC — Manual refresh-dim endpoint exists but only populates dim_business_mapping, not config_business_onboarding. P0 Fix Required: Implement discover_businesses_from_upload() that queries transaction_events_raw and upserts to config_business_onboarding.