Skip to main content

P0 Discovery Implementation — Deliverable

Date: 2026-01-28
Status: ✅ COMPLETE

Summary

Problem: New businesses from uploads appear in transaction_events_raw but do NOT appear in config_business_onboarding, making them invisible to readiness and businesses list endpoints. Solution: Implemented discover_businesses_from_upload() that queries transaction_events_raw, normalizes business labels, and upserts into config_business_onboarding.

1) Route File Paths + Handler Names

Readiness Endpoint

  • File: api/routes/business_onboarding.py
  • Handler: get_business_readiness_endpoint() (line 911)
  • Query Function: get_business_readiness() (line 932)
  • Query File: api/bigquery/business_onboarding_queries.py (line 44)

Businesses List Endpoint

  • File: api/routes/business_onboarding.py
  • Handler: list_businesses_endpoint() (line 77)
  • Query Function: list_businesses_for_onboarding() (line 96)
  • Query File: api/bigquery/business_onboarding_queries.py (line 285)

2) Query Function Names + Table Names

get_business_readiness()

Source Tables:
  • config_business_onboarding (line 139) — PRIMARY SOURCE
  • config_business_policy (line 142) — JOIN for OWNER_ROLLUP policies
  • config_business_agent_assignment (line 140) — JOIN for agent assignments
  • config_business_agent_pepm_assignment (line 141) — JOIN for PEPM rows
Does NOT join staged upload rows: ❌ NO

list_businesses_for_onboarding()

Source Tables:
  • config_business_onboarding (line 424) — PRIMARY SOURCE
  • dim_business_mapping (line 423) — JOIN for business identity
  • config_business_agent_assignment (line 425) — JOIN for assignments
  • config_business_agent_pepm_assignment (line 428) — JOIN for PEPM
Does NOT join staged upload rows: ❌ NO

3) Discovery Confirmation

Answer: ❌ NO — Discovery was NOT automatically invoked before P0 fix. Evidence:
  1. refresh_dim_business_mapping_from_stage1() only upserts to dim_business_mapping, not config_business_onboarding
  2. Intake processor (process_batch) wrote to transaction_events_raw but did NOT call any discovery function
  3. Readiness endpoint queries ONLY config_business_onboarding (does not join with transaction_events_raw)

4) P0 Implementation

New Function: discover_businesses_from_upload()

Location: api/bigquery/business_onboarding_queries.py (line 2935) Functionality:
  1. Queries transaction_events_raw filtered by batch_id (or period_label)
  2. Extracts DISTINCT business_name_raw values
  3. Normalizes using SQL pattern matching Python normalize_business_name()
  4. Generates business_id using MD5 hash (first 16 hex chars)
  5. Upserts into config_business_onboarding with:
    • first_seen_date = MIN(period_label) from batch
    • last_seen_date = MAX(period_label) from batch
    • scope = NULL (default, can be set later via adoption)
    • owning_org_id = NULL (default, can be set later via adoption)
    • ignored = FALSE
Invocation Points:
  • Automatic: After process_batch completes successfully (in api/routes/intake_processor.py, line 702)
  • Manual: Debug endpoint GET /api/v1/admin/onboarding/debug/discover/{batch_id} (line 910 in business_onboarding.py)

Debug Endpoint

Route: GET /api/v1/admin/onboarding/debug/discover/{batch_id}
Handler: debug_discover_businesses_endpoint() (line 910)
Returns: Raw labels + normalized labels + business_ids (before upsert)
Response Format:
{
  "success": true,
  "tenant_id": "...",
  "batch_id": "...",
  "period_label": "...",
  "discovered_count": 5,
  "upserted_count": 5,
  "businesses": [
    {
      "raw_name": "Acme Corp",
      "normalized_name": "ACME CORP",
      "business_id": "a1b2c3d4e5f6g7h8",
      "first_seen_date": "2026-01-01",
      "last_seen_date": "2026-01-01"
    }
  ]
}

Files Modified

  1. api/bigquery/business_onboarding_queries.py
    • Added discover_businesses_from_upload() function (line 2935)
  2. api/routes/business_onboarding.py
    • Added import for discover_businesses_from_upload (line 60)
    • Added debug endpoint GET /debug/discover/{batch_id} (line 910)
  3. api/routes/intake_processor.py
    • Added import for discover_businesses_from_upload (line 28)
    • Integrated discovery call after batch processing (line 702)
  4. docs/READINESS_DATA_SOURCE_TRACE.md (new)
    • Complete trace of data sources and discovery gap
  5. docs/P0_DISCOVERY_IMPLEMENTATION.md (this file)
    • Implementation summary and deliverable

Testing

Manual Test Steps

  1. Upload a CSV file via POST /api/v1/intake/upload
  2. Map columns via POST /api/v1/intake/{batch_id}/map
  3. Process batch via POST /api/v1/intake/{batch_id}/process
  4. Verify discovery:
    • Check logs for: "[INTAKE] Discovered X businesses, upserted Y to config_business_onboarding"
    • Query GET /api/v1/admin/onboarding/businesses?period_label=YYYY-MM-01 — new businesses should appear
    • Query GET /api/v1/admin/onboarding/businesses/readiness?period_label=YYYY-MM-01 — new businesses should appear
  5. Debug endpoint test:
    • GET /api/v1/admin/onboarding/debug/discover/{batch_id} — returns raw+normalized labels

Commit Message

feat(P0): implement automatic business discovery from upload batches

- Add discover_businesses_from_upload() to query transaction_events_raw
- Normalize business labels and upsert into config_business_onboarding
- Integrate discovery into intake processor after batch processing
- Add debug endpoint for discovery verification
- Fixes: new businesses from uploads now appear in readiness/businesses endpoints

Files:
- api/bigquery/business_onboarding_queries.py (discover_businesses_from_upload)
- api/routes/business_onboarding.py (debug endpoint)
- api/routes/intake_processor.py (auto-discovery integration)
- docs/READINESS_DATA_SOURCE_TRACE.md (trace document)
- docs/P0_DISCOVERY_IMPLEMENTATION.md (implementation summary)

Next Steps

  1. Deploy backend with P0 discovery changes
  2. Test in production:
    • Upload a test CSV with new businesses
    • Verify businesses appear in onboarding list
    • Verify readiness endpoint includes new businesses
  3. Monitor logs for discovery success/failure rates
  4. Consider: Add discovery retry logic if batch processing succeeds but discovery fails