Skip to main content

P0 Discovery Runtime Validation Steps

Purpose

Validate, in runtime, that the deployed P0 discovery fix executes correctly and remains idempotent.

When to run this

  • Immediately after deploying 65c90d3
  • During regressions affecting discovery/readiness behavior

Prerequisites

  • Deployment access and log access in payroll-bi-gauntlet
  • Admin JWT and intake UI access
  • BigQuery query access for duplicate checks

Inputs

  • Target commit SHA (65c90d3)
  • Cloud Run service (payroll-pipeline-cbs-api)
  • Uploaded batch_id
  • Tenant (creative_benefit_strategies)

Procedure

Deployment status

Commit SHA: 65c90d3
Full SHA: 65c90d3351bc3981491bf8a047828300683f3da2
Deployment Command Executed:
gcloud run deploy payroll-pipeline-cbs-api --source . --region=us-central1 ...
Status: Deployment initiated (may take 5-10 minutes to complete)

1) Verify deployment completed

# Check latest revision
gcloud run revisions list \
  --service=payroll-pipeline-cbs-api \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --limit=1 \
  --format="value(name,metadata.labels.'git-sha')"

# Verify GIT_COMMIT_SHA env var
gcloud run services describe payroll-pipeline-cbs-api \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --format="value(spec.template.spec.containers[0].env)"
Expected:
  • Latest revision shows git-sha=65c90d3
  • Environment variable GIT_COMMIT_SHA=65c90d3

2) Upload December payroll

  1. Navigate to intake UI
  2. Upload December payroll file
  3. Capture batch_id from response or logs
Example batch_id format: UUID string (e.g., a1b2c3d4-e5f6-7890-abcd-ef1234567890)

3) Call debug endpoint (dry-run)

Endpoint:
GET /api/v1/admin/onboarding/debug/discover/{batch_id}
Headers:
Authorization: Bearer {admin_jwt_token}
Expected Response:
{
  "success": true,
  "tenant_id": "creative_benefit_strategies",
  "batch_id": "{batch_id}",
  "discovered_count": 3,
  "dim_upserted_count": 0,
  "onboarding_upserted_count": 0,
  "invalid_period_count": 0,
  "dry_run": true,
  "businesses": [
    {
      "raw_name": "AIC",
      "normalized_name": "AIC",
      "business_id": "{16-char-hex}",
      "first_seen_date": "2026-12-01",
      "last_seen_date": "2026-12-01"
    },
    {
      "raw_name": "Motor Medics",
      "normalized_name": "MOTOR MEDICS",
      "business_id": "{16-char-hex}",
      "first_seen_date": "2026-12-01",
      "last_seen_date": "2026-12-01"
    },
    {
      "raw_name": "Auto Intensive Care of Savannah",
      "normalized_name": "AUTO INTENSIVE CARE OF SAVANNAH",
      "business_id": "{16-char-hex}",
      "first_seen_date": "2026-12-01",
      "last_seen_date": "2026-12-01"
    }
  ]
}
Validation Checklist:
  • discovered_count > 0
  • ✅ At least one of: AIC, Motor Medics, Auto Intensive Care of Savannah present
  • invalid_period_count present (should be 0)
  • dry_run: true present
  • dim_upserted_count: 0 (dry-run, no writes)
  • onboarding_upserted_count: 0 (dry-run, no writes)
Capture: JSON response snippet showing at least one discovered business

4) Verify discovery auto-runs on intake

Discovery should run automatically after intake processing completes (via api/routes/intake_processor.py line 707). Check logs:
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=payroll-pipeline-cbs-api AND textPayload=~'Discover'" \
  --limit=10 \
  --format=json \
  --project=payroll-bi-gauntlet
Expected: Log entries showing:
  • [ONBOARDING] Discovering businesses from upload: tenant=..., batch_id=...
  • [ONBOARDING] Discovered X distinct businesses from upload
  • [ONBOARDING] Discovery completed: tenant=..., discovered=X, dim_upserted=X, onboarding_upserted=X

5) Preflight/readiness check

  1. Navigate to Preflight / Readiness UI
  2. Refresh readiness check
  3. Expected: Newly discovered businesses appear under “Not ready for ingest”
Capture: Screenshot or log confirming at least one newly discovered business appears in “Not ready” list

6) Idempotency check (no duplicates)

Option A: Re-upload same December payroll
  • Upload same file again
  • Refresh Preflight
  • Expected: No duplicate entries
Option B: Re-trigger discovery for same batch_id
  • Call debug endpoint again with same batch_id
  • Expected: Same discovered_count, no new rows in BigQuery
BigQuery Verification:
-- Check for duplicates in config_business_onboarding
SELECT tenant_id, business_id, COUNT(*) as count
FROM `payroll-bi-gauntlet.payroll_analytics.config_business_onboarding`
WHERE tenant_id = 'creative_benefit_strategies'
GROUP BY tenant_id, business_id
HAVING COUNT(*) > 1;

-- Check for duplicates in dim_business_mapping
SELECT tenant_id, normalized_name, COUNT(*) as count
FROM `payroll-bi-gauntlet.payroll_analytics.dim_business_mapping`
WHERE tenant_id = 'creative_benefit_strategies'
GROUP BY tenant_id, normalized_name
HAVING COUNT(*) > 1;
Expected: 0 rows returned (no duplicates)

Verification

P0 Fully Closed if ALL are true:
  1. ✅ Debug endpoint returns discovered businesses (dry-run verified)
  2. ✅ Discovery auto-runs on intake completion
  3. ✅ Preflight/readiness shows missing businesses as “Not ready for ingest”
  4. ✅ No duplicates on re-run (idempotency verified)
  5. ✅ Tenant isolation preserved (no cross-tenant data)

Failure modes & fixes

  1. Revision mismatch
    • Re-deploy and verify revision labels/env values.
  2. Debug endpoint auth failure
    • Verify admin JWT scope and route authorization.
  3. No discovery logs
    • Inspect intake completion flow and Cloud Run logs.
  4. Duplicates found in BigQuery
    • Audit upsert keys and idempotent write logic.

Artifacts produced

  • Deployment verification output
  • Discovery debug payload
  • Log evidence for auto-discovery
  • Duplicate-check SQL results
  • docs/runbooks/P0_DISCOVERY_DEPLOYMENT_RUNBOOK.md
  • docs/guides/P0_DISCOVERY_BLOCKER_FIXES_V2.md
  • docs/reference/READINESS_DATA_SOURCE_TRACE.md

Supporting reference

Rollback (if needed)

# Get previous revision
PREV_REV=$(gcloud run revisions list \
  --service=payroll-pipeline-cbs-api \
  --region=us-central1 \
  --project=payroll-bi-gauntlet \
  --limit=2 \
  --format="value(name)" | Select-Object -Last 1)

# Rollback to previous revision
gcloud run services update-traffic payroll-pipeline-cbs-api \
  --to-revisions=${PREV_REV}=100 \
  --region=us-central1 \
  --project=payroll-bi-gauntlet