Skip to main content

Real Payroll File Wizard E2E Test Results

Date: 2026-01-28
File: data/input/December 2025 Combnied Payroll.csv
Size: 821.99 KB
Rows: 2,841
Batch ID: f97d7357-986d-46d4-822c-74eb3d467ac3
Period Label: 2025-12-01
Tenant: creative_benefit_strategies
Org ID: null (Phase 8D)

Test Execution Summary

✅ Step A: Health Check

  • Status: 200 OK
  • Git SHA: ccfb79ea0f617cad88b235ae0ac4f4d5c11004aa
  • Service: payroll-pipeline-cbs-api-evndxpcirq-uc.a.run.app

✅ Step B: Upload

  • Status: 200 OK
  • Batch ID: f97d7357-986d-46d4-822c-74eb3d467ac3
  • Row Count: 2,841
  • Status: UPLOADED
  • Org ID: null ✅ (Phase 8D compliance)

⚠️ Step C: Preview

  • Status: 404 Not Found (endpoint may not be implemented or path incorrect)
  • Impact: Non-critical, skipped

⚠️ Step D: Map

  • Status: HTTP Timeout (>300 seconds)
  • Rows Written: 279 (partial)
  • Batch Status: Still UPLOADED (not MAPPED)
  • Issue: Synchronous processing of 2,841 rows exceeds HTTP timeout limits
Mapping Used:
{
  "payee_agent_name": "Payee Agent Label",
  "business_label": "Label",
  "member_id": "Member ID",
  "credit": "Credit",
  "debit": "Debit",
  "period": "Period",
  "total": "Total"
}
Root Cause: The /map endpoint processes files synchronously:
  1. Downloads file from GCS (821 KB)
  2. Parses all 2,841 rows into memory
  3. Processes each row
  4. Writes all rows to BigQuery in one batch
For large files, this exceeds Cloud Run’s HTTP timeout (default 300s, configurable up to 3600s).

⏸️ Steps E-G: Not Executed

  • Discover: Not executed (requires MAPPED status)
  • Preflight: Not executed (requires MAPPED status)
  • Process: Not executed (requires MAPPED status)

BigQuery Status (Current State)

ingestion_batches

SELECT batch_id, status, row_count, org_id, created_at 
FROM `payroll-bi-gauntlet.payroll_analytics.ingestion_batches` 
WHERE batch_id = 'f97d7357-986d-46d4-822c-74eb3d467ac3';
Result:
  • batch_id: f97d7357-986d-46d4-822c-74eb3d467ac3
  • status: UPLOADED (not MAPPED)
  • row_count: 2841
  • org_id: null

transaction_events_raw

SELECT 
  COUNT(*) as row_count,
  COUNTIF(row_status = 'ACCEPTED') as accepted_count,
  COUNTIF(row_status = 'REJECTED') as rejected_count
FROM `payroll-bi-gauntlet.payroll_analytics.transaction_events_raw` 
WHERE batch_id = 'f97d7357-986d-46d4-822c-74eb3d467ac3';
Result:
  • row_count: 279 (partial - only 9.8% of total rows)
  • accepted_count: 279
  • rejected_count: 0

MERGE Key Analysis (Task C)

transaction_events_raw MERGE Keys

Source: api/bigquery/intake_processor_queries.py:102-104
ON target.tenant_id = source.tenant_id
   AND target.batch_id = source.batch_id
   AND target.row_index = source.row_index
Answer: Same batch_id updates existing rows (by row_index). New batch_id accumulates new rows.

stage1_bridge_rows MERGE Keys

Source: api/bigquery/intake_processor_queries.py:679-683
ON target.tenant_id = source.tenant_id
   AND IFNULL(target.org_id, '') = IFNULL(source.org_id, '')
   AND target.batch_id = source.batch_id
   AND target.row_index = source.row_index
Answer: Same batch_id updates existing rows (by row_index). New batch_id accumulates new rows.

ingestion_batch_businesses MERGE Keys

Source: api/bigquery/discovery_queries.py:120-124
ON target.tenant_id = source.tenant_id
  AND (target.org_id IS NULL AND source.org_id IS NULL OR target.org_id = source.org_id)
  AND target.batch_id = source.batch_id
  AND target.period_label = source.period_label
  AND target.normalized_business_name = source.normalized_business_name
Answer: Same batch_id + period_label + normalized_business_name updates existing rows. New combinations accumulate.

Preflight Readiness Check (Task D)

Source: api/routes/intake_preflight.py:764-780
Query Function: get_business_readiness() from api/bigquery/business_onboarding_queries.py
Tables Checked:
  1. config_business_onboarding (base table)
  2. config_business_agent_assignment (JOIN for agent assignments)
  3. config_business_agent_pepm_assignment (JOIN for PEPM rows)
  4. config_business_policy (JOIN for OWNER_ROLLUP policies)
Readiness Criteria (any of):
  1. Has OWNER_ROLLUP policy active (as of period)
  2. Has ≥ 1 agent assignment AND all assigned agents have PEPM rows AND effective_start_date present
Minimal Operational Steps (if preflight blocks):
  • Option 1: Use existing admin UI “Go to Business Onboarding” to configure businesses
  • Option 2: Use existing admin endpoints (if available):
    • POST /api/v1/admin/onboarding/businesses/{business_id}/assignments (assign agents)
    • POST /api/v1/admin/onboarding/businesses/{business_id}/pepm (set PEPM)
    • POST /api/v1/admin/onboarding/businesses/{business_id}/policy (set OWNER_ROLLUP)
Note: Preflight was not executed due to map timeout, so blocker count is unknown.

Red Flags Identified

  1. ⚠️ Performance Issue: /map endpoint times out on large files (>2000 rows)
    • Impact: Cannot process production payroll files synchronously
    • Recommendation: Consider async processing or chunked writes for large files
  2. ⚠️ Partial Write: 279 rows written but batch status not updated to MAPPED
    • Impact: Inconsistent state - rows exist but batch appears incomplete
    • Recommendation: Ensure atomic batch status updates or implement idempotent retry
  3. ✅ Tenant Isolation: Confirmed org_id = null for batch (Phase 8D compliance)
  4. ✅ Decimal Handling: No float leakage detected (all money fields stored as strings)

Recommendations

  1. Immediate: Increase Cloud Run timeout for /map endpoint (if not already maxed at 3600s)
  2. Short-term: Implement async processing for /map endpoint (return 202, poll status)
  3. Long-term: Consider chunked BigQuery writes to avoid memory issues on very large files

Next Steps

  1. Retry Map: Attempt map again (should be idempotent) with increased timeout
  2. Check Logs: Review Cloud Run logs for /map endpoint to identify bottleneck
  3. Complete Flow: Once map succeeds, proceed with Discover → Preflight → Process
  4. Validate Outputs: Run BigQuery invariant queries after Process completes

Cloud Run Log Query (for debugging)

gcloud logging read \
  "resource.type=cloud_run_revision AND resource.labels.service_name=payroll-pipeline-cbs-api AND jsonPayload.batch_id=f97d7357-986d-46d4-822c-74eb3d467ac3" \
  --project payroll-bi-gauntlet \
  --limit 100 \
  --format json