Real Payroll File Wizard E2E Test Results

Date: 2026-01-28
File: data/input/December 2025 Combnied Payroll.csv
Size: 821.99 KB
Rows: 2,841
Batch ID: f97d7357-986d-46d4-822c-74eb3d467ac3
Period Label: 2025-12-01
Tenant: creative_benefit_strategies
Org ID: null (Phase 8D)

Test Execution Summary

✅ Step A: Health Check

Status: 200 OK
Git SHA: ccfb79ea0f617cad88b235ae0ac4f4d5c11004aa
Service: payroll-pipeline-cbs-api-evndxpcirq-uc.a.run.app

✅ Step B: Upload

Status: 200 OK
Batch ID: f97d7357-986d-46d4-822c-74eb3d467ac3
Row Count: 2,841
Status: UPLOADED
Org ID: null ✅ (Phase 8D compliance)

⚠️ Step C: Preview

Status: 404 Not Found (endpoint may not be implemented or path incorrect)
Impact: Non-critical, skipped

⚠️ Step D: Map

Status: HTTP Timeout (>300 seconds)
Rows Written: 279 (partial)
Batch Status: Still UPLOADED (not MAPPED)
Issue: Synchronous processing of 2,841 rows exceeds HTTP timeout limits

Mapping Used:

{
  "payee_agent_name": "Payee Agent Label",
  "business_label": "Label",
  "member_id": "Member ID",
  "credit": "Credit",
  "debit": "Debit",
  "period": "Period",
  "total": "Total"
}

Root Cause: The /map endpoint processes files synchronously:

Downloads file from GCS (821 KB)
Parses all 2,841 rows into memory
Processes each row
Writes all rows to BigQuery in one batch

For large files, this exceeds Cloud Run’s HTTP timeout (default 300s, configurable up to 3600s).

⏸️ Steps E-G: Not Executed

Discover: Not executed (requires MAPPED status)
Preflight: Not executed (requires MAPPED status)
Process: Not executed (requires MAPPED status)

BigQuery Status (Current State)

ingestion_batches

SELECT batch_id, status, row_count, org_id, created_at 
FROM `payroll-bi-gauntlet.payroll_analytics.ingestion_batches` 
WHERE batch_id = 'f97d7357-986d-46d4-822c-74eb3d467ac3';

Result:

batch_id: f97d7357-986d-46d4-822c-74eb3d467ac3
status: UPLOADED (not MAPPED)
row_count: 2841
org_id: null ✅

transaction_events_raw

SELECT 
  COUNT(*) as row_count,
  COUNTIF(row_status = 'ACCEPTED') as accepted_count,
  COUNTIF(row_status = 'REJECTED') as rejected_count
FROM `payroll-bi-gauntlet.payroll_analytics.transaction_events_raw` 
WHERE batch_id = 'f97d7357-986d-46d4-822c-74eb3d467ac3';

Result:

row_count: 279 (partial - only 9.8% of total rows)
accepted_count: 279
rejected_count: 0

MERGE Key Analysis (Task C)

transaction_events_raw MERGE Keys

Source: api/bigquery/intake_processor_queries.py:102-104

ON target.tenant_id = source.tenant_id
   AND target.batch_id = source.batch_id
   AND target.row_index = source.row_index

Answer: Same batch_id updates existing rows (by row_index). New batch_id accumulates new rows.

stage1_bridge_rows MERGE Keys

Source: api/bigquery/intake_processor_queries.py:679-683

ON target.tenant_id = source.tenant_id
   AND IFNULL(target.org_id, '') = IFNULL(source.org_id, '')
   AND target.batch_id = source.batch_id
   AND target.row_index = source.row_index

Answer: Same batch_id updates existing rows (by row_index). New batch_id accumulates new rows.

ingestion_batch_businesses MERGE Keys

Source: api/bigquery/discovery_queries.py:120-124

ON target.tenant_id = source.tenant_id
  AND (target.org_id IS NULL AND source.org_id IS NULL OR target.org_id = source.org_id)
  AND target.batch_id = source.batch_id
  AND target.period_label = source.period_label
  AND target.normalized_business_name = source.normalized_business_name

Answer: Same batch_id + period_label + normalized_business_name updates existing rows. New combinations accumulate.

Preflight Readiness Check (Task D)

Source: api/routes/intake_preflight.py:764-780
Query Function: get_business_readiness() from api/bigquery/business_onboarding_queries.py Tables Checked:

config_business_onboarding (base table)
config_business_agent_assignment (JOIN for agent assignments)
config_business_agent_pepm_assignment (JOIN for PEPM rows)
config_business_policy (JOIN for OWNER_ROLLUP policies)

Readiness Criteria (any of):

Has OWNER_ROLLUP policy active (as of period)
Has ≥ 1 agent assignment AND all assigned agents have PEPM rows AND effective_start_date present

Minimal Operational Steps (if preflight blocks):

Option 1: Use existing admin UI “Go to Business Onboarding” to configure businesses
Option 2: Use existing admin endpoints (if available):
- POST /api/v1/admin/onboarding/businesses/{business_id}/assignments (assign agents)
- POST /api/v1/admin/onboarding/businesses/{business_id}/pepm (set PEPM)
- POST /api/v1/admin/onboarding/businesses/{business_id}/policy (set OWNER_ROLLUP)

Note: Preflight was not executed due to map timeout, so blocker count is unknown.

Red Flags Identified

⚠️ Performance Issue: /map endpoint times out on large files (>2000 rows)
- Impact: Cannot process production payroll files synchronously
- Recommendation: Consider async processing or chunked writes for large files
⚠️ Partial Write: 279 rows written but batch status not updated to MAPPED
- Impact: Inconsistent state - rows exist but batch appears incomplete
- Recommendation: Ensure atomic batch status updates or implement idempotent retry
✅ Tenant Isolation: Confirmed org_id = null for batch (Phase 8D compliance)
✅ Decimal Handling: No float leakage detected (all money fields stored as strings)

Recommendations

Immediate: Increase Cloud Run timeout for /map endpoint (if not already maxed at 3600s)
Short-term: Implement async processing for /map endpoint (return 202, poll status)
Long-term: Consider chunked BigQuery writes to avoid memory issues on very large files

Next Steps

Retry Map: Attempt map again (should be idempotent) with increased timeout
Check Logs: Review Cloud Run logs for /map endpoint to identify bottleneck
Complete Flow: Once map succeeds, proceed with Discover → Preflight → Process
Validate Outputs: Run BigQuery invariant queries after Process completes

Cloud Run Log Query (for debugging)

gcloud logging read \
  "resource.type=cloud_run_revision AND resource.labels.service_name=payroll-pipeline-cbs-api AND jsonPayload.batch_id=f97d7357-986d-46d4-822c-74eb3d467ac3" \
  --project payroll-bi-gauntlet \
  --limit 100 \
  --format json

Start Here

Operate & Support

Reference

Build & Integrate

Real payroll wizard e2e results

Real Payroll File Wizard E2E Test Results

Test Execution Summary

✅ Step A: Health Check

✅ Step B: Upload

⚠️ Step C: Preview

⚠️ Step D: Map

⏸️ Steps E-G: Not Executed

BigQuery Status (Current State)

ingestion_batches

transaction_events_raw

MERGE Key Analysis (Task C)

transaction_events_raw MERGE Keys

stage1_bridge_rows MERGE Keys

ingestion_batch_businesses MERGE Keys

Preflight Readiness Check (Task D)

Red Flags Identified

Recommendations

Next Steps

Cloud Run Log Query (for debugging)

Start Here

Operate & Support

Reference

Build & Integrate

​Real Payroll File Wizard E2E Test Results

​Test Execution Summary

​✅ Step A: Health Check

​✅ Step B: Upload

​⚠️ Step C: Preview

​⚠️ Step D: Map

​⏸️ Steps E-G: Not Executed

​BigQuery Status (Current State)

​ingestion_batches

​transaction_events_raw

​MERGE Key Analysis (Task C)

​transaction_events_raw MERGE Keys

​stage1_bridge_rows MERGE Keys

​ingestion_batch_businesses MERGE Keys

​Preflight Readiness Check (Task D)

​Red Flags Identified

​Recommendations

​Next Steps

​Cloud Run Log Query (for debugging)

Real Payroll File Wizard E2E Test Results

Test Execution Summary

✅ Step A: Health Check

✅ Step B: Upload

⚠️ Step C: Preview

⚠️ Step D: Map

⏸️ Steps E-G: Not Executed

BigQuery Status (Current State)

ingestion_batches

transaction_events_raw

MERGE Key Analysis (Task C)

transaction_events_raw MERGE Keys

stage1_bridge_rows MERGE Keys

ingestion_batch_businesses MERGE Keys

Preflight Readiness Check (Task D)

Red Flags Identified

Recommendations

Next Steps

Cloud Run Log Query (for debugging)