Skip to main content

Stage 1 Certification

Purpose

Certify Stage 1 ingestion boundary as audit-grade before Stage 2+ progression.

When to run this

  • At initial Stage 1 certification
  • After Stage 1 hotfixes affecting ingestion/bridge behavior

Prerequisites

  • Access to Stage 1 test suite
  • BigQuery read access for proof queries
  • Tenant/batch IDs for evidence collection

Inputs

  • Certification date/branch/commit metadata
  • Stage 1 test output
  • SQL proof query output

Procedure

Date: 2026-01-31
Branch: feat/partb-stage1-bridge
Commit: 7697419 (initial certification)
Production Certified: 2026-02-01
Production Certified Revision: 5a8e373

1) Stage 1 scope definition

Stage 1 is the ingestion boundary: transaction_events_raw → stage1_bridge_rows

Boundaries

  • Input: CSV upload via /api/v1/intake/upload
  • Raw Storage: transaction_events_raw table (all rows, ACCEPTED + REJECTED)
  • Bridge: stage1_bridge_rows table (MERGE from raw, typed fields via SAFE_CAST)
  • Output: Validated, typed rows ready for Stage 2 processing

Explicitly Deferred to Stage 2

  • Business onboarding/discovery logic
  • Agent PEPM assignments
  • Rate plan resolution
  • Commission calculations
  • Aggregation/shaping (stage1_snapshots)
  • Dashboard fields

2) Required wizard inputs

period_label (YYYY-MM-01 format)

  • Required: Yes (wizard input, not column-mapped)
  • Format: YYYY-MM-01 (e.g., 2025-01-01)
  • Validation: Strict regex ^\d{4}-\d{2}-01$
  • Source: Wizard UI input, passed in /process request payload

3) Verified endpoints

/api/v1/intake/{batch_id}/discover

  • Discovers businesses from mapped rows
  • Returns classification (configured_ok, unconfigured_seen_before, new_discovered)
  • Status: Verified in test suite

/api/v1/intake/{batch_id}/preflight

  • Server-side preflight check (authoritative)
  • Validates normalization drift invariant
  • Checks policy requirements
  • Returns preflight hash for idempotency
  • Status: Verified in test suite

/api/v1/intake/{batch_id}/process

  • Processes mapped rows → transaction_events_raw
  • Bridges to stage1_bridge_rows (MERGE)
  • Row-resilient: bad rows REJECTED, batch still PROCESSED
  • Returns processing summary with counts
  • Status: Verified in test suite (17 tests passing)

4) Invariants (audit-grade)

1. NUMERIC Money Fields (ADR-002)

  • All money fields stored as NUMERIC (not FLOAT)
  • Fields: credit_num, debit_num, total_num
  • Parsed via SAFE_CAST(credit_raw AS NUMERIC)
  • Verified: Decimal safety tests passing

2. Idempotent MERGE Keys

  • MERGE key: (tenant_id, batch_id, row_index) + NULL-safe org_id matching
  • NULL org_id handling: IFNULL(target.org_id,'') = IFNULL(source.org_id,'')
  • Verified: Idempotency tests passing (no duplicates on rerun)

3. Lineage Integrity

  • Every bridge row links to raw row via (tenant_id, batch_id, row_index)
  • Full traceability: mapped_payload_json preserved in raw table
  • Verified: Lineage join tests passing

4. Safe Error Payloads

  • No str(e) in client payloads
  • Exceptions logged server-side only
  • Error codes: structured arrays (not free-form strings)
  • Verified: Error handling tests passing

5. Missing Batch Metadata: Controlled 404 + Metric

  • Pre-check: Query ingestion_batches before MERGE
  • If batch metadata missing: raise RuntimeError("BATCH_METADATA_MISSING")
  • Route handler maps to HTTPException(404) with safe payload
  • Structured log field: "metric": "stage1_bridge.batch_metadata_missing"
  • Verified: Missing metadata tests passing

5) Targeted test gate

Command

pytest api/tests/test_stage1_bridge.py api/tests/test_intake_processor.py -q

Results (2026-01-31)

====================== 17 passed, 130 warnings in 0.18s =======================

Test Coverage

  • test_stage1_bridge.py: Bridge MERGE, idempotency, validation, missing metadata handling
  • test_intake_processor.py: Full ingestion flow, decimal safety, RBAC, row resilience

6) Proof queries

Counts Verification

-- Raw table counts
SELECT 
  COUNT(*) as raw_total,
  COUNTIF(row_status = 'ACCEPTED') as raw_accepted,
  COUNTIF(row_status = 'REJECTED') as raw_rejected
FROM `payroll-bi-gauntlet.payroll_raw.transaction_events_raw`
WHERE tenant_id = @tenant_id AND batch_id = @batch_id;

-- Bridge table counts
SELECT 
  COUNT(*) as bridge_total,
  COUNTIF(is_valid = TRUE) as bridge_valid,
  COUNTIF(is_valid = FALSE) as bridge_invalid
FROM `payroll-bi-gauntlet.payroll_raw.stage1_bridge_rows`
WHERE tenant_id = @tenant_id AND batch_id = @batch_id;

Lineage Integrity

-- Every bridge row must link to raw row
SELECT 
  COUNT(*) as total_bridge_rows,
  COUNTIF(b.tenant_id = r.tenant_id 
          AND b.batch_id = r.batch_id 
          AND b.row_index = r.row_index) as matched_lineage
FROM `payroll-bi-gauntlet.payroll_raw.stage1_bridge_rows` b
LEFT JOIN `payroll-bi-gauntlet.payroll_raw.transaction_events_raw` r
  ON b.tenant_id = r.tenant_id 
  AND b.batch_id = r.batch_id 
  AND b.row_index = r.row_index
WHERE b.tenant_id = @tenant_id AND b.batch_id = @batch_id;
-- Expected: matched_lineage = total_bridge_rows

Duplicate Check (Idempotency)

-- No duplicates for same (tenant_id, batch_id, row_index)
SELECT 
  tenant_id, batch_id, row_index, COUNT(*) as dup_count
FROM `payroll-bi-gauntlet.payroll_raw.stage1_bridge_rows`
WHERE tenant_id = @tenant_id AND batch_id = @batch_id
GROUP BY tenant_id, batch_id, row_index
HAVING COUNT(*) > 1;
-- Expected: 0 rows (no duplicates)

7) Log verification

STAGE1_BRIDGE_COMPLETE Event

Check logs for structured event:
{
  "event": "STAGE1_BRIDGE_COMPLETE",
  "trace_id": "<uuid>",
  "tenant_id": "<tenant_id>",
  "batch_id": "<batch_id>",
  "raw_total": <count>,
  "raw_accepted": <count>,
  "raw_rejected": <count>,
  "bridge_total": <count>,
  "bridge_valid": <count>,
  "bridge_invalid": <count>,
  "elapsed_ms": <milliseconds>
}

8) Known limitations (deferred to Stage 2)

  1. Business Onboarding: Discovery logic exists but onboarding decisions are Stage 2
  2. Agent PEPM Assignments: Assignment logic is Stage 2+
  3. Rate Plan Resolution: Rate plan lookup is Stage 2+
  4. Commission Calculations: Commission math is Stage 2+
  5. Aggregation: stage1_snapshots shaping is Stage 2+

Verification

CERTIFIED - Stage 1 ingestion boundary is audit-grade and ready for Stage 2 development. Evidence:
  • All 17 Stage 1 targeted tests passing
  • Row resilience verified (bad rows REJECTED, batch still PROCESSED)
  • Idempotency verified (no duplicates on rerun)
  • Lineage integrity verified (every bridge row links to raw row)
  • Decimal safety verified (NUMERIC fields, no FLOAT leakage)
  • Missing metadata handling verified (controlled 404 + metric)

Failure modes & fixes

  1. Test gate failures
    • Fix regression before certification; rerun targeted gate.
  2. Duplicate rows in bridge
    • Re-check merge key and NULL-safe org match logic.
  3. Lineage mismatch
    • Verify (tenant_id, batch_id, row_index) join contract.
  4. Unsafe error payload exposure
    • Ensure safe, structured error responses only.

Artifacts produced

  • Certification metadata (date/branch/commit/revision)
  • Test gate output (17 passed baseline)
  • SQL proof outputs for counts, lineage, and duplicates
  • Structured log evidence (STAGE1_BRIDGE_COMPLETE)
  • docs/runbooks/STAGE2_CERTIFICATION.md
  • docs/reference/INVARIANT_ENFORCEMENT.md
  • docs/reference/BQ_CONTRACT.md