Skip to main content

Phase 8D Active Batch Query Implementation

Function: get_active_ingestion_batch_id(tenant_id, period_label)

Location: api/bigquery/intake_queries.py

Query Logic

SELECT DISTINCT
    b.batch_id
FROM `payroll-bi-gauntlet.payroll_analytics.ingestion_batches` b
INNER JOIN `payroll-bi-gauntlet.payroll_analytics.ingestion_batch_businesses` d
    ON b.batch_id = d.batch_id
    AND b.tenant_id = d.tenant_id
WHERE b.tenant_id = @tenant_id
  AND b.status IN ('MAPPED', 'PROCESSED')
  AND d.period_label = @period_label
  AND d.org_id IS NULL  -- Phase 8D: discovery is platform-scoped (org_id IS NULL)
ORDER BY b.uploaded_at DESC, b.batch_id ASC
LIMIT 1

Tenant Isolation

Confirmed: Query filters by b.tenant_id = @tenant_id as the first WHERE condition, ensuring strict tenant isolation.

Deterministic Ordering

Confirmed:
  • Primary: b.uploaded_at DESC (most recent first)
  • Tie-break: b.batch_id ASC (deterministic UUID ordering)

Status Filtering

Confirmed: Only returns batches with status MAPPED or PROCESSED.

Period Association

Confirmed: Joins with ingestion_batch_businesses table which stores period_label per discovery run. Only batches with discovery records for the given period_label are considered.

Phase 8D Invariant

Confirmed: Filters d.org_id IS NULL to match Phase 8D platform-scoped discovery.

Fix 1 Implementation

All three endpoints (preflight, discover-businesses, /process) now:
  1. ✅ Check batch status is MAPPED or PROCESSED
  2. NEW: Verify received_batch_id == active_batch_id for (tenant_id, period_label)
  3. ✅ Return 409 Conflict with BatchMismatch error if mismatch detected

Error Response Format

{
  "error": "BatchMismatch",
  "message": "Preflight/process must use the active mapped batch_id",
  "expected_batch_id": "<active_batch_id>",
  "received_batch_id": "<received_batch_id>",
  "tenant_id": "<tenant_id>",
  "period_label": "<period_label>"
}

Degraded Mode

If get_active_ingestion_batch_id fails (BigQuery unavailable), endpoints log error but do not block (allows degraded operation).

Edge Cases

  • No active batch exists: Logs warning but allows request (first discovery scenario)
  • Invalid period_label format: Logs warning, fails later in flow
  • BigQuery unavailable: Logs error, allows degraded operation