Skip to main content

Stage3 Insert Root Cause — Partition Window

Evidence (from scripts/repro_stage3_insert_error.py)

first_5_errors (first error):
FieldValue
index0
location(empty)
reasoninvalid
messageValue 47117 for partition column period_label of the destination table payroll-bi-gauntlet:payroll_processed.stage3_snapshots_v2 is outside the allowed bounds. You can only stream to partitions of date range within 3650 days in the past and 366 days in the future relative to the current date. Integer value mapping to a BigQuery date column has to be number of days since Unix Epoch time, please double check if your input data is valid.
Matching schema field: period_label (DATE, partition column)

Root Cause

stage3_snapshots_v2 is partitioned by period_label. BigQuery streaming insert (insert_rows_json) to partitioned tables enforces:
  • 3650 days in the past
  • 366 days in the future
The sandbox smoke test uses period_label=2099-01-01. That date is beyond the 366‑day future limit, so BQ rejects the streaming insert.

Category

CategoryExample
Date mismatch2099-01-01 outside BQ streaming window
Not a type mismatch. The date format (ISO YYYY-MM-DD) is correct; the value is outside the allowed partition range.

Fix Options

  1. Use period within bounds for smoke – Default PeriodLabel to something like 2026-02-01 in the smoke script. Requires a batch with data for that period.
  2. Reject out-of-window in API – Add validation in write_stage3_rows and return STAGE3_WRITE_FAILED with a clear message before calling BQ.
  3. Batch load for out-of-window – Use a BQ load job instead of streaming when the period is out of window. More complex and out of scope for the initial fix.
  • Add validation in stage3_queries.py that rejects period_label outside the BQ streaming window and raises a descriptive STAGE3_WRITE_FAILED error.
  • Update pr3d_deploy_smoke.ps1 default PeriodLabel to 2026-02-01 (or another date within 366 days of today).
  • Document this constraint in the smoke script.