Skip to main content

Shadow Mode Implementation - FINAL Alignment Check

Date: 2025-01-XX
Purpose: Verify FINAL specification resolves ALL red flags and is ready for implementation

✅ ALL RED FLAGS RESOLVED

RED FLAG #1: Shadow Naming Strategy ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
“Use dataset-level shadowing only.” “Shadow datasets: payroll_raw_shadow, payroll_processed_shadow, payroll_analytics_shadow” ”❌ Do NOT create _shadow tables inside prod datasets”
Verification: ✅ Unambiguous, matches recommendation

RED FLAG #2: Hardcoded Dataset Constants ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
“Implement a single canonical resolver, e.g.: resolve_dataset("payroll_raw") -> "payroll_raw_shadow" if BQ_SHADOW_MODE=1 else "payroll_raw"” “One resolver used everywhere” “No hardcoded dataset names at call sites”
Verification: ✅ Clear implementation pattern specified Implementation Requirements:
  • Function signature: resolve_dataset(base_name: str) -> str
  • Reads BQ_SHADOW_MODE env var
  • Default (shadow mode OFF) returns prod dataset names

RED FLAG #3: top_agents Dual-Table Dependency ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
Critical: top_agents shadow view must swap BOTH: payroll_processed.stage3_snapshotspayroll_processed_shadow.stage3_snapshots, payroll_raw.stage1_snapshotspayroll_raw_shadow.stage1_snapshots
Verification: ✅ Explicitly addressed

RED FLAG #4: Period Validation ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
“Introduce: ALLOWED_PERIODS_SHADOW = {"2025-06-01", "2025-07-01", "2025-08-01", "2025-09-01"}” “Period format must be YYYY-MM-01
Verification: ✅ Clear specification with exact format Implementation Note: Current code uses {"2025-06"} format. Must normalize to {"2025-06-01"} format for consistency.

RED FLAG #5: Parity Validator Query Pattern ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
“Use parameterized query-builder helpers, not string search/replace.” “Example pattern: def analytics_ds(use_shadow: bool) -> str: return "payroll_analytics_shadow" if use_shadow else "payroll_analytics"
Verification: ✅ Clear implementation pattern specified Implementation Pattern:
def analytics_ds(use_shadow: bool) -> str:
    return "payroll_analytics_shadow" if use_shadow else "payroll_analytics"

def raw_ds(use_shadow: bool) -> str:
    return "payroll_raw_shadow" if use_shadow else "payroll_raw"

def processed_ds(use_shadow: bool) -> str:
    return "payroll_processed_shadow" if use_shadow else "payroll_processed"

RED FLAG #6: MoM Previous Period ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
“Run Repo B shadow ingestion for: June 2025, July 2025, August 2025, September 2025” “This is REQUIRED to satisfy: MoM LAG() logic, Growth/Loss reconciliation”
Verification: ✅ Multi-month backfill includes all required periods MoM Validation Coverage:
  • Jun→Jul: ✅ (June + July shadow data)
  • Jul→Aug: ✅ (July + August shadow data)
  • Aug→Sep: ✅ (August + September shadow data)

RED FLAG #7: YTD/QTD Direct Table Queries ✅ RESOLVED

Status:EXPLICITLY ADDRESSED Final Spec:
“QTD (STANDARD QUARTER ONLY): July 2025 QTD, August 2025 QTD, September 2025 QTD” ”❌ Do NOT validate June QTD”
Verification: ✅ QTD validation scope clearly defined QTD Validation Coverage:
  • July 2025 QTD: baseline = 2025-07-01, current = 2025-07-31
  • August 2025 QTD: baseline = 2025-07-01, current = 2025-08-31
  • September 2025 QTD: baseline = 2025-07-01, current = 2025-09-30
  • June 2025 QTD: ❌ Explicitly excluded (would require April baseline, not in scope)

RED FLAG #8: tpa_applied Column ✅ VERIFIED IN CODE

Status:CODE VERIFIES COLUMN EXISTS Code Evidence:
# repo_b/upload_to_bq.py:599
stage3_df['tpa_applied'] = df['is_tpa'].apply(parse_bool)
Verification: ✅ Repo B upload sets tpa_applied from is_tpa column

RED FLAG #9: QTD Baseline Calculation ✅ RESOLVED

Status:EXPLICITLY RESOLVED Final Spec:
“QTD (STANDARD QUARTER ONLY): July 2025 QTD (baseline = Jul 1), August 2025 QTD (baseline = Jul 1), September 2025 QTD (baseline = Jul 1)” ”❌ Do NOT validate June QTD”
Verification: ✅ QTD validation scope explicitly excludes June (which would require April baseline) QTD Baseline Logic (Verified in Code):
# api/bigquery/queries.py:8074-8077
elif period_type == "qtd":
    quarter = (end_date.month - 1) // 3
    quarter_start_month = quarter * 3 + 1
    start_date = date(end_date.year, quarter_start_month, 1)
For Q3 (July-September):
  • July: quarter = (7-1)//3 = 2, quarter_start_month = 2*3+1 = 72025-07-01
  • August: quarter = (8-1)//3 = 2, quarter_start_month = 72025-07-01
  • September: quarter = (9-1)//3 = 2, quarter_start_month = 72025-07-01
All Q3 QTD validations use July 1 baseline (covered by shadow backfill) ✅

RED FLAG #10: Growth/Loss QTD Snapshot Comparison ✅ RESOLVED

Status:EXPLICITLY ADDRESSED Final Spec:
“QTD snapshot comparison: employees(period_end) − employees(quarter_start)
Verification: ✅ Spec explicitly documents snapshot comparison logic Implementation Note: Parity validator should use get_growth_loss_summary_snapshot() logic for QTD validation.

RED FLAG #11: View DDL Storage Location ✅ RESOLVED

Status:EXPLICITLY SPECIFIED Final Spec:
“Store in: integration/bigquery/sql/views_shadow/
Verification: ✅ Clear location specified

⚠️ MINOR CLARIFICATIONS (Non-Blocking)

CLARIFICATION #1: Period Format Normalization

Issue: Spec uses YYYY-MM-01 format consistently, but current code may accept YYYY-MM format Impact: Need to ensure consistent normalization in shadow mode Recommendation: Use normalize_period_label() helper that converts YYYY-MMYYYY-MM-01 Code Reference:
# repo_b/upload_to_bq.py:61-68
def normalize_period_label(period_str: str) -> date:
    if len(period_str) == 7:  # YYYY-MM
        return date.fromisoformat(f"{period_str}-01")
    return date.fromisoformat(period_str)
Action: Ensure ALLOWED_PERIODS_SHADOW uses YYYY-MM-01 format, normalize input periods before validation

CLARIFICATION #2: QTD Aggregation vs Snapshot

Issue: Spec mentions both:
  • “QTD snapshot comparison” (for Growth/Loss)
  • QTD aggregation (for CEO metrics)
Impact: Need to understand which metrics use which approach Clarification:
  • CEO Metrics QTD: Aggregates financial metrics (SUM) across date range, uses snapshot for counts
  • Growth/Loss QTD: Uses snapshot comparison (current snapshot vs baseline snapshot)
Verification: ✅ Both approaches are valid and correctly specified

✅ ALIGNMENT VERIFICATION

Contract Map Alignment

RequirementSpec StatusContract StatusMatch
Base tablesstage1_snapshots, stage3_snapshots✅ Plural form
ViewsAll 4 views identified✅ All 4 views
top_agents dual-tableExplicitly addressed✅ JOIN both tables
tpa_applied filterVerified in code✅ Required
QTD baselineJuly 1 for Q3✅ Quarter start
MoM previous periodMulti-month backfill✅ Required

Implementation Readiness

All Critical Requirements Met:
  • ✅ Dataset-level shadowing strategy
  • ✅ Canonical resolver pattern specified
  • ✅ Shadow allowlist with exact periods
  • ✅ Query builder pattern specified
  • ✅ QTD validation scope clarified (standard Q3 only)
  • ✅ Multi-month backfill scope (June–September)
  • ✅ View DDL storage location specified

📋 FINAL PRE-IMPLEMENTATION CHECKLIST

  • ✅ Shadow naming strategy (dataset-level, locked)
  • ✅ Multi-month scope (June–September, locked)
  • top_agents dual-table dependency (explicitly addressed)
  • ✅ Shadow allowlist (exact periods specified)
  • ✅ Query builder pattern (parameterized helpers)
  • ✅ QTD validation scope (standard Q3 only, June excluded)
  • ✅ Period format (YYYY-MM-01, locked)
  • ✅ View DDL storage location (views_shadow/)
  • ⚠️ Verify is_tpa column exists in Repo B Stage3 CSV (runtime check)
  • ⚠️ Create views_shadow/ directory (one-time setup)

🎯 FINAL READINESS ASSESSMENT

Status:READY FOR IMPLEMENTATION Confidence Level: 🟢 VERY HIGH Blockers: NONE Remaining Items:
  • Runtime verification of is_tpa column (non-blocking, can verify during first shadow upload)
  • Directory creation (trivial, one-time setup)

✅ SPECIFICATION QUALITY ASSESSMENT

Strengths:

  1. Unambiguous architecture decisions (dataset-level shadowing)
  2. Explicit implementation patterns (query builder helpers)
  3. Clear validation scope (MTD, MoM, QTD with exact periods)
  4. Resolves all previous red flags (8/8 resolved)
  5. Aligned with validated contract map (100% match)

Minor Improvements (Optional):

  1. Could add example query builder function signatures (but pattern is clear)
  2. Could specify exact error messages for parity failures (but “fail-fast with clear diffs” is sufficient)

🚀 IMPLEMENTATION READINESS

Verdict:PROCEED WITH IMPLEMENTATION Rationale:
  • All architectural decisions locked
  • All red flags resolved
  • Implementation patterns clearly specified
  • Validation scope comprehensive and unambiguous
  • Aligned with validated contract map
Next Steps:
  1. Implement resolve_dataset() helper
  2. Add ALLOWED_PERIODS_SHADOW constant
  3. Create shadow view DDL files
  4. Implement parity validator with query builder pattern
  5. Create cutover runbook

Status:FINAL SPECIFICATION VALIDATED - READY FOR IMPLEMENTATION