Skip to main content

Shadow Mode Implementation - Updated Alignment Check

Date: 2025-01-XX
Purpose: Verify updated specification resolves all red flags and check for remaining issues

✅ RESOLVED RED FLAGS

RED FLAG #1: Shadow Naming Strategy ✅ RESOLVED

Status:EXPLICITLY RESOLVED Updated Spec:
“Use dataset-level shadowing only.” “Shadow datasets: payroll_raw_shadow, payroll_processed_shadow, payroll_analytics_shadow” ”❌ Do NOT suffix individual tables”
Verification: ✅ Spec is unambiguous and matches recommendation

RED FLAG #2: Hardcoded Dataset Constants ✅ PARTIALLY RESOLVED

Status: ⚠️ SPEC MENTIONS SOLUTION, NEEDS IMPLEMENTATION DETAIL Updated Spec:
“Introduce a single helper (example): resolve_dataset("payroll_raw") -> "payroll_raw_shadow" if BQ_SHADOW_MODE=1” “One canonical resolver” “No hardcoded dataset strings elsewhere”
Verification: ✅ Spec provides clear direction, but implementation details needed:
  • Exact function signature
  • Where to place helper function
  • How to integrate with existing load_stage1_to_bigquery() and load_stage3_to_bigquery() calls
Action Required: Implementation must ensure:
  • All call sites use helper (not hardcoded strings)
  • Helper reads BQ_SHADOW_MODE env var
  • Default behavior (shadow mode OFF) writes to prod datasets

RED FLAG #3: top_agents Dual-Table Dependency ✅ RESOLVED

Status:EXPLICITLY ADDRESSED Updated Spec:
Critical: top_agents shadow view must replace both: payroll_processed.stage3_snapshots, payroll_raw.stage1_snapshots
Verification: ✅ Spec explicitly calls out this requirement

RED FLAG #4: Period Validation ✅ RESOLVED

Status:EXPLICITLY RESOLVED Updated Spec:
“Introduce ALLOWED_PERIODS_SHADOW” “Shadow allowlist must include: 2025-06-01, 2025-07-01, 2025-08-01” “Shadow mode must not bypass validation silently”
Verification: ✅ Spec provides clear solution Implementation Note: Current code has:
ALLOWED_PERIODS_SEATBELT = {"2025-06"}  # Line 30
Must add:
ALLOWED_PERIODS_SHADOW = {"2025-06", "2025-07", "2025-08"}

RED FLAG #5: Parity Validator Dual-Dataset Queries ⚠️ NEEDS CLARIFICATION

Status: ⚠️ SCOPE CLEAR, IMPLEMENTATION PATTERN NOT SPECIFIED Updated Spec:
“Compare Repo A (prod) vs Repo B (shadow)” Lists all metrics to compare
Verification: ✅ Scope is comprehensive, but spec doesn’t show:
  • How to construct queries with configurable dataset names
  • Whether to use query builder functions or string replacement
  • Example query pattern
Recommendation: Implementation should use:
def build_ceo_metrics_query(use_shadow: bool, period_label: str) -> str:
    dataset_suffix = "_shadow" if use_shadow else ""
    return f"""
        SELECT * FROM `payroll-bi-gauntlet.payroll_analytics{dataset_suffix}.ceo_snapshot`
        WHERE period_label = @period_label
    """

RED FLAG #6: MoM Calculations Require Previous Period ✅ RESOLVED

Status:EXPLICITLY RESOLVED Updated Spec:
“Run Repo B shadow ingestion for: June 2025, July 2025, August 2025” “This is required to satisfy: MoM LAG() logic in views, Growth/Loss reconciliation”
Verification: ✅ Spec explicitly includes multi-month backfill

RED FLAG #7: YTD/QTD Direct Table Queries ✅ RESOLVED

Status:EXPLICITLY ADDRESSED Updated Spec:
“QTD (June–August)” listed in parity validator scope Implies validator will query shadow tables directly for QTD
Verification: ✅ QTD validation included in scope Implementation Note: QTD for August 2025 should aggregate:
  • Start: 2025-07-01 (Q3 start)
  • End: 2025-08-31 (August end)
  • Baseline: 2025-07-01 (for snapshot comparison)

RED FLAG #8: tpa_applied Column ✅ VERIFIED IN CODE

Status:CODE VERIFIES COLUMN EXISTS Code Evidence:
# repo_b/upload_to_bq.py:599
stage3_df['tpa_applied'] = df['is_tpa'].apply(parse_bool)
Verification: ✅ Repo B upload sets tpa_applied from is_tpa column Action Required: Verify is_tpa column exists in Repo B Stage3 CSV outputs

🔴 REMAINING RED FLAGS

RED FLAG #9: QTD Baseline Calculation for Multi-Month Validation

Issue: Spec mentions “QTD (June–August)” but QTD logic depends on selected period:
  • If validating August 2025 QTD: baseline = July 1, current = August 31
  • If validating July 2025 QTD: baseline = July 1, current = July 31
  • If validating June 2025 QTD: baseline = April 1 (Q2), current = June 30
Impact:
  • Parity validator needs to understand QTD baseline varies by selected period
  • For August 2025 QTD validation, need July 1 baseline (already covered by multi-month backfill)
  • For June 2025 QTD validation, need April 1 baseline (NOT in shadow backfill scope)
Recommendation:
  • Clarify: Are we validating QTD for each month separately (June QTD, July QTD, August QTD)?
  • OR: Are we validating Q3 QTD only (aggregating June–August)?
  • Current spec says “QTD (June–August)” which suggests Q3 aggregation, but needs clarification
Action Required: Clarify QTD validation scope

RED FLAG #10: Growth/Loss QTD/YTD Uses Snapshot Comparison

Issue: get_growth_loss_summary_snapshot() (for QTD/YTD) uses snapshot comparison:
  • Current snapshot: Selected period end
  • Baseline snapshot: Quarter start (for QTD) or Jan 1 (for YTD)
Impact:
  • QTD Growth/Loss for August 2025 compares:
    • Current: August 31 snapshot
    • Baseline: July 1 snapshot
  • This requires July shadow data (✅ covered)
  • But also requires understanding that QTD uses snapshot comparison, not cumulative aggregation
Recommendation:
  • Parity validator should use get_growth_loss_summary_snapshot() logic for QTD validation
  • Document that QTD Growth/Loss uses snapshot comparison (not cumulative)
Action Required: Document QTD Growth/Loss validation approach

RED FLAG #11: View DDL Storage Location

Issue: Spec says:
“Store shadow view SQL in: integration/bigquery/sql/views_shadow/
Verification: ✅ Clear location specified Action Required: Create directory structure if it doesn’t exist

⚠️ WARNINGS (Non-Blocking)

WARNING #1: Project ID Hardcoding

Issue: All view SQL hardcodes payroll-bi-gauntlet project ID Impact: Shadow views will also hardcode project ID (acceptable for Phase 1) Recommendation: Acceptable for shadow mode. Consider parameterization in future.

WARNING #2: Period Format Consistency

Issue: Spec uses both formats:
  • 2025-06-01 (with day)
  • 2025-06 (without day)
Impact: Need to ensure consistent normalization Recommendation: Use YYYY-MM-01 format consistently (matches BigQuery DATE partitioning)

WARNING #3: QTD Quarter Definition

Issue: Q3 2025 = July, August, September (not June, July, August) Impact:
  • If validating “QTD (June–August)”, this is NOT a standard quarter
  • Standard Q3 = July–September
  • June is end of Q2
Recommendation: Clarify:
  • Are we validating Q2 QTD (April–June) + Q3 QTD (July–September)?
  • OR: Are we validating a custom 3-month period (June–August)?
Current Spec Says: “QTD (June–August)” - this suggests custom period, not standard quarter

📋 UPDATED PRE-IMPLEMENTATION CHECKLIST

  • ✅ Shadow naming strategy clarified (dataset-level)
  • ✅ Multi-month scope defined (June, July, August)
  • top_agents dual-table dependency acknowledged
  • ✅ Shadow allowlist specified
  • ⚠️ Clarify QTD validation scope (standard quarter vs custom period)
  • ⚠️ Verify is_tpa column exists in Repo B Stage3 CSV
  • ⚠️ Design query builder pattern for parity validator (or use string replacement)
  • ⚠️ Create views_shadow/ directory structure
  • ⚠️ Document QTD Growth/Loss snapshot comparison logic

✅ ALIGNMENT SUMMARY

Overall Assessment: Updated specification resolves 6 of 8 original red flags explicitly. 2 red flags need minor clarification (QTD scope, query builder pattern).

Resolved (6):

  1. ✅ Shadow naming strategy (dataset-level)
  2. ✅ Hardcoded constants (helper function specified)
  3. top_agents dual-table dependency
  4. ✅ Period validation (shadow allowlist)
  5. ✅ MoM previous period requirement (multi-month backfill)
  6. ✅ YTD/QTD scope (included in validator)

Needs Clarification (2):

  1. ⚠️ QTD validation scope (standard quarter vs custom June–August period)
  2. ⚠️ Parity validator query builder pattern (implementation detail)

New Red Flags (1):

  1. 🔴 QTD baseline calculation for multi-month validation

🎯 READINESS ASSESSMENT

Status:MOSTLY READY FOR IMPLEMENTATION Blockers: None (all can be resolved during implementation) Recommendations:
  1. Clarify QTD validation scope before writing parity validator SQL
  2. Verify is_tpa column exists in Repo B outputs before shadow backfill
  3. Design query builder helper early (before writing validator)
Confidence Level: 🟢 HIGH - Specification is well-aligned with validated contract map
Next Step: Proceed with implementation, addressing clarifications as they arise