Shadow Mode Implementation - Updated Alignment Check

Date: 2025-01-XX
Purpose: Verify updated specification resolves all red flags and check for remaining issues

✅ RESOLVED RED FLAGS

RED FLAG #1: Shadow Naming Strategy ✅ RESOLVED

Status: ✅ EXPLICITLY RESOLVED Updated Spec:

“Use dataset-level shadowing only.” “Shadow datasets: payroll_raw_shadow, payroll_processed_shadow, payroll_analytics_shadow” ”❌ Do NOT suffix individual tables”

Verification: ✅ Spec is unambiguous and matches recommendation

RED FLAG #2: Hardcoded Dataset Constants ✅ PARTIALLY RESOLVED

Status: ⚠️ SPEC MENTIONS SOLUTION, NEEDS IMPLEMENTATION DETAIL Updated Spec:

“Introduce a single helper (example): resolve_dataset("payroll_raw") -> "payroll_raw_shadow" if BQ_SHADOW_MODE=1” “One canonical resolver” “No hardcoded dataset strings elsewhere”

Verification: ✅ Spec provides clear direction, but implementation details needed:

Exact function signature
Where to place helper function
How to integrate with existing load_stage1_to_bigquery() and load_stage3_to_bigquery() calls

Action Required: Implementation must ensure:

All call sites use helper (not hardcoded strings)
Helper reads BQ_SHADOW_MODE env var
Default behavior (shadow mode OFF) writes to prod datasets

RED FLAG #3: `top_agents` Dual-Table Dependency ✅ RESOLVED

Status: ✅ EXPLICITLY ADDRESSED Updated Spec:

“Critical: top_agents shadow view must replace both: payroll_processed.stage3_snapshots, payroll_raw.stage1_snapshots”

Verification: ✅ Spec explicitly calls out this requirement

RED FLAG #4: Period Validation ✅ RESOLVED

Status: ✅ EXPLICITLY RESOLVED Updated Spec:

“Introduce ALLOWED_PERIODS_SHADOW” “Shadow allowlist must include: 2025-06-01, 2025-07-01, 2025-08-01” “Shadow mode must not bypass validation silently”

Verification: ✅ Spec provides clear solution Implementation Note: Current code has:

ALLOWED_PERIODS_SEATBELT = {"2025-06"}  # Line 30

Must add:

ALLOWED_PERIODS_SHADOW = {"2025-06", "2025-07", "2025-08"}

RED FLAG #5: Parity Validator Dual-Dataset Queries ⚠️ NEEDS CLARIFICATION

Status: ⚠️ SCOPE CLEAR, IMPLEMENTATION PATTERN NOT SPECIFIED Updated Spec:

“Compare Repo A (prod) vs Repo B (shadow)” Lists all metrics to compare

Verification: ✅ Scope is comprehensive, but spec doesn’t show:

How to construct queries with configurable dataset names
Whether to use query builder functions or string replacement
Example query pattern

Recommendation: Implementation should use:

def build_ceo_metrics_query(use_shadow: bool, period_label: str) -> str:
    dataset_suffix = "_shadow" if use_shadow else ""
    return f"""
        SELECT * FROM `payroll-bi-gauntlet.payroll_analytics{dataset_suffix}.ceo_snapshot`
        WHERE period_label = @period_label
    """

RED FLAG #6: MoM Calculations Require Previous Period ✅ RESOLVED

Status: ✅ EXPLICITLY RESOLVED Updated Spec:

“Run Repo B shadow ingestion for: June 2025, July 2025, August 2025” “This is required to satisfy: MoM LAG() logic in views, Growth/Loss reconciliation”

Verification: ✅ Spec explicitly includes multi-month backfill

RED FLAG #7: YTD/QTD Direct Table Queries ✅ RESOLVED

Status: ✅ EXPLICITLY ADDRESSED Updated Spec:

“QTD (June–August)” listed in parity validator scope Implies validator will query shadow tables directly for QTD

Verification: ✅ QTD validation included in scope Implementation Note: QTD for August 2025 should aggregate:

Start: 2025-07-01 (Q3 start)
End: 2025-08-31 (August end)
Baseline: 2025-07-01 (for snapshot comparison)

RED FLAG #8: `tpa_applied` Column ✅ VERIFIED IN CODE

Status: ✅ CODE VERIFIES COLUMN EXISTS Code Evidence:

# repo_b/upload_to_bq.py:599
stage3_df['tpa_applied'] = df['is_tpa'].apply(parse_bool)

Verification: ✅ Repo B upload sets tpa_applied from is_tpa column Action Required: Verify is_tpa column exists in Repo B Stage3 CSV outputs

🔴 REMAINING RED FLAGS

RED FLAG #9: QTD Baseline Calculation for Multi-Month Validation

Issue: Spec mentions “QTD (June–August)” but QTD logic depends on selected period:

If validating August 2025 QTD: baseline = July 1, current = August 31
If validating July 2025 QTD: baseline = July 1, current = July 31
If validating June 2025 QTD: baseline = April 1 (Q2), current = June 30

Impact:

Parity validator needs to understand QTD baseline varies by selected period
For August 2025 QTD validation, need July 1 baseline (already covered by multi-month backfill)
For June 2025 QTD validation, need April 1 baseline (NOT in shadow backfill scope)

Recommendation:

Clarify: Are we validating QTD for each month separately (June QTD, July QTD, August QTD)?
OR: Are we validating Q3 QTD only (aggregating June–August)?
Current spec says “QTD (June–August)” which suggests Q3 aggregation, but needs clarification

Action Required: Clarify QTD validation scope

RED FLAG #10: Growth/Loss QTD/YTD Uses Snapshot Comparison

Issue: get_growth_loss_summary_snapshot() (for QTD/YTD) uses snapshot comparison:

Current snapshot: Selected period end
Baseline snapshot: Quarter start (for QTD) or Jan 1 (for YTD)

Impact:

QTD Growth/Loss for August 2025 compares:
- Current: August 31 snapshot
- Baseline: July 1 snapshot
This requires July shadow data (✅ covered)
But also requires understanding that QTD uses snapshot comparison, not cumulative aggregation

Recommendation:

Parity validator should use get_growth_loss_summary_snapshot() logic for QTD validation
Document that QTD Growth/Loss uses snapshot comparison (not cumulative)

Action Required: Document QTD Growth/Loss validation approach

RED FLAG #11: View DDL Storage Location

Issue: Spec says:

“Store shadow view SQL in: integration/bigquery/sql/views_shadow/”

Verification: ✅ Clear location specified Action Required: Create directory structure if it doesn’t exist

⚠️ WARNINGS (Non-Blocking)

WARNING #1: Project ID Hardcoding

Issue: All view SQL hardcodes payroll-bi-gauntlet project ID Impact: Shadow views will also hardcode project ID (acceptable for Phase 1) Recommendation: Acceptable for shadow mode. Consider parameterization in future.

WARNING #2: Period Format Consistency

Issue: Spec uses both formats:

2025-06-01 (with day)
2025-06 (without day)

Impact: Need to ensure consistent normalization Recommendation: Use YYYY-MM-01 format consistently (matches BigQuery DATE partitioning)

WARNING #3: QTD Quarter Definition

Issue: Q3 2025 = July, August, September (not June, July, August) Impact:

If validating “QTD (June–August)”, this is NOT a standard quarter
Standard Q3 = July–September
June is end of Q2

Recommendation: Clarify:

Are we validating Q2 QTD (April–June) + Q3 QTD (July–September)?
OR: Are we validating a custom 3-month period (June–August)?

Current Spec Says: “QTD (June–August)” - this suggests custom period, not standard quarter

📋 UPDATED PRE-IMPLEMENTATION CHECKLIST

✅ Shadow naming strategy clarified (dataset-level)
✅ Multi-month scope defined (June, July, August)
✅ top_agents dual-table dependency acknowledged
✅ Shadow allowlist specified
⚠️ Clarify QTD validation scope (standard quarter vs custom period)
⚠️ Verify is_tpa column exists in Repo B Stage3 CSV
⚠️ Design query builder pattern for parity validator (or use string replacement)
⚠️ Create views_shadow/ directory structure
⚠️ Document QTD Growth/Loss snapshot comparison logic

✅ ALIGNMENT SUMMARY

Overall Assessment: Updated specification resolves 6 of 8 original red flags explicitly. 2 red flags need minor clarification (QTD scope, query builder pattern).

Resolved (6):

✅ Shadow naming strategy (dataset-level)
✅ Hardcoded constants (helper function specified)
✅ top_agents dual-table dependency
✅ Period validation (shadow allowlist)
✅ MoM previous period requirement (multi-month backfill)
✅ YTD/QTD scope (included in validator)

Needs Clarification (2):

⚠️ QTD validation scope (standard quarter vs custom June–August period)
⚠️ Parity validator query builder pattern (implementation detail)

New Red Flags (1):

🔴 QTD baseline calculation for multi-month validation

🎯 READINESS ASSESSMENT

Status: ✅ MOSTLY READY FOR IMPLEMENTATION Blockers: None (all can be resolved during implementation) Recommendations:

Clarify QTD validation scope before writing parity validator SQL
Verify is_tpa column exists in Repo B outputs before shadow backfill
Design query builder helper early (before writing validator)

Confidence Level: 🟢 HIGH - Specification is well-aligned with validated contract map

Next Step: Proceed with implementation, addressing clarifications as they arise

Start Here

Operate & Support

Reference

Build & Integrate

Shadow mode alignment check v2

Shadow Mode Implementation - Updated Alignment Check

✅ RESOLVED RED FLAGS

RED FLAG #1: Shadow Naming Strategy ✅ RESOLVED

RED FLAG #2: Hardcoded Dataset Constants ✅ PARTIALLY RESOLVED

RED FLAG #3: `top_agents` Dual-Table Dependency ✅ RESOLVED

RED FLAG #4: Period Validation ✅ RESOLVED

RED FLAG #5: Parity Validator Dual-Dataset Queries ⚠️ NEEDS CLARIFICATION

RED FLAG #6: MoM Calculations Require Previous Period ✅ RESOLVED

RED FLAG #7: YTD/QTD Direct Table Queries ✅ RESOLVED

RED FLAG #8: `tpa_applied` Column ✅ VERIFIED IN CODE

🔴 REMAINING RED FLAGS

RED FLAG #9: QTD Baseline Calculation for Multi-Month Validation

RED FLAG #10: Growth/Loss QTD/YTD Uses Snapshot Comparison

RED FLAG #11: View DDL Storage Location

⚠️ WARNINGS (Non-Blocking)

WARNING #1: Project ID Hardcoding

WARNING #2: Period Format Consistency

WARNING #3: QTD Quarter Definition

📋 UPDATED PRE-IMPLEMENTATION CHECKLIST

✅ ALIGNMENT SUMMARY

Resolved (6):

Needs Clarification (2):

New Red Flags (1):

🎯 READINESS ASSESSMENT

Start Here

Operate & Support

Reference

Build & Integrate

​Shadow Mode Implementation - Updated Alignment Check

​✅ RESOLVED RED FLAGS

​RED FLAG #1: Shadow Naming Strategy ✅ RESOLVED

​RED FLAG #2: Hardcoded Dataset Constants ✅ PARTIALLY RESOLVED

​RED FLAG #3: top_agents Dual-Table Dependency ✅ RESOLVED

​RED FLAG #4: Period Validation ✅ RESOLVED

​RED FLAG #5: Parity Validator Dual-Dataset Queries ⚠️ NEEDS CLARIFICATION

​RED FLAG #6: MoM Calculations Require Previous Period ✅ RESOLVED

​RED FLAG #7: YTD/QTD Direct Table Queries ✅ RESOLVED

​RED FLAG #8: tpa_applied Column ✅ VERIFIED IN CODE

​🔴 REMAINING RED FLAGS

​RED FLAG #9: QTD Baseline Calculation for Multi-Month Validation

​RED FLAG #10: Growth/Loss QTD/YTD Uses Snapshot Comparison

​RED FLAG #11: View DDL Storage Location

​⚠️ WARNINGS (Non-Blocking)

​WARNING #1: Project ID Hardcoding

​WARNING #2: Period Format Consistency

​WARNING #3: QTD Quarter Definition

​📋 UPDATED PRE-IMPLEMENTATION CHECKLIST

​✅ ALIGNMENT SUMMARY

​Resolved (6):

​Needs Clarification (2):

​New Red Flags (1):

​🎯 READINESS ASSESSMENT

Shadow Mode Implementation - Updated Alignment Check

✅ RESOLVED RED FLAGS

RED FLAG #1: Shadow Naming Strategy ✅ RESOLVED

RED FLAG #2: Hardcoded Dataset Constants ✅ PARTIALLY RESOLVED

RED FLAG #3: `top_agents` Dual-Table Dependency ✅ RESOLVED

RED FLAG #4: Period Validation ✅ RESOLVED

RED FLAG #5: Parity Validator Dual-Dataset Queries ⚠️ NEEDS CLARIFICATION

RED FLAG #6: MoM Calculations Require Previous Period ✅ RESOLVED

RED FLAG #7: YTD/QTD Direct Table Queries ✅ RESOLVED

RED FLAG #8: `tpa_applied` Column ✅ VERIFIED IN CODE

🔴 REMAINING RED FLAGS

RED FLAG #9: QTD Baseline Calculation for Multi-Month Validation

RED FLAG #10: Growth/Loss QTD/YTD Uses Snapshot Comparison

RED FLAG #11: View DDL Storage Location

⚠️ WARNINGS (Non-Blocking)

WARNING #1: Project ID Hardcoding

WARNING #2: Period Format Consistency

WARNING #3: QTD Quarter Definition

📋 UPDATED PRE-IMPLEMENTATION CHECKLIST

✅ ALIGNMENT SUMMARY

Resolved (6):

Needs Clarification (2):

New Red Flags (1):

🎯 READINESS ASSESSMENT