Skip to main content

Repo B Shadow Mode Execution Guide

Purpose

Guide for running Repo B in shadow mode and validating dashboard parity.

When to run this

  • Before Repo B cutover
  • After data/logic changes that may impact dashboard parity
  • During parity incident investigation

Prerequisites

  1. Environment Setup:
    # Set shadow mode environment variable
    export BQ_SHADOW_MODE=1
    
    # Verify GCP authentication
    gcloud auth application-default login
    
    # Verify project ID
    echo $GCP_PROJECT_ID  # Should output: payroll-bi-gauntlet
    
  2. Required Files:
    • Repo B Stage1_Raw CSV files for June-September 2025
    • Repo B Stage3_Full_Detail CSV files for June-September 2025
    • Location: data/outputs/repo_b/dashboard_data/{YYYY-MM}/dashboard_csvs/

Inputs

  • Periods to backfill (June-September 2025 baseline in this guide)
  • Tenant (creative_benefit_strategies in examples)
  • Repo B output CSV files
  • Shadow validation output file path (for example parity_results.json)

Procedure

Last Updated

2025-12-19

Overview

Shadow mode allows Repo B to write to parallel datasets (payroll_raw_shadow, payroll_processed_shadow, payroll_analytics_shadow) without affecting production data. This enables safe validation of Repo B output before cutover.

1) Shadow backfill (June-September 2025)

Upload Single Period

# Set shadow mode
export BQ_SHADOW_MODE=1

# Upload June 2025
python -m repo_b.upload_to_bq --period 2025-06

# Upload July 2025
python -m repo_b.upload_to_bq --period 2025-07

# Upload August 2025
python -m repo_b.upload_to_bq --period 2025-08

# Upload September 2025
python -m repo_b.upload_to_bq --period 2025-09

Batch Upload Script

Create scripts/shadow_backfill_june_september.sh:
#!/bin/bash
set -e

export BQ_SHADOW_MODE=1

PERIODS=("2025-06" "2025-07" "2025-08" "2025-09")

for period in "${PERIODS[@]}"; do
    echo "Uploading $period..."
    python -m repo_b.upload_to_bq --period "$period"
    echo "✅ $period complete"
done

echo "✅ Shadow backfill complete"
Run:
chmod +x scripts/shadow_backfill_june_september.sh
./scripts/shadow_backfill_june_september.sh

Verification

After upload, verify shadow datasets:
-- Check shadow dataset row counts
SELECT 
  'stage1_shadow' as table_name,
  period_label,
  COUNT(*) as row_count
FROM `payroll-bi-gauntlet.payroll_raw_shadow.stage1_snapshots`
WHERE tenant_id = 'creative_benefit_strategies'
  AND period_label IN (DATE('2025-06-01'), DATE('2025-07-01'), DATE('2025-08-01'), DATE('2025-09-01'))
GROUP BY period_label
ORDER BY period_label

UNION ALL

SELECT 
  'stage3_shadow' as table_name,
  period_label,
  COUNT(*) as row_count
FROM `payroll-bi-gauntlet.payroll_processed_shadow.stage3_snapshots`
WHERE tenant_id = 'creative_benefit_strategies'
  AND period_label IN (DATE('2025-06-01'), DATE('2025-07-01'), DATE('2025-08-01'), DATE('2025-09-01'))
GROUP BY period_label
ORDER BY period_label
Expected: Row counts should match production datasets for the same periods.

2) Create shadow analytics views

Deploy Views

# Deploy CEO snapshot view
bq query --use_legacy_sql=false < integration/bigquery/sql/views_shadow/analytics_ceo_snapshot.sql

# Deploy top businesses view
bq query --use_legacy_sql=false < integration/bigquery/sql/views_shadow/analytics_top_businesses.sql

# Deploy top agents view
bq query --use_legacy_sql=false < integration/bigquery/sql/views_shadow/analytics_top_agents.sql

# Deploy business growth/loss view
bq query --use_legacy_sql=false < integration/bigquery/sql/views_shadow/analytics_business_growth_loss.sql

Verification

-- Test shadow views
SELECT COUNT(*) as ceo_snapshot_rows
FROM `payroll-bi-gauntlet.payroll_analytics_shadow.ceo_snapshot`
WHERE tenant_id = 'creative_benefit_strategies'
  AND period_label IN (DATE('2025-06-01'), DATE('2025-07-01'), DATE('2025-08-01'), DATE('2025-09-01'))

SELECT COUNT(*) as top_businesses_rows
FROM `payroll-bi-gauntlet.payroll_analytics_shadow.top_businesses`
WHERE tenant_id = 'creative_benefit_strategies'
  AND period_label IN (DATE('2025-06-01'), DATE('2025-07-01'), DATE('2025-08-01'), DATE('2025-09-01'))

SELECT COUNT(*) as top_agents_rows
FROM `payroll-bi-gauntlet.payroll_analytics_shadow.top_agents`
WHERE tenant_id = 'creative_benefit_strategies'
  AND period_label IN (DATE('2025-06-01'), DATE('2025-07-01'), DATE('2025-08-01'), DATE('2025-09-01'))

SELECT COUNT(*) as growth_loss_rows
FROM `payroll-bi-gauntlet.payroll_analytics_shadow.business_growth_loss`
WHERE tenant_id = 'creative_benefit_strategies'
  AND period_label IN (DATE('2025-06-01'), DATE('2025-07-01'), DATE('2025-08-01'), DATE('2025-09-01'))
Expected: All views return non-zero row counts.

3) Run parity validation

Basic Execution

# Run parity validator
python tests/verify_dashboard_parity_shadow.py

With JSON Output

# Save results to JSON file
python tests/verify_dashboard_parity_shadow.py --json-output parity_results.json

Example Output

================================================================================
DASHBOARD PARITY VALIDATION: Repo A (prod) vs Repo B (shadow)
================================================================================
Tenant ID: creative_benefit_strategies

MTD VALIDATION
--------------------------------------------------------------------------------

Period: 2025-07-01
  [PASS] CEO Metrics
  [PASS] Top Businesses Ranking
  [PASS] Top Agents Ranking

Period: 2025-08-01
  [PASS] CEO Metrics
  [PASS] Top Businesses Ranking
  [PASS] Top Agents Ranking

================================================================================
MoM VALIDATION
--------------------------------------------------------------------------------

MoM: 2025-07-01 ↔ 2025-06-01
  [PASS] Growth/Loss Reconciliation

MoM: 2025-08-01 ↔ 2025-07-01
  [PASS] Growth/Loss Reconciliation

================================================================================
QTD VALIDATION (Q3: baseline = 2025-07-01)
--------------------------------------------------------------------------------

Period: 2025-07-01 QTD
  [PASS] CEO Metrics QTD
  [PASS] Growth/Loss QTD

Period: 2025-08-01 QTD
  [PASS] CEO Metrics QTD
  [PASS] Growth/Loss QTD

================================================================================
SUMMARY
--------------------------------------------------------------------------------
Total Checks: 24
Passed: 24
Failed: 0

✅ ALL PARITY CHECKS PASSED

Interpreting Results

  • [PASS]: Metric matches within tolerance
    • Counts: exact match (diff = 0)
    • Money: abs diff ≤ $0.05
    • Rankings: identical ordered list
  • [FAIL]: Metric mismatch
    • Check diff values in output
    • Review JSON output for detailed comparison
    • Investigate root cause before proceeding

Common Issues

  1. Missing Shadow Data:
    • Error: No shadow data found for {period}
    • Fix: Run shadow backfill for the missing period
  2. View Not Created:
    • Error: Table not found: payroll_analytics_shadow.{view_name}
    • Fix: Deploy shadow views (Step 2)
  3. Money Tolerance Exceeded:
    • Error: abs_diff > 0.05
    • Fix: Investigate rounding differences or calculation logic
  4. Ranking Mismatch:
    • Error: prod != shadow (ordered lists differ)
    • Fix: Check business/agent name normalization

4) Review parity results

JSON Output Structure

{
  "mtd": {
    "2025-07-01": {
      "ceo_metrics": {
        "total_businesses": {
          "metric": "total_businesses",
          "prod": 150,
          "shadow": 150,
          "diff": 0,
          "match": true,
          "tolerance": 0
        },
        "gross_payout": {
          "metric": "gross_payout",
          "prod": 18523.78,
          "shadow": 18523.80,
          "diff": 0.02,
          "abs_diff": 0.02,
          "match": true,
          "tolerance": 0.05
        }
      },
      "top_businesses": {
        "ranking": {
          "metric": "top_businesses_ranking",
          "prod": ["Business A", "Business B", ...],
          "shadow": ["Business A", "Business B", ...],
          "match": true
        }
      }
    }
  },
  "mom": {
    "2025-07-01_vs_2025-06-01": {
      "reconciliation": {
        "comparison": {
          "current_employees": {
            "metric": "current_employees",
            "prod": 5000,
            "shadow": 5000,
            "diff": 0,
            "match": true
          }
        }
      }
    }
  },
  "qtd": {
    "2025-07-01": {
      "ceo_metrics": {
        "gross_payout": {
          "metric": "gross_payout_qtd",
          "prod": 18523.78,
          "shadow": 18523.80,
          "diff": 0.02,
          "match": true
        }
      }
    }
  },
  "summary": {
    "total_checks": 24,
    "passed": 24,
    "failed": 0
  }
}

Review Checklist

  • All MTD checks pass for June-September
  • All MoM checks pass for Jul↔Jun, Aug↔Jul, Sep↔Aug
  • All QTD checks pass for July, August, September
  • No money diffs exceed $0.05
  • All rankings match exactly
  • All counts match exactly

Verification

  • Shadow datasets contain expected row counts for all target periods
  • Shadow analytics views are deployed and queryable
  • Parity validator completes and reports pass/fail breakdown
  • Any failing metrics are documented with root-cause notes

Failure modes & fixes

Issue: is_tpa Column Missing

Error:
CRITICAL: Stage3 CSV missing required column 'is_tpa'
Fix:
  • Regenerate Stage3 CSV with Repo B engine
  • Ensure is_tpa column is included in CSV export

Issue: Shadow Mode Not Active

Error:
SEATBELT: Refusing to run for period=2025-07
Fix:
export BQ_SHADOW_MODE=1
# Verify
python -c "import os; print('Shadow mode:', os.getenv('BQ_SHADOW_MODE'))"

Issue: Period Not in Shadow Allowlist

Error:
SHADOW SEATBELT: Refusing to run for period=2025-10-01
Fix:
  • Add period to ALLOWED_PERIODS_SHADOW in repo_b/upload_to_bq.py
  • Or use --force flag (dangerous, only for testing)

Artifacts produced

  • Shadow backfill execution logs
  • Shadow view deployment results
  • Parity output (console and optional parity_results.json)
  • Review checklist with pass/fail status
  • docs/runbooks/engine_cutover_repoB.md
  • docs/shadow_mode_final_alignment.md
  • docs/runbooks/STAGE3_KICKOFF.md

Supporting reference

Next steps

After successful parity validation:
  1. Review Results: Ensure all checks pass
  2. Document Findings: Note any minor diffs or edge cases
  3. Proceed to Cutover: Follow docs/runbooks/engine_cutover_repoB.md

Support

For issues or questions:
  • Check docs/shadow_mode_final_alignment.md for architecture details
  • Review docs/runbooks/engine_cutover_repoB.md for cutover procedure
  • Contact Data Engineering team