Shadow Mode Implementation - Red Flags & Alignment Check

Date: 2025-01-XX
Purpose: Identify red flags and alignment issues before implementing Repo B Shadow Mode

✅ ALIGNMENT VERIFIED

1. Base Objects Match Contract

✅ payroll_raw.stage1_snapshots (plural) - CORRECT
✅ payroll_processed.stage3_snapshots (plural) - CORRECT
✅ All 4 views correctly identified: ceo_snapshot, top_businesses, top_agents, business_growth_loss

2. View Dependencies Correctly Identified

✅ ceo_snapshot → stage1_snapshots only
✅ top_businesses → stage1_snapshots only
✅ top_agents → stage3_snapshots + stage1_snapshots (JOIN) - CRITICAL
✅ business_growth_loss → stage1_snapshots only

3. Parity Validator Scope Correct

✅ CEO MTD metrics (all required fields identified)
✅ Top businesses MTD (order + values)
✅ Top agents MTD (order + values)
✅ Growth/Loss reconciliation (MoM formula)

🔴 RED FLAGS

RED FLAG #1: Ambiguous Shadow Table Naming Strategy

Issue: Task description is ambiguous about shadow table naming:

Option A: Table-level suffixing: stage1_snapshots_shadow, stage3_snapshots_shadow
Option B: Dataset-level shadowing: payroll_raw_shadow.stage1_snapshots, payroll_processed_shadow.stage3_snapshots
Task mentions BOTH: “parallel tables” with *_shadow suffix AND “parallel dataset payroll_analytics_shadow”

Impact:

If using table suffix: Views need to reference stage1_snapshots_shadow (same dataset)
If using dataset suffix: Views need to reference payroll_raw_shadow.stage1_snapshots (different dataset)

Recommendation:

Use dataset-level shadowing (Option B) for cleaner separation
Shadow tables: payroll_raw_shadow.stage1_snapshots, payroll_processed_shadow.stage3_snapshots
Shadow views: payroll_analytics_shadow.* (all views)
Rationale: Easier to drop entire shadow dataset for cleanup, clearer separation

Action Required: Clarify in task specification

RED FLAG #2: Hardcoded Dataset Constants in Upload Functions

Issue: repo_b/upload_to_bq.py uses hardcoded constants:

DEFAULT_DATASET_RAW = "payroll_raw"
DEFAULT_DATASET_PROCESSED = "payroll_processed"
DEFAULT_DATASET_ANALYTICS = "payroll_analytics"

Functions load_stage1_to_bigquery() and load_stage3_to_bigquery() accept dataset_id parameter but default to these constants. Impact:

Shadow mode needs to override dataset names
Current code structure allows this (parameters exist), but need to ensure all call sites pass shadow dataset names

Recommendation:

Add BQ_SHADOW_MODE env var that sets dataset suffix
Create helper function: get_dataset_name(base_name: str, shadow_mode: bool) -> str
Update all upload function calls to use helper

Action Required: Implement configurable dataset naming

RED FLAG #3: `top_agents` View Joins BOTH Shadow Tables

Issue: analytics_top_agents.sql view joins:

payroll_processed.stage3_snapshots (for commissions)
payroll_raw.stage1_snapshots (for employee counts)

Impact:

Shadow view MUST reference BOTH shadow tables:
- payroll_processed_shadow.stage3_snapshots
- payroll_raw_shadow.stage1_snapshots
If shadow view only references one shadow table, parity test will fail

Recommendation:

Shadow view SQL must replace ALL table references:

-- Production:
FROM `payroll-bi-gauntlet.payroll_processed.stage3_snapshots`
FROM `payroll-bi-gauntlet.payroll_raw.stage1_snapshots`

-- Shadow:
FROM `payroll-bi-gauntlet.payroll_processed_shadow.stage3_snapshots`
FROM `payroll-bi-gauntlet.payroll_raw_shadow.stage1_snapshots`

Action Required: Ensure shadow view DDL replaces ALL table references (not just one)

RED FLAG #4: July 2025 Period Validation

Issue: Task specifies July 2025 (2025-07-01) for parity validation, but:

ALLOWED_PERIODS_SEATBELT = {"2025-06"} in upload_to_bq.py (line 30)
July 2025 may not be in allowlist

Impact:

Shadow mode upload may fail if July 2025 not in allowlist
Need to either:
- Add July 2025 to allowlist, OR
- Use --force flag (dangerous), OR
- Update seatbelt to allow shadow mode periods

Recommendation:

Add July 2025 to ALLOWED_PERIODS_SEATBELT OR
Create separate ALLOWED_PERIODS_SHADOW constant
Shadow mode should bypass seatbelt OR use separate allowlist

Action Required: Update seatbelt logic for shadow mode

RED FLAG #5: Parity Validator Needs to Query BOTH Production and Shadow

Issue: Parity validator must compare:

Production: Query payroll_analytics.* views (or payroll_raw.stage1_snapshots directly)
Shadow: Query payroll_analytics_shadow.* views (or payroll_raw_shadow.stage1_snapshots directly)

Impact:

Validator needs to construct queries with configurable dataset names
Cannot hardcode production dataset names

Recommendation:

Create query builder functions that accept dataset_suffix parameter
Example: get_ceo_metrics_query(use_shadow: bool) -> str
Or: Use parameterized queries with dataset names as parameters

Action Required: Design query builder pattern for dual-dataset comparison

RED FLAG #6: MoM Calculations Require Previous Period Data

Issue:

ceo_snapshot view uses LAG() window function (requires previous period)
business_growth_loss view uses LAG() window function (requires previous period)
Growth/Loss endpoints use FULL OUTER JOIN between current and previous month

Impact:

Shadow tables for July 2025 alone may not produce correct MoM calculations
Need June 2025 shadow data for July MoM to work correctly

Recommendation:

Shadow mode backfill should include June 2025 (previous period) for July validation
OR: Document that MoM calculations will be NULL/incorrect without previous period
OR: Use direct table queries (bypass views) for parity validation

Action Required: Clarify whether June 2025 shadow data is required for July validation

RED FLAG #7: YTD/QTD Functions Query Tables Directly

Issue: Some query functions bypass views and query tables directly:

get_top_businesses_ytd() → stage1_snapshots (direct)
get_top_businesses_qtd() → stage1_snapshots (direct)
get_top_agents_ytd() → stage3_snapshots (direct)
get_top_agents_qtd() → stage3_snapshots (direct)

Impact:

Parity validator for YTD/QTD must query shadow tables directly
Cannot rely on shadow views for YTD/QTD validation

Recommendation:

Parity validator should query shadow tables directly for YTD/QTD comparisons
OR: Create shadow views that support YTD/QTD (but this may be overkill)

Action Required: Document that YTD/QTD parity validation queries tables directly

RED FLAG #8: Agent Commissions Query Uses `tpa_applied = TRUE` Filter

Issue: get_agent_commissions_range() queries:

FROM `payroll-bi-gauntlet.payroll_processed.stage3_snapshots`
WHERE tpa_applied = TRUE

Impact:

Parity validator must ensure Repo B shadow data includes tpa_applied column
Repo B upload must set tpa_applied correctly (matches Repo A logic)

Recommendation:

Verify Repo B load_stage3_to_bigquery() sets tpa_applied correctly
Add validation check: SELECT COUNT(*) WHERE tpa_applied IS NULL should be 0

Action Required: Verify tpa_applied column population in Repo B upload

⚠️ WARNINGS (Non-Blocking)

WARNING #1: View SQL Hardcodes Project ID

Issue: All view SQL files hardcode payroll-bi-gauntlet project ID:

FROM `payroll-bi-gauntlet.payroll_raw.stage1_snapshots`

Impact:

Shadow views will also hardcode project ID (acceptable for now)
If project changes, need to update all shadow view DDL

Recommendation: Acceptable for Phase 1 (shadow mode). Consider parameterization in Phase 2.

WARNING #2: Rank Computation Mismatch

Issue:

Views use QUALIFY ROW_NUMBER() ... <= 10 to LIMIT but don’t expose rank
Backend adds ROW_NUMBER() OVER (ORDER BY ...) as rank in SELECT

Impact:

Parity validator should ignore rank field (compare order by name/ID instead)
Task already specifies: “ignore internal rank numbering fields” ✅

Recommendation: Already handled in task spec. No action needed.

WARNING #3: Tolerance Rules May Need Adjustment

Issue: Task specifies:

Money: abs diff ≤ 0.05 (5 cents)
Counts: exact match (0 tolerance)

Impact:

BigQuery NUMERIC precision may cause floating-point differences
Need to verify tolerance is appropriate for actual data

Recommendation: Start with task spec tolerances, adjust if needed based on actual parity results.

📋 PRE-IMPLEMENTATION CHECKLIST

Before implementing shadow mode:

Clarify shadow naming strategy (dataset-level vs table-level)
Update seatbelt allowlist to include July 2025 (or create shadow allowlist)
Verify Repo B upload sets tpa_applied correctly
Decide on June 2025 shadow data requirement for July MoM validation
Design query builder pattern for dual-dataset parity validation
Verify all table references in shadow view DDL (especially top_agents JOIN)
Test shadow dataset creation in BigQuery (permissions, quotas)

✅ ALIGNMENT SUMMARY

Overall Assessment: Task specification is mostly aligned with validated contract map, but has 7 red flags that must be resolved before implementation:

🔴 Shadow naming strategy ambiguity
🔴 Hardcoded dataset constants
🔴 top_agents view dual-table dependency
🔴 July 2025 seatbelt restriction
🔴 Parity validator dual-dataset query pattern
🔴 MoM calculations require previous period
🔴 YTD/QTD direct table queries

Recommendation: Resolve red flags #1-4 before starting implementation. Red flags #5-7 can be addressed during implementation with proper design.

Status: ⚠️ RED FLAGS IDENTIFIED - RESOLUTION REQUIRED BEFORE IMPLEMENTATION

Start Here

Operate & Support

Reference

Build & Integrate

Shadow mode red flags

Shadow Mode Implementation - Red Flags & Alignment Check

✅ ALIGNMENT VERIFIED

1. Base Objects Match Contract

2. View Dependencies Correctly Identified

3. Parity Validator Scope Correct

🔴 RED FLAGS

RED FLAG #1: Ambiguous Shadow Table Naming Strategy

RED FLAG #2: Hardcoded Dataset Constants in Upload Functions

RED FLAG #3: `top_agents` View Joins BOTH Shadow Tables

RED FLAG #4: July 2025 Period Validation

RED FLAG #5: Parity Validator Needs to Query BOTH Production and Shadow

RED FLAG #6: MoM Calculations Require Previous Period Data

RED FLAG #7: YTD/QTD Functions Query Tables Directly

RED FLAG #8: Agent Commissions Query Uses `tpa_applied = TRUE` Filter

⚠️ WARNINGS (Non-Blocking)

WARNING #1: View SQL Hardcodes Project ID

WARNING #2: Rank Computation Mismatch

WARNING #3: Tolerance Rules May Need Adjustment

📋 PRE-IMPLEMENTATION CHECKLIST

✅ ALIGNMENT SUMMARY

Start Here

Operate & Support

Reference

Build & Integrate

​Shadow Mode Implementation - Red Flags & Alignment Check

​✅ ALIGNMENT VERIFIED

​1. Base Objects Match Contract

​2. View Dependencies Correctly Identified

​3. Parity Validator Scope Correct

​🔴 RED FLAGS

​RED FLAG #1: Ambiguous Shadow Table Naming Strategy

​RED FLAG #2: Hardcoded Dataset Constants in Upload Functions

​RED FLAG #3: top_agents View Joins BOTH Shadow Tables

​RED FLAG #4: July 2025 Period Validation

​RED FLAG #5: Parity Validator Needs to Query BOTH Production and Shadow

​RED FLAG #6: MoM Calculations Require Previous Period Data

​RED FLAG #7: YTD/QTD Functions Query Tables Directly

​RED FLAG #8: Agent Commissions Query Uses tpa_applied = TRUE Filter

​⚠️ WARNINGS (Non-Blocking)

​WARNING #1: View SQL Hardcodes Project ID

​WARNING #2: Rank Computation Mismatch

​WARNING #3: Tolerance Rules May Need Adjustment

​📋 PRE-IMPLEMENTATION CHECKLIST

​✅ ALIGNMENT SUMMARY

Shadow Mode Implementation - Red Flags & Alignment Check

✅ ALIGNMENT VERIFIED

1. Base Objects Match Contract

2. View Dependencies Correctly Identified

3. Parity Validator Scope Correct

🔴 RED FLAGS

RED FLAG #1: Ambiguous Shadow Table Naming Strategy

RED FLAG #2: Hardcoded Dataset Constants in Upload Functions

RED FLAG #3: `top_agents` View Joins BOTH Shadow Tables

RED FLAG #4: July 2025 Period Validation

RED FLAG #5: Parity Validator Needs to Query BOTH Production and Shadow

RED FLAG #6: MoM Calculations Require Previous Period Data

RED FLAG #7: YTD/QTD Functions Query Tables Directly

RED FLAG #8: Agent Commissions Query Uses `tpa_applied = TRUE` Filter

⚠️ WARNINGS (Non-Blocking)

WARNING #1: View SQL Hardcodes Project ID

WARNING #2: Rank Computation Mismatch

WARNING #3: Tolerance Rules May Need Adjustment

📋 PRE-IMPLEMENTATION CHECKLIST

✅ ALIGNMENT SUMMARY