Skip to main content

Shadow Mode Implementation - Red Flags & Alignment Check

Date: 2025-01-XX
Purpose: Identify red flags and alignment issues before implementing Repo B Shadow Mode

βœ… ALIGNMENT VERIFIED

1. Base Objects Match Contract

  • βœ… payroll_raw.stage1_snapshots (plural) - CORRECT
  • βœ… payroll_processed.stage3_snapshots (plural) - CORRECT
  • βœ… All 4 views correctly identified: ceo_snapshot, top_businesses, top_agents, business_growth_loss

2. View Dependencies Correctly Identified

  • βœ… ceo_snapshot β†’ stage1_snapshots only
  • βœ… top_businesses β†’ stage1_snapshots only
  • βœ… top_agents β†’ stage3_snapshots + stage1_snapshots (JOIN) - CRITICAL
  • βœ… business_growth_loss β†’ stage1_snapshots only

3. Parity Validator Scope Correct

  • βœ… CEO MTD metrics (all required fields identified)
  • βœ… Top businesses MTD (order + values)
  • βœ… Top agents MTD (order + values)
  • βœ… Growth/Loss reconciliation (MoM formula)

πŸ”΄ RED FLAGS

RED FLAG #1: Ambiguous Shadow Table Naming Strategy

Issue: Task description is ambiguous about shadow table naming:
  • Option A: Table-level suffixing: stage1_snapshots_shadow, stage3_snapshots_shadow
  • Option B: Dataset-level shadowing: payroll_raw_shadow.stage1_snapshots, payroll_processed_shadow.stage3_snapshots
  • Task mentions BOTH: β€œparallel tables” with *_shadow suffix AND β€œparallel dataset payroll_analytics_shadow”
Impact:
  • If using table suffix: Views need to reference stage1_snapshots_shadow (same dataset)
  • If using dataset suffix: Views need to reference payroll_raw_shadow.stage1_snapshots (different dataset)
Recommendation:
  • Use dataset-level shadowing (Option B) for cleaner separation
  • Shadow tables: payroll_raw_shadow.stage1_snapshots, payroll_processed_shadow.stage3_snapshots
  • Shadow views: payroll_analytics_shadow.* (all views)
  • Rationale: Easier to drop entire shadow dataset for cleanup, clearer separation
Action Required: Clarify in task specification

RED FLAG #2: Hardcoded Dataset Constants in Upload Functions

Issue: repo_b/upload_to_bq.py uses hardcoded constants:
DEFAULT_DATASET_RAW = "payroll_raw"
DEFAULT_DATASET_PROCESSED = "payroll_processed"
DEFAULT_DATASET_ANALYTICS = "payroll_analytics"
Functions load_stage1_to_bigquery() and load_stage3_to_bigquery() accept dataset_id parameter but default to these constants. Impact:
  • Shadow mode needs to override dataset names
  • Current code structure allows this (parameters exist), but need to ensure all call sites pass shadow dataset names
Recommendation:
  • Add BQ_SHADOW_MODE env var that sets dataset suffix
  • Create helper function: get_dataset_name(base_name: str, shadow_mode: bool) -> str
  • Update all upload function calls to use helper
Action Required: Implement configurable dataset naming

RED FLAG #3: top_agents View Joins BOTH Shadow Tables

Issue: analytics_top_agents.sql view joins:
  • payroll_processed.stage3_snapshots (for commissions)
  • payroll_raw.stage1_snapshots (for employee counts)
Impact:
  • Shadow view MUST reference BOTH shadow tables:
    • payroll_processed_shadow.stage3_snapshots
    • payroll_raw_shadow.stage1_snapshots
  • If shadow view only references one shadow table, parity test will fail
Recommendation:
  • Shadow view SQL must replace ALL table references:
    -- Production:
    FROM `payroll-bi-gauntlet.payroll_processed.stage3_snapshots`
    FROM `payroll-bi-gauntlet.payroll_raw.stage1_snapshots`
    
    -- Shadow:
    FROM `payroll-bi-gauntlet.payroll_processed_shadow.stage3_snapshots`
    FROM `payroll-bi-gauntlet.payroll_raw_shadow.stage1_snapshots`
    
Action Required: Ensure shadow view DDL replaces ALL table references (not just one)

RED FLAG #4: July 2025 Period Validation

Issue: Task specifies July 2025 (2025-07-01) for parity validation, but:
  • ALLOWED_PERIODS_SEATBELT = {"2025-06"} in upload_to_bq.py (line 30)
  • July 2025 may not be in allowlist
Impact:
  • Shadow mode upload may fail if July 2025 not in allowlist
  • Need to either:
    • Add July 2025 to allowlist, OR
    • Use --force flag (dangerous), OR
    • Update seatbelt to allow shadow mode periods
Recommendation:
  • Add July 2025 to ALLOWED_PERIODS_SEATBELT OR
  • Create separate ALLOWED_PERIODS_SHADOW constant
  • Shadow mode should bypass seatbelt OR use separate allowlist
Action Required: Update seatbelt logic for shadow mode

RED FLAG #5: Parity Validator Needs to Query BOTH Production and Shadow

Issue: Parity validator must compare:
  • Production: Query payroll_analytics.* views (or payroll_raw.stage1_snapshots directly)
  • Shadow: Query payroll_analytics_shadow.* views (or payroll_raw_shadow.stage1_snapshots directly)
Impact:
  • Validator needs to construct queries with configurable dataset names
  • Cannot hardcode production dataset names
Recommendation:
  • Create query builder functions that accept dataset_suffix parameter
  • Example: get_ceo_metrics_query(use_shadow: bool) -> str
  • Or: Use parameterized queries with dataset names as parameters
Action Required: Design query builder pattern for dual-dataset comparison

RED FLAG #6: MoM Calculations Require Previous Period Data

Issue:
  • ceo_snapshot view uses LAG() window function (requires previous period)
  • business_growth_loss view uses LAG() window function (requires previous period)
  • Growth/Loss endpoints use FULL OUTER JOIN between current and previous month
Impact:
  • Shadow tables for July 2025 alone may not produce correct MoM calculations
  • Need June 2025 shadow data for July MoM to work correctly
Recommendation:
  • Shadow mode backfill should include June 2025 (previous period) for July validation
  • OR: Document that MoM calculations will be NULL/incorrect without previous period
  • OR: Use direct table queries (bypass views) for parity validation
Action Required: Clarify whether June 2025 shadow data is required for July validation

RED FLAG #7: YTD/QTD Functions Query Tables Directly

Issue: Some query functions bypass views and query tables directly:
  • get_top_businesses_ytd() β†’ stage1_snapshots (direct)
  • get_top_businesses_qtd() β†’ stage1_snapshots (direct)
  • get_top_agents_ytd() β†’ stage3_snapshots (direct)
  • get_top_agents_qtd() β†’ stage3_snapshots (direct)
Impact:
  • Parity validator for YTD/QTD must query shadow tables directly
  • Cannot rely on shadow views for YTD/QTD validation
Recommendation:
  • Parity validator should query shadow tables directly for YTD/QTD comparisons
  • OR: Create shadow views that support YTD/QTD (but this may be overkill)
Action Required: Document that YTD/QTD parity validation queries tables directly

RED FLAG #8: Agent Commissions Query Uses tpa_applied = TRUE Filter

Issue: get_agent_commissions_range() queries:
FROM `payroll-bi-gauntlet.payroll_processed.stage3_snapshots`
WHERE tpa_applied = TRUE
Impact:
  • Parity validator must ensure Repo B shadow data includes tpa_applied column
  • Repo B upload must set tpa_applied correctly (matches Repo A logic)
Recommendation:
  • Verify Repo B load_stage3_to_bigquery() sets tpa_applied correctly
  • Add validation check: SELECT COUNT(*) WHERE tpa_applied IS NULL should be 0
Action Required: Verify tpa_applied column population in Repo B upload

⚠️ WARNINGS (Non-Blocking)

WARNING #1: View SQL Hardcodes Project ID

Issue: All view SQL files hardcode payroll-bi-gauntlet project ID:
FROM `payroll-bi-gauntlet.payroll_raw.stage1_snapshots`
Impact:
  • Shadow views will also hardcode project ID (acceptable for now)
  • If project changes, need to update all shadow view DDL
Recommendation: Acceptable for Phase 1 (shadow mode). Consider parameterization in Phase 2.

WARNING #2: Rank Computation Mismatch

Issue:
  • Views use QUALIFY ROW_NUMBER() ... <= 10 to LIMIT but don’t expose rank
  • Backend adds ROW_NUMBER() OVER (ORDER BY ...) as rank in SELECT
Impact:
  • Parity validator should ignore rank field (compare order by name/ID instead)
  • Task already specifies: β€œignore internal rank numbering fields” βœ…
Recommendation: Already handled in task spec. No action needed.

WARNING #3: Tolerance Rules May Need Adjustment

Issue: Task specifies:
  • Money: abs diff ≀ 0.05 (5 cents)
  • Counts: exact match (0 tolerance)
Impact:
  • BigQuery NUMERIC precision may cause floating-point differences
  • Need to verify tolerance is appropriate for actual data
Recommendation: Start with task spec tolerances, adjust if needed based on actual parity results.

πŸ“‹ PRE-IMPLEMENTATION CHECKLIST

Before implementing shadow mode:
  • Clarify shadow naming strategy (dataset-level vs table-level)
  • Update seatbelt allowlist to include July 2025 (or create shadow allowlist)
  • Verify Repo B upload sets tpa_applied correctly
  • Decide on June 2025 shadow data requirement for July MoM validation
  • Design query builder pattern for dual-dataset parity validation
  • Verify all table references in shadow view DDL (especially top_agents JOIN)
  • Test shadow dataset creation in BigQuery (permissions, quotas)

βœ… ALIGNMENT SUMMARY

Overall Assessment: Task specification is mostly aligned with validated contract map, but has 7 red flags that must be resolved before implementation:
  1. πŸ”΄ Shadow naming strategy ambiguity
  2. πŸ”΄ Hardcoded dataset constants
  3. πŸ”΄ top_agents view dual-table dependency
  4. πŸ”΄ July 2025 seatbelt restriction
  5. πŸ”΄ Parity validator dual-dataset query pattern
  6. πŸ”΄ MoM calculations require previous period
  7. πŸ”΄ YTD/QTD direct table queries
Recommendation: Resolve red flags #1-4 before starting implementation. Red flags #5-7 can be addressed during implementation with proper design.
Status: ⚠️ RED FLAGS IDENTIFIED - RESOLUTION REQUIRED BEFORE IMPLEMENTATION