Shadow Mode Implementation - Red Flags & Alignment Check
Date: 2025-01-XXPurpose: Identify red flags and alignment issues before implementing Repo B Shadow Mode
β ALIGNMENT VERIFIED
1. Base Objects Match Contract
- β
payroll_raw.stage1_snapshots(plural) - CORRECT - β
payroll_processed.stage3_snapshots(plural) - CORRECT - β
All 4 views correctly identified:
ceo_snapshot,top_businesses,top_agents,business_growth_loss
2. View Dependencies Correctly Identified
- β
ceo_snapshotβstage1_snapshotsonly - β
top_businessesβstage1_snapshotsonly - β
top_agentsβstage3_snapshots+stage1_snapshots(JOIN) - CRITICAL - β
business_growth_lossβstage1_snapshotsonly
3. Parity Validator Scope Correct
- β CEO MTD metrics (all required fields identified)
- β Top businesses MTD (order + values)
- β Top agents MTD (order + values)
- β Growth/Loss reconciliation (MoM formula)
π΄ RED FLAGS
RED FLAG #1: Ambiguous Shadow Table Naming Strategy
Issue: Task description is ambiguous about shadow table naming:- Option A: Table-level suffixing:
stage1_snapshots_shadow,stage3_snapshots_shadow - Option B: Dataset-level shadowing:
payroll_raw_shadow.stage1_snapshots,payroll_processed_shadow.stage3_snapshots - Task mentions BOTH: βparallel tablesβ with
*_shadowsuffix AND βparallel datasetpayroll_analytics_shadowβ
- If using table suffix: Views need to reference
stage1_snapshots_shadow(same dataset) - If using dataset suffix: Views need to reference
payroll_raw_shadow.stage1_snapshots(different dataset)
- Use dataset-level shadowing (Option B) for cleaner separation
- Shadow tables:
payroll_raw_shadow.stage1_snapshots,payroll_processed_shadow.stage3_snapshots - Shadow views:
payroll_analytics_shadow.*(all views) - Rationale: Easier to drop entire shadow dataset for cleanup, clearer separation
RED FLAG #2: Hardcoded Dataset Constants in Upload Functions
Issue:repo_b/upload_to_bq.py uses hardcoded constants:
load_stage1_to_bigquery() and load_stage3_to_bigquery() accept dataset_id parameter but default to these constants.
Impact:
- Shadow mode needs to override dataset names
- Current code structure allows this (parameters exist), but need to ensure all call sites pass shadow dataset names
- Add
BQ_SHADOW_MODEenv var that sets dataset suffix - Create helper function:
get_dataset_name(base_name: str, shadow_mode: bool) -> str - Update all upload function calls to use helper
RED FLAG #3: top_agents View Joins BOTH Shadow Tables
Issue: analytics_top_agents.sql view joins:
payroll_processed.stage3_snapshots(for commissions)payroll_raw.stage1_snapshots(for employee counts)
- Shadow view MUST reference BOTH shadow tables:
payroll_processed_shadow.stage3_snapshotspayroll_raw_shadow.stage1_snapshots
- If shadow view only references one shadow table, parity test will fail
- Shadow view SQL must replace ALL table references:
RED FLAG #4: July 2025 Period Validation
Issue: Task specifies July 2025 (2025-07-01) for parity validation, but:
ALLOWED_PERIODS_SEATBELT = {"2025-06"}inupload_to_bq.py(line 30)- July 2025 may not be in allowlist
- Shadow mode upload may fail if July 2025 not in allowlist
- Need to either:
- Add July 2025 to allowlist, OR
- Use
--forceflag (dangerous), OR - Update seatbelt to allow shadow mode periods
- Add July 2025 to
ALLOWED_PERIODS_SEATBELTOR - Create separate
ALLOWED_PERIODS_SHADOWconstant - Shadow mode should bypass seatbelt OR use separate allowlist
RED FLAG #5: Parity Validator Needs to Query BOTH Production and Shadow
Issue: Parity validator must compare:- Production: Query
payroll_analytics.*views (orpayroll_raw.stage1_snapshotsdirectly) - Shadow: Query
payroll_analytics_shadow.*views (orpayroll_raw_shadow.stage1_snapshotsdirectly)
- Validator needs to construct queries with configurable dataset names
- Cannot hardcode production dataset names
- Create query builder functions that accept
dataset_suffixparameter - Example:
get_ceo_metrics_query(use_shadow: bool) -> str - Or: Use parameterized queries with dataset names as parameters
RED FLAG #6: MoM Calculations Require Previous Period Data
Issue:ceo_snapshotview usesLAG()window function (requires previous period)business_growth_lossview usesLAG()window function (requires previous period)- Growth/Loss endpoints use FULL OUTER JOIN between current and previous month
- Shadow tables for July 2025 alone may not produce correct MoM calculations
- Need June 2025 shadow data for July MoM to work correctly
- Shadow mode backfill should include June 2025 (previous period) for July validation
- OR: Document that MoM calculations will be NULL/incorrect without previous period
- OR: Use direct table queries (bypass views) for parity validation
RED FLAG #7: YTD/QTD Functions Query Tables Directly
Issue: Some query functions bypass views and query tables directly:get_top_businesses_ytd()βstage1_snapshots(direct)get_top_businesses_qtd()βstage1_snapshots(direct)get_top_agents_ytd()βstage3_snapshots(direct)get_top_agents_qtd()βstage3_snapshots(direct)
- Parity validator for YTD/QTD must query shadow tables directly
- Cannot rely on shadow views for YTD/QTD validation
- Parity validator should query shadow tables directly for YTD/QTD comparisons
- OR: Create shadow views that support YTD/QTD (but this may be overkill)
RED FLAG #8: Agent Commissions Query Uses tpa_applied = TRUE Filter
Issue: get_agent_commissions_range() queries:
- Parity validator must ensure Repo B shadow data includes
tpa_appliedcolumn - Repo B upload must set
tpa_appliedcorrectly (matches Repo A logic)
- Verify Repo B
load_stage3_to_bigquery()setstpa_appliedcorrectly - Add validation check:
SELECT COUNT(*) WHERE tpa_applied IS NULLshould be 0
tpa_applied column population in Repo B upload
β οΈ WARNINGS (Non-Blocking)
WARNING #1: View SQL Hardcodes Project ID
Issue: All view SQL files hardcodepayroll-bi-gauntlet project ID:
- Shadow views will also hardcode project ID (acceptable for now)
- If project changes, need to update all shadow view DDL
WARNING #2: Rank Computation Mismatch
Issue:- Views use
QUALIFY ROW_NUMBER() ... <= 10to LIMIT but donβt expose rank - Backend adds
ROW_NUMBER() OVER (ORDER BY ...) as rankin SELECT
- Parity validator should ignore
rankfield (compare order by name/ID instead) - Task already specifies: βignore internal rank numbering fieldsβ β
WARNING #3: Tolerance Rules May Need Adjustment
Issue: Task specifies:- Money:
abs diff β€ 0.05(5 cents) - Counts: exact match (0 tolerance)
- BigQuery NUMERIC precision may cause floating-point differences
- Need to verify tolerance is appropriate for actual data
π PRE-IMPLEMENTATION CHECKLIST
Before implementing shadow mode:- Clarify shadow naming strategy (dataset-level vs table-level)
- Update seatbelt allowlist to include July 2025 (or create shadow allowlist)
- Verify Repo B upload sets
tpa_appliedcorrectly - Decide on June 2025 shadow data requirement for July MoM validation
- Design query builder pattern for dual-dataset parity validation
- Verify all table references in shadow view DDL (especially
top_agentsJOIN) - Test shadow dataset creation in BigQuery (permissions, quotas)
β ALIGNMENT SUMMARY
Overall Assessment: Task specification is mostly aligned with validated contract map, but has 7 red flags that must be resolved before implementation:- π΄ Shadow naming strategy ambiguity
- π΄ Hardcoded dataset constants
- π΄
top_agentsview dual-table dependency - π΄ July 2025 seatbelt restriction
- π΄ Parity validator dual-dataset query pattern
- π΄ MoM calculations require previous period
- π΄ YTD/QTD direct table queries
Status: β οΈ RED FLAGS IDENTIFIED - RESOLUTION REQUIRED BEFORE IMPLEMENTATION