Skip to main content

BigQuery Contract - Payroll Commission Pipeline

Summary (3–6 bullets)

  • Defines BigQuery schema, dataset, table, and view expectations for payroll analytics.
  • Documents ingestion paths, ETL mappings, and downstream Excel-aligned analytics semantics.
  • Lists contract-critical fields, partitioning/clustering, and source-of-truth artifacts.
  • Specifies breaking vs safe change policy for schema and pipeline evolution.
  • Provides implementation-level mappings from Repo B outputs into BigQuery objects.

When to use this (3–6 bullets)

  • Before changing BigQuery schemas, ETL mappings, or analytics views.
  • During ELT/ETL onboarding to understand required contracts.
  • When validating if Repo B output changes are backward-compatible.
  • During incident response for dashboard/data mismatches tied to BigQuery objects.

What you’ll walk away with (2–5 bullets)

  • A concrete map of what BigQuery expects and what Repo B must preserve.
  • A change-control checklist to avoid breaking dashboards and analytics.
  • A fast way to trace table/view lineage and contract dependencies.
Last Updated: 2025-12-15
Purpose: Defines BigQuery schema expectations, data flow, and Excel dependencies to prevent breaking changes when aligning Repo B Excel outputs.

Table of Contents

  1. Source of Truth
  2. Data Flow
  3. Datasets
  4. Tables
  5. Views
  6. Excel Dependencies
  7. CSV/DataFrame Mappings
  8. Change Policy

Source of Truth

Schema Definitions

  • JSON Schemas: integration/bigquery/schemas/*.json
    • Primary source for table column definitions
    • Used by ETL loader for validation
    • Files: stage1_schema.json, stage3_schema.json, ceo_metrics_schema.json, growth_loss_schema.json, audit_checks_schema.json, dim_business_mapping_schema.json, dim_agent_hierarchy_schema.json
  • SQL Table Definitions: integration/bigquery/sql/tables/*.sql
    • Used for table creation and schema evolution
    • Files: bq_schema_agents.sql, monthly_rollup_all.sql, business_churn_metadata.sql, period_toggle_audit.sql
  • SQL View Definitions: integration/bigquery/sql/views/*.sql
    • Analytics views that align with Excel outputs
    • Key views: analytics_agent_report_v5.sql, analytics_ceo_snapshot.sql, analytics_business_growth_loss.sql
  • Configuration: configs/bq_config.yaml
    • Dataset definitions, partitioning, clustering, access control
    • ETL settings, data quality rules

ETL Loader

  • Main Loader: integration/bigquery/scripts/etl_loader.py
    • BigQueryETLLoader class handles all data loading
    • Methods: upload_period_data(), _upload_stage3_v10_3(), _upload_ceo_metrics_v10_3(), _upload_stage1_v10_3()
    • Loads from DataFrames (not Excel directly)
  • Client Initialization: api/bigquery/client.py
    • get_client() function initializes BigQuery client
    • Uses GCP_PROJECT_ID environment variable or ADC

Data Flow

Primary Path (BigQuery Ingestion)

Repo B Engine → CSV Snapshots/DataFrames → BigQuery ETL Loader → BigQuery Tables
Details:
  • Repo B generates DataFrames from canonical JSON
  • ETL loader (etl_loader.py) receives DataFrames directly
  • Data is uploaded via load_table_from_dataframe() (not Excel files)
  • CSV snapshots are used for historical backfill only

Secondary Path (Excel Outputs)

Repo B Engine → Excel Workbooks → (Human Consumption Only)
Note: Excel files are NOT directly loaded into BigQuery. They are:
  • Generated for human review and reporting
  • Used for validation and reconciliation
  • May inform downstream processes, but BigQuery loads from DataFrames/CSV

View Alignment

Some BigQuery views align with Excel outputs conceptually:
  • analytics_agent_report_v5 matches “V5 Excel output” (per view comment)
  • Views aggregate from tables, not Excel files

Datasets

1. payroll_raw (Ingestion Layer)

Purpose: Raw Stage 1 payroll data snapshots
Location: US
Retention: 7 years (220752000000 ms)
Access: Aaron (Owner), Richard/Robin/Sheleeha (Data Viewer)
Tables:
  • stage1_snapshots (partitioned by period_label, clustered by tenant_id, business_label, payee_agent_id)

2. payroll_processed (Transformation Layer)

Purpose: Stage 3 collapsed data with TPA normalization
Location: US
Retention: 7 years
Access: Aaron (Owner), Richard/Robin/Sheleeha (Data Viewer)
Tables:
  • stage3_snapshots (partitioned by period_label, clustered by tenant_id, business_name, agent_id, tpa_applied)

3. payroll_analytics (BI Layer)

Purpose: Pre-aggregated analytics and executive reporting
Location: US
Retention: 7 years
Access: Aaron (Owner), Richard/Robin/Sheleeha (Data Viewer)
Tables:
  • ceo_metrics (partitioned by period_label)
  • growth_loss (partitioned by period_label)
  • audit_checks (partitioned by period_label)
  • dim_business_mapping (dimension table)
  • dim_agent_hierarchy (dimension table)
  • dim_ingest_audit (audit dimension)
  • agent_monthly_rollup (agent audit tables)
  • agent_catchups (agent audit tables)
  • agent_churn_register (agent audit tables)
  • agent_summary (agent audit tables)
  • monthly_rollup_all (monthly business rollup)
Views: See Views section

Tables

payroll_raw.stage1_snapshots

Purpose: Raw Stage 1 payroll data snapshots - raw commission records
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, business_label, payee_agent_id
Source: Repo B Stage 1 DataFrame or CSV snapshot
Schema (from integration/bigquery/schemas/stage1_schema.json):
ColumnTypeModeDescription
tenant_idSTRINGREQUIREDTenant identifier for SaaS multi-tenant isolation
period_labelDATEREQUIREDPayroll period in YYYY-MM-DD format (partitioning field)
business_labelSTRINGREQUIREDBusiness name from payroll data (Label column)
member_idSTRINGREQUIREDUnique employee identifier
member_first_nameSTRINGNULLABLEEmployee first name
member_last_nameSTRINGNULLABLEEmployee last name
payee_agent_idINTEGERREQUIREDAgent ID (721995=Robin, 668078=Richard)
payee_agent_nameSTRINGNULLABLEAgent name
creditFLOATREQUIREDGross commission amount before chargebacks
debitFLOATREQUIREDChargeback amount (usually 0)
totalFLOATREQUIREDNet amount after chargebacks (Credit - Debit)
periodSTRINGREQUIREDPay period cadence (Weekly, Bi-Weekly, etc.)
period_codeSTRINGREQUIREDNormalized period code (W, BW, SM, 4M, M)
pay_periodsINTEGERREQUIREDNumber of pay periods per year
trunc2_appliedBOOLEANREQUIREDWhether tunc2 truncation was applied (audit trail)
created_atTIMESTAMPREQUIREDTimestamp when record was created
source_fileSTRINGREQUIREDSource file or process
pipeline_versionSTRINGREQUIREDPipeline version
Column Mapping (from ETL loader _upload_stage1_v10_3()):
  • Labelbusiness_label
  • Member IDmember_id
  • Member First Namemember_first_name
  • Member Last Namemember_last_name
  • Payee Agent IDpayee_agent_id
  • Payee Agent Namepayee_agent_name
  • Periodperiodperiod_code (mapped via period_map)
  • Creditcredit
  • Debitdebit
  • Totaltotal

payroll_processed.stage3_snapshots

Purpose: Historical Stage 3 collapsed payroll data with TPA normalization
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, business_name, agent_id, tpa_applied
Source: Repo B Stage 3 DataFrame
Schema (from integration/bigquery/schemas/stage3_schema.json):
ColumnTypeModeDescription
tenant_idSTRINGREQUIREDTenant identifier
period_labelDATEREQUIREDPayroll period (partitioning field)
business_nameSTRINGREQUIREDBusiness name after normalization
business_idSTRINGNULLABLEStable business ID (MD5 hash)
agent_idSTRINGREQUIREDAgent identifier
agent_nameSTRINGREQUIREDAgent full name
downline_agent_nameSTRINGNULLABLEDownline agent name for TPA commissions
owner_nameSTRINGNULLABLEOwner name (Robin Bundy or Richard Ballard)
period_codeSTRINGREQUIREDPay period cadence code (W, BW, SM, 4M, M)
pay_periodsINTEGERREQUIREDNumber of pay periods per year
pepm_rateFLOATNULLABLEPer Employee Per Month rate from agent2_pepm.csv
pepm_rate_cleanFLOATNULLABLECleaned PEPM rate after normalization
absorbed_countFLOATREQUIREDNumber of employees for this agent-business-cadence
tpa_per_pay_amountFLOATNULLABLECommission per pay period (PEPM * 12 / pay_periods)
agent_totalFLOATREQUIREDTotal commission for this agent-business-cadence
tpa_appliedBOOLEANREQUIREDWhether TPA normalization was applied
match_typeSTRINGNULLABLEType of agent matching (single_agent, multi_agent)
match_methodSTRINGNULLABLEMethod used (exact, fuzzy)
slice_noteSTRINGNULLABLEAdditional notes about commission slice
trunc2_appliedBOOLEANREQUIREDWhether tunc2 truncation was applied
created_atTIMESTAMPREQUIREDTimestamp when record was created
source_fileSTRINGREQUIREDSource file or process
pipeline_versionSTRINGREQUIREDPipeline version
commission_typeSTRINGNULLABLECommission type: TPA, MIXED_OWNER, or Owner
owner_residual_totalFLOATNULLABLEOwner residual amount for MIXED_OWNER slices
cadence_override_appliedBOOLEANNULLABLEWhether cadence override was applied
cadence_override_reasonSTRINGNULLABLEReason for cadence override
posted_dateDATENULLABLEPosted date from source payroll file
Column Mapping (from ETL loader _upload_stage3_v10_3()):
  • business_labelbusiness_name (if business_name not present)
  • employee_countabsorbed_count (if absorbed_count not present)
  • business_id generated via BusinessIDGenerator
  • commission_type derived from is_tpa_commission and owner_residual_total
  • pay_periods mapped from period_code if missing: {'W': 52, 'BW': 26, 'SM': 24, '4M': 48, 'M': 12}

payroll_analytics.ceo_metrics

Purpose: Historical CEO metrics and executive KPIs by period
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, period_label
Source: Repo B CEO metrics dictionary
Schema (from integration/bigquery/schemas/ceo_metrics_schema.json):
ColumnTypeModeDescription
tenant_idSTRINGREQUIREDTenant identifier
period_labelDATEREQUIREDPayroll period (partitioning field)
total_businessesINTEGERREQUIREDTotal number of businesses with commission activity
total_employeesINTEGERREQUIREDTotal number of unique employees
total_agentsINTEGERREQUIREDTotal number of TPA agents
commissionable_agentsINTEGERREQUIREDNumber of commissionable agents
gross_payoutFLOATREQUIREDTotal gross payout before chargebacks
chargebacksFLOATREQUIREDTotal chargebacks and refunds
agent_payout_netFLOATREQUIREDTotal TPA agent commission payout (net)
business_owner_commissionFLOATREQUIREDBusiness owner commission (Gross - Chargebacks - Agent Payout)
average_commission_per_employeeFLOATREQUIREDAverage commission per employee (Gross / Total Employees)
stage1_grossFLOATREQUIREDStage 1 gross amount (for reconciliation)
stage1_chargebacksFLOATREQUIREDStage 1 chargebacks (for reconciliation)
stage3_totalFLOATREQUIREDStage 3 total amount (for reconciliation)
reconciliation_statusSTRINGREQUIREDReconciliation status (PASS, FAIL, WARNING)
reconciliation_deltaFLOATREQUIREDReconciliation delta amount
trunc2_appliedBOOLEANREQUIREDWhether tunc2 truncation was applied
created_atTIMESTAMPREQUIREDTimestamp when record was created
source_fileSTRINGREQUIREDSource file or process
pipeline_versionSTRINGREQUIREDPipeline version
Column Mapping (from ETL loader _upload_ceo_metrics_v10_3()):
  • CEO metrics dictionary keys map directly to columns
  • period_label converted to DATE format

payroll_analytics.growth_loss

Purpose: Employee gains and losses by business over time
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, business_name
Source: Repo B Growth/Loss DataFrame or calculation
Schema (from integration/bigquery/schemas/growth_loss_schema.json):
ColumnTypeModeDescription
tenant_idSTRINGREQUIREDTenant identifier
period_labelDATEREQUIREDPayroll period (partitioning field)
business_nameSTRINGREQUIREDBusiness name for growth/loss analysis
employees_gainedINTEGERREQUIREDNumber of employees gained this period
employees_lostINTEGERREQUIREDNumber of employees lost this period
net_changeINTEGERREQUIREDNet change in employees (gained - lost)
percent_changeFLOATREQUIREDPercentage change in employees
previous_employeesINTEGERREQUIREDPrevious period employee count
current_employeesINTEGERREQUIREDCurrent period employee count
previous_periodDATENULLABLEPrevious period label for comparison
comparison_availableBOOLEANREQUIREDWhether previous period data was available
statusSTRINGREQUIREDGrowth status (GROWING, DECLINING, STABLE, NEW, CHURNED)
trunc2_appliedBOOLEANREQUIREDWhether tunc2 truncation was applied
created_atTIMESTAMPREQUIREDTimestamp when record was created
source_fileSTRINGREQUIREDSource file or process
pipeline_versionSTRINGREQUIREDPipeline version

payroll_analytics.audit_checks

Purpose: Historical audit check results and data quality validation
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, check_type, status
Source: Repo B audit checks DataFrame
Schema (from integration/bigquery/schemas/audit_checks_schema.json):
ColumnTypeModeDescription
tenant_idSTRINGREQUIREDTenant identifier
period_labelDATEREQUIREDPayroll period (partitioning field)
check_typeSTRINGREQUIREDType of audit check performed
check_nameSTRINGREQUIREDSpecific name of the audit check
statusSTRINGREQUIREDCheck result status (PASS, WARNING, FAIL)
expected_valueFLOATNULLABLEExpected value for the check
actual_valueFLOATNULLABLEActual value found
thresholdFLOATNULLABLEThreshold value for the check
variance_percentFLOATNULLABLEPercentage variance from expected value
messageSTRINGREQUIREDHuman-readable message about check result
severitySTRINGREQUIREDSeverity level (INFO, WARNING, ERROR, CRITICAL)
categorySTRINGREQUIREDCheck category (IDENTITY, RECONCILIATION, ANOMALY, DATA_QUALITY)
business_nameSTRINGNULLABLEBusiness name if check is business-specific
agent_nameSTRINGNULLABLEAgent name if check is agent-specific
detailsSTRINGNULLABLEAdditional details about check result
trunc2_appliedBOOLEANREQUIREDWhether tunc2 truncation was applied
created_atTIMESTAMPREQUIREDTimestamp when record was created
source_fileSTRINGREQUIREDSource file or process
pipeline_versionSTRINGREQUIREDPipeline version

Dimension Tables

payroll_analytics.dim_business_mapping

Purpose: Stable business ID mapping dimension
Clustering: business_id, normalized_name
Source: BusinessIDGenerator utility
Key Columns:
  • business_id (STRING, REQUIRED) - Stable MD5 hash identifier
  • business_name (STRING, REQUIRED) - Business name as appears in source
  • normalized_name (STRING, REQUIRED) - Normalized name (uppercase, trimmed)
  • is_active (BOOLEAN, REQUIRED)
  • first_seen_date, last_seen_date (DATE, NULLABLE)

payroll_analytics.dim_agent_hierarchy

Purpose: Agent hierarchy dimension with parent-child relationships
Clustering: agent_id, parent_agent_id
Source: agent_hierarchy.csv reference file
Key Columns:
  • agent_id (STRING, REQUIRED)
  • agent_name (STRING, REQUIRED)
  • parent_agent_id, parent_agent_name (STRING, NULLABLE)
  • hierarchy_level (INTEGER, NULLABLE)
  • is_root_owner (BOOLEAN, REQUIRED) - Robin/Richard flag
  • is_metrics_only (BOOLEAN, REQUIRED) - Richard Ballard exclusion flag
  • is_active, is_deleted (BOOLEAN, REQUIRED)
  • effective_date, end_date (DATE, NULLABLE) - SCD Type 2

Views

payroll_analytics.analytics_agent_report_v5

Purpose: TPA Producing Agent Commissions (matches V5 Excel output)
Source: integration/bigquery/sql/views/analytics_agent_report_v5.sql
Base Tables: stage3_snapshots, stage1_snapshots
Key Columns:
  • tenant_id, period_label, agent_name, business_name
  • gross_payout (SUM of agent_total)
  • employee_count (from Stage 1, authoritative)
  • record_count, avg_pepm
  • cadence_codes, pay_periods
  • override_applied, rounding_applied (audit flags)
Note: View comment states “matches V5 Excel output” - this is conceptual alignment, not direct Excel ingestion.

payroll_analytics.analytics_ceo_snapshot

Purpose: CEO snapshot view that mirrors Excel/Python logic
Source: integration/bigquery/sql/views/analytics_ceo_snapshot.sql
Base Tables: stage1_snapshots
Key Columns:
  • tenant_id, period_label
  • total_businesses, total_employees
  • gross_payout, chargebacks, net_payout
  • MoM changes: business_change_mom, employee_change_mom, gross_payout_change_mom, net_payout_change_mom
  • MoM percentages: business_change_pct_mom, employee_change_pct_mom, etc.

payroll_analytics.analytics_business_growth_loss

Purpose: Business growth/loss analysis
Source: integration/bigquery/sql/views/analytics_business_growth_loss.sql
Base Tables: stage1_snapshots
Key Columns:
  • tenant_id, business_name, period_label
  • current_employees, previous_employees
  • net_change, pct_change

Other Analytics Views

Located in integration/bigquery/sql/views/:
  • analytics_agent_at_risk_businesses.sql
  • analytics_agent_attrition_trends.sql
  • analytics_agent_business_health_scores.sql
  • analytics_agent_business_lifecycle.sql
  • analytics_agent_business_recovery.sql
  • analytics_agent_business_summary.sql
  • analytics_agent_churn_metrics_v.sql
  • analytics_agent_churn_summary.sql
  • analytics_agent_lost_businesses.sql
  • analytics_agent_portfolio_metrics.sql
  • analytics_agent_trends.sql
  • analytics_agent2_pepm_view.sql
  • analytics_churn_metrics.sql
  • analytics_owner_commission.sql
  • analytics_top_agents.sql
  • analytics_top_businesses.sql
  • business_health.sql
  • stage1_agent_detailed_export_vw.sql
  • top_anomalies.sql
  • v_business_absent_2m.sql
  • v_business_full_churn.sql
  • v_business_permanently_lost.sql
  • v_business_recovered.sql
  • v_business_watchlist.sql

Excel Dependencies

Important Note

BigQuery does NOT directly read Excel files. Excel workbooks are generated for human consumption only. However, Excel sheet structures may inform downstream processes, so we document them for completeness.

Excel Sheet Names (from repo_b/excel_contract.py)

V5 Mode Sheets

Required:
  • Agent Commissions - Agent commission breakdown
  • Audit View - Audit trail view

Executive Mode Sheets

Required:
  • CEO Snapshot - Executive KPIs
  • Business Summary - Business-level totals
  • Agent2 Breakdown - Agent commission breakdown (same as “Agent Commissions” with Posted Date)
  • Agent2 Audit - Audit trail (same as “Audit View” with Posted Date)
  • Full Detail - Detailed transaction data (Posted Date must be first column)
Optional:
  • Growth/Loss or Growth & Loss - Growth/loss analysis
  • Audit Checks - Data quality checks
  • Business Analytics - Business analytics

Excel Column Requirements

Agent Commissions / Agent2 Breakdown

Required Columns (from repo_b/excel_contract.py):
  • business - Business name
  • agent_first - Agent first name
  • agent_last - Agent last name
  • agent_pepm - Agent PEPM rate
  • pay_periods - Pay periods per year
  • employee_count - Employee count (Stage 1 count)
  • normalized_pepm - Normalized PEPM rate
  • agent_total - Total agent commission
  • note - Additional notes
Note: Agent2 Breakdown has Posted Date as first column (added by add_posted_date_first()).

Full Detail

Required: Posted Date must be first column
Other columns: Varies based on Stage 3 DataFrame structure

CEO Snapshot

Required Columns:
  • Metric - Metric name
  • Current Value - Metric value

Business Summary

Required Columns:
  • Business Name - Business name
  • Total Employees - Total employee count
  • Gross Commission - Gross commission amount
  • Chargebacks - Chargeback amount
  • Net Commission - Net commission amount

Excel → BigQuery Mapping (Indirect)

Since BigQuery doesn’t read Excel directly, the mapping is:
Excel Sheet → Repo B DataFrame → BigQuery Table
Example:
  • Agent2 Breakdown Excel sheet → build_agent2_breakdown() DataFrame → Not directly loaded (Excel is output only)
  • Stage 3 DataFrame (source of Agent2 Breakdown) → stage3_snapshots table

CSV/DataFrame Mappings

Stage 1 → stage1_snapshots

Source: Repo B Stage 1 DataFrame (from canonical JSON)
ETL Method: _upload_stage1_v10_3()
Column Mappings:
  • DataFrame columns map directly to BigQuery columns
  • period_label converted to DATE format (YYYY-MM-01)
  • member_id converted to STRING
  • pay_periods converted to INT64
  • period_code derived from period column if missing

Stage 3 → stage3_snapshots

Source: Repo B Stage 3 DataFrame (from output_adapter.py)
ETL Method: _upload_stage3_v10_3()
Column Mappings:
  • business_labelbusiness_name (if business_name not present)
  • employee_countabsorbed_count (if absorbed_count not present)
  • business_id generated via BusinessIDGenerator.get_or_create_business_id()
  • commission_type derived from is_tpa_commission and owner_residual_total
  • pay_periods mapped from period_code: {'W': 52, 'BW': 26, 'SM': 24, '4M': 48, 'M': 12}
  • Currency columns cleaned (remove $, strip spaces, convert to float)

CEO Metrics → ceo_metrics

Source: Repo B CEO metrics dictionary
ETL Method: _upload_ceo_metrics_v10_3()
Column Mappings:
  • Dictionary keys map directly to BigQuery columns
  • period_label converted to DATE format

Growth/Loss → growth_loss

Source: Repo B Growth/Loss DataFrame (from repo_b/reporting/growth_analysis.py or similar)
ETL Method: Not yet implemented in ETL loader (placeholder)
Expected Mappings:
  • business_namebusiness_name
  • current_employeescurrent_employees
  • previous_employeesprevious_employees
  • net_changenet_change
  • percent_changepercent_change

Change Policy

What Repo B CAN Change

Safe Changes:
  • Add new optional Excel sheets (e.g., Growth & Loss, Business Analytics)
  • Add new columns to Excel sheets (as long as required columns remain)
  • Change internal DataFrame column names (as long as ETL mapping is updated)
  • Add new BigQuery tables/views (with proper schema definitions)
  • Modify Excel formatting/styling (doesn’t affect BigQuery)

What Repo B MUST NOT Change

Breaking Changes:
  1. Required Excel Sheet Names:
    • Agent Commissions (V5 mode) - DO NOT RENAME
    • Agent2 Breakdown (Executive mode) - DO NOT RENAME
    • Full Detail - DO NOT RENAME
    • CEO Snapshot - DO NOT RENAME
    • Business Summary - DO NOT RENAME
  2. Required Excel Columns:
    • All columns listed in REQUIRED_COLUMNS dict (repo_b/excel_contract.py)
    • Posted Date must remain first column in Full Detail and Agent2 Breakdown
  3. BigQuery Schema:
    • DO NOT remove or rename existing columns in tables
    • DO NOT change column types (e.g., FLOAT → INTEGER)
    • DO NOT change partitioning/clustering fields
    • DO NOT remove required fields (tenant_id, period_label, created_at, etc.)
  4. Data Types:
    • period_label must remain DATE type (YYYY-MM-DD format)
    • agent_total, gross_payout, etc. must remain FLOAT/NUMERIC
    • pay_periods must remain INTEGER
  5. ETL Column Mappings:
    • If changing DataFrame column names, MUST update ETL loader mappings
    • business_labelbusiness_name mapping must be preserved
    • employee_countabsorbed_count mapping must be preserved
  6. View Dependencies:
    • Views depend on specific table columns - changes to base tables may break views
    • analytics_agent_report_v5 expects downline_agent_name, agent_total, tpa_applied
    • analytics_ceo_snapshot expects business_label, member_id, credit, debit, total

Migration Strategy

If schema changes are needed:
  1. Additive Changes Only: Add new columns, don’t remove existing ones
  2. Versioning: Use pipeline_version field to track schema versions
  3. Backward Compatibility: Maintain old column names alongside new ones during transition
  4. Documentation: Update this contract document when changes are made
  5. Testing: Validate ETL loader and views after schema changes

Summary: What BQ Expects / What Repo B Must Preserve

Critical Requirements

  1. Excel Sheet Names: Agent Commissions, Agent2 Breakdown, Full Detail, CEO Snapshot, Business Summary must not change
  2. Excel Columns: Required columns in REQUIRED_COLUMNS must remain present
  3. BigQuery Tables: Schema must remain stable - no column removals or type changes
  4. Data Types: period_label (DATE), currency fields (FLOAT), pay_periods (INTEGER)
  5. ETL Mappings: Column mappings in etl_loader.py must be maintained
  6. Partitioning: period_label partitioning must be preserved for cost control

Safe to Change

  • Optional Excel sheets
  • Excel formatting/styling
  • Internal DataFrame column names (with ETL updates)
  • New tables/views (with proper schemas)

End of Contract Document