BigQuery Contract - Payroll Commission Pipeline

Summary (3–6 bullets)

Defines BigQuery schema, dataset, table, and view expectations for payroll analytics.
Documents ingestion paths, ETL mappings, and downstream Excel-aligned analytics semantics.
Lists contract-critical fields, partitioning/clustering, and source-of-truth artifacts.
Specifies breaking vs safe change policy for schema and pipeline evolution.
Provides implementation-level mappings from Repo B outputs into BigQuery objects.

When to use this (3–6 bullets)

Before changing BigQuery schemas, ETL mappings, or analytics views.
During ELT/ETL onboarding to understand required contracts.
When validating if Repo B output changes are backward-compatible.
During incident response for dashboard/data mismatches tied to BigQuery objects.

What you’ll walk away with (2–5 bullets)

A concrete map of what BigQuery expects and what Repo B must preserve.
A change-control checklist to avoid breaking dashboards and analytics.
A fast way to trace table/view lineage and contract dependencies.

Last Updated: 2025-12-15
Purpose: Defines BigQuery schema expectations, data flow, and Excel dependencies to prevent breaking changes when aligning Repo B Excel outputs.

Source of Truth
Data Flow
Datasets
Tables
Views
Excel Dependencies
CSV/DataFrame Mappings
Change Policy

Source of Truth

Schema Definitions

JSON Schemas: integration/bigquery/schemas/*.json
- Primary source for table column definitions
- Used by ETL loader for validation
- Files: stage1_schema.json, stage3_schema.json, ceo_metrics_schema.json, growth_loss_schema.json, audit_checks_schema.json, dim_business_mapping_schema.json, dim_agent_hierarchy_schema.json
SQL Table Definitions: integration/bigquery/sql/tables/*.sql
- Used for table creation and schema evolution
- Files: bq_schema_agents.sql, monthly_rollup_all.sql, business_churn_metadata.sql, period_toggle_audit.sql
SQL View Definitions: integration/bigquery/sql/views/*.sql
- Analytics views that align with Excel outputs
- Key views: analytics_agent_report_v5.sql, analytics_ceo_snapshot.sql, analytics_business_growth_loss.sql
Configuration: configs/bq_config.yaml
- Dataset definitions, partitioning, clustering, access control
- ETL settings, data quality rules

ETL Loader

Main Loader: integration/bigquery/scripts/etl_loader.py
- BigQueryETLLoader class handles all data loading
- Methods: upload_period_data(), _upload_stage3_v10_3(), _upload_ceo_metrics_v10_3(), _upload_stage1_v10_3()
- Loads from DataFrames (not Excel directly)
Client Initialization: api/bigquery/client.py
- get_client() function initializes BigQuery client
- Uses GCP_PROJECT_ID environment variable or ADC

Data Flow

Primary Path (BigQuery Ingestion)

Repo B Engine → CSV Snapshots/DataFrames → BigQuery ETL Loader → BigQuery Tables

Details:

Repo B generates DataFrames from canonical JSON
ETL loader (etl_loader.py) receives DataFrames directly
Data is uploaded via load_table_from_dataframe() (not Excel files)
CSV snapshots are used for historical backfill only

Secondary Path (Excel Outputs)

Repo B Engine → Excel Workbooks → (Human Consumption Only)

Note: Excel files are NOT directly loaded into BigQuery. They are:

Generated for human review and reporting
Used for validation and reconciliation
May inform downstream processes, but BigQuery loads from DataFrames/CSV

View Alignment

Some BigQuery views align with Excel outputs conceptually:

analytics_agent_report_v5 matches “V5 Excel output” (per view comment)
Views aggregate from tables, not Excel files

Datasets

1. `payroll_raw` (Ingestion Layer)

Purpose: Raw Stage 1 payroll data snapshots
Location: US
Retention: 7 years (220752000000 ms)
Access: Aaron (Owner), Richard/Robin/Sheleeha (Data Viewer) Tables:

stage1_snapshots (partitioned by period_label, clustered by tenant_id, business_label, payee_agent_id)

2. `payroll_processed` (Transformation Layer)

Purpose: Stage 3 collapsed data with TPA normalization
Location: US
Retention: 7 years
Access: Aaron (Owner), Richard/Robin/Sheleeha (Data Viewer) Tables:

stage3_snapshots (partitioned by period_label, clustered by tenant_id, business_name, agent_id, tpa_applied)

3. `payroll_analytics` (BI Layer)

Purpose: Pre-aggregated analytics and executive reporting
Location: US
Retention: 7 years
Access: Aaron (Owner), Richard/Robin/Sheleeha (Data Viewer) Tables:

ceo_metrics (partitioned by period_label)
growth_loss (partitioned by period_label)
audit_checks (partitioned by period_label)
dim_business_mapping (dimension table)
dim_agent_hierarchy (dimension table)
dim_ingest_audit (audit dimension)
agent_monthly_rollup (agent audit tables)
agent_catchups (agent audit tables)
agent_churn_register (agent audit tables)
agent_summary (agent audit tables)
monthly_rollup_all (monthly business rollup)

Views: See Views section

Tables

`payroll_raw.stage1_snapshots`

Purpose: Raw Stage 1 payroll data snapshots - raw commission records
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, business_label, payee_agent_id
Source: Repo B Stage 1 DataFrame or CSV snapshot Schema (from integration/bigquery/schemas/stage1_schema.json):

Column	Type	Mode	Description
`tenant_id`	STRING	REQUIRED	Tenant identifier for SaaS multi-tenant isolation
`period_label`	DATE	REQUIRED	Payroll period in YYYY-MM-DD format (partitioning field)
`business_label`	STRING	REQUIRED	Business name from payroll data (Label column)
`member_id`	STRING	REQUIRED	Unique employee identifier
`member_first_name`	STRING	NULLABLE	Employee first name
`member_last_name`	STRING	NULLABLE	Employee last name
`payee_agent_id`	INTEGER	REQUIRED	Agent ID (721995=Robin, 668078=Richard)
`payee_agent_name`	STRING	NULLABLE	Agent name
`credit`	FLOAT	REQUIRED	Gross commission amount before chargebacks
`debit`	FLOAT	REQUIRED	Chargeback amount (usually 0)
`total`	FLOAT	REQUIRED	Net amount after chargebacks (Credit - Debit)
`period`	STRING	REQUIRED	Pay period cadence (Weekly, Bi-Weekly, etc.)
`period_code`	STRING	REQUIRED	Normalized period code (W, BW, SM, 4M, M)
`pay_periods`	INTEGER	REQUIRED	Number of pay periods per year
`trunc2_applied`	BOOLEAN	REQUIRED	Whether tunc2 truncation was applied (audit trail)
`created_at`	TIMESTAMP	REQUIRED	Timestamp when record was created
`source_file`	STRING	REQUIRED	Source file or process
`pipeline_version`	STRING	REQUIRED	Pipeline version

Column Mapping (from ETL loader _upload_stage1_v10_3()):

Label → business_label
Member ID → member_id
Member First Name → member_first_name
Member Last Name → member_last_name
Payee Agent ID → payee_agent_id
Payee Agent Name → payee_agent_name
Period → period → period_code (mapped via period_map)
Credit → credit
Debit → debit
Total → total

`payroll_processed.stage3_snapshots`

Purpose: Historical Stage 3 collapsed payroll data with TPA normalization
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, business_name, agent_id, tpa_applied
Source: Repo B Stage 3 DataFrame Schema (from integration/bigquery/schemas/stage3_schema.json):

Column	Type	Mode	Description
`tenant_id`	STRING	REQUIRED	Tenant identifier
`period_label`	DATE	REQUIRED	Payroll period (partitioning field)
`business_name`	STRING	REQUIRED	Business name after normalization
`business_id`	STRING	NULLABLE	Stable business ID (MD5 hash)
`agent_id`	STRING	REQUIRED	Agent identifier
`agent_name`	STRING	REQUIRED	Agent full name
`downline_agent_name`	STRING	NULLABLE	Downline agent name for TPA commissions
`owner_name`	STRING	NULLABLE	Owner name (Robin Bundy or Richard Ballard)
`period_code`	STRING	REQUIRED	Pay period cadence code (W, BW, SM, 4M, M)
`pay_periods`	INTEGER	REQUIRED	Number of pay periods per year
`pepm_rate`	FLOAT	NULLABLE	Per Employee Per Month rate from agent2_pepm.csv
`pepm_rate_clean`	FLOAT	NULLABLE	Cleaned PEPM rate after normalization
`absorbed_count`	FLOAT	REQUIRED	Number of employees for this agent-business-cadence
`tpa_per_pay_amount`	FLOAT	NULLABLE	Commission per pay period (PEPM * 12 / pay_periods)
`agent_total`	FLOAT	REQUIRED	Total commission for this agent-business-cadence
`tpa_applied`	BOOLEAN	REQUIRED	Whether TPA normalization was applied
`match_type`	STRING	NULLABLE	Type of agent matching (single_agent, multi_agent)
`match_method`	STRING	NULLABLE	Method used (exact, fuzzy)
`slice_note`	STRING	NULLABLE	Additional notes about commission slice
`trunc2_applied`	BOOLEAN	REQUIRED	Whether tunc2 truncation was applied
`created_at`	TIMESTAMP	REQUIRED	Timestamp when record was created
`source_file`	STRING	REQUIRED	Source file or process
`pipeline_version`	STRING	REQUIRED	Pipeline version
`commission_type`	STRING	NULLABLE	Commission type: TPA, MIXED_OWNER, or Owner
`owner_residual_total`	FLOAT	NULLABLE	Owner residual amount for MIXED_OWNER slices
`cadence_override_applied`	BOOLEAN	NULLABLE	Whether cadence override was applied
`cadence_override_reason`	STRING	NULLABLE	Reason for cadence override
`posted_date`	DATE	NULLABLE	Posted date from source payroll file

Column Mapping (from ETL loader _upload_stage3_v10_3()):

business_label → business_name (if business_name not present)
employee_count → absorbed_count (if absorbed_count not present)
business_id generated via BusinessIDGenerator
commission_type derived from is_tpa_commission and owner_residual_total
pay_periods mapped from period_code if missing: {'W': 52, 'BW': 26, 'SM': 24, '4M': 48, 'M': 12}

`payroll_analytics.ceo_metrics`

Purpose: Historical CEO metrics and executive KPIs by period
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, period_label
Source: Repo B CEO metrics dictionary Schema (from integration/bigquery/schemas/ceo_metrics_schema.json):

Column	Type	Mode	Description
`tenant_id`	STRING	REQUIRED	Tenant identifier
`period_label`	DATE	REQUIRED	Payroll period (partitioning field)
`total_businesses`	INTEGER	REQUIRED	Total number of businesses with commission activity
`total_employees`	INTEGER	REQUIRED	Total number of unique employees
`total_agents`	INTEGER	REQUIRED	Total number of TPA agents
`commissionable_agents`	INTEGER	REQUIRED	Number of commissionable agents
`gross_payout`	FLOAT	REQUIRED	Total gross payout before chargebacks
`chargebacks`	FLOAT	REQUIRED	Total chargebacks and refunds
`agent_payout_net`	FLOAT	REQUIRED	Total TPA agent commission payout (net)
`business_owner_commission`	FLOAT	REQUIRED	Business owner commission (Gross - Chargebacks - Agent Payout)
`average_commission_per_employee`	FLOAT	REQUIRED	Average commission per employee (Gross / Total Employees)
`stage1_gross`	FLOAT	REQUIRED	Stage 1 gross amount (for reconciliation)
`stage1_chargebacks`	FLOAT	REQUIRED	Stage 1 chargebacks (for reconciliation)
`stage3_total`	FLOAT	REQUIRED	Stage 3 total amount (for reconciliation)
`reconciliation_status`	STRING	REQUIRED	Reconciliation status (PASS, FAIL, WARNING)
`reconciliation_delta`	FLOAT	REQUIRED	Reconciliation delta amount
`trunc2_applied`	BOOLEAN	REQUIRED	Whether tunc2 truncation was applied
`created_at`	TIMESTAMP	REQUIRED	Timestamp when record was created
`source_file`	STRING	REQUIRED	Source file or process
`pipeline_version`	STRING	REQUIRED	Pipeline version

Column Mapping (from ETL loader _upload_ceo_metrics_v10_3()):

CEO metrics dictionary keys map directly to columns
period_label converted to DATE format

`payroll_analytics.growth_loss`

Purpose: Employee gains and losses by business over time
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, business_name
Source: Repo B Growth/Loss DataFrame or calculation Schema (from integration/bigquery/schemas/growth_loss_schema.json):

Column	Type	Mode	Description
`tenant_id`	STRING	REQUIRED	Tenant identifier
`period_label`	DATE	REQUIRED	Payroll period (partitioning field)
`business_name`	STRING	REQUIRED	Business name for growth/loss analysis
`employees_gained`	INTEGER	REQUIRED	Number of employees gained this period
`employees_lost`	INTEGER	REQUIRED	Number of employees lost this period
`net_change`	INTEGER	REQUIRED	Net change in employees (gained - lost)
`percent_change`	FLOAT	REQUIRED	Percentage change in employees
`previous_employees`	INTEGER	REQUIRED	Previous period employee count
`current_employees`	INTEGER	REQUIRED	Current period employee count
`previous_period`	DATE	NULLABLE	Previous period label for comparison
`comparison_available`	BOOLEAN	REQUIRED	Whether previous period data was available
`status`	STRING	REQUIRED	Growth status (GROWING, DECLINING, STABLE, NEW, CHURNED)
`trunc2_applied`	BOOLEAN	REQUIRED	Whether tunc2 truncation was applied
`created_at`	TIMESTAMP	REQUIRED	Timestamp when record was created
`source_file`	STRING	REQUIRED	Source file or process
`pipeline_version`	STRING	REQUIRED	Pipeline version

`payroll_analytics.audit_checks`

Purpose: Historical audit check results and data quality validation
Partitioning: period_label (DATE, DAY) - partition filter required
Clustering: tenant_id, check_type, status
Source: Repo B audit checks DataFrame Schema (from integration/bigquery/schemas/audit_checks_schema.json):

Column	Type	Mode	Description
`tenant_id`	STRING	REQUIRED	Tenant identifier
`period_label`	DATE	REQUIRED	Payroll period (partitioning field)
`check_type`	STRING	REQUIRED	Type of audit check performed
`check_name`	STRING	REQUIRED	Specific name of the audit check
`status`	STRING	REQUIRED	Check result status (PASS, WARNING, FAIL)
`expected_value`	FLOAT	NULLABLE	Expected value for the check
`actual_value`	FLOAT	NULLABLE	Actual value found
`threshold`	FLOAT	NULLABLE	Threshold value for the check
`variance_percent`	FLOAT	NULLABLE	Percentage variance from expected value
`message`	STRING	REQUIRED	Human-readable message about check result
`severity`	STRING	REQUIRED	Severity level (INFO, WARNING, ERROR, CRITICAL)
`category`	STRING	REQUIRED	Check category (IDENTITY, RECONCILIATION, ANOMALY, DATA_QUALITY)
`business_name`	STRING	NULLABLE	Business name if check is business-specific
`agent_name`	STRING	NULLABLE	Agent name if check is agent-specific
`details`	STRING	NULLABLE	Additional details about check result
`trunc2_applied`	BOOLEAN	REQUIRED	Whether tunc2 truncation was applied
`created_at`	TIMESTAMP	REQUIRED	Timestamp when record was created
`source_file`	STRING	REQUIRED	Source file or process
`pipeline_version`	STRING	REQUIRED	Pipeline version

Dimension Tables

`payroll_analytics.dim_business_mapping`

Purpose: Stable business ID mapping dimension
Clustering: business_id, normalized_name
Source: BusinessIDGenerator utility Key Columns:

business_id (STRING, REQUIRED) - Stable MD5 hash identifier
business_name (STRING, REQUIRED) - Business name as appears in source
normalized_name (STRING, REQUIRED) - Normalized name (uppercase, trimmed)
is_active (BOOLEAN, REQUIRED)
first_seen_date, last_seen_date (DATE, NULLABLE)

`payroll_analytics.dim_agent_hierarchy`

Purpose: Agent hierarchy dimension with parent-child relationships
Clustering: agent_id, parent_agent_id
Source: agent_hierarchy.csv reference file Key Columns:

agent_id (STRING, REQUIRED)
agent_name (STRING, REQUIRED)
parent_agent_id, parent_agent_name (STRING, NULLABLE)
hierarchy_level (INTEGER, NULLABLE)
is_root_owner (BOOLEAN, REQUIRED) - Robin/Richard flag
is_metrics_only (BOOLEAN, REQUIRED) - Richard Ballard exclusion flag
is_active, is_deleted (BOOLEAN, REQUIRED)
effective_date, end_date (DATE, NULLABLE) - SCD Type 2

Views

`payroll_analytics.analytics_agent_report_v5`

Purpose: TPA Producing Agent Commissions (matches V5 Excel output)
Source: integration/bigquery/sql/views/analytics_agent_report_v5.sql
Base Tables: stage3_snapshots, stage1_snapshots Key Columns:

tenant_id, period_label, agent_name, business_name
gross_payout (SUM of agent_total)
employee_count (from Stage 1, authoritative)
record_count, avg_pepm
cadence_codes, pay_periods
override_applied, rounding_applied (audit flags)

Note: View comment states “matches V5 Excel output” - this is conceptual alignment, not direct Excel ingestion.

`payroll_analytics.analytics_ceo_snapshot`

Purpose: CEO snapshot view that mirrors Excel/Python logic
Source: integration/bigquery/sql/views/analytics_ceo_snapshot.sql
Base Tables: stage1_snapshots Key Columns:

tenant_id, period_label
total_businesses, total_employees
gross_payout, chargebacks, net_payout
MoM changes: business_change_mom, employee_change_mom, gross_payout_change_mom, net_payout_change_mom
MoM percentages: business_change_pct_mom, employee_change_pct_mom, etc.

`payroll_analytics.analytics_business_growth_loss`

Purpose: Business growth/loss analysis
Source: integration/bigquery/sql/views/analytics_business_growth_loss.sql
Base Tables: stage1_snapshots Key Columns:

tenant_id, business_name, period_label
current_employees, previous_employees
net_change, pct_change

Other Analytics Views

Located in integration/bigquery/sql/views/:

analytics_agent_at_risk_businesses.sql
analytics_agent_attrition_trends.sql
analytics_agent_business_health_scores.sql
analytics_agent_business_lifecycle.sql
analytics_agent_business_recovery.sql
analytics_agent_business_summary.sql
analytics_agent_churn_metrics_v.sql
analytics_agent_churn_summary.sql
analytics_agent_lost_businesses.sql
analytics_agent_portfolio_metrics.sql
analytics_agent_trends.sql
analytics_agent2_pepm_view.sql
analytics_churn_metrics.sql
analytics_owner_commission.sql
analytics_top_agents.sql
analytics_top_businesses.sql
business_health.sql
stage1_agent_detailed_export_vw.sql
top_anomalies.sql
v_business_absent_2m.sql
v_business_full_churn.sql
v_business_permanently_lost.sql
v_business_recovered.sql
v_business_watchlist.sql

Excel Dependencies

Important Note

BigQuery does NOT directly read Excel files. Excel workbooks are generated for human consumption only. However, Excel sheet structures may inform downstream processes, so we document them for completeness.

Excel Sheet Names (from `repo_b/excel_contract.py`)

V5 Mode Sheets

Required:

Agent Commissions - Agent commission breakdown
Audit View - Audit trail view

Executive Mode Sheets

Required:

CEO Snapshot - Executive KPIs
Business Summary - Business-level totals
Agent2 Breakdown - Agent commission breakdown (same as “Agent Commissions” with Posted Date)
Agent2 Audit - Audit trail (same as “Audit View” with Posted Date)
Full Detail - Detailed transaction data (Posted Date must be first column)

Optional:

Growth/Loss or Growth & Loss - Growth/loss analysis
Audit Checks - Data quality checks
Business Analytics - Business analytics

Excel Column Requirements

`Agent Commissions` / `Agent2 Breakdown`

Required Columns (from repo_b/excel_contract.py):

business - Business name
agent_first - Agent first name
agent_last - Agent last name
agent_pepm - Agent PEPM rate
pay_periods - Pay periods per year
employee_count - Employee count (Stage 1 count)
normalized_pepm - Normalized PEPM rate
agent_total - Total agent commission
note - Additional notes

Note: Agent2 Breakdown has Posted Date as first column (added by add_posted_date_first()).

`Full Detail`

Required: Posted Date must be first column
Other columns: Varies based on Stage 3 DataFrame structure

`CEO Snapshot`

Required Columns:

Metric - Metric name
Current Value - Metric value

`Business Summary`

Required Columns:

Business Name - Business name
Total Employees - Total employee count
Gross Commission - Gross commission amount
Chargebacks - Chargeback amount
Net Commission - Net commission amount

Excel → BigQuery Mapping (Indirect)

Since BigQuery doesn’t read Excel directly, the mapping is:

Excel Sheet → Repo B DataFrame → BigQuery Table

Example:

Agent2 Breakdown Excel sheet → build_agent2_breakdown() DataFrame → Not directly loaded (Excel is output only)
Stage 3 DataFrame (source of Agent2 Breakdown) → stage3_snapshots table

CSV/DataFrame Mappings

Stage 1 → `stage1_snapshots`

Source: Repo B Stage 1 DataFrame (from canonical JSON)
ETL Method: _upload_stage1_v10_3() Column Mappings:

DataFrame columns map directly to BigQuery columns
period_label converted to DATE format (YYYY-MM-01)
member_id converted to STRING
pay_periods converted to INT64
period_code derived from period column if missing

Stage 3 → `stage3_snapshots`

Source: Repo B Stage 3 DataFrame (from output_adapter.py)
ETL Method: _upload_stage3_v10_3() Column Mappings:

business_label → business_name (if business_name not present)
employee_count → absorbed_count (if absorbed_count not present)
business_id generated via BusinessIDGenerator.get_or_create_business_id()
commission_type derived from is_tpa_commission and owner_residual_total
pay_periods mapped from period_code: {'W': 52, 'BW': 26, 'SM': 24, '4M': 48, 'M': 12}
Currency columns cleaned (remove $, strip spaces, convert to float)

CEO Metrics → `ceo_metrics`

Source: Repo B CEO metrics dictionary
ETL Method: _upload_ceo_metrics_v10_3() Column Mappings:

Dictionary keys map directly to BigQuery columns
period_label converted to DATE format

Growth/Loss → `growth_loss`

Source: Repo B Growth/Loss DataFrame (from repo_b/reporting/growth_analysis.py or similar)
ETL Method: Not yet implemented in ETL loader (placeholder) Expected Mappings:

business_name → business_name
current_employees → current_employees
previous_employees → previous_employees
net_change → net_change
percent_change → percent_change

Change Policy

What Repo B CAN Change

✅ Safe Changes:

Add new optional Excel sheets (e.g., Growth & Loss, Business Analytics)
Add new columns to Excel sheets (as long as required columns remain)
Change internal DataFrame column names (as long as ETL mapping is updated)
Add new BigQuery tables/views (with proper schema definitions)
Modify Excel formatting/styling (doesn’t affect BigQuery)

What Repo B MUST NOT Change

❌ Breaking Changes:

Required Excel Sheet Names:
- Agent Commissions (V5 mode) - DO NOT RENAME
- Agent2 Breakdown (Executive mode) - DO NOT RENAME
- Full Detail - DO NOT RENAME
- CEO Snapshot - DO NOT RENAME
- Business Summary - DO NOT RENAME
Required Excel Columns:
- All columns listed in REQUIRED_COLUMNS dict (repo_b/excel_contract.py)
- Posted Date must remain first column in Full Detail and Agent2 Breakdown
BigQuery Schema:
- DO NOT remove or rename existing columns in tables
- DO NOT change column types (e.g., FLOAT → INTEGER)
- DO NOT change partitioning/clustering fields
- DO NOT remove required fields (tenant_id, period_label, created_at, etc.)
Data Types:
- period_label must remain DATE type (YYYY-MM-DD format)
- agent_total, gross_payout, etc. must remain FLOAT/NUMERIC
- pay_periods must remain INTEGER
ETL Column Mappings:
- If changing DataFrame column names, MUST update ETL loader mappings
- business_label → business_name mapping must be preserved
- employee_count → absorbed_count mapping must be preserved
View Dependencies:
- Views depend on specific table columns - changes to base tables may break views
- analytics_agent_report_v5 expects downline_agent_name, agent_total, tpa_applied
- analytics_ceo_snapshot expects business_label, member_id, credit, debit, total

Migration Strategy

If schema changes are needed:

Additive Changes Only: Add new columns, don’t remove existing ones
Versioning: Use pipeline_version field to track schema versions
Backward Compatibility: Maintain old column names alongside new ones during transition
Documentation: Update this contract document when changes are made
Testing: Validate ETL loader and views after schema changes

Summary: What BQ Expects / What Repo B Must Preserve

Critical Requirements

Excel Sheet Names: Agent Commissions, Agent2 Breakdown, Full Detail, CEO Snapshot, Business Summary must not change
Excel Columns: Required columns in REQUIRED_COLUMNS must remain present
BigQuery Tables: Schema must remain stable - no column removals or type changes
Data Types: period_label (DATE), currency fields (FLOAT), pay_periods (INTEGER)
ETL Mappings: Column mappings in etl_loader.py must be maintained
Partitioning: period_label partitioning must be preserved for cost control

Safe to Change

Optional Excel sheets
Excel formatting/styling
Internal DataFrame column names (with ETL updates)
New tables/views (with proper schemas)

End of Contract Document

Start Here

Operate & Support

Reference

Build & Integrate

​BigQuery Contract - Payroll Commission Pipeline

​Summary (3–6 bullets)

​When to use this (3–6 bullets)

​What you’ll walk away with (2–5 bullets)

​Table of Contents

​Source of Truth

​Schema Definitions

​ETL Loader

​Data Flow

​Primary Path (BigQuery Ingestion)

​Secondary Path (Excel Outputs)

​View Alignment

​Datasets

​1. payroll_raw (Ingestion Layer)

​2. payroll_processed (Transformation Layer)

​3. payroll_analytics (BI Layer)

​Tables

​payroll_raw.stage1_snapshots

​payroll_processed.stage3_snapshots

​payroll_analytics.ceo_metrics

​payroll_analytics.growth_loss

​payroll_analytics.audit_checks

​Dimension Tables

​payroll_analytics.dim_business_mapping

​payroll_analytics.dim_agent_hierarchy

​Views

​payroll_analytics.analytics_agent_report_v5

​payroll_analytics.analytics_ceo_snapshot

​payroll_analytics.analytics_business_growth_loss

​Other Analytics Views

​Excel Dependencies

​Important Note

​Excel Sheet Names (from repo_b/excel_contract.py)

​V5 Mode Sheets

​Executive Mode Sheets

​Excel Column Requirements

​Agent Commissions / Agent2 Breakdown

​Full Detail

​CEO Snapshot

​Business Summary

​Excel → BigQuery Mapping (Indirect)

​CSV/DataFrame Mappings

​Stage 1 → stage1_snapshots

​Stage 3 → stage3_snapshots

​CEO Metrics → ceo_metrics

​Growth/Loss → growth_loss

​Change Policy

​What Repo B CAN Change

​What Repo B MUST NOT Change

​Migration Strategy

​Summary: What BQ Expects / What Repo B Must Preserve

​Critical Requirements

​Safe to Change