Skip to main content

Audit Views Documentation

Status: ✅ PRODUCTION READY
Phase: 1.5
Last Updated: 2025-01-XX

Overview

This document defines the six audit artifact views that provide audit-grade traceability for commission allocations, agent payouts, business lifecycle events, headcount changes, and churn analysis. All views are read-only projections and do not modify any existing Stage tables or semantic views. Key Principles:
  • Additive Only: No modifications to Phase 0 artifacts or existing stage tables/views
  • Read-Only: All views are projections only (no INSERT/UPDATE/DELETE)
  • Tenant Isolation: tenant_id must be included in ALL joins
  • Traceability: Money/allocation views MUST have stage3_component_id; headcount views have OPTIONAL stage3_component_ids
  • NUMERIC Safety: All money fields use NUMERIC (not FLOAT64)

View Inventory

  1. stage1_headcount_by_business_period - Performance helper view (pre-aggregates employee counts)
  2. audit_commission_summary - Component reconciliation summary (money/allocation - REQUIRES stage3_component_id)
  3. audit_agent_payout_trace - Detailed trace of component allocations to Stage1 rows (money/allocation - REQUIRES stage3_component_id)
  4. audit_business_lifecycle - Business lifecycle events with commission impact (lifecycle - OPTIONAL stage3_component_ids)
  5. audit_employee_headcount_timeline - Employee count changes over time (headcount-only - OPTIONAL stage3_component_ids)
  6. audit_business_churn_events - Churned businesses with peak snapshot and commission impact (churn - OPTIONAL stage3_component_ids)
  7. audit_business_churn_agent_impact - Churn events tied to affected agents (money/allocation - REQUIRES stage3_component_id)

Definitions

Attrition

Definition: Within-company employee reduction while the company remains active. Characteristics:
  • Company remains active (employee_count > 0)
  • Employee count decreased from previous period
  • Threshold: pct_change <= -0.15 AND prior_employee_count >= 20 (minimum base threshold to avoid noise)
Example: Business had 100 employees, now has 80 employees (20% attrition) ✅ Counter-Example: Business had 5 employees, now has 4 employees (20% change but < 20 base) → MINOR_CHANGE ⚠️

Churn

Definition: Business disappears or goes to 0 employees (company no longer active). Types:
  1. FULL_CHURN:
    • Business went from >0 employees to 0 employees
    • Still appears in stage1_snapshots with 0 count
    • churn_confidence: HIGH
  2. DISAPPEARED:
    • Business no longer appears in stage1_snapshots after last_active_period
    • Absent for >= grace_window consecutive expected periods (see Cadence Inference below)
    • churn_confidence: HIGH (if >= grace_window), MEDIUM (if < grace_window but > 1 period), LOW (single period or unknown cadence)

Growth

Definition: Within-company employee increase while the company remains active. Characteristics:
  • Company remains active (employee_count > 0)
  • Employee count increased from previous period
  • Threshold: pct_change >= 0.15 AND prior_employee_count >= 20 (minimum base threshold to avoid noise)
Example: Business had 100 employees, now has 120 employees (20% growth) ✅ Counter-Example: Business had 5 employees, now has 6 employees (20% change but < 20 base) → MINOR_CHANGE ⚠️

MINOR_CHANGE

Definition: Small-base or small-percentage employee changes that don’t meet growth/attrition thresholds. Characteristics:
  • ABS(employee_change) > 0
  • Either prior_employee_count < 20 (small base) OR ABS(pct_change) < 0.15 (small percentage change)
Purpose: Avoids noise from small businesses or minor fluctuations.

Cadence Inference

Purpose

For DISAPPEARED churn detection, we need to determine the “grace window” (number of consecutive absent periods required to classify as churn). This depends on the tenant’s payroll cadence (weekly, bi-weekly, monthly, etc.).

Method

  1. Calculate Period Gaps: For each tenant, calculate diff_days between consecutive period_label values in stage1_headcount_by_business_period.
  2. Classify Cadence: Use MODE(diff_days) per tenant to determine the most common period gap:
    • diff_days ≈ 7: WEEKLY → grace_window = 6 periods
    • diff_days ≈ 14: BIWEEKLY → grace_window = 4 periods
    • diff_days ≈ 30: MONTHLY → grace_window = 2 periods
    • diff_days ≈ 60: BIMONTHLY → grace_window = 2 periods
    • Other/Unknown: UNKNOWN → grace_window = 2 periods (default), churn_confidence = LOW
  3. Apply Grace Window: A business is considered DISAPPEARED if it’s absent for >= grace_window consecutive expected periods.

Implementation

See audit_business_churn_events.sql for the full implementation using APPROX_TOP_COUNT to find the mode of diff_days per tenant.

Churn Confidence

The churn_confidence field in audit_business_churn_events indicates the reliability of churn detection:
  • HIGH:
    • FULL_CHURN (employee_count = 0), OR
    • DISAPPEARED with >= grace_window periods (known cadence)
  • MEDIUM:
    • DISAPPEARED with < grace_window but > 1 period
  • LOW:
    • Single period absence (may be data gap), OR
    • Unknown cadence (cannot determine expected period frequency)

Assumptions

1. Employee Count Authority

  • Authoritative Source: COUNT(DISTINCT member_id) from stage1_snapshots
  • Helper View: stage1_headcount_by_business_period pre-aggregates this calculation for performance
  • DO NOT USE: absorbed_count from stage3_snapshots (it’s a commission calculation artifact, not actual headcount)

2. Growth/Attrition Thresholds

  • Minimum Base Employee Count: prior_employee_count >= 20 (to avoid noise from small businesses)
  • Minimum Percentage Change: ABS(pct_change) >= 0.15 (15%)
  • Classification: Changes that don’t meet both thresholds are classified as MINOR_CHANGE

3. Churn Detection

  • FULL_CHURN: employee_count = 0 in current period AND employee_count > 0 in previous period
  • DISAPPEARED: Business appears in period N but is absent for >= grace_window consecutive expected periods
  • Grace Window: Determined by cadence inference (see Cadence Inference section above)
  • churn_confidence: Indicates reliability (HIGH/MEDIUM/LOW)

4. Commission Impact

  • All commission impact calculations are read-only projections from authoritative allocated_commission_amount_raw values in view_commission_traceability
  • No new payout logic is introduced
  • NULL Safety: loss_amount = COALESCE(commission_at_peak, 0) - COALESCE(commission_at_churn, 0) (positive = loss)

5. Peak Snapshot

  • Peak employee count and peak period are calculated as MAX over all historical periods for a business, regardless of current status

6. Agent Role Derivation

  • Agent roles (DIRECT/UPLINE/OVERRIDE) are derived from joins to stage1_snapshots and dim_agent_hierarchy
  • CRITICAL: Joins to stage1_snapshots using stage1_row_id MUST include tenant_id:
    ON v.tenant_id = s.tenant_id AND v.stage1_row_id = s.stage1_row_id
    
  • If role cannot be determined, it is marked as UNKNOWN

7. Traceability Requirements

  • Money/Allocation Views (#2, #3, #7): MUST have stage3_component_id (REQUIRED)
    • audit_commission_summary
    • audit_agent_payout_trace
    • audit_business_churn_agent_impact
  • Headcount-Only View (#5): stage3_component_id is OPTIONAL (may be NULL)
    • audit_employee_headcount_timeline
  • Lifecycle/Churn Views (#4, #6): Include stage3_component_ids when commission exists, NULL/empty array otherwise
    • audit_business_lifecycle
    • audit_business_churn_events

Validation Methods

Employee Count Validation

-- Validate employee counts match between audit view and helper view
SELECT 
  a.tenant_id,
  a.business_id,
  a.period_label,
  a.employee_count AS audit_count,
  h.employee_count AS helper_count,
  ABS(CAST(a.employee_count AS NUMERIC) - CAST(h.employee_count AS NUMERIC)) AS diff
FROM audit_employee_headcount_timeline a
LEFT JOIN stage1_headcount_by_business_period h
  ON a.tenant_id = h.tenant_id 
  AND a.business_id = h.business_id 
  AND a.period_label = h.period_label
WHERE ABS(CAST(a.employee_count AS NUMERIC) - CAST(h.employee_count AS NUMERIC)) > 0;
-- Expected: 0 rows (perfect match)

Commission Impact Validation

-- Validate commission impact (loss_amount) ties to authoritative source
SELECT 
  c.tenant_id,
  c.business_id,
  c.churn_period,
  c.commission_at_peak AS churn_view_peak,
  c.commission_at_churn AS churn_view_churn,
  c.loss_amount AS churn_view_loss,
  s.peak_commission AS source_peak,
  s.churn_commission AS source_churn,
  (s.peak_commission - s.churn_commission) AS source_loss
FROM audit_business_churn_events c
LEFT JOIN (
  SELECT 
    tenant_id,
    business_id,
    SUM(CASE WHEN period_label = peak_period THEN allocated_commission_amount_raw ELSE 0 END) AS peak_commission,
    SUM(CASE WHEN period_label = churn_period THEN allocated_commission_amount_raw ELSE 0 END) AS churn_commission
  FROM view_commission_traceability
  GROUP BY tenant_id, business_id
) s ON c.tenant_id = s.tenant_id AND c.business_id = s.business_id
WHERE ABS(CAST(c.commission_at_peak AS NUMERIC) - CAST(s.peak_commission AS NUMERIC)) > CAST('0.01' AS NUMERIC)
   OR ABS(CAST(c.commission_at_churn AS NUMERIC) - CAST(s.churn_commission AS NUMERIC)) > CAST('0.01' AS NUMERIC)
   OR ABS(CAST(c.loss_amount AS NUMERIC) - CAST((s.peak_commission - s.churn_commission) AS NUMERIC)) > CAST('0.01' AS NUMERIC);
-- Expected: 0 rows (tolerance: $0.01)
-- Note: loss_amount should be positive (commission_at_peak > commission_at_churn for churned businesses)

Churn Event Count Validation

-- Validate churn events are correctly identified with confidence levels
SELECT 
  churn_type,
  churn_confidence,
  COUNT(*) AS churn_count,
  COUNT(DISTINCT business_id) AS unique_businesses_churned
FROM audit_business_churn_events
WHERE tenant_id = 'creative_benefit_strategies'
GROUP BY churn_type, churn_confidence
ORDER BY churn_confidence DESC, churn_type;
-- Expected: 
-- - HIGH confidence churn events match manual review (FULL_CHURN or DISAPPEARED with >= grace_window)
-- - MEDIUM/LOW confidence events are flagged for review (may be data gaps)

-- Validate grace window logic for DISAPPEARED businesses
SELECT 
  business_id,
  business_name,
  last_active_period,
  churn_period,
  DATE_DIFF(churn_period, last_active_period, MONTH) AS months_absent,
  churn_confidence
FROM audit_business_churn_events
WHERE churn_type = 'DISAPPEARED'
  AND tenant_id = 'creative_benefit_strategies'
ORDER BY months_absent DESC;
-- Expected: DISAPPEARED businesses with HIGH confidence have >= 2 months absent (monthly) or >= 6 weeks (weekly)

Stage3 Component ID Traceability

-- Validate money/allocation views have stage3_component_id (REQUIRED)
SELECT 
  'audit_commission_summary' AS view_name,
  COUNT(*) AS total_rows,
  COUNT(stage3_component_id) AS rows_with_component_id,
  COUNT(*) - COUNT(stage3_component_id) AS rows_missing_component_id
FROM audit_commission_summary
UNION ALL
SELECT 
  'audit_agent_payout_trace',
  COUNT(*),
  COUNT(stage3_component_id),
  COUNT(*) - COUNT(stage3_component_id)
FROM audit_agent_payout_trace
UNION ALL
SELECT 
  'audit_business_churn_agent_impact',
  COUNT(*),
  COUNT(ARRAY_LENGTH(stage3_component_ids_at_peak)) + COUNT(ARRAY_LENGTH(stage3_component_ids_at_churn)),
  COUNT(*) - (COUNT(ARRAY_LENGTH(stage3_component_ids_at_peak)) + COUNT(ARRAY_LENGTH(stage3_component_ids_at_churn)))
FROM audit_business_churn_agent_impact;
-- Expected: rows_missing_component_id = 0 for money/allocation views (all rows traceable)

-- Validate headcount-only view has OPTIONAL stage3_component_id (may be NULL)
SELECT 
  'audit_employee_headcount_timeline' AS view_name,
  COUNT(*) AS total_rows,
  COUNT(ARRAY_LENGTH(stage3_component_ids)) AS rows_with_component_ids,
  COUNT(*) - COUNT(ARRAY_LENGTH(stage3_component_ids)) AS rows_without_component_ids
FROM audit_employee_headcount_timeline;
-- Expected: rows_without_component_ids >= 0 (NULL is valid for headcount-only view)

-- Validate lifecycle/churn views include stage3_component_ids when commission exists
SELECT 
  'audit_business_lifecycle' AS view_name,
  COUNT(*) AS total_rows,
  COUNT(ARRAY_LENGTH(stage3_component_ids)) AS rows_with_commission,
  COUNT(*) - COUNT(ARRAY_LENGTH(stage3_component_ids)) AS rows_without_commission
FROM audit_business_lifecycle;
-- Expected: rows_without_commission >= 0 (NULL/empty array is valid when no commission)

Tenant Isolation Validation

-- Validate tenant isolation: no cross-tenant data leakage
SELECT 
  'audit_commission_summary' AS view_name,
  COUNT(*) AS cross_tenant_rows
FROM audit_commission_summary a
INNER JOIN stage3_snapshots s3
  ON a.tenant_id != s3.tenant_id
  AND a.stage3_component_id = TO_HEX(SHA256(
    CONCAT(
      CAST(s3.tenant_id AS STRING), '|',
      CAST(s3.period_label AS STRING), '|',
      CAST(UPPER(TRIM(s3.business_name)) AS STRING), '|',
      CAST(s3.period_code AS STRING), '|',
      CAST(s3.agent_id AS STRING), '|',
      CAST(COALESCE(s3.commission_type, 'NULL') AS STRING), '|',
      CAST(COALESCE(s3.downline_agent_name, s3.agent_name, 'NULL') AS STRING)
    )
  ));
-- Expected: 0 rows (no cross-tenant matches)

Deployment Order

  1. First: stage1_headcount_by_business_period.sql (helper view - required by other views)
  2. Second: audit_business_churn_events.sql (required by churn_agent_impact)
  3. Third: audit_business_churn_agent_impact.sql (depends on churn_events)
  4. Independent: audit_commission_summary.sql, audit_agent_payout_trace.sql, audit_business_lifecycle.sql, audit_employee_headcount_timeline.sql (can be deployed in any order)

Notes

  • These views are additive - they do not modify Phase 0 or Phase 1 code
  • Views can be deployed independently (no migration dependencies)
  • Views are optimized for audit queries (not real-time UI filtering)
  • Employee count is authoritative at Stage1 level (COUNT(DISTINCT member_id))
  • Commission impact is read-only projection from authoritative values (no new payout logic)
  • All joins include tenant_id for tenant isolation
  • stage1_row_id joins MUST include tenant_id: ON v.tenant_id = s.tenant_id AND v.stage1_row_id = s.stage1_row_id

Validation Queries

These SQL queries validate data quality, tenant isolation, traceability completeness, and reconciliation accuracy. Run these periodically to ensure audit-grade integrity.

A) Cross-Tenant Leakage Check

Purpose: Verify tenant isolation - no data from one tenant should appear in another tenant’s results. Expected Result: 0 rows (any rows indicate tenant isolation failure)
-- Check for cross-tenant leakage in audit views
-- This query checks if any business_id or agent_id appears with mismatched tenant_id
SELECT 
  'audit_commission_summary' AS view_name,
  COUNT(*) AS leakage_count
FROM `payroll-bi-gauntlet.payroll_analytics.audit_commission_summary` acs
WHERE EXISTS (
  SELECT 1
  FROM `payroll-bi-gauntlet.payroll_analytics.dim_business_mapping` bm
  WHERE bm.business_id = acs.business_id
    AND bm.tenant_id != acs.tenant_id
)
UNION ALL
SELECT 
  'audit_agent_payout_trace' AS view_name,
  COUNT(*) AS leakage_count
FROM `payroll-bi-gauntlet.payroll_analytics.audit_agent_payout_trace` apt
WHERE EXISTS (
  SELECT 1
  FROM `payroll-bi-gauntlet.payroll_raw.stage1_snapshots` s1
  WHERE TO_HEX(SHA256(
    CONCAT(
      CAST(s1.tenant_id AS STRING), '|',
      CAST(s1.period_label AS STRING), '|',
      CAST(s1.business_label AS STRING), '|',
      CAST(s1.member_id AS STRING), '|',
      CAST(s1.payee_agent_id AS STRING), '|',
      CAST(s1.period_code AS STRING), '|',
      CAST(s1.credit AS STRING), '|',
      CAST(s1.debit AS STRING), '|',
      CAST(s1.total AS STRING)
    )
  )) = apt.stage1_row_id
    AND s1.tenant_id != apt.tenant_id
);

B) Money View Traceability Completeness

Purpose: Verify that money/allocation views have stage3_component_id where required. Expected Result: 0 rows (any rows indicate missing traceability)
-- Check for missing stage3_component_id in money/allocation views
SELECT 
  'audit_commission_summary' AS view_name,
  COUNT(*) AS rows_missing_component_id
FROM `payroll-bi-gauntlet.payroll_analytics.audit_commission_summary`
WHERE stage3_component_id IS NULL
UNION ALL
SELECT 
  'audit_agent_payout_trace' AS view_name,
  COUNT(*) AS rows_missing_component_id
FROM `payroll-bi-gauntlet.payroll_analytics.audit_agent_payout_trace`
WHERE stage3_component_id IS NULL
UNION ALL
SELECT 
  'audit_business_churn_agent_impact' AS view_name,
  COUNT(*) AS rows_missing_component_id
FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact`
WHERE ARRAY_LENGTH(stage3_component_ids_at_peak) = 0
  AND ARRAY_LENGTH(stage3_component_ids_at_churn) = 0
  AND commission_at_peak > 0;  -- Only flag if commission exists but no component_ids

C) Headcount Reconciliation

Purpose: Verify that headcount counts match authoritative Stage1 source (COUNT(DISTINCT member_id)). Expected Result: 0 rows (any rows indicate reconciliation failure)
-- Compare headcount from audit views vs direct Stage1 count
WITH audit_headcounts AS (
  SELECT 
    tenant_id,
    business_id,
    period_label,
    employee_count AS audit_employee_count
  FROM `payroll-bi-gauntlet.payroll_analytics.stage1_headcount_by_business_period`
),
stage1_direct_counts AS (
  SELECT 
    tenant_id,
    period_label,
    UPPER(TRIM(business_label)) AS business_name_normalized,
    COUNT(DISTINCT member_id) AS direct_employee_count
  FROM `payroll-bi-gauntlet.payroll_raw.stage1_snapshots`
  WHERE business_label IS NOT NULL
    AND TRIM(business_label) != ''
    AND period_label >= DATE('2024-01-01')
  GROUP BY tenant_id, period_label, UPPER(TRIM(business_label))
),
business_mapping AS (
  SELECT
    tenant_id,
    business_id,
    UPPER(TRIM(normalized_name)) AS normalized_name
  FROM `payroll-bi-gauntlet.payroll_analytics.dim_business_mapping`
  WHERE is_active = TRUE
  QUALIFY ROW_NUMBER() OVER (
    PARTITION BY tenant_id, UPPER(TRIM(normalized_name))
    ORDER BY COALESCE(last_seen_date, DATE '1900-01-01') DESC
  ) = 1
)
SELECT 
  ah.tenant_id,
  ah.business_id,
  ah.period_label,
  ah.audit_employee_count,
  s1.direct_employee_count,
  ABS(ah.audit_employee_count - s1.direct_employee_count) AS diff
FROM audit_headcounts ah
INNER JOIN business_mapping bm
  ON ah.tenant_id = bm.tenant_id
  AND ah.business_id = bm.business_id
INNER JOIN stage1_direct_counts s1
  ON ah.tenant_id = s1.tenant_id
  AND ah.period_label = s1.period_label
  AND bm.normalized_name = s1.business_name_normalized
WHERE ah.audit_employee_count != s1.direct_employee_count;

D) Churn View Determinism

Purpose: Verify that churn views use deterministic as-of dates (no CURRENT_DATE()). Expected Result: Query should run successfully and show asof_period values (not CURRENT_DATE())
-- Verify churn view uses tenant_asof (deterministic) not CURRENT_DATE()
-- This query checks that asof_period is derived from MAX(period_label) per tenant
WITH tenant_asof_check AS (
  SELECT 
    tenant_id,
    MAX(period_label) AS expected_asof_period
  FROM `payroll-bi-gauntlet.payroll_analytics.stage1_headcount_by_business_period`
  GROUP BY tenant_id
),
churn_events_check AS (
  SELECT DISTINCT
    tenant_id,
    churn_period,
    last_active_period
  FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_events`
  WHERE churn_type = 'DISAPPEARED'
)
SELECT 
  ta.tenant_id,
  ta.expected_asof_period,
  ce.churn_period,
  ce.last_active_period,
  DATE_DIFF(ta.expected_asof_period, ce.last_active_period, DAY) AS days_since_last_active
FROM tenant_asof_check ta
LEFT JOIN churn_events_check ce
  ON ta.tenant_id = ce.tenant_id
WHERE ce.churn_period IS NOT NULL
  AND ce.churn_period > ta.expected_asof_period;  -- Should be 0 rows (churn_period should be <= asof_period)

E) Commission Reconciliation

Purpose: Verify that component source totals reconcile with allocated totals (within tolerance). Expected Result: All rows should have reconciliation_status = 'RECONCILED' or abs_diff <= 0.01
-- Verify commission reconciliation: stage3_source_total vs allocated_total +/- unallocated_total
SELECT 
  tenant_id,
  period_label,
  business_id,
  stage3_component_id,
  stage3_source_total,
  allocated_total,
  unallocated_total,
  (allocated_total + unallocated_total) AS total_accounted,
  ABS(stage3_source_total - (allocated_total + unallocated_total)) AS abs_diff,
  reconciliation_status
FROM `payroll-bi-gauntlet.payroll_analytics.audit_commission_summary`
WHERE ABS(stage3_source_total - (allocated_total + unallocated_total)) > CAST('0.01' AS NUMERIC)
  AND reconciliation_status != 'RECONCILED'
ORDER BY abs_diff DESC
LIMIT 100;  -- Show top 100 reconciliation gaps

F) Early-History Tenant Churn Suppression

Purpose: Document and verify that tenants with insufficient timeline depth (< grace_window periods) do not emit false DISAPPEARED churn events. Expected Behavior: Tenants with < grace_window periods should have LOW confidence or no DISAPPEARED events.
-- Check for DISAPPEARED events from tenants with insufficient history
WITH tenant_period_counts AS (
  SELECT 
    tenant_id,
    COUNT(DISTINCT period_label) AS total_periods
  FROM `payroll-bi-gauntlet.payroll_analytics.stage1_headcount_by_business_period`
  GROUP BY tenant_id
),
churn_events_by_tenant AS (
  SELECT 
    tenant_id,
    COUNT(*) AS disappeared_count,
    COUNTIF(churn_confidence = 'HIGH') AS high_confidence_count,
    COUNTIF(churn_confidence = 'LOW') AS low_confidence_count
  FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_events`
  WHERE churn_type = 'DISAPPEARED'
  GROUP BY tenant_id
)
SELECT 
  tpc.tenant_id,
  tpc.total_periods,
  COALESCE(ce.disappeared_count, 0) AS disappeared_count,
  COALESCE(ce.high_confidence_count, 0) AS high_confidence_count,
  COALESCE(ce.low_confidence_count, 0) AS low_confidence_count,
  CASE
    WHEN tpc.total_periods < 3 THEN 'INSUFFICIENT_HISTORY'
    WHEN tpc.total_periods < 6 THEN 'EARLY_HISTORY'
    ELSE 'SUFFICIENT_HISTORY'
  END AS history_category
FROM tenant_period_counts tpc
LEFT JOIN churn_events_by_tenant ce
  ON tpc.tenant_id = ce.tenant_id
WHERE COALESCE(ce.disappeared_count, 0) > 0
ORDER BY tpc.total_periods ASC, ce.high_confidence_count DESC;
Note: Early-history tenants (with < grace_window periods) may suppress churn events until enough periods exist for reliable cadence inference. This is expected behavior and prevents false positives.

Validation Queries (Phase 1.5)

These queries validate the audit_business_churn_agent_impact view specifically, ensuring correctness, tenant isolation, role derivation, and NUMERIC-only math safety.

Q1: Key uniqueness (expect 0 rows)

Purpose: Verify that the view maintains one row per (tenant_id, business_id, agent_id) grain. Expected Result: 0 rows (any rows indicate duplicate keys)
SELECT tenant_id, business_id, agent_id, COUNT(*) c
FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact`
GROUP BY 1,2,3
HAVING COUNT(*) > 1;

Q2: Cross-tenant leakage sanity (expect 0 rows)

Purpose: Verify tenant isolation - no business should appear with mismatched tenant_id between churn_events and churn_agent_impact. Expected Result: 0 rows (any rows indicate tenant isolation failure)
SELECT *
FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact` a
JOIN `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_events` e
  ON a.business_id = e.business_id
 AND a.tenant_id != e.tenant_id
LIMIT 1;

Q3: Role distribution (UNKNOWN allowed, but should not dominate)

Purpose: Understand role derivation coverage - how many agents are classified as DIRECT, OVERRIDE, UPLINE, or UNKNOWN. Expected Result: UNKNOWN should be a small percentage; if it dominates, investigate stage1_row_id hash mismatch or missing hierarchy data.
SELECT agent_role, COUNT(*) c
FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact`
GROUP BY 1
ORDER BY c DESC;

Q4: Stage1 join coverage sanity (expect low miss rate; investigate spikes)

Purpose: Measure % of rows where role derivation fell back to UNKNOWN, grouped by tenant_id, business_id, and churn_period. High UNKNOWN rates may indicate stage1_row_id canonicalization mismatch. Expected Result: Low pct_unknown for most businesses; investigate spikes where pct_unknown > 50%.
-- Measure % of rows where role derivation fell back to UNKNOWN
-- grouped by tenant_id, business_id, and churn_period
SELECT
  tenant_id,
  business_id,
  churn_period,
  COUNT(*) AS total_rows,
  SUM(CASE WHEN agent_role = 'UNKNOWN' THEN 1 ELSE 0 END) AS unknown_rows,
  CAST(
    CASE
      WHEN COUNT(*) = 0 THEN NULL
      ELSE (CAST(SUM(CASE WHEN agent_role = 'UNKNOWN' THEN 1 ELSE 0 END) AS NUMERIC) / CAST(COUNT(*) AS NUMERIC)) * CAST(100 AS NUMERIC)
    END
  AS NUMERIC) AS pct_unknown
FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact`
GROUP BY tenant_id, business_id, churn_period
ORDER BY pct_unknown DESC, total_rows DESC;

Q5: NUMERIC-only division safety (expect percent_of_total_loss NULL when denom=0, never INF/NaN)

Purpose: Verify that percent_of_total_loss IS NULL when total_loss_amount=0, and that no SAFE_DIVIDE/FLOAT64 exists in the view definition. Expected Result: pct_nulls should match zero_or_null_denom (NULL only when denom=0/null). Also verify view definition has no SAFE_DIVIDE or FLOAT64 casts.
-- Verify percent_of_total_loss IS NULL when business total loss=0
-- and that no SAFE_DIVIDE/FLOAT64 exists in the view definition
WITH business_totals AS (
  SELECT
    tenant_id,
    business_id,
    SUM(CAST(loss_amount AS NUMERIC)) AS total_loss_amount
  FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact`
  GROUP BY tenant_id, business_id
)
SELECT
  a.tenant_id,
  COUNTIF(a.percent_of_total_loss IS NULL) AS pct_nulls,
  COUNTIF(bt.total_loss_amount IS NULL OR CAST(bt.total_loss_amount AS NUMERIC) = CAST(0 AS NUMERIC)) AS zero_or_null_denom,
  COUNT(*) AS total
FROM `payroll-bi-gauntlet.payroll_analytics.audit_business_churn_agent_impact` a
LEFT JOIN business_totals bt
  ON a.tenant_id = bt.tenant_id
 AND a.business_id = bt.business_id
GROUP BY a.tenant_id;
-- Expect: pct_nulls should match zero_or_null_denom (NULL only when denom=0/null)
-- Also verify view definition has no SAFE_DIVIDE or FLOAT64 casts
Note: To verify the view definition has no SAFE_DIVIDE or FLOAT64, query the INFORMATION_SCHEMA:
-- Check view definition for SAFE_DIVIDE or FLOAT64
SELECT
  view_name,
  view_definition
FROM `payroll-bi-gauntlet.payroll_analytics.INFORMATION_SCHEMA.VIEWS`
WHERE view_name = 'audit_business_churn_agent_impact'
  AND (
    view_definition LIKE '%SAFE_DIVIDE%'
    OR view_definition LIKE '%FLOAT64%'
  );
-- Expect: 0 rows (no SAFE_DIVIDE or FLOAT64 in definition)