Skip to main content

Semantic Layer Implementation - Revision Summary

Revision Requests Addressed

1. ✅ NUMERIC/BIGNUMERIC for Money Fields

Changed: All FLOAT64 money fields → NUMERIC(18, 2) Files Updated:
  • semantic_view_commission_traceability.sql:
    • stage1_credit, stage1_debit, stage1_total, allocated_commission_amountNUMERIC(18, 2)
Rationale: NUMERIC provides exact decimal precision for financial calculations, preventing floating-point rounding errors.

2. ✅ Stage1 Row-Grain Redesign

Changed: view_commission_traceability grain from transaction-level → Stage1 row-level New Grain: (tenant_id, period_label, stage1_row_id, component_type) - one row per Stage1 record + component allocation Key Fields Added:
  • stage1_row_id: Deterministic hash of (tenant_id, period_label, business_label, member_id, payee_agent_id, period_code)
  • allocated_commission_amount: How much of this Stage1 row went to each component (proportional allocation)
  • component_type: TPA, OWNER, MIXED_OWNER, etc.
Supports: “12 transactions” drilldown - shows how each Stage1 row is split across commission components.

3. ✅ Deterministic IDs (No created_at)

Changed: Removed created_at from ID generation Old: CONCAT(..., CAST(created_at AS STRING))
New: TO_HEX(SHA256(CONCAT(tenant_id, period_label, business_label, member_id, payee_agent_id, period_code)))
Fields Updated:
  • stage1_row_id: Deterministic hash (no timestamps)
  • rule_hash: Deterministic from inputs + plan_version_id (no timestamps)
Rationale: IDs must be stable across reruns for version comparison views.

4. ✅ Proper Stubbing Strategy

Changed: Stub views now reference empty tables (not WHERE 1=0) Files Created:
  • dim_plan_versions.sql: Empty table with schema
  • dim_modifier_windows.sql: Empty table with schema
  • semantic_view_plan_versions.sql: Queries empty table (returns 0 rows)
  • semantic_view_modifier_windows.sql: Queries empty table (returns 0 rows)
Rationale: Views can deploy successfully even when tables are empty. No WHERE 1=0 workarounds needed.

5. ✅ Tenant-Aware dim_rule_definitions

Changed: Added tenant_id and plan_version_id to schema Schema Updates:
CREATE TABLE dim_rule_definitions (
  tenant_id STRING NOT NULL,  -- NEW
  rule_id STRING NOT NULL,
  plan_version_id STRING,  -- NEW (optional)
  effective_start_date DATE,  -- NEW
  effective_end_date DATE,  -- NEW
  ...
)
CLUSTER BY tenant_id, rule_id, rule_type;
Initial Data: Populated with 3 rules for creative_benefit_strategies tenant:
  • TPA_PEPM_RATE
  • OWNER_RESIDUAL
  • MIXED_OWNER_RESIDUAL

6. ✅ Tenant_ID in All JOIN ON Clauses

Changed: All dimension table JOINs include tenant_id in ON clause Pattern:
LEFT JOIN rule_metadata rm
  ON s1c.tenant_id = rm.tenant_id  -- Tenant-scoped join
  AND s1c.rule_id = rm.rule_id
Applied To:
  • business_mapping join
  • agent_hierarchy join
  • rule_metadata join
  • plan_metadata join
Rationale: Ensures tenant isolation and prevents cross-tenant data leakage.

7. ✅ QUALIFY ROW_NUMBER = 1 for Join Uniqueness

Changed: Replaced DISTINCT with QUALIFY ROW_NUMBER() = 1 pattern Pattern:
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY tenant_id, rule_id
  ORDER BY effective_start_date DESC
) = 1
Applied To:
  • rule_metadata CTE (ensures one rule per tenant+rule_id)
  • plan_metadata CTE (ensures one plan per tenant+plan_version_id)
  • business_mapping CTE (ensures one business per tenant+normalized_name)
  • agent_hierarchy CTE (ensures one agent per tenant+agent_id)
  • stage1_with_components CTE (ensures one row per Stage1+component)
Rationale: Explicit uniqueness strategy prevents row multiplication without DISTINCT band-aids.

8. ✅ Enhanced QA Plan

Added Tests:
  1. No-Row-Drop Checks:
    -- Count Stage1 rows vs traceability rows (should match or exceed)
    SELECT 
      (SELECT COUNT(DISTINCT stage1_row_id) FROM stage1_snapshots 
       WHERE tenant_id = 'creative_benefit_strategies' AND period_label = '2025-08-01') as stage1_count,
      (SELECT COUNT(DISTINCT stage1_row_id) FROM view_commission_traceability 
       WHERE tenant_id = 'creative_benefit_strategies' AND period_label = '2025-08-01') as traceability_count
    -- traceability_count should be >= stage1_count (one Stage1 row can have multiple components)
    
  2. Deterministic ID Stability:
    -- Run same query twice, compare stage1_row_id values
    -- Should be identical (no created_at dependency)
    SELECT stage1_row_id, COUNT(*) as id_count
    FROM view_commission_traceability
    WHERE tenant_id = 'creative_benefit_strategies' AND period_label = '2025-08-01'
    GROUP BY stage1_row_id
    HAVING id_count > 1  -- Should return 0 rows (IDs are unique)
    
  3. Full Tie-Out:
    -- Stage1 → Traceability → Stage3 reconciliation
    WITH stage1_total AS (
      SELECT SUM(stage1_total) as total
      FROM view_commission_traceability
      WHERE tenant_id = 'creative_benefit_strategies' AND period_label = '2025-08-01'
    ),
    traceability_total AS (
      SELECT SUM(allocated_commission_amount) as total
      FROM view_commission_traceability
      WHERE tenant_id = 'creative_benefit_strategies' AND period_label = '2025-08-01'
    ),
    stage3_total AS (
      SELECT SUM(agent_total) as total
      FROM stage3_snapshots
      WHERE tenant_id = 'creative_benefit_strategies' AND period_label = '2025-08-01'
    )
    SELECT 
      s1.total as stage1_gross,
      t.total as traceability_allocated,
      s3.total as stage3_total,
      ABS(s1.total - s3.total) as reconciliation_delta
    FROM stage1_total s1, traceability_total t, stage3_total s3
    -- reconciliation_delta should be < 0.01 (rounding tolerance)
    

SQL Files Created

Tables

  1. integration/bigquery/sql/tables/dim_rule_definitions.sql - Tenant-aware rule metadata
  2. integration/bigquery/sql/tables/dim_plan_versions.sql - Empty stub table
  3. integration/bigquery/sql/tables/dim_modifier_windows.sql - Empty stub table

Views

  1. integration/bigquery/sql/views/semantic_view_commission_traceability.sql - Stage1 grain, NUMERIC, deterministic IDs
  2. integration/bigquery/sql/views/semantic_view_plan_versions.sql - Stub view (queries empty table)
  3. integration/bigquery/sql/views/semantic_view_modifier_windows.sql - Stub view (queries empty table)

Deployment Order

  1. Create dimension tables (empty stubs OK):
    -- Run dim_plan_versions.sql
    -- Run dim_modifier_windows.sql
    -- Run dim_rule_definitions.sql (includes initial data)
    
  2. Create semantic views:
    -- Run semantic_view_commission_traceability.sql
    -- Run semantic_view_plan_versions.sql
    -- Run semantic_view_modifier_windows.sql
    
  3. Validate:
    • Run QA tests (no-row-drop, deterministic IDs, full tie-out)
    • Check partition pruning (EXPLAIN plan)
    • Verify tenant isolation

Key Design Decisions

Stage1 Grain Allocation Logic

The view uses proportional allocation to map Stage1 rows to Stage3 components:
  1. Aggregate Stage1 by business: SUM(stage1_total) per business
  2. Aggregate Stage3 by business: SUM(agent_total) per business
  3. Calculate ratio: stage3_total / stage1_total
  4. Allocate each Stage1 row proportionally: (Stage1 row total) * (component amount / business Stage3 total)
Example:
  • Stage1: Business “NUG” has 10 rows totaling $6,883.11
  • Stage3: Business “NUG” has 4 components totaling $6,883.11
    • Agent1: $1,106.30
    • Agent2: $1,553.62
    • Agent3: $442.52
    • Owner: $3,766.36
  • Each Stage1 row gets allocated proportionally across these 4 components
Result: One Stage1 row can produce multiple traceability rows (one per component).

Remaining Work

  • Create semantic_view_rep_performance.sql (not yet implemented)
  • Create semantic_view_owner_hierarchy.sql (not yet implemented)
  • Add query functions to api/bigquery/queries.py
  • Create validation test suite (tests/test_semantic_views.py)