Identity Stability Guarantees - Audit-Grade Documentation
Date: 2025-01-XXStatus: ⚠️ REQUIRES IMMEDIATE FIXES
Impact: Critical for semantic view deterministic IDs and tenant isolation
Executive Summary
This document provides audit-grade guarantees for business_id and agent identity stability, which are foundational for:- Deterministic
stage1_row_idandstage3_component_idhashes - Tenant isolation enforcement
- Historical data continuity
- Commission traceability accuracy
agent_key pattern.
1. Business ID Stability
1.1 Current Implementation
Generation Location:utils/business_id_generator.pyGeneration Method: MD5 hash of normalized business name (first 16 hex characters)
Normalization Formula:
UPPER(TRIM(business_name))
Code Reference:
- CSV file:
data/reference/business_id_mapping.csv - BigQuery table:
payroll-bi-gauntlet.payroll_analytics.dim_business_mapping(orpayroll_raw.dim_business_mapping)
1.2 Stability Guarantees
✅ GUARANTEED STABLE:
- Deterministic Generation: Same
normalized_name→ samebusiness_id(MD5 hash) - Cross-Period Stability:
business_iddoes not change across months/periods - Cross-Run Stability: Re-running same period generates identical
business_id - Immutable Once Assigned: No automatic regeneration of existing IDs
❌ NOT GUARANTEED (Critical Gaps):
-
Missing
tenant_idin Schema:- Current schema (
dim_business_mapping_schema.json) does NOT includetenant_id - Risk: Cannot enforce tenant isolation at dimension level
- Impact: Cross-tenant business name collisions possible
- Current schema (
-
No MERGE Strategy:
- Current loader (
load_reference_tables.py) usesWRITE_TRUNCATE - Risk: Data loss on reload, no idempotent upserts
- Impact: Cannot safely update
last_seen_dateoris_activestatus
- Current loader (
-
No Upsert Key Definition:
- No explicit MERGE key documented
- Required MERGE Key:
(tenant_id, normalized_name)(once tenant_id added)
1.3 Required Fixes
Fix 1: Add tenant_id to Schema
File: integration/bigquery/schemas/dim_business_mapping_schema.json
Change:
Fix 2: Create MERGE SQL Script
File:integration/bigquery/sql/tables/dim_business_mapping.sql (NEW)
Required MERGE Statement:
(tenant_id, normalized_name)Immutable Field:
business_id (never updated in MERGE, only inserted)
Fix 3: Update BusinessIDGenerator to Include tenant_id
File:utils/business_id_generator.py
Change: Add tenant_id parameter to all methods and CSV schema.
1.4 Guarantee Statement (Post-Fix)
After fixes are applied:BUSINESS ID STABILITY GUARANTEE:Invariant Check:
For any given(tenant_id, normalized_name)tuple:
business_idis deterministically generated fromnormalized_nameusing MD5 hash (first 16 hex chars)business_idis immutable once assigned (never updated in MERGE operations)business_idis stable across periods and reruns (same input → same output)business_idis tenant-scoped (same normalized name in different tenants → different IDs)- MERGE operations use
(tenant_id, normalized_name)as the unique key- No automatic regeneration occurs for existing businesses
2. Agent Identity Stability
2.1 Current Implementation
Problem: Upstreamagent_id is NOT stable or may be missing.
Current Approach:
agent_idis minted fromagent_nameusingmint_agent_id_from_name()function- Function generates deterministic hash-based IDs:
"NAME:" + SHA256(normalized_name)[:12] - Used in
stage3_snapshotsand semantic views
repo_b/upload_to_bq.py):
2.2 Stability Issues
❌ NOT GUARANTEED:
- Name Variations: “Kenny Young” vs “KENNY YOUNG” → different hashes (if normalization inconsistent)
- No Canonical Agent Key: Current system uses
agent_idwhich is derived, not authoritative - Hierarchy References:
dim_agent_hierarchyusesagent_idbut source may be unstable - Semantic View Dependency:
stage3_component_idhash includesagent_id, making it fragile
2.3 Required Solution: agent_key Pattern
Proposal: Create dim_agent_identity Table
Purpose: Canonical agent identity mapping with aliases and effective dates
Schema (integration/bigquery/schemas/dim_agent_identity_schema.json):
(tenant_id, agent_key)
Generation Logic:
2.4 Required Code Changes
Change 1: Update Semantic View to Use agent_key
File: integration/bigquery/sql/views/semantic_view_commission_traceability.sql
Current (line 65):
Change 2: Update Tenant Isolation Tests
File:tests/test_semantic_view_qa_invariants.sql
Current Check A uses agent_id:
agent_key):
2.5 Guarantee Statement (Post-Migration)
After migration toagent_key:
AGENT IDENTITY STABILITY GUARANTEE:
agent_keyis deterministically generated from normalizeddisplay_nameusing SHA256 hashagent_keyis immutable once assigned (never updated in MERGE)agent_keyis stable across periods and rerunsagent_keyis tenant-scoped (same name in different tenants → different keys)- Name variations are handled via
aliasesarray (all map to sameagent_key)- MERGE operations use
(tenant_id, agent_key)as the unique keyupstream_agent_idis nullable and may change without affectingagent_keystability
3. Required Migrations (Non-Breaking)
3.1 Business Mapping Migration
Steps:- Add
tenant_idcolumn to existingdim_business_mappingtable (ALTER TABLE) - Backfill
tenant_idfor existing rows (default to'creative_benefit_strategies') - Update
BusinessIDGeneratorto includetenant_idin all operations - Switch loader from
WRITE_TRUNCATEto MERGE-based upserts - Run QA invariant to verify no
business_idchurn
3.2 Agent Identity Migration
Steps:- Create
dim_agent_identitytable - Populate from existing
dim_agent_hierarchy+stage3_snapshots(extract unique agents) - Generate
agent_keyfor all existing agents - Update semantic view to use
agent_key(backward compatible: keepagent_id) - Update QA invariants to check
agent_keystability
4. QA Invariants
4.1 Business ID Churn Detection
File:tests/test_semantic_view_qa_invariants.sql (add new invariant)
4.2 Agent Key Stability
File:tests/test_semantic_view_qa_invariants.sql (add new invariant)
5. Summary: Guarantee Statements
✅ BUSINESS ID STABILITY (Post-Fix)
MERGE Key:(tenant_id, normalized_name)Immutable Field:
business_idGeneration: MD5 hash of
normalized_name (first 16 hex chars)Guarantee: Same
(tenant_id, normalized_name) → same business_id forever
✅ AGENT IDENTITY STABILITY (Post-Migration)
MERGE Key:(tenant_id, agent_key)Immutable Field:
agent_keyGeneration: SHA256 hash of normalized
display_name (prefix: "AGENT:" + 16 hex chars)Guarantee: Same
(tenant_id, normalized_name) → same agent_key foreverAlias Handling: Multiple name variations map to same
agent_key via aliases array
6. Action Items
Immediate (Blocking):
- ⚠️ Add
tenant_idtodim_business_mappingschema - ⚠️ Create MERGE SQL for
dim_business_mapping(key:tenant_id, normalized_name) - ⚠️ Update
BusinessIDGeneratorto includetenant_id - ⚠️ Migrate loader from
WRITE_TRUNCATEto MERGE
Short-term (Required for Semantic View):
- Create
dim_agent_identitytable - Update semantic view to use
agent_key(backward compatible) - Add QA invariants for ID stability checks
Long-term (Enhancement):
- Implement alias resolution for business names
- Add audit logging for ID generation
- Create admin UI for manual ID overrides
Status: ⚠️ CRITICAL FIXES REQUIRED BEFORE PRODUCTION USE