Skip to main content

Legacy agent_id Minting Function & Join Usage Analysis

Date: 2025-01-XX
Purpose: Document current agent_id minting function and all join dependencies before migration to agent_key

1. Legacy agent_id Minting Function

Function Location

File: repo_b/upload_to_bq.py (lines 169-198)

Function Definition

def mint_agent_id_from_name(agent_name: str) -> str:
    """
    Generate stable agent_id from agent_name.
    
    Rules:
    - "OWNER" -> "OWNER"
    - "UNASSIGNED" -> "UNASSIGNED"  
    - Empty/NaN -> "UNASSIGNED"
    - Otherwise -> "NAME:" + sha256(normalized_name)[:12]
    
    Uses same normalization as owner detection (strip -> upper -> collapse whitespace).
    
    Args:
        agent_name: Agent name string
        
    Returns:
        Stable agent_id string
    """
    normalized = normalize_agent_name(agent_name)
    
    if normalized == "":
        return "UNASSIGNED"
    
    if normalized in {"OWNER", "UNASSIGNED"}:
        return normalized
    
    # Generate stable hash-based ID
    hash_obj = hashlib.sha256(normalized.encode('utf-8'))
    hash_hex = hash_obj.hexdigest()[:12]
    return f"NAME:{hash_hex}"

Normalization Function

def normalize_agent_name(name: str) -> str:
    """Normalize agent name: strip, upper, collapse spaces"""
    if pd.isna(name) or name is None:
        return ""
    return re.sub(r'\s+', ' ', str(name).strip().upper())

Can It Be Frozen Forever?

✅ YES - Function Can Be Frozen Rationale:
  1. Deterministic: Same input → same output (SHA256 hash)
  2. No External Dependencies: Pure function, no database lookups
  3. Stable Normalization: normalize_agent_name() is deterministic
  4. Backward Compatible: Existing agent_id values remain valid
Freeze Strategy:
  • Move function to utils/agent_id_legacy.py (frozen module)
  • Mark as @deprecated with migration path to agent_key
  • Keep for backward compatibility during migration
  • Document that new code should use agent_key from dim_agent_identity
Frozen Function Location (Proposed):
utils/agent_id_legacy.py  # Frozen, never modified

2. agent_id Join Usage Analysis

2.1 BigQuery Views (SQL Files)

A. semantic_view_commission_traceability.sql

File: integration/bigquery/sql/views/semantic_view_commission_traceability.sql Join Usage:
  1. Line 227: PARTITION BY tenant_id, agent_id (QUALIFY deduplication)
  2. Line 371: AND s1c.agent_id = ah.agent_id (LEFT JOIN to dim_agent_hierarchy)
Impact: ⚠️ CRITICAL - Used in semantic view for hierarchy joins Migration Path:
  • Add agent_identity CTE join
  • Use agent_key for stage3_component_id hash
  • Keep agent_id join for backward compatibility during transition

B. bq_schema_agents.sql

File: integration/bigquery/sql/tables/bq_schema_agents.sql Join Usage:
  1. Line 108: ON a.agent_id = c.agent_id (agent to commission join)
  2. Line 112: ON a.agent_id = s.agent_id (agent to snapshot join)
Impact: ⚠️ HIGH - Used in agent schema views Migration Path: Update joins to use agent_key via dim_agent_identity

C. stage1_agent_detailed_export_vw.sql

File: integration/bigquery/sql/views/stage1_agent_detailed_export_vw.sql Join Usage:
  1. Line 28: AND payee_agent_id = @agent_id (WHERE clause filter)
Impact: ⚠️ MEDIUM - Used for agent-specific exports Migration Path: Convert @agent_id parameter to @agent_key, join via dim_agent_identity to resolve

2.2 API Query Functions (api/bigquery/queries.py)

A. get_agent_stage1_detail() (Lines 4645-5082)

Function: Get row-level Stage-1 credit detail for an agent Join Usage:
  1. Line 4839: AND s.{agent_id_col} = 721995 (hardcoded Robin Bundy filter)
  2. Line 4840: AND s.{agent_id_col} != 668078 (exclude Richard Ballard)
  3. Line 4918-4919: Same hardcoded filters in subquery
  4. Line 5005-5006: Same hardcoded filters in summary query
  5. Line 5081-5082: Same hardcoded filters in count query
Impact: ⚠️ CRITICAL - Hardcoded numeric IDs (721995, 668078) are NOT generated by mint_agent_id_from_name() Note: These are upstream numeric IDs from source data, not minted IDs. They should be preserved as upstream_agent_id in dim_agent_identity. Migration Path:
  • Keep hardcoded filters (they’re business requirements)
  • Map to agent_key via dim_agent_identity.upstream_agent_id

B. get_agent_commission_with_authoritative_pepm() (Lines 6206-6271)

Function: Get agent commission with authoritative PEPM Join Usage:
  1. Line 6244: WHERE payee_agent_id IN (721995, 668078) (hardcoded numeric IDs)
Impact: ⚠️ MEDIUM - Uses hardcoded upstream IDs Migration Path: Same as above - preserve as upstream_agent_id

C. get_agent_metrics_range() (Lines 2923-3174)

Function: Get agent metrics for date range Join Usage:
  • No direct agent_id joins - Uses agent_name for filtering
Impact: ✅ LOW - Already uses agent_name, no migration needed

D. Window Functions

Line 524: PARTITION BY agent_id, business_name, period_label Impact: ⚠️ MEDIUM - Used in windowing for deduplication Migration Path: Update to use agent_key in partition

2.3 API Routes (api/routes/)

A. agents.py

File: api/routes/agents.py Join Usage:
  • Routes extract agent_name from JWT token
  • Pass agent_name to query functions (not agent_id)
Impact: ✅ LOW - Already uses agent_name, no direct agent_id joins

2.4 Dimension Tables

A. dim_agent_hierarchy

File: integration/bigquery/sql/tables/bq_schema_agents.sql (referenced) Join Usage:
  • Primary key: agent_id
  • Used in semantic view join (line 371)
Impact: ⚠️ HIGH - Dimension table uses agent_id as primary key Migration Path:
  • Option 1: Add agent_key column, keep agent_id for backward compatibility
  • Option 2: Create mapping view that joins dim_agent_hierarchy to dim_agent_identity

B. dim_users

File: Referenced in multiple scripts Join Usage:
  • Stores agent_id for user records
  • Used in authentication/authorization
Impact: ⚠️ MEDIUM - User records reference agent_id Migration Path: Add agent_key column, migrate gradually

3. Summary: Join Dependencies

Critical Dependencies (Must Migrate)

  1. semantic_view_commission_traceability - Uses agent_id in hierarchy join
  2. bq_schema_agents.sql - Uses agent_id in multiple joins
  3. ⚠️ Hardcoded numeric IDs (721995, 668078) - These are NOT minted IDs, preserve as upstream_agent_id

Medium Dependencies (Should Migrate)

  1. stage1_agent_detailed_export_vw.sql - Uses @agent_id parameter
  2. Window functions - Use agent_id in PARTITION BY
  3. dim_agent_hierarchy - Primary key is agent_id

Low Dependencies (No Migration Needed)

  1. API routes - Already use agent_name (no agent_id joins)
  2. Most query functions - Filter by agent_name, not agent_id

4. Migration Strategy

Phase 1: Freeze Legacy Function

  1. ✅ Move mint_agent_id_from_name() to utils/agent_id_legacy.py
  2. ✅ Mark as @deprecated with migration notice
  3. ✅ Document that new code should use agent_key

Phase 2: Create dim_agent_identity

  1. ✅ Create table (SQL already exists)
  2. ✅ Populate from existing stage3_snapshots data
  3. ✅ Map existing agent_id values to agent_key

Phase 3: Update Semantic View (Backward Compatible)

  1. Add agent_identity CTE join
  2. Update stage3_component_id hash to use agent_key (fallback to agent_id)
  3. Keep agent_id join to dim_agent_hierarchy for backward compatibility
  4. Add agent_key to final SELECT

Phase 4: Update Other Views/Queries

  1. Update bq_schema_agents.sql joins to use agent_key
  2. Update window functions to use agent_key
  3. Update stage1_agent_detailed_export_vw.sql to accept @agent_key

Phase 5: Deprecate agent_id (Future)

  1. Mark agent_id columns as deprecated
  2. Migrate all queries to agent_key
  3. Remove agent_id columns (breaking change, requires coordination)

5. Hardcoded Numeric IDs (721995, 668078)

Important Discovery

These numeric IDs are NOT generated by mint_agent_id_from_name(). They are:
  • Upstream source IDs from payroll data
  • Hardcoded business requirements (Robin Bundy = 721995, Richard Ballard = 668078)
  • Used in WHERE clauses for Stage1 filtering

Handling Strategy

  1. Preserve in dim_agent_identity: Store as upstream_agent_id
  2. Keep hardcoded filters: These are business requirements, not technical artifacts
  3. Map to agent_key: Create mapping so queries can use agent_key but still filter by upstream ID when needed

Example Mapping

-- In dim_agent_identity:
agent_key: 'AGENT:robin_bundy_canonical'
display_name: 'ROBIN BUNDY'
upstream_agent_id: '721995'  -- Preserve hardcoded ID

6. Freeze Guarantee Statement

LEGACY agent_id MINTING FUNCTION FREEZE GUARANTEE:
The function mint_agent_id_from_name() in repo_b/upload_to_bq.py is FROZEN FOREVER as of [DATE].
Frozen Behavior:
  • Same agent_name input → same agent_id output (deterministic SHA256 hash)
  • Normalization: UPPER(TRIM(collapse_spaces(agent_name)))
  • Special cases: "OWNER""OWNER", empty → "UNASSIGNED"
  • Format: "NAME:" + SHA256(normalized)[:12]
Migration Path:
New code MUST use agent_key from dim_agent_identity table.
Legacy agent_id values remain valid for backward compatibility during migration period.
Frozen Function Location: utils/agent_id_legacy.py (never modified)

7. Action Items

Immediate

  1. ✅ Document current mint_agent_id_from_name() function
  2. ✅ Identify all agent_id join dependencies
  3. ⏳ Create frozen function module (utils/agent_id_legacy.py)

Short-term

  1. ⏳ Create dim_agent_identity table
  2. ⏳ Populate with existing agents (map agent_idagent_key)
  3. ⏳ Update semantic view to use agent_key (backward compatible)

Long-term

  1. ⏳ Migrate all views/queries to agent_key
  2. ⏳ Deprecate agent_id columns
  3. ⏳ Remove agent_id dependencies (breaking change)

Status: ✅ ANALYSIS COMPLETE - Ready for migration planning