Skip to main content

Plan Red Flags & Alignment Review

Date: 2025-01-XX
Status: πŸ”΄ CRITICAL RED FLAGS FOUND

πŸ”΄ RED FLAG #1: dim_agent_hierarchy Missing tenant_id

Issue

Semantic view references tenant_id in agent_hierarchy CTE, but schema doesn’t have it Location: integration/bigquery/sql/views/semantic_view_commission_traceability.sql (lines 214, 227) Current Code:
agent_hierarchy AS (
  SELECT
    tenant_id,  -- ❌ NOT IN SCHEMA
    agent_id,
    agent_name,
    ...
  FROM `payroll-bi-gauntlet.payroll_analytics.dim_agent_hierarchy`
  ...
  QUALIFY ROW_NUMBER() OVER (
    PARTITION BY tenant_id, agent_id  -- ❌ tenant_id doesn't exist
    ...
  ) = 1
)
Schema Reality: dim_agent_hierarchy_schema.json has NO tenant_id field Impact:
  • ⚠️ CRITICAL: View will fail at runtime (column not found)
  • ⚠️ Tenant isolation broken: Cannot enforce tenant scoping in hierarchy joins
  • ⚠️ QA tests will fail: Tenant isolation checks assume tenant_id exists
Required Fix:
  1. Add tenant_id to dim_agent_hierarchy schema
  2. Create migration script to backfill tenant_id
  3. Update loader to include tenant_id
  4. OR: Remove tenant_id from semantic view CTE (if single-tenant assumption)

πŸ”΄ RED FLAG #2: agent_key Generation Inconsistency

Issue

Three different agent_key generation strategies documented Strategy A (Execution Plan - Line 102):
CONCAT('AGENT:', TO_HEX(SHA256(UPPER(TRIM(agent_name))))[:16]) AS agent_key
Strategy B (Guarantees Doc - Line 275):
hash_obj = hashlib.sha256(normalized.encode('utf-8'))
return f"AGENT:{hash_obj.hexdigest()[:16]}"
Strategy C (dim_agent_identity.sql Example - Line 31):
'AGENT:721995' AS agent_key,  -- Example: use upstream ID if stable
Problem:
  • Strategy A/B: Hash-based (deterministic, but different from legacy agent_id)
  • Strategy C: Uses upstream numeric ID (721995) - NOT deterministic from name
Impact:
  • ⚠️ Inconsistent keys: Same agent could get different agent_key depending on which strategy is used
  • ⚠️ Breaks determinism: If using upstream ID, agent_key is not deterministic from agent_name
  • ⚠️ Migration confusion: Unclear which strategy to use
Required Decision:
  • Option 1: Always use hash-based (deterministic from agent_name)
  • Option 2: Use upstream ID if available, hash otherwise (requires mapping logic)
  • Option 3: Use hash for all, store upstream ID separately in upstream_agent_id
Recommendation: Option 3 (hash-based, deterministic, upstream ID preserved separately)

πŸ”΄ RED FLAG #3: Join Strategy Unclear

Issue

No explicit join strategy documented for dim_agent_hierarchy β†’ dim_agent_identity Current State:
  • dim_agent_hierarchy.agent_id: STRING (can be numeric β€œ721995” or minted β€œNAME:abc123”)
  • dim_agent_identity.upstream_agent_id: STRING (nullable, stores upstream ID)
  • dim_agent_identity.agent_key: STRING (hash-based, deterministic)
Semantic View Join (Line 371):
LEFT JOIN agent_hierarchy ah
  ON s1c.tenant_id = ah.tenant_id  -- ❌ tenant_id doesn't exist in hierarchy
  AND s1c.agent_id = ah.agent_id
Problem:
  • How to join dim_agent_identity to dim_agent_hierarchy?
  • Should join be: agent_key β†’ upstream_agent_id β†’ dim_agent_hierarchy.agent_id?
  • Or: agent_name match?
  • Or: Direct agent_id match (if hierarchy uses minted IDs)?
Required Clarification:
  1. What is the join key between dim_agent_identity and dim_agent_hierarchy?
  2. Does dim_agent_hierarchy.agent_id contain:
    • Upstream numeric IDs (721995, 668078)?
    • Minted hash IDs (β€œNAME:abc123”)?
    • Both?
  3. Should semantic view join via agent_key β†’ upstream_agent_id β†’ dim_agent_hierarchy.agent_id?

🟑 YELLOW FLAG #1: agent_key Hash Length Mismatch

Issue

Legacy agent_id uses 12 hex chars, agent_key uses 16 hex chars Legacy (mint_agent_id_from_name):
return f"NAME:{hash_obj.hexdigest()[:12]}"  # 12 chars
New (agent_key):
CONCAT('AGENT:', TO_HEX(SHA256(...))[:16])  # 16 chars
Impact:
  • ⚠️ Different lengths: agent_id = β€œNAME:abc123def456” (18 chars), agent_key = β€œAGENT:abc123def4567890” (22 chars)
  • ⚠️ Not directly comparable: Cannot use agent_id as fallback for agent_key in hash
Status: βœ… ACCEPTABLE - Different prefixes and lengths prevent confusion, but should be documented

🟑 YELLOW FLAG #2: Migration Script Syntax Error

Issue

BigQuery array slicing syntax [:16] may not work Location: docs/IDENTITY_STABILITY_EXECUTION_PLAN.md (Line 102) Code:
CONCAT('AGENT:', TO_HEX(SHA256(UPPER(TRIM(agent_name))))[:16]) AS agent_key
Problem: BigQuery doesn’t support Python-style array slicing [:16] Correct Syntax:
CONCAT('AGENT:', SUBSTR(TO_HEX(SHA256(UPPER(TRIM(agent_name)))), 1, 16)) AS agent_key
Impact: ⚠️ Migration script will fail at runtime Required Fix: Update all SQL snippets to use SUBSTR() instead of [:16]

🟑 YELLOW FLAG #3: Missing Migration File

Issue

Migration script referenced but not created Location: docs/IDENTITY_STABILITY_EXECUTION_PLAN.md (Step 1.3, Step 2.3) Files Referenced:
  • integration/bigquery/sql/migrations/add_tenant_id_to_business_mapping.sql (NOT CREATED)
  • integration/bigquery/sql/migrations/populate_agent_identity.sql (NOT CREATED)
Status: ⚠️ MISSING - Need to create these files

βœ… ALIGNMENT CHECKS

Schema Alignment

  • βœ… dim_business_mapping_schema.json includes tenant_id (updated)
  • βœ… dim_agent_identity_schema.json created with correct fields
  • ❌ dim_agent_hierarchy_schema.json missing tenant_id (RED FLAG #1)

MERGE Key Alignment

  • βœ… dim_business_mapping: MERGE key (tenant_id, normalized_name) documented
  • βœ… dim_agent_identity: MERGE key (tenant_id, agent_key) documented
  • βœ… Both immutable fields (business_id, agent_key) correctly marked as never updated

Data Type Alignment

  • βœ… stage3_snapshots.agent_id: STRING
  • βœ… dim_agent_hierarchy.agent_id: STRING
  • βœ… Type-compatible for joins

Hash Signature Alignment

  • βœ… stage3_component_id hash documented in both view and tests
  • βœ… stage1_row_id hash documented in both view and tests
  • ⚠️ agent_key generation inconsistent (RED FLAG #2)

πŸ”΄ CRITICAL ACTION ITEMS

Immediate (Blocking)

  1. Fix RED FLAG #1: Add tenant_id to dim_agent_hierarchy schema OR remove from semantic view
  2. Fix RED FLAG #2: Standardize agent_key generation strategy (recommend hash-based)
  3. Fix RED FLAG #3: Document explicit join strategy for dim_agent_hierarchy β†’ dim_agent_identity

Short-term (Required)

  1. Fix YELLOW FLAG #2: Update SQL syntax ([:16] β†’ SUBSTR(..., 1, 16))
  2. Fix YELLOW FLAG #3: Create migration SQL files
  3. Clarify: How to handle hardcoded numeric IDs (721995, 668078) in dim_agent_hierarchy

Summary: Red Flags Found

FlagSeverityIssueImpact
#1πŸ”΄ CRITICALdim_agent_hierarchy missing tenant_idView will fail, tenant isolation broken
#2πŸ”΄ CRITICALagent_key generation inconsistentNon-deterministic keys, migration confusion
#3πŸ”΄ CRITICALJoin strategy unclearCannot implement semantic view correctly
#4🟑 HIGHSQL syntax error ([:16])Migration script will fail
#5🟑 MEDIUMMigration files missingCannot execute plan
Status: ⚠️ PLAN NOT READY FOR EXECUTION - Must fix RED FLAGS first