Skip to main content

Follow-up Issue: Bring agent_profiles create-mode into changeset bulk import pipeline

Issue Summary

The current bulk import pipeline (POST /api/v1/admin/agent-mapping/bulk/upload, /validate, /commit) only supports update-mode for agent_profiles (requires agent_entity_id to exist). Path A onboarding (new agents without agent_entity_id) currently uses the “Bulk Bootstrap Agents” UI, which writes directly to tables without changeset tracking. Goal: Extend bulk import to support create-mode for agent_profiles, bringing it to audit-grade parity with the changeset pipeline.

Current State

Bulk Import (changeset pipeline)

  • ✅ Uses changeset tracking (config_bulk_import_changesets table)
  • ✅ Validate → Review → Commit workflow
  • ✅ Rollback capability (changesets can be marked invalid)
  • ✅ Audit trail (who/what/when)
  • Only supports update-mode (requires existing agent_entity_id)

Bulk Bootstrap (direct writes)

  • ✅ Creates new agent_entity_id (UUIDv4 or UUIDv5)
  • ✅ Idempotent (via idempotency-key or deterministic UUIDv5)
  • ✅ Honors tenant/org scoping and effective_start_date
  • No changeset tracking (direct INSERT/MERGE)
  • No validation/review/commit workflow
  • No rollback capability
  • No audit trail (no changeset record)

Acceptance Criteria

1. Validation Phase (create-mode support)

  • validate_upload() detects create-mode when agent_entity_id is missing/blank
  • For create-mode rows:
    • Resolve agent_entity_id using same precedence as Bulk Bootstrap:
      • If agent_id provided → deterministic UUIDv5 (idempotent)
      • If agent_id missing → generate UUIDv4 (with idempotency-key)
    • Validate uniqueness within upload (dedupe key logic)
    • Validate against existing BQ agents (prevent duplicates)
    • Return validation report with agent_entity_id pre-resolved for each row
  • Changeset includes agent_entity_id for all rows (both create and update modes)

2. Review Phase (no changes)

  • UI displays resolved agent_entity_id for create-mode rows
  • UI distinguishes create vs update rows (visual indicator)
  • Changeset hash includes agent_entity_id (deterministic)

3. Commit Phase (create-mode writes)

  • commit_changeset_writes() handles create-mode rows:
    • For rows with resolved agent_entity_id but no existing identity:
      • Call create_agent_profile_identity() (same as Bulk Bootstrap)
      • Call upsert_agent_profile_contact() (same as Bulk Bootstrap)
    • For rows with existing agent_entity_id:
      • Use existing update logic (no changes)
  • All writes are atomic within changeset (rollback on failure)
  • Changeset status updated to COMMITTED only after all writes succeed

4. Idempotency

  • Create-mode rows use stable idempotency-key:
    • Format: sha256(tenant_id|org_id|normalized_email|effective_start_date|first_name|last_name)
    • Stored in changeset metadata or idempotency table
  • Re-running same CSV with same changeset hash → no duplicates
  • Idempotency-key used for UUIDv4 path (when agent_id missing)

5. Rollback Capability

  • Changeset can be marked ROLLED_BACK (new status)
  • Rollback operation:
    • For create-mode rows: Delete identity + contact rows (by agent_entity_id)
    • For update-mode rows: Revert to previous SCD2 window (existing logic)
  • Rollback is atomic (all-or-nothing)

6. Audit Trail

  • Changeset record includes:
    • created_count (new identities created)
    • updated_count (existing identities updated)
    • contact_created_count (new contacts created)
    • contact_updated_count (existing contacts updated)
  • Changeset metadata includes idempotency-keys for create-mode rows
  • All writes include created_by from changeset committed_by

7. Backward Compatibility

  • Update-mode continues to work (no breaking changes)
  • Existing changesets remain valid
  • API contract unchanged (same request/response shapes)

Implementation Notes

Key Functions to Modify

  1. api/services/bulk_import.py:
    • validate_upload(): Add create-mode detection and agent_entity_id resolution
    • _build_dedupe_key_agent_profiles(): Already supports missing agent_entity_id (no changes needed)
  2. api/bigquery/phase4_bulk_import_queries.py:
    • commit_changeset_writes(): Add create-mode branch (call create_agent_profile_identity() + upsert_agent_profile_contact())
  3. api/bigquery/phase4_profile_queries.py:
    • Reuse existing create_agent_profile_identity() and upsert_agent_profile_contact() (no changes)
  4. api/services/identity_seed.py:
    • Reuse existing resolve_agent_identity() for UUIDv5 path (no changes)

Testing Requirements

  • Unit tests: Create-mode validation logic
  • Integration tests: Full create-mode workflow (upload → validate → commit)
  • Idempotency tests: Re-run same CSV → no duplicates
  • Rollback tests: Rollback create-mode changeset → identities deleted
  • Backward compatibility tests: Update-mode still works
  • api/services/bulk_import.py (validation logic)
  • api/bigquery/phase4_bulk_import_queries.py (commit logic)
  • api/bigquery/phase4_profile_queries.py (identity/contact creation)
  • api/services/identity_seed.py (UUIDv5 resolution)
  • api/routes/bulk_import.py (endpoints)
  • dashboard/src/features/agentMapping/components/BulkImportWizard.tsx (UI)

Priority

Medium - Path A onboarding works via Bulk Bootstrap, but lacks audit-grade tracking. This is a quality-of-life improvement for compliance/audit requirements.

Dependencies

  • Bulk Bootstrap must remain functional (current Path A solution)
  • Changeset pipeline must remain backward compatible