Skip to main content

Business-Level Attribution Parity Analysis (September 2025)

Summary

After fixing business name normalization (whitespace collapse), Repo B now correctly matches businesses with Repo A. However, real attribution differences remain, causing a $1,925.71 agent/owner split delta.

Root Cause: Normalization Mismatch (FIXED)

Problem

  • Repo A stores business names with single spaces (e.g., "CANYON PLUMBING LLC")
  • Repo B was storing business names with double spaces from canonical input (e.g., "canyon plumbing LLC")
  • This caused businesses to appear as “MISSING_IN_REPO_A” or “MISSING_IN_REPO_B” in comparisons

Fix

Added normalize_business_name_for_storage() function that:
  1. Collapses multiple whitespace to single space
  2. Uppercases the result
  3. Matches Repo A’s storage format exactly
Location: repo_b/output_adapter.py lines 214-228 Applied to: All four places where business_name is set in build_stage3_like_dataframe():
  • Agent commission rows (line 632)
  • Owner residual rows (line 753)
  • Missing agent mapping → Owner rows (line 840)
  • Richard Ballard owner rows (line 916)

Remaining Attribution Differences

Top Businesses by Delta (September 2025)

Business NameRepo A AgentRepo B AgentAgent DeltaRoot Cause Class
CANYON PLUMBING LLC$553.84$2,408.00$1,854.16ATTRIBUTION_DIFF
HAWAIIAN HUT INC$162.00$198.00$36.00ATTRIBUTION_DIFF
HAWAIIAN HUT II INC$90.00$110.00$20.00ATTRIBUTION_DIFF
C&J LIQUORS LLC DBA SPIRITS WORLD$45.00$55.00$10.00ATTRIBUTION_DIFF
NUG$3,102.45$3,108.00$5.55ATTRIBUTION_DIFF (within tolerance)

Total Deltas

  • Agent Delta: $1,925.71 (Repo B > Repo A)
  • Owner Delta: -$1,925.71 (Repo B < Repo A)
  • Net: $0.00 (invariant preserved: gross = agent + owner + chargebacks)

Root Cause Classification

Primary Issue: CANYON PLUMBING LLC ($1,854.16 delta)

Hypothesis: Repo B is attributing Canyon Plumbing to agents when Repo A attributes it to owner, OR Repo B is using different PEPM rates/row counts. Investigation Needed:
  1. Check Repo A’s agent2_pepm mapping for Canyon Plumbing
  2. Check Repo B’s agent2_pepm mapping for Canyon Plumbing
  3. Compare row_count/employee_count between Repo A and Repo B
  4. Verify PEPM rate calculation (normalized_pepm = pepm_rate * 12 / pay_periods)

Secondary Issues: Smaller Deltas (36,36, 20, $10)

These are likely due to:
  • Fuzzy match threshold differences: Repo B uses ≥95% similarity; Repo A may use different threshold
  • Agent2/downline attribution differences: Repo B may be matching to different agents than Repo A
  • Row count differences: Repo B uses row_count (post-pairing transactions) for commission; Repo A may use different basis

Next Steps

Task 1: Investigate Canyon Plumbing Attribution

  1. Query Repo A’s stage3_snapshots for Canyon Plumbing (agent vs owner breakdown)
  2. Query Repo B’s Stage3_Full_Detail CSV for Canyon Plumbing
  3. Compare:
    • Which agents are attributed
    • PEPM rates used
    • Row counts / employee counts
    • Commission calculation basis

Task 2: Verify Agent Matching Logic

  1. Check if Repo B’s fuzzy matching is finding different agents than Repo A
  2. Verify agent2_pepm normalization matches Repo A exactly
  3. Check if Repo B is defaulting to owner when Repo A has agents (or vice versa)

Task 3: Regression Guardrail

Add a regression check for Sep 2025:
  • ABS(agent_delta) <= 0.01 (currently $1,925.71, so FAIL)
  • ABS(owner_delta) <= 0.01 (currently -$1,925.71, so FAIL)
  • Keep invariant enforced: gross = agent + owner + chargebacks ✅ (PASS)

Files Modified

  1. repo_b/output_adapter.py:
    • Added normalize_business_name_for_storage() function (lines 214-228)
    • Updated 4 locations to normalize business_name before storage (lines 632, 753, 840, 916)

Parity Status

  • June 2025: FAIL (agent delta: $255.72)
  • July 2025: FAIL (agent delta: $1,130.61)
  • August 2025: FAIL (agent delta: $1,925.71)
  • September 2025: FAIL (agent delta: $1,925.71) + 2 UNDERFUNDED_TPA anomalies (expected)
All months: Gross, chargebacks, and net payout match perfectly ✅ All months: Agent/owner split has deltas (needs investigation) ❌

Conclusion

The normalization fix successfully resolved the business name matching issue. The remaining $1,925.71 delta is a real attribution difference that requires investigation into:
  1. Agent matching logic (fuzzy vs exact)
  2. PEPM rate application
  3. Row count vs employee count basis for commission calculation
  4. Agent2/downline attribution rules
The invariant gross = agent + owner + chargebacks is preserved, indicating the math is correct but the attribution rules differ between Repo A and Repo B.