Business-Level Attribution Parity Analysis (September 2025)

Summary

After fixing business name normalization (whitespace collapse), Repo B now correctly matches businesses with Repo A. However, real attribution differences remain, causing a $1,925.71 agent/owner split delta.

Root Cause: Normalization Mismatch (FIXED)

Problem

Repo A stores business names with single spaces (e.g., "CANYON PLUMBING LLC")
Repo B was storing business names with double spaces from canonical input (e.g., "canyon plumbing LLC")
This caused businesses to appear as “MISSING_IN_REPO_A” or “MISSING_IN_REPO_B” in comparisons

Fix

Added normalize_business_name_for_storage() function that:

Collapses multiple whitespace to single space
Uppercases the result
Matches Repo A’s storage format exactly

Location: repo_b/output_adapter.py lines 214-228 Applied to: All four places where business_name is set in build_stage3_like_dataframe():

Agent commission rows (line 632)
Owner residual rows (line 753)
Missing agent mapping → Owner rows (line 840)
Richard Ballard owner rows (line 916)

Remaining Attribution Differences

Top Businesses by Delta (September 2025)

Business Name	Repo A Agent	Repo B Agent	Agent Delta	Root Cause Class
CANYON PLUMBING LLC	$553.84	$2,408.00	$1,854.16	ATTRIBUTION_DIFF
HAWAIIAN HUT INC	$162.00	$198.00	$36.00	ATTRIBUTION_DIFF
HAWAIIAN HUT II INC	$90.00	$110.00	$20.00	ATTRIBUTION_DIFF
C&J LIQUORS LLC DBA SPIRITS WORLD	$45.00	$55.00	$10.00	ATTRIBUTION_DIFF
NUG	$3,102.45	$3,108.00	$5.55	ATTRIBUTION_DIFF (within tolerance)

Total Deltas

Agent Delta: $1,925.71 (Repo B > Repo A)
Owner Delta: -$1,925.71 (Repo B < Repo A)
Net: $0.00 (invariant preserved: gross = agent + owner + chargebacks)

Root Cause Classification

Primary Issue: CANYON PLUMBING LLC ($1,854.16 delta)

Hypothesis: Repo B is attributing Canyon Plumbing to agents when Repo A attributes it to owner, OR Repo B is using different PEPM rates/row counts. Investigation Needed:

Check Repo A’s agent2_pepm mapping for Canyon Plumbing
Check Repo B’s agent2_pepm mapping for Canyon Plumbing
Compare row_count/employee_count between Repo A and Repo B
Verify PEPM rate calculation (normalized_pepm = pepm_rate * 12 / pay_periods)

Secondary Issues: Smaller Deltas ( $36,$ 20, $10)

These are likely due to:

Fuzzy match threshold differences: Repo B uses ≥95% similarity; Repo A may use different threshold
Agent2/downline attribution differences: Repo B may be matching to different agents than Repo A
Row count differences: Repo B uses row_count (post-pairing transactions) for commission; Repo A may use different basis

Next Steps

Task 1: Investigate Canyon Plumbing Attribution

Query Repo A’s stage3_snapshots for Canyon Plumbing (agent vs owner breakdown)
Query Repo B’s Stage3_Full_Detail CSV for Canyon Plumbing
Compare:
- Which agents are attributed
- PEPM rates used
- Row counts / employee counts
- Commission calculation basis

Task 2: Verify Agent Matching Logic

Check if Repo B’s fuzzy matching is finding different agents than Repo A
Verify agent2_pepm normalization matches Repo A exactly
Check if Repo B is defaulting to owner when Repo A has agents (or vice versa)

Task 3: Regression Guardrail

Add a regression check for Sep 2025:

ABS(agent_delta) <= 0.01 (currently $1,925.71, so FAIL)
ABS(owner_delta) <= 0.01 (currently -$1,925.71, so FAIL)
Keep invariant enforced: gross = agent + owner + chargebacks ✅ (PASS)

Files Modified

repo_b/output_adapter.py:
- Added normalize_business_name_for_storage() function (lines 214-228)
- Updated 4 locations to normalize business_name before storage (lines 632, 753, 840, 916)

Parity Status

June 2025: FAIL (agent delta: $255.72)
July 2025: FAIL (agent delta: $1,130.61)
August 2025: FAIL (agent delta: $1,925.71)
September 2025: FAIL (agent delta: $1,925.71) + 2 UNDERFUNDED_TPA anomalies (expected)

All months: Gross, chargebacks, and net payout match perfectly ✅ All months: Agent/owner split has deltas (needs investigation) ❌

Conclusion

The normalization fix successfully resolved the business name matching issue. The remaining $1,925.71 delta is a real attribution difference that requires investigation into:

Agent matching logic (fuzzy vs exact)
PEPM rate application
Row count vs employee count basis for commission calculation
Agent2/downline attribution rules

The invariant gross = agent + owner + chargebacks is preserved, indicating the math is correct but the attribution rules differ between Repo A and Repo B.

Start Here

Operate & Support

Reference

Build & Integrate

BUSINESS ATTRIBUTION PARITY ANALYSIS

Business-Level Attribution Parity Analysis (September 2025)

Summary

Root Cause: Normalization Mismatch (FIXED)

Problem

Fix

Remaining Attribution Differences

Top Businesses by Delta (September 2025)

Total Deltas

Root Cause Classification

Primary Issue: CANYON PLUMBING LLC ($1,854.16 delta)

Secondary Issues: Smaller Deltas ( $36,$ 20, $10)

Next Steps

Task 1: Investigate Canyon Plumbing Attribution

Task 2: Verify Agent Matching Logic

Task 3: Regression Guardrail

Files Modified

Parity Status

Conclusion

Start Here

Operate & Support

Reference

Build & Integrate

​Business-Level Attribution Parity Analysis (September 2025)

​Summary

​Root Cause: Normalization Mismatch (FIXED)

​Problem

​Fix

​Remaining Attribution Differences

​Top Businesses by Delta (September 2025)

​Total Deltas

​Root Cause Classification

​Primary Issue: CANYON PLUMBING LLC ($1,854.16 delta)

​Secondary Issues: Smaller Deltas (36,36, 36,20, $10)

​Next Steps

​Task 1: Investigate Canyon Plumbing Attribution

​Task 2: Verify Agent Matching Logic

​Task 3: Regression Guardrail

​Files Modified

​Parity Status

​Conclusion

Business-Level Attribution Parity Analysis (September 2025)

Summary

Root Cause: Normalization Mismatch (FIXED)

Problem

Fix

Remaining Attribution Differences

Top Businesses by Delta (September 2025)

Total Deltas

Root Cause Classification

Primary Issue: CANYON PLUMBING LLC ($1,854.16 delta)

Secondary Issues: Smaller Deltas ( $36,$ 20, $10)

Next Steps

Task 1: Investigate Canyon Plumbing Attribution

Task 2: Verify Agent Matching Logic

Task 3: Regression Guardrail

Files Modified

Parity Status

Conclusion