Business-Level Attribution Parity Analysis (September 2025)
Summary
After fixing business name normalization (whitespace collapse), Repo B now correctly matches businesses with Repo A. However, real attribution differences remain, causing a $1,925.71 agent/owner split delta.Root Cause: Normalization Mismatch (FIXED)
Problem
- Repo A stores business names with single spaces (e.g.,
"CANYON PLUMBING LLC") - Repo B was storing business names with double spaces from canonical input (e.g.,
"canyon plumbing LLC") - This caused businesses to appear as “MISSING_IN_REPO_A” or “MISSING_IN_REPO_B” in comparisons
Fix
Addednormalize_business_name_for_storage() function that:
- Collapses multiple whitespace to single space
- Uppercases the result
- Matches Repo A’s storage format exactly
repo_b/output_adapter.py lines 214-228
Applied to: All four places where business_name is set in build_stage3_like_dataframe():
- Agent commission rows (line 632)
- Owner residual rows (line 753)
- Missing agent mapping → Owner rows (line 840)
- Richard Ballard owner rows (line 916)
Remaining Attribution Differences
Top Businesses by Delta (September 2025)
| Business Name | Repo A Agent | Repo B Agent | Agent Delta | Root Cause Class |
|---|---|---|---|---|
| CANYON PLUMBING LLC | $553.84 | $2,408.00 | $1,854.16 | ATTRIBUTION_DIFF |
| HAWAIIAN HUT INC | $162.00 | $198.00 | $36.00 | ATTRIBUTION_DIFF |
| HAWAIIAN HUT II INC | $90.00 | $110.00 | $20.00 | ATTRIBUTION_DIFF |
| C&J LIQUORS LLC DBA SPIRITS WORLD | $45.00 | $55.00 | $10.00 | ATTRIBUTION_DIFF |
| NUG | $3,102.45 | $3,108.00 | $5.55 | ATTRIBUTION_DIFF (within tolerance) |
Total Deltas
- Agent Delta: $1,925.71 (Repo B > Repo A)
- Owner Delta: -$1,925.71 (Repo B < Repo A)
- Net: $0.00 (invariant preserved: gross = agent + owner + chargebacks)
Root Cause Classification
Primary Issue: CANYON PLUMBING LLC ($1,854.16 delta)
Hypothesis: Repo B is attributing Canyon Plumbing to agents when Repo A attributes it to owner, OR Repo B is using different PEPM rates/row counts. Investigation Needed:- Check Repo A’s agent2_pepm mapping for Canyon Plumbing
- Check Repo B’s agent2_pepm mapping for Canyon Plumbing
- Compare row_count/employee_count between Repo A and Repo B
- Verify PEPM rate calculation (normalized_pepm = pepm_rate * 12 / pay_periods)
Secondary Issues: Smaller Deltas (20, $10)
These are likely due to:- Fuzzy match threshold differences: Repo B uses ≥95% similarity; Repo A may use different threshold
- Agent2/downline attribution differences: Repo B may be matching to different agents than Repo A
- Row count differences: Repo B uses
row_count(post-pairing transactions) for commission; Repo A may use different basis
Next Steps
Task 1: Investigate Canyon Plumbing Attribution
- Query Repo A’s stage3_snapshots for Canyon Plumbing (agent vs owner breakdown)
- Query Repo B’s Stage3_Full_Detail CSV for Canyon Plumbing
- Compare:
- Which agents are attributed
- PEPM rates used
- Row counts / employee counts
- Commission calculation basis
Task 2: Verify Agent Matching Logic
- Check if Repo B’s fuzzy matching is finding different agents than Repo A
- Verify agent2_pepm normalization matches Repo A exactly
- Check if Repo B is defaulting to owner when Repo A has agents (or vice versa)
Task 3: Regression Guardrail
Add a regression check for Sep 2025:ABS(agent_delta) <= 0.01(currently $1,925.71, so FAIL)ABS(owner_delta) <= 0.01(currently -$1,925.71, so FAIL)- Keep invariant enforced:
gross = agent + owner + chargebacks✅ (PASS)
Files Modified
repo_b/output_adapter.py:- Added
normalize_business_name_for_storage()function (lines 214-228) - Updated 4 locations to normalize
business_namebefore storage (lines 632, 753, 840, 916)
- Added
Parity Status
- June 2025: FAIL (agent delta: $255.72)
- July 2025: FAIL (agent delta: $1,130.61)
- August 2025: FAIL (agent delta: $1,925.71)
- September 2025: FAIL (agent delta: $1,925.71) + 2 UNDERFUNDED_TPA anomalies (expected)
Conclusion
The normalization fix successfully resolved the business name matching issue. The remaining $1,925.71 delta is a real attribution difference that requires investigation into:- Agent matching logic (fuzzy vs exact)
- PEPM rate application
- Row count vs employee count basis for commission calculation
- Agent2/downline attribution rules
gross = agent + owner + chargebacks is preserved, indicating the math is correct but the attribution rules differ between Repo A and Repo B.