QA Audit: Member-ID-Based MIXED_CADENCE Normalization Proof
Date: 2025-01-XXFile:
repo_b/output_adapter.pyStatus: ✅ PASS (with fixes applied)
File Integrity Check
✅ PASS: File integrity verified- NUL bytes: 0
- UTF-8 encoding: Valid
- File size: 1262 lines
Audit Results
A) Commission Math Audit
✅ PASS: Noemployee_count usage in commission calculation
- Line 478:
row_count = len(agent_group_df)(per agent, ROW-BASED) - Line 479:
employee_count = agent_group_df["member_id"].nunique()(KPI ONLY, explicitly marked) - Line 510:
agent_total = normalized_rate * Decimal(str(_row_count_for_commission))(usesrow_count, NOTemployee_count) - Lines 481-490: Strict-mode assertions prevent
employee_countusage
B) Cadence Inference Scope Audit
✅ FIXED: Cadence inference now computed at (business, agent, month) scope- Call site: Line 461 (inside agent loop)
- Input:
agent_group_df(filtered per agent viaagent_hierarchy→agent_id→source_agent_id) - Lines 437-458: Agent filtering logic ensures per-agent scope
C) Member ID vs member_key Audit
✅ PASS: Uses Member ID (NOT member_key)- Line 250:
member_counts = group_df["member_id"].value_counts() - Line 249: Assertion ensures
member_idcolumn exists - Line 253:
member_id_duplication_mapbuilt frommember_idcolumn
D) Float Contamination Audit
⚠️ FIXED: Removed float conversion in PEPM rate loading- Line 128: Changed from
.astype(float)to.str.strip()(preserves as string for Decimal parsing) - Lines 906, 925-928, 985, 995-996, 1001: Float conversions in reporting functions (display only, not commission math) ✅
Implementation Fixes Applied
Fix 1: Cadence Inference Logic - “if ANY” Rule
Location:repo_b/output_adapter.py:274-327
Before: Used “most common duplication pattern”After: Uses “if ANY Member ID appears N times” rule:
max_dup_count >= 5→ Weekly (52 PPY)max_dup_count == 4→ 4×/month (48 PPY) or Weekly (52 PPY) based on spacingmax_dup_count == 2→ Bi-weekly (26 PPY)max_dup_count == 1→ Monthly (12 PPY)
Fix 2: PEPM Rate Float Contamination
Location:repo_b/output_adapter.py:128
Before: agent2_pepm["pepm_rate"] = ...astype(float)After:
agent2_pepm["pepm_rate"] = ...str.strip() (preserves as string for Decimal parsing)
Fix 3: Strict Guardrails Added
Location:repo_b/output_adapter.py:481-492, 504-509
- Line 491-492: Assert
member_idcolumn exists - Line 504-505: Assert
row_countis int >= 0 - Line 507-509: Explicit
_row_count_for_commissionvariable to preventemployee_countusage
Exact Call Site Evidence
Cadence Inference Function
Location:repo_b/output_adapter.py:207-339
Call Site (Inside Agent Loop)
Location:repo_b/output_adapter.py:461
- Line 432: Agent loop starts:
for _, agent_row in agent_matches.iterrows(): - Line 437-458: Agent filtering:
agent_group_df = group_df[group_df["source_agent_id"].astype(str) == agent_id] - Line 461: Cadence inference called with
agent_group_df(per-agent rows) - Line 478:
row_count = len(agent_group_df)(per-agent row count)
Commission Calculation Evidence
Location:repo_b/output_adapter.py:494-511
Audit Fields Evidence
All required fields present on agent rows (Lines 538-550):- ✅
inferred_pay_periods_per_year(int) - ✅
inferred_cadence_label(str) - ✅
mixed_cadence_flag(bool) - ✅
source_pepm_monthly(CENT string) - ✅
rate_per_pay_period(CENT string) - ✅
row_count_basis(int) - ✅
member_id_duplication_map(JSON: Member ID → count) - ✅
member_dup_count_distribution(JSON:{1×, 2×, 4×, 4×+})
Regression Tests
Location:tests/test_cadence_inference.py
✅ Test 1: 1 member labeled Monthly but appears 2 rows → PPY=26, mixed_cadence_flag=True✅ Test 2: 1 member appears 4 rows → PPY=48 by default
✅ Test 3: Decimal purity assertions (no floats in outputs)
Validation Script
Location:scripts/validate_canyon_plumbing_september.py
Status: ⏳ PENDING EXECUTION
Expected Output (per agent):
member_id_duplication_map: JSON mapping Member ID → row countinferred_ppy: Inferred pay_periods_per_yearinferred_cadence_label: Cadence labelnormalized_rate: Rate per pay period (CENT-quantized)agent_commission: Agent total commission
September 2025 Parity Gate
Status: ⏳ PENDING EXECUTION Note: Validation script currently fails on 4TK Holdings (expected UNDERFUNDED_TPA case). The hard error at Line 571-577 should be replaced with soft-fail anomaly emission (separate fix). Commands:python scripts/validate_canyon_plumbing_september.py(may fail on 4TK - expected)python -m repo_b.run --period 2025-09 --canonical data\canonical\2025-09.json --config configs\repo_b_config.jsonpython scripts/publish_repo_b_to_shadow.py 2025-09python scripts/run_jun_sep_parity_validation.py(Sep only)
Known Issues
⚠️ UNDERFUNDED_TPA Hard Error: Line 571-577 raisesValueError for negative owner residual. This should be soft-fail anomaly emission (separate fix, not part of cadence inference scope).
Summary
✅ File Integrity: PASS (NUL bytes: 0, UTF-8: Valid)✅ Commission Math: PASS (ROW-based, never employee_count)
✅ Cadence Scope: FIXED (per business, agent, month)
✅ Member ID Usage: PASS (uses member_id, not member_key)
✅ Cadence Logic: FIXED (“if ANY” rule implemented)
✅ Float Contamination: FIXED (PEPM rate preserved as string)
✅ Guardrails: PASS (assertions added)
✅ Audit Fields: PASS (all required fields present, including
member_id_duplication_map)⏳ Validation Run: BLOCKED (4TK UNDERFUNDED_TPA hard error - separate fix needed)
⏳ Parity Gate: PENDING
Changes Made
-
Fixed cadence inference logic to use “if ANY” rule (Lines 274-330)
max_dup_count >= 5→ Weekly (52 PPY)max_dup_count == 4→ 4×/month (48 PPY) or Weekly (52 PPY) based on spacingmax_dup_count == 2→ Bi-weekly (26 PPY)max_dup_count == 1→ Monthly (12 PPY)
-
Removed float conversion in PEPM rate loading (Line 128)
- Changed from
.astype(float)to.str.strip()(preserves as string for Decimal parsing)
- Changed from
-
Added strict guardrails for employee_count prevention (Lines 481-492, 507-512)
- Assert
row_countis int >= 0 - Assert
member_idcolumn exists - Explicit
_row_count_for_commissionvariable to preventemployee_countusage
- Assert
-
Updated regression tests to match “if ANY” rule (
tests/test_cadence_inference.py) -
Added
member_id_duplication_mapaudit field (Lines 253, 336, 522, 554)
Exact Line Numbers (Proof)
Cadence Inference Function
- Definition: Line 207-339
- Member ID usage: Line 250 (
member_counts = group_df["member_id"].value_counts()) - “if ANY” rule: Lines 279-330 (
max_dup_count = int(member_counts.max()))
Call Site (Per Agent)
- Agent loop start: Line 432
- Agent filtering: Lines 437-458
- Cadence inference call: Line 461 (
infer_cadence_from_member_duplication(agent_group_df, period_label)) - Row count calculation: Line 478 (
row_count = len(agent_group_df))
Commission Calculation
- PEPM parsing: Line 496 (
pepm_rate = _d_raw(pepm_rate_raw).quantize(CENT)) - Normalized rate: Line 502 (
normalized_rate = (pepm_rate * Decimal("12")) / Decimal(str(pay_periods))) - Commission: Line 513 (
agent_total = normalized_rate * Decimal(str(_row_count_for_commission)))
Audit Fields
- All fields: Lines 546-554
- member_id_duplication_map: Line 554
Next Steps
- Restore UNDERFUNDED_TPA soft-fail (separate fix - Line 571-577)
- Run validation:
python scripts/validate_canyon_plumbing_september.py(after UNDERFUNDED_TPA fix) - Run parity gate: Execute Sep 2025 run + publish + parity validation