Dashboard Stability Lock-In
Date: January 2026Status: Active
Purpose: Prevent dashboards and agent detail/export from “going dark” again
What Actually Broke
Impact: The outage affected agent detail/export endpoints (/api/v1/agent/{agent_name}/stage1/detail and /api/v1/agent/{agent_name}/stage1/export). Stage-v5 KPI cards/charts were NOT affected - this was not a data-truth failure. Stage-v5 data was correct throughout the outage.
1. BigQuery Client Initialization Bug
Symptom: Agent Detail Credit Report returned empty results (0 rows, valid 200 response) Root Cause:get_agent_stage1_detail() checked module-level client variable (always None) instead of calling _get_bq_client()
Why Agent Detail/Export Appeared “Down”:
- Queries never executed (client was None)
- Returned empty results silently
- Stage-v5 data was correct, but Detail Credit Report showed empty
- Users saw “No credit rows found” even when data existed
bq_client = _get_bq_client() pattern
Prevention:
- Test:
test_get_agent_stage1_detail_calls_get_bq_client()ensures function calls_get_bq_client() - CI Guard:
.github/workflows/lint_bq_client_pattern.ymlfails ifif client is None:pattern found in query functions
2. Missing Export Dependencies
Symptom: Excel export returned 500 error - “No module named ‘reportlab’” Root Cause:reportlab and openpyxl not in api/requirements-api.txt
Why Agent Detail/Export Appeared “Down”:
- Export endpoint crashed with 500 error
- Users couldn’t download reports
- Frontend showed error message
reportlab>=4.0.0 and openpyxl>=3.1.0 to api/requirements-api.txt
Prevention:
- Test:
test_export_returns_200_with_valid_excel()verifies export returns 200 with valid Excel bytes - Test:
test_export_handles_missing_dependencies_gracefully()ensures graceful error handling
3. Date Normalization Inconsistency
Symptom: Detail fetch and export could normalize dates differently (potential data mismatch) Root Cause: Code duplication -normalizePeriodInput() defined in AgentDetail.tsx, used inconsistently
Why Agent Detail/Export Appeared “Down”:
- Edge cases (e.g.,
2025-9vs2025-09) could cause mismatched API calls - Export might query different date range than detail view
- Users might see inconsistent data
dashboard/src/lib/dateUtils.ts as single source of truth, refactored AgentDetail.tsx to import from it
Prevention:
- Test:
test_detail_fetch_uses_dateUtils()ensures detail fetch uses shared utility - Test:
test_export_uses_dateUtils()ensures export uses shared utility - Test: Edge cases normalize correctly (
2025-9 → 2025-09)
Regression Prevention Tests
| Test File | Test Case | Prevents |
|---|---|---|
api/tests/test_bigquery_client_pattern.py | test_get_agent_stage1_detail_calls_get_bq_client | Client init bug (silent empty results) |
api/tests/test_bigquery_client_pattern.py | test_get_agent_stage1_detail_handles_none_client | Client init bug (graceful handling + no silent empty) |
api/tests/test_agent_dashboard_routes.py | test_export_returns_200_with_valid_excel | Missing dependencies (500 errors) + Content-Disposition header pin |
api/tests/test_agent_dashboard_routes.py | test_export_handles_missing_dependencies_gracefully | Missing dependencies (graceful errors) |
dashboard/src/lib/__tests__/dateUtils.test.ts | normalizePeriodInput edge cases | Date normalization bugs |
dashboard/src/components/__tests__/AgentDetail.test.tsx | test_detail_fetch_uses_dateUtils | Code duplication |
dashboard/src/components/__tests__/AgentDetail.test.tsx | test_export_uses_dateUtils | Code duplication |
CI/CD Guards
- BigQuery Client Pattern Check:
.github/workflows/lint_bq_client_pattern.ymlfails ifif client is None:pattern found (allows escape hatch comment). Only scans production code (api/bigquery/queries.py), tests excluded. - Export Dependency Test: Route tests verify export returns 200 with valid Excel bytes and pinned Content-Disposition header
- No Silent Empty Check: Test verifies distinct log code emitted when client unavailable (Part 1.3)
Important Notes
Detail Credit Report is Legacy
The Detail Credit Report (/api/v1/agent/{agent_name}/stage1/detail) is a legacy implementation that:
- Queries Stage-1 data directly (with Stage-3 RBAC filtering)
- Will be replaced by engine-backed Stage-v5 detail endpoint in a future phase
- Current implementation is stable and will not be rewired until new engine output is ready
Watch Items
Prevent regression on these critical behaviors:-
HTTPException wrapping in BigQuery queries:
get_agent_stage1_detail()intentionally wraps internal errors (e.g., RuntimeError) as HTTPException for consistent API error handling. If this behavior changes, update both the implementation andtest_get_agent_stage1_detail_calls_get_bq_client()test expectation together. -
Locale-safe month name assertions: Tests for
generateMonthOptions()must use locale-safe assertions (non-empty string labels, values 1..12) instead of hardcoded English month names (e.g., “January”, “December”) to prevent i18n regressions.
Maintenance
- Run tests before every deployment:
pytest api/tests/test_bigquery_client_pattern.py api/tests/test_agent_dashboard_routes.py - Run frontend tests:
npm testorvitestin dashboard directory - Verify CI guards pass on every PR
- Update this document if new stability issues are found
Why Agent Detail/Export Appeared “Down” Even Though Stage-v5 Data Was Correct
The outage was not a data problem - Stage-v5 data was correct throughout. Stage-v5 KPI cards/charts were NOT affected - this was an agent detail/export endpoint issue only. The problems were:- Client initialization bug: Queries never executed, returned empty results silently
- Missing dependencies: Export endpoint crashed with 500 error
- Date normalization: Potential mismatches between detail and export