243 lines
7.7 KiB
Markdown
243 lines
7.7 KiB
Markdown
|
|
# TEST_REPORTING.md
|
||
|
|
## How Automated Test Suites Update the TUI Live Footer
|
||
|
|
|
||
|
|
**Audience**: developers writing or extending dolphin test suites
|
||
|
|
**Last updated**: 2026-04-05
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
`dolphin_tui_v5.py` displays a live test-results footer showing the latest automated test run
|
||
|
|
per category. The footer is driven by a single JSON file:
|
||
|
|
|
||
|
|
```
|
||
|
|
/mnt/dolphinng5_predict/run_logs/test_results_latest.json
|
||
|
|
```
|
||
|
|
|
||
|
|
Any test script, pytest fixture, or CI runner can update this file by calling the
|
||
|
|
`write_test_results()` function exported from `dolphin_tui_v5.py`, or by writing the JSON
|
||
|
|
directly with the correct schema.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## JSON Schema
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"_run_at": "2026-04-05T14:30:00",
|
||
|
|
"data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
|
||
|
|
"finance_fuzz": {"passed": 8, "total": 8, "status": "PASS"},
|
||
|
|
"signal_fill": {"passed": 6, "total": 6, "status": "PASS"},
|
||
|
|
"degradation": {"passed": 12, "total": 12, "status": "PASS"},
|
||
|
|
"actor": {"passed": 46, "total": 46, "status": "PASS"}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Fields
|
||
|
|
|
||
|
|
| Field | Type | Description |
|
||
|
|
|---|---|---|
|
||
|
|
| `_run_at` | ISO-8601 string | UTC timestamp of the run. Set automatically by `write_test_results()`. |
|
||
|
|
| `data_integrity` | CategoryResult | Arrow schema + HZ key integrity tests |
|
||
|
|
| `finance_fuzz` | CategoryResult | Financial edge cases: negative capital, zero price, NaN signals |
|
||
|
|
| `signal_fill` | CategoryResult | Signal path continuity: vel_div → posture → order |
|
||
|
|
| `degradation` | CategoryResult | Graceful-degradation tests: missing HZ keys, stale data |
|
||
|
|
| `actor` | CategoryResult | DolphinActor unit + integration tests |
|
||
|
|
|
||
|
|
### CategoryResult
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"passed": 15, // int or null if not yet run
|
||
|
|
"total": 15, // int or null if not yet run
|
||
|
|
"status": "PASS" // "PASS" | "FAIL" | "N/A"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Use `"status": "N/A"` and `null` counts for categories not yet automated.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Python API — Preferred Method
|
||
|
|
|
||
|
|
```python
|
||
|
|
# At the end of a test run (pytest session, script, etc.)
|
||
|
|
import sys
|
||
|
|
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
|
||
|
|
from dolphin_tui_v5 import write_test_results
|
||
|
|
|
||
|
|
write_test_results({
|
||
|
|
"data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
|
||
|
|
"finance_fuzz": {"passed": 8, "total": 8, "status": "PASS"},
|
||
|
|
"signal_fill": {"passed": 6, "total": 6, "status": "PASS"},
|
||
|
|
"degradation": {"passed": 12, "total": 12, "status": "PASS"},
|
||
|
|
"actor": {"passed": 46, "total": 46, "status": "PASS"},
|
||
|
|
})
|
||
|
|
```
|
||
|
|
|
||
|
|
`write_test_results()` does the following atomically:
|
||
|
|
1. Injects `"_run_at"` = current UTC ISO timestamp
|
||
|
|
2. Merges provided categories with any existing file (missing categories are preserved)
|
||
|
|
3. Writes to `run_logs/test_results_latest.json`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## pytest Integration — conftest.py Pattern
|
||
|
|
|
||
|
|
Add a session-scoped fixture in `conftest.py` at the repo root or the relevant test package:
|
||
|
|
|
||
|
|
```python
|
||
|
|
# conftest.py
|
||
|
|
import pytest
|
||
|
|
import sys
|
||
|
|
|
||
|
|
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
|
||
|
|
from dolphin_tui_v5 import write_test_results
|
||
|
|
|
||
|
|
|
||
|
|
def pytest_sessionfinish(session, exitstatus):
|
||
|
|
"""Push test results to TUI footer after every pytest run."""
|
||
|
|
summary = {}
|
||
|
|
for item in session.items:
|
||
|
|
cat = _category_for(item)
|
||
|
|
summary.setdefault(cat, {"passed": 0, "total": 0})
|
||
|
|
summary[cat]["total"] += 1
|
||
|
|
if item.session.testsfailed == 0: # refine: check per-item outcome
|
||
|
|
summary[cat]["passed"] += 1
|
||
|
|
|
||
|
|
for cat, counts in summary.items():
|
||
|
|
counts["status"] = "PASS" if counts["passed"] == counts["total"] else "FAIL"
|
||
|
|
|
||
|
|
write_test_results(summary)
|
||
|
|
|
||
|
|
|
||
|
|
def _category_for(item) -> str:
|
||
|
|
"""Map a test item to a footer category based on module path."""
|
||
|
|
path = str(item.fspath)
|
||
|
|
if "data_integrity" in path: return "data_integrity"
|
||
|
|
if "finance" in path: return "finance_fuzz"
|
||
|
|
if "signal" in path: return "signal_fill"
|
||
|
|
if "degradation" in path: return "degradation"
|
||
|
|
if "actor" in path: return "actor"
|
||
|
|
return "actor" # default bucket
|
||
|
|
```
|
||
|
|
|
||
|
|
A cleaner alternative is to use `pytest-terminal-reporter` or `pytest_runtest_logreport` to
|
||
|
|
capture per-item pass/fail rather than inferring from session state.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Shell / CI Script Pattern
|
||
|
|
|
||
|
|
For shell-level CI (Prefect flows, bash scripts):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
#!/bin/bash
|
||
|
|
source /home/dolphin/siloqy_env/bin/activate
|
||
|
|
|
||
|
|
# Run the suite
|
||
|
|
python -m pytest prod/tests/test_data_integrity.py -v --tb=short
|
||
|
|
EXIT=$?
|
||
|
|
|
||
|
|
# Push result
|
||
|
|
STATUS=$( [ $EXIT -eq 0 ] && echo "PASS" || echo "FAIL" )
|
||
|
|
PASSED=$(python -m pytest prod/tests/test_data_integrity.py --co -q 2>/dev/null | grep -c "test session")
|
||
|
|
|
||
|
|
python3 - <<EOF
|
||
|
|
import sys
|
||
|
|
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
|
||
|
|
from dolphin_tui_v5 import write_test_results
|
||
|
|
write_test_results({
|
||
|
|
"data_integrity": {"passed": None, "total": None, "status": "$STATUS"}
|
||
|
|
})
|
||
|
|
EOF
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Test Categories — Definitions
|
||
|
|
|
||
|
|
### `data_integrity`
|
||
|
|
Verifies structural correctness of data at system boundaries:
|
||
|
|
- Arrow IPC files match `SCAN_SCHEMA` (27 fields, `schema_version="5.0.0"`)
|
||
|
|
- HZ keys present and non-empty after scanner startup
|
||
|
|
- JSON payloads deserialise without error
|
||
|
|
- Scan number monotonically increases
|
||
|
|
|
||
|
|
### `finance_fuzz`
|
||
|
|
Financial edge-case property tests (Hypothesis or manual):
|
||
|
|
- AlphaBetSizer with `capital=0`, `capital<0`, `price=0`, `vel_div=NaN`
|
||
|
|
- ACB boost clamped to `[0.5, 2.0]` under all inputs
|
||
|
|
- Position sizing never produces `quantity < 0`
|
||
|
|
- Fee model: slippage + commission never exceeds gross PnL on minimum position
|
||
|
|
|
||
|
|
### `signal_fill`
|
||
|
|
End-to-end signal path:
|
||
|
|
- `vel_div < -0.02` → posture becomes APEX → order generated
|
||
|
|
- `vel_div >= 0` → no new orders
|
||
|
|
- Signal correctly flows through `NDAlphaEngine → DolphinActor → NautilusOrder`
|
||
|
|
- Dedup: same `scan_number` never generates two orders
|
||
|
|
|
||
|
|
### `degradation`
|
||
|
|
Graceful degradation under missing/stale inputs:
|
||
|
|
- TUI renders without crash when any HZ key is absent
|
||
|
|
- `mc_forewarner_latest` absent → "not deployed" rendered, no exception
|
||
|
|
- `ext_features_latest` fields `None` → `_exf_str()` substitutes `"?"`
|
||
|
|
- Scanner starts with no prior Arrow files (scan_number starts at 1)
|
||
|
|
- MHS missing subsystem → RM_META excludes it gracefully
|
||
|
|
|
||
|
|
### `actor`
|
||
|
|
DolphinActor Nautilus integration:
|
||
|
|
- `on_bar()` incremental step (not batch)
|
||
|
|
- Threading lock on ACB prevents race
|
||
|
|
- `_GateSnap` stale-state detection fires within 1 bar
|
||
|
|
- Capital sync on `on_start()` matches Nautilus portfolio balance
|
||
|
|
- MC-Forewarner wired and returning envelope gate signal
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## File Location Contract
|
||
|
|
|
||
|
|
The file path is **hardcoded** relative to the TUI module:
|
||
|
|
|
||
|
|
```python
|
||
|
|
_RESULTS_FILE = Path(__file__).parent.parent.parent / "run_logs" / "test_results_latest.json"
|
||
|
|
# resolves to: /mnt/dolphinng5_predict/run_logs/test_results_latest.json
|
||
|
|
```
|
||
|
|
|
||
|
|
Do not move `run_logs/` or rename the file — the TUI footer will silently show stale data.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## TUI Footer Refresh
|
||
|
|
|
||
|
|
The footer is read once on `on_mount()`. To force a live reload:
|
||
|
|
- Press **`t`** — toggles footer visibility (hide/show re-reads file)
|
||
|
|
- Press **`r`** — forces full panel refresh
|
||
|
|
|
||
|
|
The footer does **not** auto-watch the file for changes (no inotify). Press `t` twice after
|
||
|
|
a test run to see updated results without restarting the TUI.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Bootstrap File
|
||
|
|
|
||
|
|
`run_logs/test_results_latest.json` ships with a bootstrap entry (prior manual run results)
|
||
|
|
so the footer is never blank on first launch:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"_run_at": "2026-04-05T00:00:00",
|
||
|
|
"data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
|
||
|
|
"finance_fuzz": {"passed": null, "total": null, "status": "N/A"},
|
||
|
|
"signal_fill": {"passed": null, "total": null, "status": "N/A"},
|
||
|
|
"degradation": {"passed": 12, "total": 12, "status": "PASS"},
|
||
|
|
"actor": {"passed": null, "total": null, "status": "N/A"}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*See also: `SYSTEM_BIBLE.md` §28.4 — TUI test footer architecture*
|