DOLPHIN/prod/docs/TEST_REPORTING.md

# TEST_REPORTING.md
## How Automated Test Suites Update the TUI Live Footer

**Audience**: developers writing or extending dolphin test suites
**Last updated**: 2026-04-05

---

## Overview

`dolphin_tui_v5.py` displays a live test-results footer showing the latest automated test run
per category. The footer is driven by a single JSON file:

```
/mnt/dolphinng5_predict/run_logs/test_results_latest.json
```

Any test script, pytest fixture, or CI runner can update this file by calling the
`write_test_results()` function exported from `dolphin_tui_v5.py`, or by writing the JSON
directly with the correct schema.

---

## JSON Schema

```json
{
  "_run_at": "2026-04-05T14:30:00",
  "data_integrity":  {"passed": 15, "total": 15, "status": "PASS"},
  "finance_fuzz":    {"passed":  8, "total":  8, "status": "PASS"},
  "signal_fill":     {"passed":  6, "total":  6, "status": "PASS"},
  "degradation":     {"passed": 12, "total": 12, "status": "PASS"},
  "actor":           {"passed": 46, "total": 46, "status": "PASS"}
}
```

### Fields

| Field | Type | Description |
|---|---|---|
| `_run_at` | ISO-8601 string | UTC timestamp of the run. Set automatically by `write_test_results()`. |
| `data_integrity` | CategoryResult | Arrow schema + HZ key integrity tests |
| `finance_fuzz` | CategoryResult | Financial edge cases: negative capital, zero price, NaN signals |
| `signal_fill` | CategoryResult | Signal path continuity: vel_div → posture → order |
| `degradation` | CategoryResult | Graceful-degradation tests: missing HZ keys, stale data |
| `actor` | CategoryResult | DolphinActor unit + integration tests |

### CategoryResult

```json
{
  "passed": 15,       // int or null if not yet run
  "total":  15,       // int or null if not yet run
  "status": "PASS"    // "PASS" | "FAIL" | "N/A"
}
```

Use `"status": "N/A"` and `null` counts for categories not yet automated.

---

## Python API — Preferred Method

```python
# At the end of a test run (pytest session, script, etc.)
import sys
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
from dolphin_tui_v5 import write_test_results

write_test_results({
    "data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
    "finance_fuzz":   {"passed":  8, "total":  8, "status": "PASS"},
    "signal_fill":    {"passed":  6, "total":  6, "status": "PASS"},
    "degradation":    {"passed": 12, "total": 12, "status": "PASS"},
    "actor":          {"passed": 46, "total": 46, "status": "PASS"},
})
```

`write_test_results()` does the following atomically:
1. Injects `"_run_at"` = current UTC ISO timestamp
2. Merges provided categories with any existing file (missing categories are preserved)
3. Writes to `run_logs/test_results_latest.json`

---

## pytest Integration — conftest.py Pattern

Add a session-scoped fixture in `conftest.py` at the repo root or the relevant test package:

```python
# conftest.py
import pytest
import sys

sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
from dolphin_tui_v5 import write_test_results


def pytest_sessionfinish(session, exitstatus):
    """Push test results to TUI footer after every pytest run."""
    summary = {}
    for item in session.items:
        cat = _category_for(item)
        summary.setdefault(cat, {"passed": 0, "total": 0})
        summary[cat]["total"] += 1
        if item.session.testsfailed == 0:   # refine: check per-item outcome
            summary[cat]["passed"] += 1

    for cat, counts in summary.items():
        counts["status"] = "PASS" if counts["passed"] == counts["total"] else "FAIL"

    write_test_results(summary)


def _category_for(item) -> str:
    """Map a test item to a footer category based on module path."""
    path = str(item.fspath)
    if "data_integrity" in path:  return "data_integrity"
    if "finance"        in path:  return "finance_fuzz"
    if "signal"         in path:  return "signal_fill"
    if "degradation"    in path:  return "degradation"
    if "actor"          in path:  return "actor"
    return "actor"   # default bucket
```

A cleaner alternative is to use `pytest-terminal-reporter` or `pytest_runtest_logreport` to
capture per-item pass/fail rather than inferring from session state.

---

## Shell / CI Script Pattern

For shell-level CI (Prefect flows, bash scripts):

```bash
#!/bin/bash
source /home/dolphin/siloqy_env/bin/activate

# Run the suite
python -m pytest prod/tests/test_data_integrity.py -v --tb=short
EXIT=$?

# Push result
STATUS=$( [ $EXIT -eq 0 ] && echo "PASS" || echo "FAIL" )
PASSED=$(python -m pytest prod/tests/test_data_integrity.py --co -q 2>/dev/null | grep -c "test session")

python3 - <<EOF
import sys
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
from dolphin_tui_v5 import write_test_results
write_test_results({
    "data_integrity": {"passed": None, "total": None, "status": "$STATUS"}
})
EOF
```

---

## Test Categories — Definitions

### `data_integrity`
Verifies structural correctness of data at system boundaries:
- Arrow IPC files match `SCAN_SCHEMA` (27 fields, `schema_version="5.0.0"`)
- HZ keys present and non-empty after scanner startup
- JSON payloads deserialise without error
- Scan number monotonically increases

### `finance_fuzz`
Financial edge-case property tests (Hypothesis or manual):
- AlphaBetSizer with `capital=0`, `capital<0`, `price=0`, `vel_div=NaN`
- ACB boost clamped to `[0.5, 2.0]` under all inputs
- Position sizing never produces `quantity < 0`
- Fee model: slippage + commission never exceeds gross PnL on minimum position

### `signal_fill`
End-to-end signal path:
- `vel_div < -0.02` → posture becomes APEX → order generated
- `vel_div >= 0` → no new orders
- Signal correctly flows through `NDAlphaEngine → DolphinActor → NautilusOrder`
- Dedup: same `scan_number` never generates two orders

### `degradation`
Graceful degradation under missing/stale inputs:
- TUI renders without crash when any HZ key is absent
- `mc_forewarner_latest` absent → "not deployed" rendered, no exception
- `ext_features_latest` fields `None` → `_exf_str()` substitutes `"?"`
- Scanner starts with no prior Arrow files (scan_number starts at 1)
- MHS missing subsystem → RM_META excludes it gracefully

### `actor`
DolphinActor Nautilus integration:
- `on_bar()` incremental step (not batch)
- Threading lock on ACB prevents race
- `_GateSnap` stale-state detection fires within 1 bar
- Capital sync on `on_start()` matches Nautilus portfolio balance
- MC-Forewarner wired and returning envelope gate signal

---

## File Location Contract

The file path is **hardcoded** relative to the TUI module:

```python
_RESULTS_FILE = Path(__file__).parent.parent.parent / "run_logs" / "test_results_latest.json"
# resolves to: /mnt/dolphinng5_predict/run_logs/test_results_latest.json
```

Do not move `run_logs/` or rename the file — the TUI footer will silently show stale data.

---

## TUI Footer Refresh

The footer is read once on `on_mount()`. To force a live reload:
- Press **`t`** — toggles footer visibility (hide/show re-reads file)
- Press **`r`** — forces full panel refresh

The footer does **not** auto-watch the file for changes (no inotify). Press `t` twice after
a test run to see updated results without restarting the TUI.

---

## Bootstrap File

`run_logs/test_results_latest.json` ships with a bootstrap entry (prior manual run results)
so the footer is never blank on first launch:

```json
{
  "_run_at": "2026-04-05T00:00:00",
  "data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
  "finance_fuzz":   {"passed": null, "total": null, "status": "N/A"},
  "signal_fill":    {"passed": null, "total": null, "status": "N/A"},
  "degradation":    {"passed": 12, "total": 12, "status": "PASS"},
  "actor":          {"passed": null, "total": null, "status": "N/A"}
}
```

---

*See also: `SYSTEM_BIBLE.md` §28.4 — TUI test footer architecture*
initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore. 2026-04-21 16:58:38 +02:00			`# TEST_REPORTING.md`
			`## How Automated Test Suites Update the TUI Live Footer`

			`Audience: developers writing or extending dolphin test suites`
			`Last updated: 2026-04-05`

			`---`

			`## Overview`

			`dolphin_tui_v5.py` displays a live test-results footer showing the latest automated test run
			`per category. The footer is driven by a single JSON file:`

			```
			`/mnt/dolphinng5_predict/run_logs/test_results_latest.json`
			```

			`Any test script, pytest fixture, or CI runner can update this file by calling the`
			`write_test_results()` function exported from `dolphin_tui_v5.py`, or by writing the JSON
			`directly with the correct schema.`

			`---`

			`## JSON Schema`

			```json
			`{`
			`"_run_at": "2026-04-05T14:30:00",`
			`"data_integrity": {"passed": 15, "total": 15, "status": "PASS"},`
			`"finance_fuzz": {"passed": 8, "total": 8, "status": "PASS"},`
			`"signal_fill": {"passed": 6, "total": 6, "status": "PASS"},`
			`"degradation": {"passed": 12, "total": 12, "status": "PASS"},`
			`"actor": {"passed": 46, "total": 46, "status": "PASS"}`
			`}`
			```

			`### Fields`

			`\| Field \| Type \| Description \|`
			`\|---\|---\|---\|`
			\| `_run_at` \| ISO-8601 string \| UTC timestamp of the run. Set automatically by `write_test_results()`. \|
			\| `data_integrity` \| CategoryResult \| Arrow schema + HZ key integrity tests \|
			\| `finance_fuzz` \| CategoryResult \| Financial edge cases: negative capital, zero price, NaN signals \|
			\| `signal_fill` \| CategoryResult \| Signal path continuity: vel_div → posture → order \|
			\| `degradation` \| CategoryResult \| Graceful-degradation tests: missing HZ keys, stale data \|
			\| `actor` \| CategoryResult \| DolphinActor unit + integration tests \|

			`### CategoryResult`

			```json
			`{`
			`"passed": 15, // int or null if not yet run`
			`"total": 15, // int or null if not yet run`
			`"status": "PASS" // "PASS" \| "FAIL" \| "N/A"`
			`}`
			```

			Use `"status": "N/A"` and `null` counts for categories not yet automated.

			`---`

			`## Python API — Preferred Method`

			```python
			`# At the end of a test run (pytest session, script, etc.)`
			`import sys`
			`sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")`
			`from dolphin_tui_v5 import write_test_results`

			`write_test_results({`
			`"data_integrity": {"passed": 15, "total": 15, "status": "PASS"},`
			`"finance_fuzz": {"passed": 8, "total": 8, "status": "PASS"},`
			`"signal_fill": {"passed": 6, "total": 6, "status": "PASS"},`
			`"degradation": {"passed": 12, "total": 12, "status": "PASS"},`
			`"actor": {"passed": 46, "total": 46, "status": "PASS"},`
			`})`
			```

			`write_test_results()` does the following atomically:
			1. Injects `"_run_at"` = current UTC ISO timestamp
			`2. Merges provided categories with any existing file (missing categories are preserved)`
			3. Writes to `run_logs/test_results_latest.json`

			`---`

			`## pytest Integration — conftest.py Pattern`

			Add a session-scoped fixture in `conftest.py` at the repo root or the relevant test package:

			```python
			`# conftest.py`
			`import pytest`
			`import sys`

			`sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")`
			`from dolphin_tui_v5 import write_test_results`


			`def pytest_sessionfinish(session, exitstatus):`
			`"""Push test results to TUI footer after every pytest run."""`
			`summary = {}`
			`for item in session.items:`
			`cat = _category_for(item)`
			`summary.setdefault(cat, {"passed": 0, "total": 0})`
			`summary[cat]["total"] += 1`
			`if item.session.testsfailed == 0: # refine: check per-item outcome`
			`summary[cat]["passed"] += 1`

			`for cat, counts in summary.items():`
			`counts["status"] = "PASS" if counts["passed"] == counts["total"] else "FAIL"`

			`write_test_results(summary)`


			`def _category_for(item) -> str:`
			`"""Map a test item to a footer category based on module path."""`
			`path = str(item.fspath)`
			`if "data_integrity" in path: return "data_integrity"`
			`if "finance" in path: return "finance_fuzz"`
			`if "signal" in path: return "signal_fill"`
			`if "degradation" in path: return "degradation"`
			`if "actor" in path: return "actor"`
			`return "actor" # default bucket`
			```

			A cleaner alternative is to use `pytest-terminal-reporter` or `pytest_runtest_logreport` to
			`capture per-item pass/fail rather than inferring from session state.`

			`---`

			`## Shell / CI Script Pattern`

			`For shell-level CI (Prefect flows, bash scripts):`

			```bash
			`#!/bin/bash`
			`source /home/dolphin/siloqy_env/bin/activate`

			`# Run the suite`
			`python -m pytest prod/tests/test_data_integrity.py -v --tb=short`
			`EXIT=$?`

			`# Push result`
			`STATUS=$( [ $EXIT -eq 0 ] && echo "PASS" \|\| echo "FAIL" )`
			`PASSED=$(python -m pytest prod/tests/test_data_integrity.py --co -q 2>/dev/null \| grep -c "test session")`

			`python3 - <<EOF`
			`import sys`
			`sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")`
			`from dolphin_tui_v5 import write_test_results`
			`write_test_results({`
			`"data_integrity": {"passed": None, "total": None, "status": "$STATUS"}`
			`})`
			`EOF`
			```

			`---`

			`## Test Categories — Definitions`

			### `data_integrity`
			`Verifies structural correctness of data at system boundaries:`
			- Arrow IPC files match `SCAN_SCHEMA` (27 fields, `schema_version="5.0.0"`)
			`- HZ keys present and non-empty after scanner startup`
			`- JSON payloads deserialise without error`
			`- Scan number monotonically increases`

			### `finance_fuzz`
			`Financial edge-case property tests (Hypothesis or manual):`
			- AlphaBetSizer with `capital=0`, `capital<0`, `price=0`, `vel_div=NaN`
			- ACB boost clamped to `[0.5, 2.0]` under all inputs
			- Position sizing never produces `quantity < 0`
			`- Fee model: slippage + commission never exceeds gross PnL on minimum position`

			### `signal_fill`
			`End-to-end signal path:`
			- `vel_div < -0.02` → posture becomes APEX → order generated
			- `vel_div >= 0` → no new orders
			- Signal correctly flows through `NDAlphaEngine → DolphinActor → NautilusOrder`
			- Dedup: same `scan_number` never generates two orders

			### `degradation`
			`Graceful degradation under missing/stale inputs:`
			`- TUI renders without crash when any HZ key is absent`
			- `mc_forewarner_latest` absent → "not deployed" rendered, no exception
			- `ext_features_latest` fields `None` → `_exf_str()` substitutes `"?"`
			`- Scanner starts with no prior Arrow files (scan_number starts at 1)`
			`- MHS missing subsystem → RM_META excludes it gracefully`

			### `actor`
			`DolphinActor Nautilus integration:`
			- `on_bar()` incremental step (not batch)
			`- Threading lock on ACB prevents race`
			- `_GateSnap` stale-state detection fires within 1 bar
			- Capital sync on `on_start()` matches Nautilus portfolio balance
			`- MC-Forewarner wired and returning envelope gate signal`

			`---`

			`## File Location Contract`

			`The file path is hardcoded relative to the TUI module:`

			```python
			`_RESULTS_FILE = Path(__file__).parent.parent.parent / "run_logs" / "test_results_latest.json"`
			`# resolves to: /mnt/dolphinng5_predict/run_logs/test_results_latest.json`
			```

			Do not move `run_logs/` or rename the file — the TUI footer will silently show stale data.

			`---`

			`## TUI Footer Refresh`

			The footer is read once on `on_mount()`. To force a live reload:
			- Press `t` — toggles footer visibility (hide/show re-reads file)
			- Press `r` — forces full panel refresh

			The footer does not auto-watch the file for changes (no inotify). Press `t` twice after
			`a test run to see updated results without restarting the TUI.`

			`---`

			`## Bootstrap File`

			`run_logs/test_results_latest.json` ships with a bootstrap entry (prior manual run results)
			`so the footer is never blank on first launch:`

			```json
			`{`
			`"_run_at": "2026-04-05T00:00:00",`
			`"data_integrity": {"passed": 15, "total": 15, "status": "PASS"},`
			`"finance_fuzz": {"passed": null, "total": null, "status": "N/A"},`
			`"signal_fill": {"passed": null, "total": null, "status": "N/A"},`
			`"degradation": {"passed": 12, "total": 12, "status": "PASS"},`
			`"actor": {"passed": null, "total": null, "status": "N/A"}`
			`}`
			```

			`---`

			See also: `SYSTEM_BIBLE.md` §28.4 — TUI test footer architecture