Files
DOLPHIN/prod/docs/TEST_REPORTING.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

7.7 KiB
Executable File

TEST_REPORTING.md

Audience: developers writing or extending dolphin test suites Last updated: 2026-04-05


Overview

dolphin_tui_v5.py displays a live test-results footer showing the latest automated test run per category. The footer is driven by a single JSON file:

/mnt/dolphinng5_predict/run_logs/test_results_latest.json

Any test script, pytest fixture, or CI runner can update this file by calling the write_test_results() function exported from dolphin_tui_v5.py, or by writing the JSON directly with the correct schema.


JSON Schema

{
  "_run_at": "2026-04-05T14:30:00",
  "data_integrity":  {"passed": 15, "total": 15, "status": "PASS"},
  "finance_fuzz":    {"passed":  8, "total":  8, "status": "PASS"},
  "signal_fill":     {"passed":  6, "total":  6, "status": "PASS"},
  "degradation":     {"passed": 12, "total": 12, "status": "PASS"},
  "actor":           {"passed": 46, "total": 46, "status": "PASS"}
}

Fields

Field Type Description
_run_at ISO-8601 string UTC timestamp of the run. Set automatically by write_test_results().
data_integrity CategoryResult Arrow schema + HZ key integrity tests
finance_fuzz CategoryResult Financial edge cases: negative capital, zero price, NaN signals
signal_fill CategoryResult Signal path continuity: vel_div → posture → order
degradation CategoryResult Graceful-degradation tests: missing HZ keys, stale data
actor CategoryResult DolphinActor unit + integration tests

CategoryResult

{
  "passed": 15,       // int or null if not yet run
  "total":  15,       // int or null if not yet run
  "status": "PASS"    // "PASS" | "FAIL" | "N/A"
}

Use "status": "N/A" and null counts for categories not yet automated.


Python API — Preferred Method

# At the end of a test run (pytest session, script, etc.)
import sys
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
from dolphin_tui_v5 import write_test_results

write_test_results({
    "data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
    "finance_fuzz":   {"passed":  8, "total":  8, "status": "PASS"},
    "signal_fill":    {"passed":  6, "total":  6, "status": "PASS"},
    "degradation":    {"passed": 12, "total": 12, "status": "PASS"},
    "actor":          {"passed": 46, "total": 46, "status": "PASS"},
})

write_test_results() does the following atomically:

  1. Injects "_run_at" = current UTC ISO timestamp
  2. Merges provided categories with any existing file (missing categories are preserved)
  3. Writes to run_logs/test_results_latest.json

pytest Integration — conftest.py Pattern

Add a session-scoped fixture in conftest.py at the repo root or the relevant test package:

# conftest.py
import pytest
import sys

sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
from dolphin_tui_v5 import write_test_results


def pytest_sessionfinish(session, exitstatus):
    """Push test results to TUI footer after every pytest run."""
    summary = {}
    for item in session.items:
        cat = _category_for(item)
        summary.setdefault(cat, {"passed": 0, "total": 0})
        summary[cat]["total"] += 1
        if item.session.testsfailed == 0:   # refine: check per-item outcome
            summary[cat]["passed"] += 1

    for cat, counts in summary.items():
        counts["status"] = "PASS" if counts["passed"] == counts["total"] else "FAIL"

    write_test_results(summary)


def _category_for(item) -> str:
    """Map a test item to a footer category based on module path."""
    path = str(item.fspath)
    if "data_integrity" in path:  return "data_integrity"
    if "finance"        in path:  return "finance_fuzz"
    if "signal"         in path:  return "signal_fill"
    if "degradation"    in path:  return "degradation"
    if "actor"          in path:  return "actor"
    return "actor"   # default bucket

A cleaner alternative is to use pytest-terminal-reporter or pytest_runtest_logreport to capture per-item pass/fail rather than inferring from session state.


Shell / CI Script Pattern

For shell-level CI (Prefect flows, bash scripts):

#!/bin/bash
source /home/dolphin/siloqy_env/bin/activate

# Run the suite
python -m pytest prod/tests/test_data_integrity.py -v --tb=short
EXIT=$?

# Push result
STATUS=$( [ $EXIT -eq 0 ] && echo "PASS" || echo "FAIL" )
PASSED=$(python -m pytest prod/tests/test_data_integrity.py --co -q 2>/dev/null | grep -c "test session")

python3 - <<EOF
import sys
sys.path.insert(0, "/mnt/dolphinng5_predict/Observability/TUI")
from dolphin_tui_v5 import write_test_results
write_test_results({
    "data_integrity": {"passed": None, "total": None, "status": "$STATUS"}
})
EOF

Test Categories — Definitions

data_integrity

Verifies structural correctness of data at system boundaries:

  • Arrow IPC files match SCAN_SCHEMA (27 fields, schema_version="5.0.0")
  • HZ keys present and non-empty after scanner startup
  • JSON payloads deserialise without error
  • Scan number monotonically increases

finance_fuzz

Financial edge-case property tests (Hypothesis or manual):

  • AlphaBetSizer with capital=0, capital<0, price=0, vel_div=NaN
  • ACB boost clamped to [0.5, 2.0] under all inputs
  • Position sizing never produces quantity < 0
  • Fee model: slippage + commission never exceeds gross PnL on minimum position

signal_fill

End-to-end signal path:

  • vel_div < -0.02 → posture becomes APEX → order generated
  • vel_div >= 0 → no new orders
  • Signal correctly flows through NDAlphaEngine → DolphinActor → NautilusOrder
  • Dedup: same scan_number never generates two orders

degradation

Graceful degradation under missing/stale inputs:

  • TUI renders without crash when any HZ key is absent
  • mc_forewarner_latest absent → "not deployed" rendered, no exception
  • ext_features_latest fields None_exf_str() substitutes "?"
  • Scanner starts with no prior Arrow files (scan_number starts at 1)
  • MHS missing subsystem → RM_META excludes it gracefully

actor

DolphinActor Nautilus integration:

  • on_bar() incremental step (not batch)
  • Threading lock on ACB prevents race
  • _GateSnap stale-state detection fires within 1 bar
  • Capital sync on on_start() matches Nautilus portfolio balance
  • MC-Forewarner wired and returning envelope gate signal

File Location Contract

The file path is hardcoded relative to the TUI module:

_RESULTS_FILE = Path(__file__).parent.parent.parent / "run_logs" / "test_results_latest.json"
# resolves to: /mnt/dolphinng5_predict/run_logs/test_results_latest.json

Do not move run_logs/ or rename the file — the TUI footer will silently show stale data.


The footer is read once on on_mount(). To force a live reload:

  • Press t — toggles footer visibility (hide/show re-reads file)
  • Press r — forces full panel refresh

The footer does not auto-watch the file for changes (no inotify). Press t twice after a test run to see updated results without restarting the TUI.


Bootstrap File

run_logs/test_results_latest.json ships with a bootstrap entry (prior manual run results) so the footer is never blank on first launch:

{
  "_run_at": "2026-04-05T00:00:00",
  "data_integrity": {"passed": 15, "total": 15, "status": "PASS"},
  "finance_fuzz":   {"passed": null, "total": null, "status": "N/A"},
  "signal_fill":    {"passed": null, "total": null, "status": "N/A"},
  "degradation":    {"passed": 12, "total": 12, "status": "PASS"},
  "actor":          {"passed": null, "total": null, "status": "N/A"}
}

See also: SYSTEM_BIBLE.md §28.4 — TUI test footer architecture