**TUI v9**: `/mnt/dolphinng5_predict/Observability/TUI/dolphin_tui_v9.py` — live terminal with trades footer, AE shadow panel, and bucket performance panel.
**ClickHouse**: `http://localhost:8123/` (database: `dolphin`, user: `dolphin`, pass: `dolphin_ch_2026`). BLUE live writes go to `dolphin`. PRODGREEN live writes go to `dolphin_prodgreen`. Green-side readers must filter `strategy IN ('green','prodgreen')` and must not treat legacy BLUE rows in green-side tables as authoritative.
**Adaptive Exit Engine**: Shadow mode active. Per-bucket LR continuation model + SC threshold/gauge surfaces. Logs to `adaptive_exit_shadow` and `sc_*_shadow`. Zero impact on real exits.
**D_LIQ Gold Performance**: ROI=+189.48% | T=2155 | DD=21.31% (full backtest, post vel_div fix, post D_LIQ).
### What changed since v6.0 (2026-04-19 — THIS VERSION)
| Area | Change |
|---|---|
| **ALGO VERSION: v2_gold_fix_v50-v750** | `vel_div` corrected from `v50-v150` → `v50-v750` in `nautilus_event_trader.py`. Deployed 2026-04-10. See §29. |
| **Adaptive Exit Engine — NEW** | `adaptive_exit/` package — per-bucket LR continuation model plus SC threshold / SC gauge shadow surfaces. Shadow mode only (no real exits). Trained on 5yr 1m klines. Integrated into `prod/nautilus_event_trader.py`. See §31. |
| **Asset Bucket System — NEW** | KMeans k=7 buckets by market characteristics. B3 best (WR≈61%, net+$3,858), B1/B4 worst. Live panel in TUI. See §33. |
| **Full path audit** | All file references updated to absolute Linux paths. Windows paths removed from active sections. |
| **ExF NPZ Backfill** | 1658 daily NPZ files (2021-06-15 → 2026-01-12) confirmed on disk: `fng`, `fng_prev`, `funding_btc`, `dvol_btc`, `chg24_btc`. Used for AE training. |
### What changed since v5.0 (2026-04-05 — PREVIOUS)
| Area | Change |
|---|---|
| **NG8 Linux Scanner — NEW** | `- Dolphin NG8/ng8_scanner.py` — Linux-native eigenscan service replacing Windows NG7. Fixes double-output bug. Single `enhance()` call processes all 4 windows (w50/150/300/750) in one pass → exactly one Arrow file + one HZ write per scan_number. See §27. |
| **Arrow Writer Shim — NEW** | `- Dolphin NG8/arrow_writer.py` — thin re-export so `dolphin_correlation_arb512_with_eigen_tracking.py` imports correctly on Linux (Windows had this file natively). |
| **TUI v3 — NEW** | `Observability/TUI/dolphin_tui_v3.py` — full live observability terminal. All panels event-driven via HZ entry listeners. Zero origin-system load. Replaces mocked TUI v2. See §28. |
| **Test Footer CI Hook — NEW** | `run_logs/test_results_latest.json` + `write_test_results()` API in TUI v3. Test scripts push results; TUI footer displays live. See §28.4 and `TEST_REPORTING.md`. |
| **NG7 Double-Output — Root Cause Confirmed** | Windows NG7 ran two independent tracker cycles (fast w50/w150 + slow w300/w750) sharing the same scan_number counter → two Arrow files + two HZ writes per scan, second file arriving ~3 min late with stale prices. NG8 eliminates this by design. |
---
### What changed since v4.1 (2026-03-30 — PREVIOUS)
| Area | Change |
|---|---|
| **Process Manager: Systemd → Supervisord** | ALL dolphin services migrated exclusively to supervisord. No service is managed by both. `dolphin-supervisord.conf` is the single source of process truth. See §16, §26. |
| **"Random Killer" Root Cause Fixed** | `meta_health_daemon_v2.py` had been running under systemd for 4 days calling `systemctl restart` on supervisord-managed services every 5s. Dual-management race caused random service kills. Stopped + disabled. |
| **MHS v3 — Complete Rewrite** | `meta_health_service_v3.py` — product formula bug fixed (zero-collapse replaced by weighted sum), recovery via supervisorctl not systemctl, `RECOVERY_COOLDOWN_CRITICAL_S=10s` (was 600s), non-blocking daemon thread recovery. See §24.5. |
| **OBF Universe Service — NEW** | `obf_universe_service.py` — 540 USDT perp assets on 3 WebSocket connections, zero REST weight, 60s health snapshots → HZ `obf_universe_latest`. Supervisord `autostart=true`. See §26. |
| **OBF Retention Fix** | `obf_persistence.py``MAX_FILE_AGE_DAYS = 0` (was 7 — was deleting all backtesting data). Data now accumulates indefinitely for backtesting. |
| **Test Suite: MHS** | NEW `prod/tests/test_mhs_v3.py` — 111 tests: unit, live integration, E2E kill/revive, race conditions, 13 Hypothesis property tests. |
| **HZ Schema additions** | `DOLPHIN_FEATURES["obf_universe_latest"]`, `DOLPHIN_META_HEALTH["latest"]`. See §15. |
| **Multi-Speed Architecture** | NEW multi-layer frequency isolation: OBF (0.1s), Scan (5s), ExtF (varied), Health (5s), Daily batch. See §24. |
| **Event-Driven Nautilus** | NEW `nautilus_event_trader.py` — Hz entry listener for <1msscan-to-tradelatency.NotaPrefectflow—long-runningsystemddaemon.See§24.2.|
| **MHS v2** | ENHANCED `meta_health_daemon_v2.py` — Full 5-sensor monitoring (M1-M5), per-subsystem data freshness tracking, automated recovery. See §24.3. |
| **Resource Safety** | NEW systemd resource limits: MemoryMax=2G, CPUQuota=200%, TasksMax=50 per service. Prevents process explosion. |
| **Scan Bridge Hardening** | Deployment concurrency limit=1, work pool concurrency=1, cgroups integration. See §24.1. |
| **Systemd Service Mesh** | NEW services: `dolphin-nautilus-trader.service`, updated `meta_health_daemon.service`. Systemd-managed, not Prefect-managed. |
| **Incident Response** | Post-mortem: 2026-03-24 kernel deadlock from 60+ uncontrolled Prefect processes. Fixed via concurrency controls. |
### What changed since v3 (2026-03-22)
| Area | Change |
|---|---|
| **Clean Architecture** | NEW hexagonal architecture in `prod/clean_arch/` — Ports, Adapters, Core separation. Adapter-agnostic business logic. |
| **Hazelcast DataFeed** | NEW `HazelcastDataFeed` adapter implementing `DataFeedPort` — reads from DolphinNG6 via Hazelcast (single source of truth). |
| **Scan Bridge Service** | NEW `scan_bridge_service.py` — Linux Arrow file watcher that pushes to Hazelcast. Uses file mtime (not scan #) to handle NG6 restarts. **Phase 2: Prefect daemon integration complete** — auto-restart, health monitoring, unified logging. **18 unit tests** in `tests/test_scan_bridge_prefect_daemon.py`.
| **Paper Trading Engine** | NEW `paper_trade.py` — Clean architecture trading CLI with 23 round-trip trades executed in testing. |
| **Market Data** | Live data flowing: 50 assets, BTC @ $71,281.03, velocity divergence signals active. |
### What changed since v2 (2026-03-22)
| Area | Change |
|---|---|
| **Binance Futures** | Switched system focus from Spot to Perpetuals; updated API endpoints (`fapi.binance.com`); added `recvWindow` for signature stability. |
| **Friction Management** | **SP Bypass Logic**: Alpha engines now support disabling internal fees/slippage to allow Nautilus to handle costs natively. Prevents double-counting. |
| **Paper Trading** | NEW `launch_paper_portfolio.py` — uses Sandbox matching with live Binance data; includes realistic Tier 0 friction (0.02/0.05). |
| **Session Logging** | NEW `TradeLoggerActor` — independent CSV/JSON audit trails for every session. |
| Area | Change |
|---|---|
| **DolphinActor** | Refactored to step_bar() API (incremental, not batch); threading.Lock on ACB; _GateSnap stale-state detection; replay vs live mode; bar_idx tracking |
| **0.1s resolution** | Assessed: BLOCKED by 3 hard blockers (see §22) |
| **Capital Sync** | NEW — DolphinActor now syncs initial_capital with Nautilus Portfolio balance on_start. |
| **Verification** | NEW — `TODO_CHECK_SIGNAL_PATHS.md` systematic test spec for local agents. |
| **MC-Forewarner** | Now wired in `DolphinActor.on_start()` — both flows run full gold-performance stack; `_MC_BASE_CFG` + `_MC_MODELS_DIR_DEFAULT` as frozen module constants; empty-parquet early-return bug fixed in `on_bar` replay path |
32. [Asset Bucket System (v7.0)](#32-asset-bucket-system)
---
## 1. SYSTEM PHILOSOPHY
DOLPHIN-NAUTILUS is a **SHORT-only** (champion configuration) systematic trading engine targeting crypto perpetual futures on Binance.
**Core thesis**: When crypto market correlation matrices show accelerating eigenvalue-velocity divergence (`vel_div < -0.02`), the market is entering an instability regime. Shorting during early instability onset and exiting at fixed take-profit captures the mean-reversion from panic to normalization.
**Design constraints**:
- Zero signal re-implementation in the Nautilus layer. All alpha logic lives in `NDAlphaEngine`.
- 512-bit arithmetic for correlation matrix processing (separate NG3 pipeline; not in hot path of this engine).
- Champion parameters are FROZEN. They were validated via exhaustive VBT backtest on `dolphin_vbt_real.py`.
- The Nautilus actor is a thin wire, not a strategy. It routes parquet data → NDAlphaEngine → HZ result.
**Key invariant v2**: `DolphinActor.on_bar()` receives one synthetic bar per date in paper mode, which triggers `engine.begin_day()` then iterates through all parquet rows via `step_bar()`. In live mode, one real bar → one `step_bar()` call. The `_processed_dates` guard is replaced by date-boundary detection comparing `current_date` to the bar's timestamp date.
---
## 2a. CLEAN ARCHITECTURE LAYER (NEW v4)
### 2a.1 Overview
The Clean Architecture layer provides a **hexagonal** (ports & adapters) implementation for paper trading, ensuring core business logic is independent of infrastructure concerns.
**Dependency Rule**: Dependencies only point inward. Core knows nothing about Hazelcast, Arrow files, or Binance.
**Single Source of Truth**: All data comes from Hazelcast `DOLPHIN_FEATURES.latest_eigen_scan`, written atomically by DolphinNG6.
**File Timestamp vs Scan Number**: The Scan Bridge uses file modification time (mtime) instead of scan numbers because DolphinNG6 resets counters on restarts.
### 2a.3 Components
| Component | File | Purpose |
|-----------|------|---------|
| `DataFeedPort` | `ports/data_feed.py` | Abstract interface for market data |
| `HazelcastDataFeed` | `adapters/hazelcast_feed.py` | Hz implementation of DataFeedPort |
| `TradingEngine` | `core/trading_engine.py` | Pure business logic |
IRP = **Impulse Response Profiling**. Ranks all available assets by historical behavior over the last 50 bars in the regime direction. Selects the asset with the highest ARS (Asset Ranking Score) that passes all filters.
Blended taker/maker fee rates based on historical SP fill statistics. **IMPORTANT**: In production/paper sessions using Nautilus friction, these MUST be disabled via `use_sp_fees=False`.
Also disabled when `use_sp_slippage=False` is passed to the engine. These were used to "re-approximate" fills in low-fidelity simulations. In paper/live trading, the matching engine provides the fill price directly.
The OB layer is wired in via `engine.set_ob_engine(ob_engine)` which propagates to signal_gen, asset_selector, and exit_manager. It is OPTIONAL — the engine degrades gracefully to legacy Monte Carlo when `ob_engine=None`.
self._pending_acb = None # atomic consume under lock
if pending is not None and self.engine is not None:
boost = float(pending.get('boost', 1.0))
beta = float(pending.get('beta', 0.0))
self.engine.update_acb_boost(boost, beta)
```
**v2 vs v1**: v1 relied on GIL for safety (bare dict assignment). v2 uses explicit `threading.Lock` — correct even if GIL is removed in future Python versions. Lock hold time is minimized to a single pointer swap.
### 14.3 _GateSnap — Stale-State Detection
New in v2. Detects when ACB boost, posture, or MC gate changes between the pre-step and post-step snapshot:
- **`actor.log` is read-only** (Rust-backed Cython property). Never try to assign `actor.log = MagicMock()` in tests — use the real Nautilus logger instead.
- **`actor.posture`** is a regular Python attribute (writable in tests).
- **`actor.engine`** is set in `on_start()`. Tests can set directly after `__init__`.
---
## 15. HAZELCAST — FULL IMAP SCHEMA
Hazelcast is the **system memory**. All subsystem state flows through it. Every consumer must treat HZ maps as authoritative real-time sources.
| `DOLPHIN_PNL_BLUE` | `"YYYY-MM-DD"` | JSON daily result `{pnl, capital, trades, boost, beta, mc_status, posture, stale_state?}` | `paper_trade_flow`, `DolphinActor._write_result_to_hz`, `nautilus_prefect_flow` | Analytics | stale_state=True means DO NOT use for live orders |
| `DOLPHIN_PNL_GREEN` | `"YYYY-MM-DD"` | JSON daily result | `paper_trade_flow` (green) | Analytics | GREEN config only |
| `DOLPHIN_STATE_BLUE` | `"latest"` | JSON `{strategy, capital, date, pnl, trades, peak_capital, drawdown, engine_state, updated_at}` | `paper_trade_flow` | `paper_trade_flow` (capital restore) | Full engine_state for position continuity |
shard_idx = sum(ord(c) for c in symbol) % SHARD_COUNT
imap_name = f"DOLPHIN_FEATURES_SHARD_{shard_idx:02d}" # ..._00 through ..._09
```
Routing is **stable** (sum-of-ord, not `hash()`) — deterministic across Python versions and process restarts. 400+ assets distribute evenly across 10 shards.
No static asset list required — adapts automatically as OBF flow adds/removes assets.
### 15.5 CP Subsystem (ACB Processor)
`acb_processor_service.py` uses `HZ CP FencedLock` to prevent simultaneous ACB writes from multiple instances. CP Subsystem must be enabled in `docker-compose.yml`. All writers must use the same CP lock name to get protection.
### 15.6 OBF Circuit Breaker (HZ Push)
After 5 consecutive HZ push failures, OBF flow opens a circuit breaker and switches to file-only mode (`ob_cache/latest_ob_features.json`). Consumers should prefer the JSON file during HZ outages.
---
## 16. PRODUCTION DAEMON TOPOLOGY
> **v5.0 NOTE**: ALL services are managed exclusively by **supervisord**. No service is managed by systemd. The `meta_health_daemon.service`, `dolphin-nautilus-trader.service`, and `dolphin-scan-bridge.service` systemd units are stopped and disabled. Any attempt to re-enable them will create a dual-management race condition ("random killer" bug — see §26.1).
### 20.4 OBF Live Data Gap — KNOWN LIMITATION (2026-03-26)
> **CRITICAL DATA QUALITY CAVEAT**: `nautilus_event_trader.py` (live event trader) is currently wired to `MockOBProvider` with static per-asset imbalance biases (BTC=-0.086, ETH=-0.092, BNB=+0.05, SOL=+0.05). All four OBF functional dimensions compute and produce real outputs — but with frozen, market-unresponsive inputs. The OB cascade regime will always be CALM (no depth drain in mock data).
>
> `HZOBProvider` (`/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/hz_ob_provider.py`) exists and is format-compatible with `obf_prefect_flow.py`'s HZ output, but `OBFeatureEngine` has no live streaming path — only `preload_date()` (batch/backtest). A `step_live()` method must be added before the switch.
`tests/test_obf_unit.py` — ~120 unit tests covering all hardening items:
- Circuit breaker state machine (CLOSED → OPEN → HALF-OPEN)
- Crossed-book guard triggers on malformed data
- Dark streak threshold detection
- Warmup period gating
- Background thread non-blocking behavior
- Asset discovery via HZ key scan
---
## 21. KNOWN RESEARCH TODOs
| ID | Description | Priority |
|----|-------------|----------|
| TODO-1 | Calibrate `vd_enabled` adverse-turn exits (currently disabled). Requires analysis of trade vel_div distribution at entry vs. subsequent bars. True invalidation threshold likely ~+0.02 sustained for N=3 bars. | MEDIUM |
| TODO-2 | Validate SUBDAY_ACB force-exit threshold (`old_boost >= 1.25 and boost < 1.10`). Currently ARBITRARY — agent-chosen, not backtest-derived. | MEDIUM |
| TODO-3 | MIG8: Binance live adapter (real order execution). OUT OF SCOPE until after 30-day paper trading validation. | LOW |
| TODO-4 | 48-hour chaos test with all daemons running simultaneously. Watch for: KeyError, stale-read anomalies, concurrent HZ writer collisions. | HIGH (before live capital) |
| TODO-5 | Memory profiler with IRP enabled at 400 assets (current 71 MB measurement was without IRP). Projected ~600 MB — verify. | LOW |
| TODO-6 | TF-spread recovery exits (`tf_enabled=False`). Requires sweep of tf_exhaust_ratio and tf_flip_ratio vs. champion backtest. | LOW |
| TODO-7 | GREEN (LONG) posture paper validation. LONG thresholds (long_threshold=0.01, long_extreme=0.04) not yet production-validated. | MEDIUM |
| TODO-8 | ~~ML-MC Forewarner injection into `nautilus_prefect_flow.py`.~~**DONE 2026-03-22** — wired in `DolphinActor.on_start()` for both flows. | CLOSED |
| TODO-9 | Live TradingNode integration (launcher.py exists; Binance adapter config incomplete). Requires 30-day clean paper run first. | LOW |
| TODO-10 | BingX futures private-WS `SNAPSHOT` burst absorption. On connect/reconnect, absorb the initial futures `SNAPSHOT` flood into account/config caches, gate `ws_primed` readiness on snapshot drain, and suppress false drift / excess REST polling during the burst. Treat as startup/reconnect performance work, not fill-truth logic. | MEDIUM |
| TODO-11 | Dual-shadow regime sampler for side selection. Run two ultra-light shadow engines in real time over recent sample trades: (A) basal SHORT Alpha Engine posture and (B) basal LONG posture. Use their relative WR / ROI-per-trade / drawdown asymmetry as a regime probe: SHORT down + LONG up → LONG-favorable; LONG down + SHORT up → SHORT-favorable; both up → permissive; both down → likely choppy / abstain. Treat this initially as a shadow-only market-sampling / regime-detection layer. Later, cross the shadow streams with market fingerprints so a learner can predict or simplify the switch logic. The first persistence pass on extant trades found only mild short-loss clustering, so the live switch should be hysteresis-gated, not a raw flip-on-first-loss rule. See `LONG_DETERMINISTIC_RULE_RESEARCH.md` for the measured flip-after-loss counterfactual. | MEDIUM |
| BUG-1 | **V7 `_max_hold_ref` decoupled from actual MAX_HOLD.**`alpha_exit_v7_engine.py:491` computes `_max_hold_ref = self._3m_bars * 3 = 48 bars` (from `bar_duration_sec=11.0`). The MAE-D time-pressure ramps from bar 29 and saturates at bar 48 — only ~9 minutes into a trade whose real MAX_HOLD is 125 bars (OB-halved). Effect: V7 is **over-eager on adverse-excursion trades** (mae > 0.3% after bar 29 gets time-pressure that should belong at bar 75+). No effect on winning trades (mae too low to trigger the gate). Fix: derive `_max_hold_ref` from the orchestrator's effective `max_hold_bars` (post OB-halving) rather than `_3m_bars * 3`. | MEDIUM |
| BUG-2 | **OB dynamic max_hold adjustments discard per-trade `max_hold_override`.**`alpha_exit_manager.py:135,147,152,157` all multiply `self.max_hold_bars` (global default) instead of the per-trade `dynamic_max_hold`. If a `max_hold_override` is set via `setup_position()`, the cascade/withdrawal/convexity adjustments silently replace it with the global-based computation. Currently latent (no overrides in use), but will bite if per-trade hold tuning is ever deployed. Fix: multiply from `dynamic_max_hold` (already resolved from override at line 120) instead of `self.max_hold_bars`. | LOW |
---
## 22. 0.1S RESOLUTION — READINESS ASSESSMENT
**Assessment date**: 2026-03-22. **Status: BLOCKED — 3 hard blockers.**
The current system processes 5s OHLCV bars. Upgrading to 0.1s tick resolution requires resolving all three blockers below before any code changes.
### 22.1 Blocker 1 — Async HZ Push
**Problem**: The OBF hot loop fires at ~100ms cadence. At 0.1s resolution, the per-bar HZ write latency (currently synchronous in feature compute path, despite fire-and-forget for the push itself) would exceed bar cadence, causing HZ write queue growth and eventual OOM.
**Required**: Full async HZ client (`hazelcast-python-client` async API or aiohazelcast). Currently all HZ operations are synchronous blocking calls. Estimated effort: 2–3 days of refactor + regression testing.
### 22.2 Blocker 2 — `get_depth` Timeout
**Problem**: `get_depth()` in `HZOBProvider` issues a synchronous HZ `IMap.get()` call with a 500ms timeout. At 0.1s resolution, each bar would wait up to 500ms for OB depth data — 5× the bar cadence. This makes 0.1s resolution impossible without an in-process depth cache.
**Required**: Pre-fetched depth cache (e.g., local dict refreshed by a background subscriber), making `get_depth()` a pure in-process read with <1µslatency.Estimatedeffort:1–2days.
### 22.3 Blocker 3 — Lookback Recalibration
**Problem**: All champion parameters that reference "bars" were validated against 5s bars:
-`lookback=100` (100 × 5s = 500s warmup)
-`max_hold_bars=120` (120 × 5s = 600s max hold)
-`dc_lookback_bars=7` (7 × 5s = 35s DC window)
At 0.1s resolution, the same bar counts would mean 10s warmup, 12s max hold, 0.7s DC window — **completely invalidating champion params**. All params must be re-validated from scratch via VBT backtest at 0.1s resolution.
**Required**: Full backtest sweep at 0.1s. Estimated effort: 1–2 weeks of compute + validation time. This is a research milestone, not an engineering task.
### 22.4 Assessment Summary
| Blocker | Effort | Dependency |
|---------|--------|------------|
| Async HZ push | 2–3 days engineering | None — can start now |
| `get_depth` cache | 1–2 days engineering | None — can start now |
| Lookback recalibration | 1–2 weeks research | Requires blockers 1+2 resolved first |
**Recommendation**: Do NOT attempt 0.1s resolution until after 30-day paper trading validation at 5s. The engineering blockers can be prototyped in parallel, but champion params cannot be certified until post-paper-run stability is confirmed.
## 23. SIGNAL PATH VERIFICATION SPECIFICATION
Testing the asynchronous, multi-scale signal path requires systematic validation of the data bridge and cross-layer trigger logic.
### 23.1 Verification Flow
A local agent (Prefect or standalone) should verify:
1.**Micro Ingestion**: 100ms OB features sharded across 10 HZ maps.
2.**Regime Bridge**: NG5 Arrow scan detection by `scan_hz_bridge.py` and push to `latest_eigen_scan`.
3.**Strategy Reactivity**: `DolphinActor.on_bar` (5s) pulling HZ data and verifying `scan_number` idempotency.
4.**Macro Safety**: Survival Stack Rm-computation pushing `APEX/STALKER/HIBERNATE` posture to `DOLPHIN_SAFETY`.
### 23.2 Reference Document
Full test instructions, triggers, and expected values are defined in:
`TODO_CHECK_SIGNAL_PATHS.md` (Project Root)
---
*End of DOLPHIN-NAUTILUS System Bible v3.0 — 2026-03-23*
*Champion: SHORT only (APEX posture, blue configuration)*
*Automation: Prefect-supervised paper trading active.*
*Status: Capital Sync enabled; Friction SP-bypass active; TradeLogger running.*
*Do NOT deploy real capital until 30-day paper run is clean.*
The DOLPHIN system has been re-architected from a **single-speed batch-oriented Prefect deployment** to a **multi-speed, event-driven, multi-worker architecture** with proper resource isolation and self-healing capabilities.
**Problem Solved**: 2026-03-24 system outage caused by uncontrolled Prefect process explosion (60+ `prefect.engine` zombies → resource exhaustion → kernel deadlock).
**Current Status**: Running directly (PID 158929) due to Prefect worker scheduling issues.
### 24.5 Meta Health Service v3 (MHS) — REWRITTEN v5.0
> **MHS v2 is retired.** `meta_health_daemon_v2.py` was calling `systemctl restart` on supervisord-managed processes — this was the "random killer" bug. v3 is the canonical implementation.
### 26.1 The "Random Killer" Bug — Root Cause & Fix
**Incident**: Services were being unexpectedly killed and restarted at seemingly random intervals. The system appeared healthy according to supervisord but processes would die without obvious cause.
**Root cause** (diagnosed 2026-03-30):
1.`meta_health_daemon_v2.py` had been running under `meta_health_daemon.service` (systemd) for 4+ days.
2. MHS v2's process patterns (`exf_prefect_final`, `esof_prefect_flow`) did not match any running process → M1=0 → `rm_meta = M1*M2*M3*M4*M5 = 0` always → status="DEAD".
3. MHS v2 recovery action: `systemctl restart <service>` — called every 5s.
4. But the services were supervisord-managed, not systemd-managed. `systemctl restart` on a supervisord process:
- Sends SIGTERM to the process (it dies)
- Supervisord detects the death and autostarts a new instance
- Creates brief duplicate processes, interleaved with MHS v2's next kill cycle
5. Additionally, `dolphin-nautilus-trader.service` (systemd) AND supervisord were both managing `nautilus_event_trader.py` simultaneously — two PIDs running at once.
**Permanent guard**: `test_mhs_v3.py::TestKillAndRevive::test_no_systemd_units_active_for_managed_services` asserts no conflicting systemd units are active.
### 26.2 OBF Universe Service
**Purpose**: Lightweight L2 order book health monitor for ALL 540 active USDT perpetuals on Binance Futures.
**Why**: Asset Picker needs OB health scores for the full universe (540 assets) to make informed selection decisions, not just the 400 assets covered by the existing OBF shard store.
**Design**: Push streams (zero REST weight), no polling.
```
wss://fstream.binance.com/ws
Connection 1: 200 symbols ×@depth5@500ms
Connection 2: 200 symbols ×@depth5@500ms
Connection 3: 140 symbols ×@depth5@500ms
(total: 540, Binance limit: 300/conn)
```
**Computed metrics per asset** (every 60s snapshot):
| Field | Description |
|---|---|
| `spread_bps` | (ask - bid) / mid × 10000 |
| `depth_1pct_usd` | Total USD volume within 1% of mid on both sides |
**Bug (v4.1)**: `MAX_FILE_AGE_DAYS = 7` — every daily cleanup run deleted all OBF Parquet data older than 7 days, destroying the entire backtesting dataset.
**Fix (v5.0)**:
```python
MAX_FILE_AGE_DAYS = 0 # 0 = disabled — never prune, accumulate for backtesting
def _cleanup_old_partitions(self):
"""0 = disabled."""
if not MAX_FILE_AGE_DAYS or not self.base_dir.exists():
return
...
```
Data now accumulates indefinitely in `/mnt/ng6_data/ob_features/` (existing OBF) and `/mnt/ng6_data/ob_universe/` (new universe service).
---
---
## 27. NG8 LINUX EIGENSCAN SERVICE
**File**: `- Dolphin NG8/ng8_scanner.py`
**Status**: Built, smoke-tested. Replaces Windows NG7 eigenscan.
Windows NG7 maintained two independent tracker cycles:
- **Fast cycle** (w50, w150): completed ~11s after scan start → wrote Arrow file 1, HZ write 1
- **Slow cycle** (w300, w750): completed ~3 min later with **stale BTC price** → wrote Arrow file 2, HZ write 2
Both cycles shared the same `scan_number` counter. Result: two Arrow files per logical scan, the second containing stale prices from 3 minutes earlier. The scan bridge de-duplicated by file mtime (file 1 is always the useful one).
### 27.2 NG8 Fix: Single `enhance()` Pass
`DolphinCorrelationEnhancerArb512.enhance()` processes all four windows (50, 150, 300, 750) in a single sequential loop. NG8 calls this once per scan cycle:
```python
result = self.engine.enhance(price_data, PRIORITY_SYMBOLS, now)
# result.multi_window_results has all four windows populated
# Exactly one Arrow write + one HZ write follows
```
`use_arrow=False` is passed to the engine constructor so the engine does **not** perform its own internal Arrow write — `ng8_scanner.py` owns that write exclusively.
### 27.3 Schema Contract (Doctrinal NG5)
Arrow IPC schema is defined in `ng7_arrow_writer_original.py` → `SCAN_SCHEMA` (27 fields, `SCHEMA_VERSION="5.0.0"`). `arrow_writer.py` is a thin re-export shim:
**Critical**: pass `get_arrow_scans_path().parent` (= `/mnt/dolphinng6_data`) — NOT `get_arrow_scans_path()` — or the writer creates `arrow_scans/arrow_scans/` double-nesting.
### 27.5 Hazelcast Output
Map: `DOLPHIN_FEATURES` → key `latest_eigen_scan`
**NG8 flat payload** (written by NG8, differs from NG7 nested payload):
```python
{
"scan_number": int,
"timestamp": "ISO-8601",
"bridge_ts": float, # Unix epoch at HZ write
"vel_div": float,
"w50_velocity": float,
"w150_velocity": float,
"w300_velocity": float,
"w750_velocity": float,
"eigenvalue_gradients": {...},
"multi_window_results": {...}, # full per-window stats
}
```
TUI v3 `_eigen_from_scan()` normalises both NG7 nested and NG8 flat formats transparently.
### 27.6 Scan Number Continuity
On startup, `_load_last_scan_number(arrow_scans_dir)` scans all `scan_NNNNNN_*.arrow` filenames for the highest N and resumes from N+1. Prevents counter reset gaps after service restart.
### 27.7 Symbol List
50 symbols matching doctrinal NG3/NG5/NG7 `PRIORITY_SYMBOLS`. Do NOT change this list without a full schema migration — historical correlation matrices are computed on this exact universe.
Set `autostart=true` only after confirming Windows NG7 is shut down — dual-write to the same HZ key is safe (last-write-wins) but creates confusing Arrow audit trails.
`IMap.add_entry_listener(include_value=True, updated=fn, added=fn)` fires callbacks from the HZ internal thread pool on any map change. No polling of origin systems.
Prefect is the **only** polled source — 60s interval via `run_worker(prefect_poll_loop())`.
`write_test_results()` atomically writes `_run_at` (current UTC ISO timestamp) + the provided category dict. The TUI footer auto-refreshes on next mount or `t` keypress.
Full integration documentation: `prod/docs/TEST_REPORTING.md`.
### 28.6 NG7 / NG8 Dual Format Normalisation
`_eigen_from_scan(scan)` handles both live HZ formats:
MC-Forewarner writes to `DOLPHIN_FEATURES` key `mc_forewarner_latest`. The TUI entry listener fires on each write and populates the full MC footer panel: `catastrophic_prob` Digits + ProgressBar, `envelope_score` bar, prob sparkline history, `source` label (`REAL_MODEL` / `FALLBACK_NO_DATA` / `FALLBACK_ERROR`).
If the TUI starts between 4-hour runs and HZ has never been written to (e.g., fresh HZ instance), the footer shows `"awaiting HZ data (runs every 4h via Prefect)"` in yellow. This is a cold-start state only — once the first Prefect run completes the key persists in HZ indefinitely (no TTL).
**Thresholds**: GREEN `prob < 0.10` · ORANGE `0.10–0.30` · RED `≥ 0.30`
**Models path**: `nautilus_dolphin/mc_results/models/*.pkl` — if absent, falls back to `FALLBACK_NO_DATA` (ORANGE, prob=0.20, env=0.80) which is a safe conservative posture, never random.
### 28.8 DOLPHIN_PNL_BLUE
`DOLPHIN_PNL_BLUE["session_perf"]` is now wired in TUI v9. Displays WR, PF, Sharpe, Calmar live.
---
*End of DOLPHIN-NAUTILUS System Bible v6.0 — 2026-04-05 (see v7.0 below for all updates)*
---
## 29. ALGO VERSIONING & LINEAGE TRACKING
### v2_gold_fix_v50-v750 (Deployed 2026-04-10)
**Context:** During the initial 3.5-day "shakedown cruise" (`v1_shakedown`), the system
executed ~179 trades matching the Gold Spec frequency (~50/day) but suffered a **-6%
drawdown** via MAX_HOLD fee-bleed.
**Root Cause:** In `nautilus_event_trader.py -> _normalize_ng7()`, the live `vel_div`
calculation was `v50 - v150`. The Gold Spec backtest (181% ROI) strictly used `v50 - v750`.
Subtracting medium-term `v150` instead of macro `v750` resulted in a high-noise signal
that triggered on micro-jitters rather than structural macro instability, eliminating the
mean-reversion snap-back required for the 95 bps FIXED_TP exit.
**Fix:** Corrected `_normalize_ng7()` to `'vel_div': v50 - v750`.
**Lineage Tag:** All trades executed after this fix are tagged in `nautilus_trader.log`
and `DOLPHIN_STATE_BLUE` with `[v2_gold_fix_v50-v750]`. Data science queries against
ClickHouse/PnL logs should split analysis at this tag to isolate true Gold Spec performance.
The Adaptive Exit Engine (AE) is a per-bucket logistic regression model that estimates `P(continuation)` — the probability that holding a trade further will yield positive outcome — given the current trade state features.
In shadow mode, it:
- **Evaluates** every active trade every bar
- **Logs** its decision to `dolphin.adaptive_exit_shadow` in ClickHouse
# → triggers online_update() for live model refinement
# → inserts CLOSED row with actual_exit + p_cont at close
```
**Thread safety**: `evaluate()` uses internal `threading.Lock`. Daemon thread is fire-and-forget. Zero impact on main scan loop latency.
### 31.6 Online Learning
After each trade closes, `ContinuationModelBank.online_update()` feeds the outcome back via **SGD** (partial fit). Natural exits only — `HIBERNATE_HALT` and `SUBDAY_ACB_NORMALIZATION` are filtered to prevent regime artifacts from biasing the continuation distribution.
### 31.7 Promote to Live (Prerequisites)
AE remains shadow-only until:
1. 500+ closed trades per bucket (for statistical significance)
3. Explicit shadow vs real exit comparison query confirms AE would have improved net$
4. Code review of exit integration (requires new `exit_reason` codes: `AE_MAE_STOP`, `AE_GIVEBACK_LOW_CONT`, `AE_TIME`)
### 31.8 v2 Training / Replay Spec
The next revision is documented in [`AdaptiveExitManager_v2_SPEC.md`](AdaptiveExitManager_v2_SPEC.md).
Key requirements for v2:
- preserve the current shadow-only safety boundary
- replay against the NG7 scan tape / eigenfile prices, not just terminal trade rows
- penalize early winner clipping more heavily than loser-saving
- write all v2 artifacts to versioned paths under `adaptive_exit/models/v2/`
- keep `adaptive_exit/models/continuation_models.pkl` and `adaptive_exit/models/bucket_assignments.pkl` intact
The v2 spec is intentionally non-destructive and must not overwrite the current model artifacts in place.
### 31.9 EsoF Value Gate — Live Exposure-Only Haircut
The live engine now consumes the continuous EsoF advisory score as a **size-only** gate.
This is a conservative haircut on leverage / notional only. It does **not** alter:
- asset selection
- direction choice
- exit logic
- trade accounting
- HZ / CH observability paths
Implementation details:
- Live score source: `DOLPHIN_FEATURES['esof_latest']` with fallback to `esof_advisor_latest`
- Freshness rule: stale / missing payloads are neutral and leave sizing unchanged
- The live haircut is label-aligned: `NEUTRAL` receives the main haircut, `UNFAVORABLE` receives the deepest haircut
-`MILD_POSITIVE` and `MILD_NEGATIVE` remain mostly full-size apart from narrow transition shoulders around the label boundaries
- The gate is intentionally conservative to avoid overfit
Scope note:
- The live gate is allowed to be non-monotonic only around the `NEUTRAL` and `UNFAVORABLE` label boundaries.
- Positive `sc` and `MILD_NEGATIVE` remain outside the haircut experiment space and should stay at `1.0x` unless a new documented study explicitly changes that contract.
Amendment note:
- Earlier draft notes and helper comments temporarily kept BLUE neutral until the `UNFAVORABLE` boundary.
- The live helper was revised on 2026-05-06 to make the haircut label-aware with small transition shoulders around `NEUTRAL` and `UNFAVORABLE`.
Replay note, BLUE closed-trade history:
- Sample: `1217` BLUE closed trades from `2026-03-31` through `2026-04-29`
- Entry timestamp proxy used for replay: `entry_ts ≈ exit_ts - bars_held × 11s`
- Outcome with current gate: realized net PnL `+3191.91` → counterfactual `+4964.63` (`+1772.73` uplift)
- Normal exits only (`MAX_HOLD` / `FIXED_TP` / `STOP_LOSS`): `+8268.07` → `+8866.59` (`+598.53` uplift)
- The gate is exposure-only, so the trade sign / win-rate is unchanged by construction; the benefit comes from reducing notional in weaker `sc` regimes
### 31.10 SC Threshold Advisor — Shadow ML Overlay
The `sc` gate now has an advisory-only learning overlay that observes live context and logs a recommended size-multiplier / implied threshold. It is a shadow artifact only and must **never** override the deterministic live gate.
The `sc` gate now also has a second advisory layer: a bucket-aware action-surface gauge. It is still shadow-only and must never alter live execution, but it learns a richer policy than the threshold advisor:
- deterministic EsoF / `sc` size gate remains the live source of truth
- learns from executed outcomes plus replayed price paths
- uses point-in-time OBF placement/signal/market/macro context and ExF context
- bucket-aware via `adaptive_exit/bucket_engine.py`
- logs every evaluation so replay, OOS benchmarking, and online-learning drift can be audited
- existing model artifacts are respected and not overwritten in place
Anti-degradation rules:
- online updates pause if replay quality regresses materially
- frozen-vs-online walk-forward benchmark is mandatory before promotion
- the replay harness must reconstruct paths only from data available at trade time or earlier
- OBF used for benchmarking must be point-in-time, not end-of-day aggregated
Benchmark outputs:
- actual vs policy PnL
- ROI, win rate, PF, Sharpe, Sortino, max drawdown
- average recommended size / TP / hold multipliers
- regret vs actual trade
- frozen vs online OOS comparison
The gauge is complementary to the threshold advisor. The threshold advisor decides how much to scale the trade from `sc`; the gauge learns whether, within that regime, smaller or larger size/TP/hold actions are better on a per-bucket basis.
---
## 32. ALPHA EXIT ENGINE V7 — GREEN LIVE VALIDATION
### 32.3 Pressure Threshold Calibration — Live Research (2026-04-19/20)
**Dataset**: 24 completed GREEN trades, live eigen_scan data, V7 shadow evaluations at 100ms cadence via `NautilusCachePriceFeed` (live Binance WebSocket bid/ask).
**Methodology**: For each trade, V7 emitted shadow EXIT signals at various pressure levels. We replayed with different pressure thresholds to find the level where V7 cuts genuine losers while letting winners run.
**Results by threshold**:
| Threshold | Trades Cut | Total PnL | ROI | vs Base ($784) |
- the live BLUE orchestrator consults `AlphaExitEngineV7` before the base exit manager
- when v7 returns `EXIT`, the actual `trade_events.exit_reason` records the v7 reason string
-`v7_decision_events` remains the authoritative observability journal for v7 actions and is now populated by live BLUE decisions, not just shadow replays
- HIBERNATE remains a hard override and still forces `HIBERNATE_HALT`
- AE shadow logging stays unchanged and continues to be purely observational
This promotion is limited to BLUE exit control and logging. It does not promote the
separate AE shadow engine to live exit authority.
#### 32.4.3.1 V7 Exit Audit Snapshot (2026-05-08)
Post-activation audit of the live V7 journal and trade ledger (`2026-05-01` through
`2026-05-06`) showed:
-`V7_COMPOSITE_PRESSURE` was net positive in the live ledger:
-`35` total exits across `blue` + `prodgreen`
- total PnL `+734.30`
- average `pnl_pct``+0.000319`
-`5` winners, `30` losers
-`V7_MAE_SL_VOL_NORM` was net negative:
-`7` total exits across `blue` + `prodgreen`
- total PnL `-2234.97`
- average `pnl_pct``-0.003306`
-`0` winners, `7` losers
-`FIXED_TP` was not the main failure mode in this audit window.
- The primary live-exit concern is therefore the MAE/pressure branch calibration, not a blanket TP miscalibration.
V7 at 2.0 cut 9 winners early (missing +$868 collectively) while saving +$245 on 5 losers. The low threshold reacted to transient adverse pressure that reversed into profitable exits.
**Key false positives at low pressure** (prevented by raising to 2.69):
| LTCUSDT #2 | 3.00 | +$9 | +$276 | V7 cut at tiny profit; base rode to full TP |
**Key true positives at high pressure** (still caught at 2.69):
| Trade | Pressure | Base PnL | V7 PnL | Saved |
|-------|----------|----------|--------|-------|
| ENJUSDT #24 | 3.00 | -$342 | +$26 | +$368 |
| ENJUSDT #19 | 2.04 | -$375 | -$254 | +$121 |
| ENJUSDT #16 | 2.73 | -$36 | +$10 | +$45 |
| ONTUSDT #9 | 3.00 | +$273 | +$297 | +$24 |
**Chosen threshold: 2.69** — in the optimal plateau (2.35–2.70), with a small buffer above the data-optimal 2.35 to stay closer to base-engine "let winners run" behaviour while still catching high-pressure adverse exits.
### 32.3.1 Underwater Recovery Shadow Replay
After the above live-threshold calibration, we added a stricter replay question: if a trade has already been underwater, then later recovers into profit, should V7 exit on the first positive `RETRACT` rather than waiting for the later terminal `EXIT`?
-`strong_long`: best-asset MFE `>= 2.0%` and top-3 mean end return `>= 0.8%`
-`broad_long`: top-3 mean end return `>= 1.0%` and positive breadth `>= 50%`
Key result:
-`vel_div > 0.01` alone is **not** the main long edge.
- The stronger long regimes are **stressed unwind / squeeze** states:
high `instability_50`, negative slower-window velocity (`v300 < 0`,
`v750 < 0`), and often a previously negative `vel_div` state.
- Best recent-HQ rule (`2025-12-31` onward):
`inst50_q90 & v300_neg & v750_neg`
- support: `6305` rows (`3.55%`)
-`strong_long`: `0.341` vs base `0.165` (`2.07x` lift)
-`broad_long`: `0.354` vs base `0.137` (`2.59x` lift)
Implication:
- the long counterpart to the short dislocation thesis is **not** a generic
bullish breakout detector
- it is a **reversal / squeeze detector on the same instability manifold**
- market fingerprint should therefore query an asset-fingerprint layer for the
assets most likely to express that unwind, rather than assuming the market
signal alone identifies the tradeable long
-`BNBUSDT``9bf88b81`: terminal `-12.469%` vs shadow `+0.789%`
**Interpretation**
- The replay strongly supports a momentum-aware profit-lock rule on rebound trades.
- The current terminal V7 logic is good at cutting some losers, but still gives back recoveries on a subset of trades.
- This is still shadow research. It should not be promoted to live execution without a separate ablation on the same day/window and a dedicated false-positive review.
**Important**: GREEN's RT/V7 exits are **observability-only** — they write to CH with live exit prices but do NOT close the engine position. The base `AlphaExitManager` remains authoritative for position lifecycle (FIXED_TP / MAX_HOLD / STOP_LOSS). The engine fires its own EXIT on the next scan cycle regardless.
1. Collect 100+ trades at 2.69 threshold before considering promotion to BLUE
2. Verify the optimal plateau (2.35–2.70) holds across different market regimes (trending vs ranging)
3. Evaluate whether V7 MAE_SL_VOL_NORM exits should use a lower threshold than COMPOSITE_PRESSURE (tiered thresholds)
4. Compare V7@2.69 against Adaptive Exit Engine (§31) shadow results on the same trade set
---
## 33. ASSET BUCKET SYSTEM
**Version**: v7.0 Addition — 2026-04-19
**Status**: Live. Bucket assignments in `adaptive_exit/models/bucket_assignments.pkl`.
### 32.1 Overview
Assets are clustered into **7 buckets** (KMeans k=7) using 5-year market characteristics. Buckets represent asset archetypes — how predictably an asset behaves under instability regimes.
**Purpose**: Trade selection (pick assets from high-ROI buckets) and AE model stratification (separate continuation distributions per bucket).
### 32.2 Clustering Features
| Feature | Description |
|---|---|
| `vol_daily_pct` | Daily return volatility |
| `corr_btc` | Pearson correlation with BTCUSDT |
| `log_price` | log(mean price) — proxy for market cap tier |
| `btc_relevance` | `corr_btc × log_price` — interaction term |
| `vov` | Volatility-of-volatility |
Features computed from 5yr 1m klines. Buckets are PRICE/VOLATILITY characteristics, **not** OBF features. OBF is an overlay-phase add-on, not a bucket driver.
### 32.2.1 Extended Asset-Feature Sweep
`adaptive_exit/asset_feature_sweep.py` builds a higher-dimensional per-asset feature vector from the 5 bucket seed features plus 100+ derived TA / path-shape features, using local kline caches first and public Binance futures klines as fallback. It is a research-only pipeline used to discover regime-discriminative asset-feature ranges for later vector retrieval. Default outputs are written under `/mnt/dolphin_training/asset_feature_sweep/` so the SMB repo mount is not used for large derived artifacts. See also [REGIME_ASSET_FINGERPRINT_WORKLOG.md](</mnt/dolphinng5_predict/prod/docs/REGIME_ASSET_FINGERPRINT_WORKLOG.md>).
### 32.2.2 Regime-to-Asset Prototype Retrieval
`adaptive_exit/regime_asset_retriever.py` is the next research layer on top of the expanded asset vectors. It builds a trade-time regime fingerprint from `vel_div`, `sc`, and the EXF snapshot fields, joins that with the sweep vectors, clusters the regime states, and stores the best-performing asset prototypes per regime cluster. Those prototypes are then used as the query object for a live asset-vector lookup. The retriever is research/offline only and writes its model/report under `/mnt/dolphin_training/regime_asset_prototypes/`. The trade-only baseline is explicitly stored as the **pure performed-trade** model at `/mnt/dolphin_training/regime_asset_prototypes/pure_performed_trade_regime_asset_model.pkl`.
### 32.2.3 Scan-Tape Backrunner
`adaptive_exit/scan_tape_backrunner.py` extends the pure performed-trade
baseline into a scan-tape backrunner. It walks the historical scan parquet tape
row by row, reconstructs market fingerprints from the tape itself, derives
candidate asset feature vectors from trailing scan windows, and binds forward
path labels from the tape as counterfactual short outcomes. It is the research
bridge from the trade-conditioned model to the enlarged market-regime model.
### 32.3 Bucket Performance (Live — 400+ BLUE trades, 2026-04-19)
> **B1/B4 leverage problem**: ACB assigns higher leverage to trades that happen to be in B1/B4 — anti-correlation between leverage and outcome. This is an ENTRY selection issue, not fixable by AE.
> **B3 dominance**: B3 assets have moderate BTC correlation + medium volatility → cleaner mean-reversion behavior.
### 32.4 Live TUI Panel
`#bucket_footer` in TUI v9 shows per-bucket n/WR%/avg-pnl% updated every 60s from ClickHouse.
Query: `adaptive_exit_shadow` CLOSED rows, `GROUP BY bucket_id`, all-time, excl HIBERNATE/ACB.
### 32.5 Future: Active Asset Modulation
Modulating asset picking by live bucket performance is the intended next step once we have 200+ trades per bucket. Hold threshold for current data: B1 (109 trades, -$1,787) is the clearest cut candidate.
**Do NOT modulate until**: per-bucket sample ≥ 200 trades AND bucket performance is stable over 30 days.
| Training data | `/mnt/dolphinng5_predict/adaptive_exit/models/training_data.parquet` |
---
## 34. CRITICAL OPERATIONAL WARNINGS
### 34.1 SMB Disk-Full Silent Truncation
**CRITICAL**: The SMB mount at `/mnt/dolphinng5_predict/` (96% full as of 2026-04-19, 42GB free) can silently truncate files to 0 bytes if disk space runs out during a write.
**The Edit tool opens files with `O_TRUNC` before writing. If ENOSPC is hit, the file becomes 0 bytes with no error message.**
**Rule**: Always write large files to `/tmp/` first, verify content, then `cp` to the SMB mount. Never use direct Edit/Write on SMB for files > 50KB when disk usage is > 95%.
HZ is **RAM-only**. Every restart wipes all state. If Hz restarts:
-`DOLPHIN_SAFETY` → reverts to `APEX` (engine will re-read on next bar)
-`acb_boost` → lost (engine uses yesterday's file-based ACB until next ACB push)
-`latest_eigen_scan` → empty (no trades until NG8 produces next scan)
-`capital_checkpoint` → lost (engine falls back to `initial_capital`)
Never restart HZ unless you know what you're losing.
### 34.3 Supervisord vs Systemd
All services are exclusively supervisord-managed. Never use `systemctl start/restart` on dolphin services — creates dual-management race ("random killer" bug). See §26.1.
### 34.4 vel_div Formula
The canonical vel_div is `v50_lambda_max_velocity − v750_lambda_max_velocity` (50-window minus 750-window). **Never** compute as `v50 − v150`. The v150 formula was the v1 shakedown bug that caused -6% drawdown on 179 trades. See §29.
---
*End of DOLPHIN-NAUTILUS System Bible v7.0 — 2026-04-19*
*Champion: SHORT only (APEX posture, blue configuration). ALGO=v2_gold_fix_v50-v750.*
*Process manager: Supervisord exclusively (systemd units retired).*
*Do NOT deploy live capital without review of AE promote-to-live prerequisites (§31.7).*
> 2026-05-03 bucket update: BLUE now has 1,217 closed trades. Current live ranking is still B3 best and B4 worst; B5, B6, and B1 are net-positive on this larger sample. Keep the old bucket prior as a soft routing prior, not a universal blacklist.
> 2026-05-05 regime-fingerprint addendum: historical backfill artifact now exists on the Dolphin machine at `/mnt/dolphin_training/regime_fingerprint_backfill/regime_fingerprint_backfill.parquet` (416 rows, 161 cols, 556 KB) with report at `/mnt/dolphin_training/regime_fingerprint_backfill/regime_fingerprint_backfill_report.json`. It merges CH trades, recent live log trades, ExF/EsoF, price-path signatures, and matrix overlays.
>
> 2026-05-05 asset-fingerprint addendum: the candidate-asset spec now explicitly includes recency-biased exhaustion / continuation features (local overextension, short continuation quality, bounce susceptibility, OBF symmetry, path persistence / entropy, reversal pressure, vol-normalized stretch) as point-in-time asset features, not as a market-state substitute.
> 2026-05-05 recency-gating addendum: the asset fingerprint is now defined as a multi-window bank plus a query-time recency gate. This is intentional so the market fingerprint can tune short-vs-long lookback emphasis at inference time without retraining the raw asset history. The bank must be preserved in storage, along with the gate metadata, so alternative recency profiles can be replayed later.
> 2026-05-05 implementation-map addendum: `ASSET_FINGERPRINT_CANDIDATE_SYSTEM.md` now includes a concrete implementation map for dev agents. It defines query-side entry objects, asset-side builder responsibilities, window-bank storage, gate modes, retrieval outputs, suggested file boundaries, and acceptance tests. The map is designed to stay compatible with the market-fingerprint → asset-fingerprint query flow and future universe enlargement.
> 2026-05-05 scan-backrunner addendum: the pure performed-trade baseline is now named `/mnt/dolphin_training/regime_asset_prototypes/pure_performed_trade_regime_asset_model.pkl`, and the enlarged scan-tape layer lives in `adaptive_exit/scan_tape_backrunner.py`.
> 2026-05-05 full-sweep addendum: the canonical asset feature store is now `/mnt/dolphin_training/asset_feature_sweep/asset_feature_vectors.parquet` (948 rows, 37 assets, 109 features). The pure-performed-trade retriever was retrained on the full expanded feature table and now lives at `/mnt/dolphin_training/regime_asset_prototypes/pure_performed_trade_regime_asset_model.pkl` (421,500 merged rows, 4 clusters, 12 prototypes, ~434 MB). The scan-tape backrunner remains the larger regime-enlarged layer with OOS/OOD validation.
> 2026-05-05 market-state split addendum: the deterministic market-state statistics now live in `adaptive_exit/market_state_outputs.py` as fingerprint inputs (`market_fingerprint_*`). The learned output bundle has two families: (1) the exit-policy head emits `market_state_tp_pct` and `market_state_max_hold_bars` with a BLUE base policy of `0.95%` TP and `120` bars hold; (2) the asset-target head surfaces historically favorable asset fingerprints for the given market fingerprint / regime-state pairing. The original design also reserved `SIZE(x)` as a learned output, but that remains downstream work.
> 2026-05-05 market-state trainer addendum: the bundle trainer now defaults to full available scan history (`--days 0`) and emits a progress file at `/mnt/dolphin_training/market_state_bundle/market_state_bundle_progress.json` plus a learning log at `/mnt/dolphin_training/market_state_bundle/learning_log.jsonl`. This makes the retrain observable without changing the learned semantics.
> 2026-05-05 market-state asset-head addendum: the learned bundle now predicts a direct asset-fingerprint vector, then uses nearest-neighbor lookup to surface candidate assets. This is the correct shape for later universe enlargement because the fingerprint can be compared against new assets without retraining the exit policy head.
> 2026-05-06 market-state runtime addendum: the live trader now routes scan snapshots and natural trade closes through `adaptive_exit/market_state_runtime.py`, a thin runtime adapter that caches the latest market-state bundle, keeps the rolling scan window, and calls the bundle's online update path. This is the abstraction seam for future batch automation of data refresh, retraining, and post-trade learning.
> 2026-05-06 storage-format addendum: large market-state / asset-fingerprint tabular artifacts should use Arrow IPC / Feather first, Parquet second, and JSON only for small control metadata. The runtime reads Arrow IPC / Feather or Parquet transparently and writes the latest live bundle snapshot to `/mnt/dolphin_training/market_state_bundle/latest_market_state_bundle.feather`. The trainer accepts the same format family for backfill and asset-lookup inputs.
> 2026-05-06 model-storage addendum: the learned market-state bundle is persisted as a gzip-compressed pickle at `/mnt/dolphin_training/market_state_bundle/market_state_bundle_model.pkl` for space efficiency, with backward-compatible load support for the older plain-pickle form. The estimator state is still the canonical learned object; the surrounding tabular snapshots remain Arrow IPC / Feather or Parquet.
> 2026-05-08 post-outlier-win side-selection addendum: BLUE trade/log replay found a narrow event-conditioned long probe after large 9x short wins. On the cleaned BLUE sequence (`1321` non-hibernate/non-ACB trades), flipping only the immediate next trade after `pnl_abs >= $400`, `leverage >= 8.5x`, and `pnl_pct >= +0.50%` improved estimated dollars and drawdown but did not improve WR. This is a one-trade post-exhaustion/cooldown candidate, not a broad long engine. Details are in `prod/docs/LONG_DETERMINISTIC_RULE_RESEARCH.md`.
> 2026-05-08 leverage-as-conviction sweep addendum: BLUE replay does **not** support a broad rule that turns trades LONG after ordinary high-leverage short wins. The correct criterion is marginal overlay value on the intervened subset, not replacement of the whole short engine. Even under that criterion, the literal `trigger_lev >= 0.70`, `trade_min_lev >= 0.69` thesis degraded the cleaned sequence badly (`WR ~37%`, negative compound return, negative estimated dollars, strongly negative overlay delta). The best swept long-switch variants still failed to add value over leaving the same triggered trades short. The useful signal is that leverage is a conviction / quality feature for filtering and sizing; it is not, by itself, a side-inversion trigger. Keep the narrower post-outlier one-trade long probe as research because it showed positive marginal overlay delta, but do not deploy the broad leverage-win LONG switch.
> 2026-05-08 lowered post-win-threshold addendum: the post-win overlay is stronger than the original narrow sample implied, but only when conditioned on realized exhaustion. Dollar-only thresholds below about `$300` are harmful. With a prior-return filter (`pnl_pct >= +0.75%`), lower thresholds become useful: e.g. `$100-$150` prior wins produced `63-67` immediate next-trade cases with about `+$2.4k` marginal delta and positive flipped-LONG PnL. High-leverage `$300-$500` wins support a next-`2`-trade rebound/cooldown signal (`+$2.7k` to `+$3.7k` marginal delta). The edge is payoff-asymmetry / loss-tail avoidance, not WR improvement, and should be researched as a guarded next-1/next-2 overlay or abstain gate.
> 2026-05-08 post-win EFSM implementation addendum: the candidate BLUE overlay is now the post-win **EFSM** (**Execution FSM**) at `adaptive_exit/post_win_long_overlay.py` with tests in `prod/tests/test_post_win_long_overlay.py`. Canonical class names are `PostWinExecutionFSM` and `PostWinExecutionFSMConfig` (`PostWinLongOverlay` names remain compatibility aliases). Codified rule: `pnl_abs > $397` arms next `1` FLIP_LONG slot; `pnl_abs > $397 and lev > 8.6` arms next `2`; `0 < pnl_abs < $250 and pnl_pct >= +0.75%` arms next `1`; consumed slots reset to SHORT. Active slots cannot re-arm and overlay-flipped LONG outcomes cannot re-arm. This reset invariant is mandatory: unsafe recursive re-arm replay turned `+$1.51k` marginal delta into `-$5.43k`. V7 is side-aware but SHORT-calibrated; validate LONG overlay exits in shadow or with conservative LONG-specific settings before live use.
> 2026-05-08 AlphaExitEngineV7 LONG calibration addendum: V7 threshold/gate constants are now surfaced as `AlphaExitV7Config` in `nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.py`. Default `AlphaExitEngineV7()` remains the deployed SHORT-calibrated surface: `exit_pressure_threshold=2.69`, `retract_pressure_threshold=1.0`, `extend_pressure_threshold=-0.5`, vol-normalized MAE tiers `max(floor, k * rv_comp)` with `k=(3.5,7.0,12.0)` and floors `(0.005,0.012,0.025)`, and bounce soft weights `(0.15,0.35)`. A separate LONG engine can now be initialized with a different `AlphaExitV7Config` without mutating BLUE SHORT defaults. Synthetic LONG replay over BLUE V7 journal paths (`97` paths, `6,812` rows, bounce disabled because the current bounce model is SHORT-trained) found natural LONG PnL `-$328.84`; deployed default V7 improved this to `+$1.43` (`+$330.26` delta); best tested LONG proxy was reducing MFE-risk contributions by half while keeping pressure threshold `2.69`, yielding `+$205.32` (`+$534.15` delta), `36/97` exits, and `1.69%` max DD. Do not deploy this live from proxy alone; first shadow it on actual EFSM-flipped LONG contexts. Detailed method/results: `prod/docs/LONG_DETERMINISTIC_RULE_RESEARCH.md`.
> 2026-05-08 LONG-capability addendum: BLUE and PRODGREEN Alpha Engine code paths are now LONG-capable without changing the default deployed side. `short_only` remains the default everywhere. LONG is activated only by explicit config/env direction (`long`, `long_only`, `buy`, `1`, `+1`). The signal generator exposes configurable LONG thresholds (`long_vel_div_threshold=+0.01`, `long_vel_div_extreme=+0.04`) and keeps the canonical SHORT thresholds (`-0.02`, `-0.05`). `NDAlphaEngine.begin_day(..., direction=+1)` now propagates LONG semantics into signal gating, DC interpretation, IRP expected action, sizing, PnL, exit price slippage, and ACB meta-strength. The sizing trend multiplier is side-aware: negative `vel_div` trend remains favorable for SHORT; positive `vel_div` trend is favorable for LONG.
> 2026-05-08 ACBv6 side-awareness addendum: ACBv6 remains SHORT/risk-off by default and preserves the legacy cache key for default calls. LONG/risk-on ACB is opt-in via `direction=+1` and uses separate cache entries (`date|long`) so HZ prewarm cannot pollute SHORT BLUE state. SHORT signals are unchanged: bearish funding, high DVOL, fear, and taker selling. LONG signals are explicit and separate: positive funding, calm DVOL, greed/risk appetite, and taker buying. OB beta modulation is also side-aware: stress/cascade raises beta for SHORT and reduces it for LONG; calm/liquidity-building raises beta for LONG and reduces it for SHORT.
> 2026-05-08 ACB HZ keying addendum: `prod/acb_processor_service.py` now publishes `acb_boost` and `acb_boost_short` as the legacy SHORT payload and `acb_boost_long` as the LONG/risk-on payload. BLUE continues to use `acb_boost` unless explicitly run with `DOLPHIN_DIRECTION=long_only`, in which case its local prewarm calls ACB with `direction=+1`. PRODGREEN's Nautilus actor subscribes to `acb_boost_long` when its config direction is LONG, otherwise to legacy `acb_boost`.
> 2026-05-08 LONG exit-layer addendum: base TP/SL/max-hold exits were already direction-aware through signed PnL. Optional `AlphaExitManager` vel_div invalidation/exhaustion and TF-spread recovery exits are now side-aware too. They remain disabled by default unless explicitly enabled, but if enabled for a LONG invocation they now treat falling/negative `vel_div` and adverse TF-spread recovery as LONG invalidation rather than applying hidden SHORT semantics.
> 2026-05-08 validation addendum: targeted regression after LONG-capability wiring passed `prod/tests/test_long_capability_layers.py` (`9 passed`), existing ACB HZ + V7 + EFSM suites (`57 passed`), and ACB signal-threshold integrity (`11 passed`). Compile checks passed for modified Alpha/ACB/live runner files. These tests verify SHORT default preservation, explicit LONG entries, side-separated ACB caches, side-aware OB beta modulation, side-aware optional VD exits, and case-insensitive PRODGREEN direction parsing.