# DOLPHIN-NAUTILUS SYSTEM BIBLE ## Doctrinal Reference — As Running 2026-04-05 **Version**: v6.0 — NG8 Linux Scanner + TUI v3 Live Observability + Test Footer CI **Previous version**: v5.0 — Supervisord-First Architecture + MHS v3 + OBF Universe (2026-03-30) **Previous version**: v4.1 — Multi-Speed Event-Driven Architecture (2026-03-25) **CI gate (Nautilus)**: 46/46 tests green **CI gate (MHS)**: 111/111 tests green (unit + E2E + race + Hypothesis) **CI gate (ACB)**: 118/118 tests green **Execution**: Binance Futures (USDT-M) verified via `verify_testnet_creds.py` and `binance_test.py` **Status**: Supervisord-managed. MHS v3 live. OBF universe 540 assets. RM_META=0.975–1.000 [GREEN]. **NG8**: Linux-native eigenscan service. Fixes NG7 double-output bug. Bit-for-bit schema-identical to doctrinal NG5. **TUI v3**: Live event-driven observability terminal. All panels hooked to HZ entry listeners. Zero origin-system load. ### What changed since v5.0 (2026-04-05 — THIS VERSION) | Area | Change | |---|---| | **NG8 Linux Scanner — NEW** | `- Dolphin NG8/ng8_scanner.py` — Linux-native eigenscan service replacing Windows NG7. Fixes double-output bug. Single `enhance()` call processes all 4 windows (w50/150/300/750) in one pass → exactly one Arrow file + one HZ write per scan_number. See §27. | | **Arrow Writer Shim — NEW** | `- Dolphin NG8/arrow_writer.py` — thin re-export so `dolphin_correlation_arb512_with_eigen_tracking.py` imports correctly on Linux (Windows had this file natively). | | **TUI v3 — NEW** | `Observability/TUI/dolphin_tui_v3.py` — full live observability terminal. All panels event-driven via HZ entry listeners. Zero origin-system load. Replaces mocked TUI v2. See §28. | | **Test Footer CI Hook — NEW** | `run_logs/test_results_latest.json` + `write_test_results()` API in TUI v3. Test scripts push results; TUI footer displays live. See §28.4 and `TEST_REPORTING.md`. | | **NG7 Double-Output — Root Cause Confirmed** | Windows NG7 ran two independent tracker cycles (fast w50/w150 + slow w300/w750) sharing the same scan_number counter → two Arrow files + two HZ writes per scan, second file arriving ~3 min late with stale prices. NG8 eliminates this by design. | --- ### What changed since v4.1 (2026-03-30 — PREVIOUS) | Area | Change | |---|---| | **Process Manager: Systemd → Supervisord** | ALL dolphin services migrated exclusively to supervisord. No service is managed by both. `dolphin-supervisord.conf` is the single source of process truth. See §16, §26. | | **"Random Killer" Root Cause Fixed** | `meta_health_daemon_v2.py` had been running under systemd for 4 days calling `systemctl restart` on supervisord-managed services every 5s. Dual-management race caused random service kills. Stopped + disabled. | | **MHS v3 — Complete Rewrite** | `meta_health_service_v3.py` — product formula bug fixed (zero-collapse replaced by weighted sum), recovery via supervisorctl not systemctl, `RECOVERY_COOLDOWN_CRITICAL_S=10s` (was 600s), non-blocking daemon thread recovery. See §24.5. | | **OBF Universe Service — NEW** | `obf_universe_service.py` — 540 USDT perp assets on 3 WebSocket connections, zero REST weight, 60s health snapshots → HZ `obf_universe_latest`. Supervisord `autostart=true`. See §26. | | **OBF Retention Fix** | `obf_persistence.py` `MAX_FILE_AGE_DAYS = 0` (was 7 — was deleting all backtesting data). Data now accumulates indefinitely for backtesting. | | **Test Suite: MHS** | NEW `prod/tests/test_mhs_v3.py` — 111 tests: unit, live integration, E2E kill/revive, race conditions, 13 Hypothesis property tests. | | **HZ Schema additions** | `DOLPHIN_FEATURES["obf_universe_latest"]`, `DOLPHIN_META_HEALTH["latest"]`. See §15. | | **Supervisord groups** | `dolphin_data` group: exf_fetcher, acb_processor, obf_universe, meta_health (all autostart=true). `dolphin` group: nautilus_trader, scan_bridge, clean_arch_trader (autostart=false). | ### What changed since v4 (2026-03-24) | Area | Change | |---|---| | **Multi-Speed Architecture** | NEW multi-layer frequency isolation: OBF (0.1s), Scan (5s), ExtF (varied), Health (5s), Daily batch. See §24. | | **Event-Driven Nautilus** | NEW `nautilus_event_trader.py` — Hz entry listener for <1ms scan-to-trade latency. Not a Prefect flow — long-running systemd daemon. See §24.2. | | **MHS v2** | ENHANCED `meta_health_daemon_v2.py` — Full 5-sensor monitoring (M1-M5), per-subsystem data freshness tracking, automated recovery. See §24.3. | | **Resource Safety** | NEW systemd resource limits: MemoryMax=2G, CPUQuota=200%, TasksMax=50 per service. Prevents process explosion. | | **Scan Bridge Hardening** | Deployment concurrency limit=1, work pool concurrency=1, cgroups integration. See §24.1. | | **Systemd Service Mesh** | NEW services: `dolphin-nautilus-trader.service`, updated `meta_health_daemon.service`. Systemd-managed, not Prefect-managed. | | **Incident Response** | Post-mortem: 2026-03-24 kernel deadlock from 60+ uncontrolled Prefect processes. Fixed via concurrency controls. | ### What changed since v3 (2026-03-22) | Area | Change | |---|---| | **Clean Architecture** | NEW hexagonal architecture in `prod/clean_arch/` — Ports, Adapters, Core separation. Adapter-agnostic business logic. | | **Hazelcast DataFeed** | NEW `HazelcastDataFeed` adapter implementing `DataFeedPort` — reads from DolphinNG6 via Hazelcast (single source of truth). | | **Scan Bridge Service** | NEW `scan_bridge_service.py` — Linux Arrow file watcher that pushes to Hazelcast. Uses file mtime (not scan #) to handle NG6 restarts. **Phase 2: Prefect daemon integration complete** — auto-restart, health monitoring, unified logging. **18 unit tests** in `tests/test_scan_bridge_prefect_daemon.py`. | **Paper Trading Engine** | NEW `paper_trade.py` — Clean architecture trading CLI with 23 round-trip trades executed in testing. | | **Market Data** | Live data flowing: 50 assets, BTC @ $71,281.03, velocity divergence signals active. | ### What changed since v2 (2026-03-22) | Area | Change | |---|---| | **Binance Futures** | Switched system focus from Spot to Perpetuals; updated API endpoints (`fapi.binance.com`); added `recvWindow` for signature stability. | | **Friction Management** | **SP Bypass Logic**: Alpha engines now support disabling internal fees/slippage to allow Nautilus to handle costs natively. Prevents double-counting. | | **Paper Trading** | NEW `launch_paper_portfolio.py` — uses Sandbox matching with live Binance data; includes realistic Tier 0 friction (0.02/0.05). | | **Session Logging** | NEW `TradeLoggerActor` — independent CSV/JSON audit trails for every session. | | Area | Change | |---|---| | **DolphinActor** | Refactored to step_bar() API (incremental, not batch); threading.Lock on ACB; _GateSnap stale-state detection; replay vs live mode; bar_idx tracking | | **OBF Subsystem** | Sprint 1 hardening complete: circuit breaker, stall watchdog, crossed-book guard, dark streak, first flush 60s, fire-and-forget HZ pushes, dynamic asset discovery | | **nautilus_prefect_flow.py** | NEW — Prefect-supervised BacktestEngine daily flow; champion SHA256 hash check; HZ heartbeats; capital continuity; HIBERNATE guard | | **Test suite** | +35 DolphinActor tests (test_dolphin_actor.py); total 46 Nautilus + ~120 OBF | | **prod/docs/** | All prod .md files consolidated; SYSTEM_FILE_MAP.md; NAUTILUS_DOLPHIN_SPEC.md added | | **0.1s resolution** | Assessed: BLOCKED by 3 hard blockers (see §22) | | **Capital Sync** | NEW — DolphinActor now syncs initial_capital with Nautilus Portfolio balance on_start. | | **Verification** | NEW — `TODO_CHECK_SIGNAL_PATHS.md` systematic test spec for local agents. | | **MC-Forewarner** | Now wired in `DolphinActor.on_start()` — both flows run full gold-performance stack; `_MC_BASE_CFG` + `_MC_MODELS_DIR_DEFAULT` as frozen module constants; empty-parquet early-return bug fixed in `on_bar` replay path | --- ## TABLE OF CONTENTS 1. [System Philosophy](#1-system-philosophy) 2. [Physical Architecture](#2-physical-architecture) 2a. [Clean Architecture Layer (NEW v4)](#2a-clean-architecture-layer) 3. [Data Layer](#3-data-layer) 4. [Signal Layer — vel_div & DC](#4-signal-layer) 5. [Asset Selection — IRP](#5-asset-selection-irp) 6. [Position Sizing — AlphaBetSizer](#6-position-sizing) 7. [Exit Management](#7-exit-management) 8. [Fee & Slippage Model](#8-fee--slippage-model) 9. [OB Intelligence Layer](#9-ob-intelligence-layer) 10. [ACB v6 — Adaptive Circuit Breaker](#10-acb-v6) 11. [Survival Stack — Posture Control](#11-survival-stack) 12. [MC-Forewarner Envelope Gate](#12-mc-forewarner-envelope-gate) 13. [NDAlphaEngine — Full Bar Loop](#13-ndalpha-engine-full-bar-loop) 14. [DolphinActor — Nautilus Integration](#14-dolphin-actor) 15. [Hazelcast — Full IMap Schema](#15-hazelcast-full-imap-schema) 16. [Production Daemon Topology & HZ Bridge](#16-production-daemon-topology) 17. [Prefect Orchestration Layer](#17-prefect-orchestration-layer) 18. [CI Test Suite](#18-ci-test-suite) 19. [Parameter Reference](#19-parameter-reference) 20. [OBF Sprint 1 Hardening](#20-obf-sprint-1-hardening) 21. [Known Research TODOs](#21-known-research-todos) 22. [0.1s Resolution — Readiness Assessment](#22-01s-resolution-readiness-assessment) 23. [Signal Path Verification Specification](#23-signal-path-verification) 24. [Multi-Speed Event-Driven Architecture (v4.1)](#24-multi-speed-event-driven-architecture) 25. [Numerical Precision Policy](#25-numerical-precision-policy) 26. [Supervisord Architecture & OBF Universe (v5.0)](#26-supervisord-architecture--obf-universe) 27. [NG8 Linux Eigenscan Service (v6.0)](#27-ng8-linux-eigenscan-service) 28. [TUI v3 — Live Observability Terminal (v6.0)](#28-tui-v3-live-observability-terminal) --- ## 1. SYSTEM PHILOSOPHY DOLPHIN-NAUTILUS is a **SHORT-only** (champion configuration) systematic trading engine targeting crypto perpetual futures on Binance. **Core thesis**: When crypto market correlation matrices show accelerating eigenvalue-velocity divergence (`vel_div < -0.02`), the market is entering an instability regime. Shorting during early instability onset and exiting at fixed take-profit captures the mean-reversion from panic to normalization. **Design constraints**: - Zero signal re-implementation in the Nautilus layer. All alpha logic lives in `NDAlphaEngine`. - 512-bit arithmetic for correlation matrix processing (separate NG3 pipeline; not in hot path of this engine). - Champion parameters are FROZEN. They were validated via exhaustive VBT backtest on `dolphin_vbt_real.py`. - The Nautilus actor is a thin wire, not a strategy. It routes parquet data → NDAlphaEngine → HZ result. **Champion performance** (ACBv6 + IRP + DC + OB, full-stack 55-day Dec31–Feb25): - ROI: +54.67% | PF: 1.141 | Sharpe: 2.84 | Max DD: 15.80% | WR: 49.5% | Trades: 2145 - Log: `run_logs/summary_20260307_163401.json` > **Data correction note (2026-03-07)**: An earlier reference showed ROI=+57.18%, PF=1.149, > Sharpe=3.00. Those figures came from a stale `vbt_cache/2026-02-25.parquet` that was built > mid-day — missing 435 scans and carrying corrupt vel_div on 492 rows for the final day of the > window. ALGO-3 parity testing caught the mismatch (max_diff=1.22 vs tolerance 1e-10). > The parquet was rebuilt from live NG3 JSON (`build_parquet_cache(dates=['2026-02-25'], force=True)`). > The stale file is preserved as `2026-02-25.parquet.STALE_20260307` for replicability. > The corrected numbers above are the canonical reference. The ~2.5pp ROI drop reflects real > late-day trades on Feb 25 that the stale parquet had silently omitted. --- ## 2. PHYSICAL ARCHITECTURE ``` ┌──────────────────────────────────────────────────────────────────────┐ │ DATA SOURCES │ │ NG3 Scanner (Win) → /mnt/ng6_data/eigenvalues/ (SMB DolphinNG6_Data)│ │ Binance WS → 5s OHLCV bars + live order book (48+ USDT perpetuals) │ │ VBT Cache → vbt_cache_klines/*.parquet (DOLPHIN-local + /mnt/dolphin)│ └────────────────────────┬─────────────────────────────────────────────┘ │ ┌────────────────────────▼─────────────────────────────────────────────┐ │ HAZELCAST IN-MEMORY GRID (localhost:5701, cluster: "dolphin") │ │ *** SYSTEM MEMORY — primary real-time data bus *** │ │ DOLPHIN_SAFETY → posture + Rm (CP AtomicRef / IMap) │ │ DOLPHIN_FEATURES → acb_boost {boost,beta}, latest_eigen_scan │ │ DOLPHIN_PNL_BLUE/GREEN → per-date trade results │ │ DOLPHIN_STATE_BLUE → capital continuity (latest + per-run) │ │ DOLPHIN_HEARTBEAT → liveness pulses (nautilus_prefect_flow) │ │ DOLPHIN_OB → order book snapshots │ │ DOLPHIN_FEATURES_SHARD_00..09 → 400-asset OBF sharded store │ └────────────────────────┬─────────────────────────────────────────────┘ │ ┌────────────────────────▼─────────────────────────────────────────────┐ │ PREFECT ORCHESTRATION (localhost:4200, work-pool: dolphin) │ │ paper_trade_flow.py 00:05 UTC — NDAlphaEngine direct │ │ nautilus_prefect_flow.py 00:10 UTC — BacktestEngine + DolphinActor│ │ obf_prefect_flow.py Continuous ~500ms — OB ingestion │ │ mc_forewarner_flow.py Daily — MC gate prediction │ │ exf_fetcher_flow.py Periodic — ExF macro data fetch │ └────────────────────────┬─────────────────────────────────────────────┘ │ ┌────────────────────────▼─────────────────────────────────────────────┐ │ SUPERVISORD (v5.0 — sole process manager) │ │ Config: prod/supervisor/dolphin-supervisord.conf │ │ Socket: /tmp/dolphin-supervisor.sock │ │ │ │ dolphin_data group (autostart=true): │ │ ├── exf_fetcher_flow.py — ExF live daemon │ │ ├── acb_processor_service.py — ACB boost + HZ write (CP lock) │ │ ├── obf_universe_service.py — 540-asset OBF universe (NEW v5.0) │ │ └── meta_health_service_v3.py — MHS watchdog (NEW v5.0) │ │ │ │ dolphin group (autostart=false): │ │ ├── nautilus_event_trader.py — HZ entry listener trader │ │ ├── scan_bridge_service.py — Arrow → HZ scan bridge │ │ └── clean_arch/main.py — Clean architecture trader │ └────────────────────────┬─────────────────────────────────────────────┘ │ ┌────────────────────────▼─────────────────────────────────────────────┐ │ NAUTILUS TRADING ENGINE (siloqy-env, nautilus_trader 1.219.0) │ │ BacktestEngine + DolphinActor(Strategy) → NDAlphaEngine │ │ on_bar() fires per date tick; step_bar() iterates parquet rows │ │ HZ ACB listener → pending-flag → applied at top of next on_bar() │ │ TradingNode (launcher.py) → future live exchange connectivity │ └──────────────────────────────────────────────────────────────────────┘ ``` **Key invariant v2**: `DolphinActor.on_bar()` receives one synthetic bar per date in paper mode, which triggers `engine.begin_day()` then iterates through all parquet rows via `step_bar()`. In live mode, one real bar → one `step_bar()` call. The `_processed_dates` guard is replaced by date-boundary detection comparing `current_date` to the bar's timestamp date. --- ## 2a. CLEAN ARCHITECTURE LAYER (NEW v4) ### 2a.1 Overview The Clean Architecture layer provides a **hexagonal** (ports & adapters) implementation for paper trading, ensuring core business logic is independent of infrastructure concerns. ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ CLEAN ARCHITECTURE (prod/clean_arch/) │ ├─────────────────────────────────────────────────────────────────────────┤ │ PORTS (Interfaces) │ │ ├── DataFeedPort → Abstract market data source │ │ └── TradingPort → Abstract order execution │ ├─────────────────────────────────────────────────────────────────────────┤ │ ADAPTERS (Infrastructure) │ │ ├── HazelcastDataFeed → Reads from DOLPHIN_FEATURES map │ │ └── PaperTradeExecutor → Simulated execution (no real orders) │ ├─────────────────────────────────────────────────────────────────────────┤ │ CORE (Business Logic) │ │ ├── TradingEngine → Position sizing, signal processing │ │ ├── SignalProcessor → Eigenvalue-based signal generation │ │ └── PortfolioManager → PnL tracking, position management │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### 2a.2 Key Design Principles **Dependency Rule**: Dependencies only point inward. Core knows nothing about Hazelcast, Arrow files, or Binance. **Single Source of Truth**: All data comes from Hazelcast `DOLPHIN_FEATURES.latest_eigen_scan`, written atomically by DolphinNG6. **File Timestamp vs Scan Number**: The Scan Bridge uses file modification time (mtime) instead of scan numbers because DolphinNG6 resets counters on restarts. ### 2a.3 Components | Component | File | Purpose | |-----------|------|---------| | `DataFeedPort` | `ports/data_feed.py` | Abstract interface for market data | | `HazelcastDataFeed` | `adapters/hazelcast_feed.py` | Hz implementation of DataFeedPort | | `TradingEngine` | `core/trading_engine.py` | Pure business logic | | `Scan Bridge` | `../scan_bridge_service.py` | Arrow → Hazelcast bridge | | `Paper Trader` | `paper_trade.py` | CLI trading session | ### 2a.4 Data Flow ``` DolphinNG6 → Arrow Files (/mnt/ng6_data/arrow_scans/) → Scan Bridge → Hazelcast → HazelcastDataFeed → TradingEngine (5s) (watchdog) (SSOT) (Adapter) (Core) ↑ (Prefect daemon supervises) ``` **Management**: The scan bridge runs as a Prefect-supervised daemon (`scan_bridge_prefect_daemon.py`): - Health checks every 30 seconds - Automatic restart on crash or stale data (>60s) - Centralized logging via Prefect UI - Deployed to `dolphin-daemon-pool` ### 2a.5 MarketSnapshot Structure ```python MarketSnapshot( timestamp=datetime, symbol="BTCUSDT", price=71281.03, # From asset_prices[0] eigenvalues=[...], # From asset_loadings (50 values) velocity_divergence=-0.0058, # vel_div field scan_number=7315 ) ``` ### 2a.6 Current Status - **Assets Tracked**: 50 (BTC, ETH, BNB, etc.) - **BTC Price**: $71,281.03 - **Test Trades**: 23 round-trip trades executed - **Strategy**: Mean reversion on velocity divergence - **Data Latency**: ~5 seconds (DolphinNG6 pulse) - **Bridge Management**: Prefect daemon (auto-restart, health checks every 30s) ### 2a.7 Testing **Unit Tests:** `prod/tests/test_scan_bridge_prefect_daemon.py` (18 tests) | Test Category | Count | Description | |--------------|-------|-------------| | ScanBridgeProcess | 6 | Process lifecycle (start, stop, restart) | | Hazelcast Freshness | 6 | Data age detection (fresh, stale, warning) | | Health Check Task | 3 | Prefect task health validation | | Integration | 3 | Real Hz connection, process lifecycle | **Run Tests:** ```bash cd /mnt/dolphinng5_predict/prod source /home/dolphin/siloqy_env/bin/activate pytest tests/test_scan_bridge_prefect_daemon.py -v ``` **Test Coverage:** - ✅ Process start/stop/restart - ✅ Graceful and force kill - ✅ Fresh/stale/warning data detection - ✅ Hazelcast connection error handling - ✅ Health check state transitions --- ## 3. DATA LAYER ### 3.1 vbt_cache_klines Parquet Schema Location: `C:\Users\Lenovo\Documents\- DOLPHIN NG HD HCM TSF Predict\vbt_cache_klines\YYYY-MM-DD.parquet` | Column | Type | Description | |--------|------|-------------| | `vel_div` | float64 | Eigenvalue velocity divergence: `v50_vel − v750_vel` (primary signal) | | `v50_lambda_max_velocity` | float64 | Short-window (50-bar) lambda_max rate of change | | `v150_lambda_max_velocity` | float64 | 150-bar window lambda velocity | | `v300_lambda_max_velocity` | float64 | 300-bar window lambda velocity | | `v750_lambda_max_velocity` | float64 | Long-window (750-bar) macro eigenvalue velocity | | `instability_50` | float64 | General market instability index (50-bar) | | `instability_150` | float64 | General market instability index (150-bar) | | `BTCUSDT` … `STXUSDT` | float64 | Per-asset close prices (48 assets in current dataset) | Each file: 1,439 rows (1 per 5-second bar over 24h), 57 columns. ### 3.2 NG3 Eigenvalue Data Location: `C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\` ``` eigenvalues/ YYYY-MM-DD/ scan_NNNNNN__Indicators.npz ← ACBv6 external factors (funding, dvol, fng, taker) scan_NNNNNN__scan_global.npz ← lambda_vel_w750 for dynamic beta matrices/ YYYY-MM-DD/ scan_NNNNNN_w50_HHMMSS.arb512.pkl.zst ← 512-bit correlation matrix (unused in hot path) ``` NPZ files loaded by `AdaptiveCircuitBreaker._load_external_factors()` (max 10 scans per date, median-aggregated). --- ## 4. SIGNAL LAYER ### 4.1 Primary Signal: vel_div Threshold Gate **Source**: `alpha_signal_generator.py`, `AlphaSignalGenerator.generate()` **SHORT signal condition**: ``` vel_div < VEL_DIV_THRESHOLD (-0.02) ``` **LONG signal condition** (green posture, not champion): ``` vel_div > LONG_THRESHOLD (0.01) ``` **Confidence calculation** (SHORT path): ```python ratio = clamp((threshold - vel_div) / (threshold - extreme), 0, 1) = clamp((-0.02 - vel_div) / (-0.02 - (-0.05)), 0, 1) = clamp((-0.02 - vel_div) / 0.03, 0, 1) confidence = 0.50 + ratio * 0.40 # range: [0.50, 0.90] ``` `is_extreme = (vel_div <= -0.05)` ### 4.2 Direction Confirmation (DC) — Layer 6 **Source**: `alpha_signal_generator.py`, `check_dc_nb()` (numba JIT) ```python # Looks back dc_lookback_bars (default 7) bars on the selected asset price p0 = price[n - lookback - 1] p1 = price[n - 1] chg_bps = (p1 - p0) / p0 * 10000.0 if chg_bps < -min_magnitude_bps (-0.75): return CONFIRM # falling price → SHORT OK if chg_bps > min_magnitude_bps (+0.75): return CONTRADICT else: return NEUTRAL ``` **dc_skip_contradicts = True** (champion): CONTRADICT returns null signal (skip entry). **Effect on leverage**: DC has `dc_leverage_boost=1.0` (no boost in champion). CONTRADICT kills entry. ### 4.3 OB Sub-2: Per-Asset Imbalance Confirmation When `ob_engine` is wired in (`use_ob_edge=True`): ```python eff_imb = -ob_signal.imbalance_ma5 # For SHORT: sell pressure = positive eff_imb if eff_imb > 0.10: # OB confirms → confidence boost ≤+15% ob_adj = 1 + min(0.15, eff_imb * persistence * 0.5) confidence *= ob_adj elif eff_imb < -0.15 and persistence > 0.60: # Strong persistent OB contradiction → HARD SKIP return null_signal elif eff_imb < -0.10: # Moderate → soft dampen confidence ob_adj = max(0.85, 1 - |eff_imb| * persistence * 0.4) confidence *= ob_adj ``` --- ## 5. ASSET SELECTION — IRP ### 5.1 Overview **Source**: `alpha_asset_selector.py`, `AlphaAssetSelector.rank_assets()` + numba kernels IRP = **Impulse Response Profiling**. Ranks all available assets by historical behavior over the last 50 bars in the regime direction. Selects the asset with the highest ARS (Asset Ranking Score) that passes all filters. **Enabled by**: `use_asset_selection=True` (production default). ### 5.2 Numba Kernel: compute_irp_nb ```python # Input: price_segment (last 50 prices), direction (-1 or +1) dir_returns[i] = (price[i+1] - price[i]) * direction # directional returns cumulative = cumsum(dir_returns) mfe = max(cumulative) # Maximum Favorable Excursion mae = abs(min(cumulative, 0)) # Maximum Adverse Excursion efficiency = mfe / (mae + 1e-6) alignment = count(dir_returns > 0) / n_ret noise = variance(dir_returns) latency = bars_to_reach_10pct_of_mfe # (default: 50 if mfe==0) ``` ### 5.3 Numba Kernel: compute_ars_nb ``` ARS = 0.5 * log1p(efficiency) + 0.35 * alignment - 0.15 * noise * 1000 ``` ### 5.4 Numba Kernel: rank_assets_irp_nb For each asset: 1. Compute IRP in DIRECT direction (regime_direction) 2. Compute IRP in INVERSE direction (-regime_direction) 3. Take whichever gives higher ARS (allows inverse selection) 4. Apply filter gates: - `noise > 500` → skip - `latency > 20` → skip (must reach 10% MFE within 20 bars) - `alignment < 0.20` → skip 5. Bubble-sort by ARS descending (numba nopython) ### 5.5 AlphaAssetSelector Python Wrapper ```python # Build 2D array (max_len × n_assets), right-aligned valid = rank_assets_irp_nb(prices_2d, idx=max_len, regime_direction, ...) # Walk ranked list: for r in rankings: if min_irp_alignment > 0 and r.metrics.alignment < min_irp_alignment: continue # alignment gate (default 0.45) if r.action != expected_action: continue # direction gate if ob_engine and ob_placement.depth_quality < 0.20: continue # OB depth gate (try next asset) trade_asset = r.asset break # No match → return None (no fallback to BTCUSDT when IRP enabled) ``` **OB Sub-1**: ARS adjusted ±5%/10% by per-asset OB depth quality before sorting. --- ## 6. POSITION SIZING ### 6.1 Signal Strength Score **Source**: `alpha_bet_sizer.py`, `compute_sizing_nb()` (numba JIT) ```python # SHORT path (vel_div < threshold): if vel_div <= extreme (-0.05): strength_score = 1.0 else: strength_score = (threshold - vel_div) / (threshold - extreme) # = (-0.02 - vel_div) / 0.03 strength_score = clamp(strength_score, 0.0, 1.0) ``` ### 6.2 Dynamic Leverage (Cubic Convex) ```python scaled_score = strength_score ** leverage_convexity (3.0) eff_leverage = min_leverage + scaled_score * (max_leverage - min_leverage) = 0.5 + scaled_score³ * 4.5 # range: [0.5, 5.0] before ACB ``` ### 6.3 Signal Bucket Classification ```python # Numba: get_signal_bucket_nb(vel_div, threshold=-0.02, extreme=-0.05) if vel_div <= extreme * 1.5 (-0.075): bucket = 0 # "extreme" elif vel_div <= extreme (-0.05): bucket = 1 # "strong" elif vel_div <= (threshold+extreme)/2: bucket = 2 # "moderate" (-0.035) else: bucket = 3 # "weak" ``` ### 6.4 Alpha Layers (Layer 7) **Bucket Boost** — adaptive win-rate feedback: ```python # get_bucket_boost_nb: per-bucket win rate → multiplier wr > 0.60 → 1.3x | wr > 0.55 → 1.1x | wr < 0.40 → 0.7x | wr < 0.45 → 0.85x ``` **Streak Multiplier** — recent 5-trade loss streak: ```python # get_streak_mult_nb losses_in_last_5 >= 4 → 0.5x | >= 3 → 0.7x | <= 1 → 1.1x ``` **Trend Multiplier** — vel_div acceleration: ```python # get_trend_mult_nb(vd_trend = vel_div_history[-1] - vel_div_history[-10]) vd_trend < -0.01 → 1.3x (deepening instability) vd_trend > 0.01 → 0.7x (recovering) ``` **Effective Fraction computation**: ```python confidence = 0.70 if is_extreme else 0.55 conf_mult = confidence / 0.95 extreme_boost = 2.0 if is_extreme else 1.0 base_frac = 0.02 + strength_score * (base_fraction - 0.02) eff_fraction = base_frac * conf_mult * extreme_boost * trend_mult * bucket_boost * streak_mult eff_fraction = clamp(eff_fraction, 0.02, base_fraction=0.20) ``` **Final notional**: ```python notional = capital * eff_fraction * final_leverage ``` ### 6.5 ACB + MC Size Multiplier ```python # regime_size_mult is recomputed every bar via _update_regime_size_mult(vel_div) if day_beta > 0: strength_cubic = clamp((threshold - vel_div) / (threshold - extreme), 0, 1) ** convexity regime_size_mult = day_base_boost * (1.0 + day_beta * strength_cubic) * day_mc_scale else: regime_size_mult = day_base_boost * day_mc_scale # Applied to leverage ceiling: clamped_max_leverage = min(base_max_leverage * regime_size_mult * market_ob_mult, abs_max_leverage=6.0) raw_leverage = size_result["leverage"] * dc_lev_mult * regime_size_mult * market_ob_mult # STALKER posture hard cap: if posture == 'STALKER': clamped_max_leverage = min(clamped_max_leverage, 2.0) final_leverage = clamp(raw_leverage, min_leverage=0.5, clamped_max_leverage) ``` --- ## 7. EXIT MANAGEMENT ### 7.1 Exit Priority Order (champion) **Source**: `alpha_exit_manager.py`, `AlphaExitManager.evaluate()` 1. **FIXED_TP**: `pnl_pct >= 0.0095` (95 basis points) 2. **STOP_LOSS**: `pnl_pct <= -1.0` (DISABLED in practice — 100% loss never triggers before TP/max_hold) 3. **OB DURESS exits** (when ob_engine != None): - Cascade Detection: `cascade_count > 0` → widen TP ×1.40, halve max_hold - Liquidity Withdrawal: `regime_signal == 1` → hard SL 10%, TP ×0.60 4. **vel_div adverse-turn exits** (`vd_enabled=False` by default — disabled pending calibration) 5. **MAX_HOLD**: `bars_held >= 120` (= 600 seconds) ### 7.2 OB Dynamic Exit Parameter Adjustment ```python if cascade_count > 0: dynamic_tp_pct *= 1.40 dynamic_max_hold = int(max_hold_bars * 0.50) # take profit fast before snap-back elif regime_signal == 1: # LIQUIDITY WITHDRAWAL STRESS dynamic_sl_pct = 0.10 # hard 10% stop (tail protection) if pnl_pct > 0.0: dynamic_tp_pct *= 0.60 # take profit sooner under stress if eff_imb < -0.10: # OB actively opposing dynamic_max_hold = int(max_hold_bars * 0.40) elif regime_signal == -1 and eff_imb > 0.15: # CALM + FAVORABLE dynamic_max_hold = int(max_hold_bars * 1.50) # let winners run # Per-asset withdrawal (micro-level): if withdrawal_velocity < -0.20 and not in cascade/stress: dynamic_max_hold = min(dynamic_max_hold, int(max_hold_bars * 0.40)) if pnl_pct > 0.0: dynamic_tp_pct *= 0.75 ``` ### 7.3 Sub-day ACB Force Exit When HZ listener fires an ACB update mid-day: ```python # In update_acb_boost(boost, beta): if old_boost >= 1.25 and boost < 1.10: evaluate_subday_exits() # → _execute_exit("SUBDAY_ACB_NORMALIZATION", ...) ``` Threshold is ARBITRARY (not backtested). Marked research TODO. Safe under pending-flag pattern (fires on next bar, not mid-loop). ### 7.4 Slippage on Exit ```python # SHORT position exit: exit_price = current_price * (1.0 + slip) # slippage against us when covering short # STOP_LOSS: slip = 0.0005 (5 bps — market order fill) # FIXED_TP: slip = 0.0002 (2 bps — likely limit fill) # All others: slip = 0.0002 ``` --- ## 8. FEE & SLIPPAGE MODEL ### 8.1 SmartPlacer Fee Model (Layer 3) **Source**: `esf_alpha_orchestrator.py`, `_execute_exit()` Blended taker/maker fee rates based on historical SP fill statistics. **IMPORTANT**: In production/paper sessions using Nautilus friction, these MUST be disabled via `use_sp_fees=False`. ```python # Entry fee (ONLY applied if use_sp_fees=True): entry_fee = (0.0002 * sp_maker_entry_rate + 0.0005 * (1 - sp_maker_entry_rate)) * notional = (0.0002 * 0.62 + 0.0005 * 0.38) * notional = (0.0001240 + 0.0001900) * notional = 0.000314 * notional (31.4 bps) ``` ### 8.2 SP Slippage Refund (Layer 3) Also disabled when `use_sp_slippage=False` is passed to the engine. These were used to "re-approximate" fills in low-fidelity simulations. In paper/live trading, the matching engine provides the fill price directly. ### 8.3 Production-Grade Native Friction (Nautilus) In `launch_paper_portfolio.py` and live production flows: 1. **Engine Bypass**: `use_sp_fees = False`, `use_sp_slippage = False`. 2. **Nautilus Node Side**: Commissions are applied by the kernel via `CommissionConfig`. 3. **Execution**: Slippage is realized via the spread in the Nautilus Sandbox (Paper) or on-chain (Live). ### 8.4 Independent Session Logging Every high-fidelity session now deploys a `TradeLoggerActor` that independently captures: - `logs/paper_trading/settings_.json`: Full configuration metadata. - `logs/paper_trading/trades_.csv`: Every execution event. ### 8.3 OB Edge (Layer 4) ```python # With real OB engine: if ob_placement.depth_quality > 0.5: pnl_pct_raw += ob_placement.fill_probability * ob_edge_bps * 1e-4 # Without OB engine (legacy Monte Carlo fallback): if rng.random() < ob_confirm_rate (0.40): pnl_pct_raw += ob_edge_bps * 1e-4 # default: +5 bps ``` **Net PnL**: ```python gross_pnl = pnl_pct_raw * notional net_pnl = gross_pnl - entry_fee - exit_fee capital += net_pnl ``` --- ## 9. OB INTELLIGENCE LAYER **Source**: `ob_features.py`, `ob_provider.py`, `hz_ob_provider.py` The OB layer is wired in via `engine.set_ob_engine(ob_engine)` which propagates to signal_gen, asset_selector, and exit_manager. It is OPTIONAL — the engine degrades gracefully to legacy Monte Carlo when `ob_engine=None`. ### 9.1 OB Signals Per Asset ```python ob_signal = ob_engine.get_signal(asset, timestamp) # Fields: # imbalance_ma5 — 5-bar MA of bid/ask size imbalance ([-1, +1]) # imbalance_persistence — fraction of last N bars sustaining sign # withdrawal_velocity — rate of depth decay (negative = book thinning) ``` ### 9.2 OB Macro (Market-Wide) ```python ob_macro = ob_engine.get_macro() # Fields: # cascade_count — number of assets in liquidation cascade # regime_signal — (-1=calm/trending, 0=neutral, +1=withdrawal stress) ``` ### 9.3 OB Placement Quality ```python ob_placement = ob_engine.get_placement(asset, timestamp) # Fields: # depth_quality — book depth score ([0, 2+]; >1 = deep book) # fill_probability — probability of maker fill at entry price ``` ### 9.4 OB Sub-Systems Summary | Sub | Location | Effect | |-----|----------|--------| | OB-1 | AlphaAssetSelector | ARS adjusted ±5%/10% by depth quality | | OB-2 | AlphaSignalGenerator | Confidence boost/dampen by imbalance; hard skip if persistent contradiction | | OB-3 | NDAlphaEngine._try_entry | Market-wide imbalance multiplier on final leverage (±20%/15%) | | OB-4 | AdaptiveCircuitBreaker | Macro withdrawal stress modulates ACBv6 dynamic beta (+25% max) | | OB-5 | AlphaExitManager | Dynamic TP/SL/max_hold based on cascade/withdrawal/calm regime | --- ## 10. ACB v6 — ADAPTIVE CIRCUIT BREAKER ### 10.1 Architecture (3-Scale Confirmation) **Source**: `adaptive_circuit_breaker.py`, `AdaptiveCircuitBreaker` ``` Scale 1 (Daily): External macro factors → log_0.5 base_boost Scale 2 (Per-bar): vel_div signal strength → meta-boost multiplier Scale 3 (Macro): w750 eigenvalue velocity → dynamic beta switch ``` ### 10.2 Scale 1 — External Factor Signals Loaded from NG3 `scan_*__Indicators.npz` files (median of first 10 scans per date): | Factor | Source | Bearish Threshold | Signal Weight | |--------|--------|------------------|---------------| | `funding_btc` | Funding rate | < -0.0001 (very) / < 0.0 (mild) | 1.0 / 0.5 | | `dvol_btc` | BTC implied vol | > 80 (extreme) / > 55 (elevated) | 1.0 / 0.5 | | `fng` | Fear & Greed Index | < 25 (extreme fear) / < 40 (fear) | 1.0 / 0.5 (requires confirmation) | | `taker` | Taker buy ratio | < 0.80 (selling) / < 0.90 (mild) | 1.0 / 0.5 | **Signal counting**: ```python signals = sum(individual_signal_weights) # float, e.g. 2.5 ``` **Base boost formula**: ```python if signals >= 1.0: base_boost = 1.0 + 0.5 * log1p(signals) # signals=1 → 1.347x | signals=2 → 1.549x | signals=3 → 1.693x else: base_boost = 1.0 ``` ### 10.3 Scale 3 — Dynamic Beta ```python # Preloaded: w750_threshold = np.percentile(all_w750_vels, 60) if w750_vel >= w750_threshold: beta = BETA_HIGH = 0.8 # aggressive meta-boost during macro acceleration else: beta = BETA_LOW = 0.2 # conservative during calm macro ``` ### 10.4 Scale 2 — Per-Bar Meta-Boost ```python # Computed every bar inside _update_regime_size_mult(vel_div): strength_cubic = clamp((threshold - vel_div) / (threshold - extreme), 0, 1) ** 3 # leverage_convexity = 3 → cubic if day_beta > 0: regime_size_mult = base_boost * (1.0 + beta * strength_cubic) * mc_scale else: regime_size_mult = base_boost * mc_scale ``` ### 10.5 Sub-Day ACB Update (HZ Listener) The `acb_processor_service.py` re-runs ACB computation mid-day when new NG3 scan data arrives and writes `{boost, beta}` to `DOLPHIN_FEATURES` IMap. `_on_acb_event()` in `DolphinActor` stores the payload in `self._pending_acb` (GIL-safe dict write). Applied at start of next `on_bar()` iteration: ```python # In on_bar() — BEFORE processing: if _pending_acb is not None and engine is not None: engine.update_acb_boost(pending_acb['boost'], pending_acb['beta']) _pending_acb = None ``` --- ## 11. SURVIVAL STACK — POSTURE CONTROL ### 11.1 Overview **Source**: `survival_stack.py`, `SurvivalStack` Computes a continuous Risk Multiplier `Rm ∈ [0, 1]` from 5 sensor categories. Maps to discrete posture {APEX, STALKER, TURTLE, HIBERNATE}. ### 11.2 Five Sensor Categories **Cat1 — Binary Invariant** (kill switch): ```python if hz_nodes < 1 or heartbeat_age_s > 30: return 0.0 # Total system failure → HIBERNATE immediately return 1.0 ``` **Cat2 — Structural** (MC-Forewarner + data staleness): ```python base = {OK: 1.0, ORANGE: 0.5, RED: 0.1}[mc_status] decay = exp(-max(0, staleness_hours - 6) / 3) f_structural = base * decay # Exponential decay after 6h stale ``` **Cat3 — Microstructure** (OB depth/fill quality): ```python if ob_stale: return 0.5 score = min(depth_quality, fill_prob) return clamp(0.3 + 0.7 * score, 0.3, 1.0) ``` **Cat4 — Environmental** (DVOL spike impulse): ```python if dvol_spike and t_since_spike_min == 0: return 0.3 # Instant degradation at spike return 0.3 + 0.7 * (1 - exp(-t_since_spike_min / 60)) # 60-min recovery tau ``` **Cat5 — Capital** (sigmoid drawdown constraint): ```python # Rm5 ≈ 1.0 at DD<5%, ≈ 0.5 at DD=12%, ≈ 0.1 at DD=20% return 1 / (1 + exp(30 * (drawdown - 0.12))) ``` ### 11.3 Hierarchical Combination ```python f_environment = min(f_structural, f_ext) # worst of Cat2/Cat4 f_execution = f_micro # Cat3 r_target = Cat1 * Cat5 * f_environment * f_execution # Correlated sensor collapse penalty: degraded = count([f_structural < 0.8, f_micro < 0.8, f_ext < 0.8]) if degraded >= 2: r_target *= 0.5 ``` ### 11.4 Bounded Recovery Dynamics ```python # Fast attack (instant degradation), slow recovery (5%/minute max): if r_target < last_r_total: r_final = r_target # immediate drop else: alpha = min(1.0, 0.02 * dt_min) step = min(alpha * (r_target - last_r_total), 0.05 * dt_min) r_final = last_r_total + step ``` ### 11.5 Posture Mapping **NOTE: Thresholds are deliberately TIGHTER than mathematical spec (safety buffer).** ```python if Rm >= 0.90: APEX # Full trading, no constraints if Rm >= 0.75: STALKER # Max leverage capped at 2.0x if Rm >= 0.50: TURTLE # regime_dd_halt = True (no new entries) else: HIBERNATE # Force-close open positions, no new entries ``` ### 11.6 Hysteresis ```python # Down: requires hysteresis_down=2 consecutive bars at lower level # Up: requires hysteresis_up=5 consecutive bars at higher level # Prevents flip-flopping around thresholds ``` ### 11.7 Posture → Engine Effect | Posture | Engine Effect | |---------|--------------| | APEX | No constraint (max leverage = abs_max=6.0 × regime_size_mult) | | STALKER | `clamped_max_leverage = min(..., 2.0)` in `_try_entry` | | TURTLE | `regime_dd_halt = True` → `process_bar` skips entry block | | HIBERNATE | `_manage_position` forces EXIT("HIBERNATE_HALT"), `regime_dd_halt = True` | --- ## 12. MC-FOREWARNER ENVELOPE GATE **Source**: Called via `engine.set_mc_forewarner(forewarner, mc_base_cfg)` Runs daily at start of `process_day()`: ```python mc_cfg = {**mc_base_cfg, 'max_leverage': base_max_leverage * day_base_boost} mc_report = forewarner.assess_config_dict(mc_cfg) mc_red = mc_report.catastrophic_probability > 0.25 or mc_report.envelope_score < -1.0 mc_orange = (not mc_red) and (envelope_score < 0 or catastrophic_probability > 0.10) day_mc_status = 'RED' | 'ORANGE' | 'OK' day_mc_scale = 0.5 if ORANGE else 1.0 if mc_red or posture in ['TURTLE', 'HIBERNATE']: regime_dd_halt = True # No new entries for the entire day ``` --- ## 13. NDAlphaEngine — FULL BAR LOOP ### 13.1 process_day() Flow ```python def process_day(date_str, df, asset_columns, vol_regime_ok, direction, posture): # --- Per-day initialization --- regime_direction = direction or -1 # -1=SHORT champion regime_dd_halt = False day_posture = posture # --- ACB: Scale 1 + Scale 3 (once per day) --- if acb: info = acb.get_dynamic_boost_for_date(date_str, ob_engine) day_base_boost = info['boost'] # log_0.5 curve day_beta = info['beta'] # dynamic beta (w750 percentile switch) # --- MC-Forewarner gate (once per day) --- if forewarner: mc_report = forewarner.assess_config_dict(...) if mc_red or posture in [TURTLE, HIBERNATE]: regime_dd_halt = True # kill new entries for day # === BAR LOOP === for ri, row in df.iterrows(): vd = row['vel_div'] # primary signal v50_vel = row['v50_lambda_max_velocity'] v750_vel = row['v750_lambda_max_velocity'] prices = {col: row[col] for col in asset_columns if row[col] > 0} # Append to price_histories (capped at 200; sliding window) for ac, p in prices.items(): price_histories[ac].append(p) if len(price_histories[ac]) > 500: price_histories[ac] = price_histories[ac][-200:] vol_ok = vol_regime_ok[ri] if vol_regime_ok else (bid >= 100) # ACB Scale 2: regime_size_mult updated every bar _update_regime_size_mult(vd) process_bar(global_bar_idx, vd, prices, vol_ok, price_histories, v50_vel, v750_vel) global_bar_idx += 1 return {date, pnl, capital, boost, beta, mc_status, trades} ``` ### 13.2 process_bar() Flow ```python def process_bar(bar_idx, vel_div, prices, vol_regime_ok, price_histories, v50_vel, v750_vel): bar_count += 1 vel_div_history.append(vel_div) # trimmed to 200 # === EXIT MANAGEMENT (always first) === if position is not None: exit_info = _manage_position(bar_idx, prices, vel_div, v50_vel, v750_vel) # → AlphaExitManager.evaluate() → if EXIT: _execute_exit() # === ENTRY (only when no position) === if position is None AND bar_idx > last_exit_bar AND NOT regime_dd_halt: if bar_count >= lookback (100) AND vol_regime_ok: entry_info = _try_entry(bar_idx, vel_div, prices, price_histories, v50_vel, v750_vel) ``` ### 13.3 _try_entry() Flow ```python def _try_entry(bar_idx, vel_div, prices, price_histories, v50_vel, v750_vel): if capital <= 0: return None # 1. IRP Asset Selection (Layer 2) if use_asset_selection: market_data = {a: history[-50:] for a, history in price_histories if len >= 50} rankings = asset_selector.rank_assets(market_data, regime_direction) trade_asset = first_asset_passing_all_gates(rankings) if trade_asset is None: return None # strict: no fallback else: trade_asset = "BTCUSDT" # fallback when IRP disabled # 2. Signal Generation + DC (Layer 6) signal = signal_gen.generate(vel_div, vel_div_history, price_histories[trade_asset], regime_direction, trade_asset) if not signal.is_valid: return None # vel_div or DC killed it # 3. Position Sizing (Layers 7-8) size = bet_sizer.calculate_size(capital, vel_div, signal.vel_div_trend, regime_direction) # 4. OB Sub-3: Cross-asset market multiplier market_ob_mult = ob_engine.get_market_multiplier(...) # ±20% # 5. ACB leverage ceiling enforcement clamped_max = min(base_max_leverage * regime_size_mult * market_ob_mult, abs_max_leverage=6.0) if posture == STALKER: clamped_max = min(clamped_max, 2.0) final_leverage = clamp(size.leverage * regime_size_mult * market_ob_mult, min_lev, clamped_max) # 6. Notional and entry notional = capital * size.fraction * final_leverage entry_price = prices[trade_asset] # 7. Create position position = NDPosition(trade_asset, regime_direction, entry_price, notional, final_leverage, ...) exit_manager.setup_position(trade_id, entry_price, direction, bar_idx, v50_vel, v750_vel) ``` --- ## 14. DOLPHIN ACTOR — NAUTILUS INTEGRATION **Source**: `nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py` **Base**: `nautilus_trader.trading.strategy.Strategy` (Rust/Cython core) **Lines**: 338 ### 14.1 Lifecycle (v2 — step_bar API) ``` __init__: dolphin_config, engine=None, hz_client=None current_date=None, posture='APEX', _processed_dates=set() _pending_acb: dict|None = None _acb_lock = threading.Lock() ← v2: explicit lock (not GIL reliance) _stale_state_events = 0 _day_data = None, _bar_idx_today = 0 on_start(): 1. _connect_hz() → HazelcastClient(cluster="dolphin", members=["localhost:5701"]) 2. _read_posture() → DOLPHIN_SAFETY (CP AtomicRef, map fallback) 3. _setup_acb_listener() → add_entry_listener(DOLPHIN_FEATURES["acb_boost"]) 4. create_boost_engine(mode=boost_mode, **engine_kwargs) → NDAlphaEngine 5. MC-Forewarner injection (gold-performance stack — always active): mc_models_dir = config.get('mc_models_dir', _MC_MODELS_DIR_DEFAULT) if Path(mc_models_dir).exists(): forewarner = DolphinForewarner(models_dir=mc_models_dir) engine.set_mc_forewarner(forewarner, _MC_BASE_CFG) ← graceful degradation: logs warning + continues if models missing ← disable explicitly: set mc_models_dir=None/'' in config on_bar(bar): ① Drain ACB under _acb_lock: pending = _pending_acb; _pending_acb = None ← atomic swap if pending: engine.update_acb_boost(boost, beta) ② Date boundary: date_str = datetime.fromtimestamp(bar.ts_event/1e9, UTC).strftime('%Y-%m-%d') if current_date != date_str: if current_date: engine.end_day() current_date = date_str posture = _read_posture() _bar_idx_today = 0 engine.begin_day(date_str, posture=posture, direction=±1) if not live_mode: _load_parquet_data(date_str) → _day_data ③ HIBERNATE guard: if posture=='HIBERNATE': return ← hard skip, no step_bar ④ Feature extraction: live_mode=False → if _day_data empty: return ← early exit, no step_bar with zeros elif _bar_idx_today >= len(df): return ← end-of-day else: row = df.iloc[_bar_idx_today], vol_regime_ok = (idx>=100) live_mode=True → _get_latest_hz_scan(), staleness check (>10s → warning), dedup on scan_number ⑤ _GateSnap BEFORE: (acb_boost, acb_beta, posture, mc_gate_open) ⑥ engine.pre_bar_proxy_update(inst50, v750_vel) ← if ProxyBoostEngine ⑦ result = engine.step_bar(bar_idx, vel_div, prices, v50_vel, v750_vel, vol_regime_ok) _bar_idx_today += 1 ⑧ _GateSnap AFTER: compare → if changed: stale_state_events++, result['stale_state']=True ⑨ _write_result_to_hz(date_str, result) on_stop(): _processed_dates.clear() _stale_state_events = 0 if hz_client: hz_client.shutdown() ``` ### 14.2 Thread Safety: ACB Pending-Flag Pattern (v2) **CRITICAL**: HZ entry listeners run on HZ client pool threads, NOT the Nautilus event loop. ```python # HZ listener thread — parse outside lock, assign inside lock: def _on_acb_event(event): try: val = event.value if val: parsed = json.loads(val) # CPU work OUTSIDE lock with self._acb_lock: self._pending_acb = parsed # atomic write under lock except Exception as e: self.log.error(f"ACB event parse error: {e}") # Nautilus event loop — drain under lock, apply outside lock: def on_bar(bar): with self._acb_lock: pending = self._pending_acb self._pending_acb = None # atomic consume under lock if pending is not None and self.engine is not None: boost = float(pending.get('boost', 1.0)) beta = float(pending.get('beta', 0.0)) self.engine.update_acb_boost(boost, beta) ``` **v2 vs v1**: v1 relied on GIL for safety (bare dict assignment). v2 uses explicit `threading.Lock` — correct even if GIL is removed in future Python versions. Lock hold time is minimized to a single pointer swap. ### 14.3 _GateSnap — Stale-State Detection New in v2. Detects when ACB boost, posture, or MC gate changes between the pre-step and post-step snapshot: ```python _GateSnap = namedtuple('_GateSnap', ['acb_boost', 'acb_beta', 'posture', 'mc_gate_open']) before = _GateSnap(engine._day_base_boost, engine._day_beta, posture, engine._mc_gate_open) result = engine.step_bar(...) after = _GateSnap(engine._day_base_boost, engine._day_beta, _read_posture(), engine._mc_gate_open) if before != after: self._stale_state_events += 1 self.log.warning(f"[STALE_STATE] gate changed mid-eval: {changed_fields}") result['stale_state'] = True # flagged in HZ write — DO NOT use for live orders ``` ### 14.4 Replay vs Live Mode | | Replay Mode (live_mode=False) | Live Mode (live_mode=True) | |---|---|---| | Data source | `vbt_cache_klines/YYYY-MM-DD.parquet` | `DOLPHIN_FEATURES["latest_eigen_scan"]` (HZ) | | Per-bar iteration | `df.iloc[_bar_idx_today]` | One bar = one HZ scan fetch | | vol_regime_ok | `bar_idx >= 100` (warmup) | From scan dict | | Stale guard | — | `abs(now_ns - scan_ts_ns) > 10s` → warning | | Dedup | — | `scan_num == last_scan_number` → skip | ### 14.5 Data Loading (Replay) ```python def _load_parquet_data(date_str): path = HCM_DIR / "vbt_cache_klines" / f"{date_str}.parquet" df = pd.read_parquet(path) meta_cols = {vel_div, scan_number, v50_..., v750_..., instability_50, instability_150} asset_columns = [c for c in df.columns if c not in meta_cols] return df, asset_columns, None # vol_regime_ok deferred to on_bar warmup check ``` ### 14.6 Posture Reading Primary: `HZ CP Subsystem AtomicReference('DOLPHIN_SAFETY')` — linearizable. Fallback: `HZ IMap('DOLPHIN_SAFETY').get('latest')` — eventually consistent. Default when HZ unavailable: `'APEX'` (non-fatal degradation). ### 14.7 Result Writing ```python def _write_result_to_hz(date_str, result): if not self.hz_client: return # silent noop imap_pnl = hz_client.get_map('DOLPHIN_PNL_BLUE').blocking() imap_pnl.put(date_str, json.dumps(result)) if result.get('stale_state'): self.log.error("[STALE_STATE] DO NOT use for live order submission") # result: {date, pnl, capital, boost, beta, mc_status, trades, stale_state?} ``` ### 14.8 Important Notes for Callers - **`actor.log` is read-only** (Rust-backed Cython property). Never try to assign `actor.log = MagicMock()` in tests — use the real Nautilus logger instead. - **`actor.posture`** is a regular Python attribute (writable in tests). - **`actor.engine`** is set in `on_start()`. Tests can set directly after `__init__`. --- ## 15. HAZELCAST — FULL IMAP SCHEMA Hazelcast is the **system memory**. All subsystem state flows through it. Every consumer must treat HZ maps as authoritative real-time sources. **Infrastructure**: Hazelcast 5.3, Docker (`prod/docker-compose.yml`), `localhost:5701`, cluster `"dolphin"`. **CP Subsystem**: Enabled — required for ACB atomic operations. **Management Center**: `http://localhost:8080`. **Python client**: `hazelcast-python-client 5.6.0` (siloqy-env). ### 15.1 Complete IMap Reference | Map | Key | Value | Writer | Reader(s) | Notes | |---|---|---|---|---|---| | `DOLPHIN_SAFETY` | `"latest"` | JSON `{posture, Rm, sensors, ...}` | `system_watchdog_service.py` | `DolphinActor`, `paper_trade_flow`, `nautilus_prefect_flow` | CP AtomicRef preferred; IMap fallback | | `DOLPHIN_FEATURES` | `"acb_boost"` | JSON `{boost, beta}` | `acb_processor_service.py` | `DolphinActor` (HZ entry listener) | Triggers `_on_acb_event` | | `DOLPHIN_FEATURES` | `"latest_eigen_scan"` | JSON `{vel_div, scan_number, asset_prices, timestamp_ns, w50_velocity, w750_velocity, instability_50}` | Eigenvalue scanner bridge | `DolphinActor` (live mode) | Dedup on scan_number | | `DOLPHIN_PNL_BLUE` | `"YYYY-MM-DD"` | JSON daily result `{pnl, capital, trades, boost, beta, mc_status, posture, stale_state?}` | `paper_trade_flow`, `DolphinActor._write_result_to_hz`, `nautilus_prefect_flow` | Analytics | stale_state=True means DO NOT use for live orders | | `DOLPHIN_PNL_GREEN` | `"YYYY-MM-DD"` | JSON daily result | `paper_trade_flow` (green) | Analytics | GREEN config only | | `DOLPHIN_STATE_BLUE` | `"latest"` | JSON `{strategy, capital, date, pnl, trades, peak_capital, drawdown, engine_state, updated_at}` | `paper_trade_flow` | `paper_trade_flow` (capital restore) | Full engine_state for position continuity | | `DOLPHIN_STATE_BLUE` | `"latest_nautilus"` | JSON `{strategy, capital, date, pnl, trades, posture, param_hash, engine, updated_at}` | `nautilus_prefect_flow` | `nautilus_prefect_flow` (capital restore) | param_hash = champion SHA256[:16] | | `DOLPHIN_STATE_BLUE` | `"state_{strategy}_{date}"` | JSON per-run snapshot | `paper_trade_flow` | Recovery | Full historical per-run snapshots | | `DOLPHIN_HEARTBEAT` | `"nautilus_flow_heartbeat"` | JSON `{ts, iso, run_date, phase, flow}` | `nautilus_prefect_flow` (heartbeat_task) | External monitoring | Written at flow_start, engine_start, flow_end | | `DOLPHIN_HEARTBEAT` | `"probe_ts"` | Timestamp string | `nautilus_prefect_flow` (hz_probe_task) | Liveness check | Written at HZ probe time | | `DOLPHIN_OB` | per-asset key | JSON OB snapshot | `obf_prefect_flow` | `HZOBProvider` | Raw OB map | | `DOLPHIN_FEATURES_SHARD_00` | symbol | JSON OB feature dict `{imbalance, fill_probability, depth_quality, regime_signal, ...}` | `obf_prefect_flow` | `HZOBProvider` | shard routing (see §15.2) | | `DOLPHIN_FEATURES_SHARD_01..09` | symbol | Same schema | `obf_prefect_flow` | `HZOBProvider` | — | | `DOLPHIN_SIGNALS` | signal key | Signal distribution | `signal_bridge.py` | Strategy consumers | — | | `DOLPHIN_FEATURES` | `"obf_universe_latest"` | JSON `{_snapshot_utc, _n_assets, assets: {symbol: {spread_bps, depth_1pct_usd, depth_quality, fill_probability, imbalance, best_bid, best_ask, n_bid_levels, n_ask_levels}}}` | `obf_universe_service.py` | MHS v3 (M5 coherence), Asset Picker | 540 USDT perps; 60s push cadence. NEW v5.0 | | `DOLPHIN_META_HEALTH` | `"latest"` | JSON `{rm_meta, status, m4_control_plane, m1_data_infra, m1_trader, m2_heartbeat, m3_data_freshness, m5_coherence, service_status, hz_key_status, timestamp}` | `meta_health_service_v3.py` | External monitoring, MHS tests | GREEN/DEGRADED/CRITICAL/DEAD. NEW v5.0 | ### 15.2 OBF Shard Routing ```python SHARD_COUNT = 10 shard_idx = sum(ord(c) for c in symbol) % SHARD_COUNT imap_name = f"DOLPHIN_FEATURES_SHARD_{shard_idx:02d}" # ..._00 through ..._09 ``` Routing is **stable** (sum-of-ord, not `hash()`) — deterministic across Python versions and process restarts. 400+ assets distribute evenly across 10 shards. ### 15.3 ShardedFeatureStore API **Source**: `hz_sharded_feature_store.py`, `ShardedFeatureStore` ```python store = ShardedFeatureStore(hz_client) store.put('BTCUSDT', 'vel_div', -0.03) # routes to shard based on symbol hash val = store.get('BTCUSDT', 'vel_div') store.delete('BTCUSDT', 'vel_div') # Internal key format: "vel_div_BTCUSDT" ``` Near cache config: TTL=300s, invalidate_on_change=True, LRU eviction, max_size=5000 per shard. ### 15.4 HZOBProvider — Dynamic Asset Discovery ```python # On connect (lazy), discovers which assets are present in any shard: for shard_idx in range(SHARD_COUNT): key_set = client.get_map(f"DOLPHIN_FEATURES_SHARD_{shard_idx:02d}").blocking().key_set() discovered_assets.update(key_set) ``` No static asset list required — adapts automatically as OBF flow adds/removes assets. ### 15.5 CP Subsystem (ACB Processor) `acb_processor_service.py` uses `HZ CP FencedLock` to prevent simultaneous ACB writes from multiple instances. CP Subsystem must be enabled in `docker-compose.yml`. All writers must use the same CP lock name to get protection. ### 15.6 OBF Circuit Breaker (HZ Push) After 5 consecutive HZ push failures, OBF flow opens a circuit breaker and switches to file-only mode (`ob_cache/latest_ob_features.json`). Consumers should prefer the JSON file during HZ outages. --- ## 16. PRODUCTION DAEMON TOPOLOGY > **v5.0 NOTE**: ALL services are managed exclusively by **supervisord**. No service is managed by systemd. The `meta_health_daemon.service`, `dolphin-nautilus-trader.service`, and `dolphin-scan-bridge.service` systemd units are stopped and disabled. Any attempt to re-enable them will create a dual-management race condition ("random killer" bug — see §26.1). ### 16.1 Supervisord Config **File**: `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf` **Socket**: `/tmp/dolphin-supervisor.sock` **PYTHONPATH** (dolphin_data group): `/mnt/dolphinng5_predict:/mnt/dolphinng5_predict/nautilus_dolphin:/mnt/dolphinng5_predict/prod` ```bash # Status check supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status # Restart a service supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf restart dolphin_data:meta_health ``` ### 16.2 dolphin_data Group (autostart=true — data pipeline) | Program | File | Purpose | startsecs | |---|---|---|---| | `exf_fetcher` | `exf_fetcher_flow.py --warmup 15` | ExF live daemon: funding/dvol/fng/taker → HZ `exf_latest` | 20 | | `acb_processor` | `acb_processor_service.py` | ACBv6 daily boost + dynamic beta → HZ `acb_boost` (CP FencedLock) | 10 | | `obf_universe` | `obf_universe_service.py` | 540-asset OBF universe L2 health → HZ `obf_universe_latest` | 15 | | `meta_health` | `meta_health_service_v3.py` | MHS v3 watchdog — monitors all data services, auto-restarts | 5 | ### 16.3 dolphin Group (autostart=false — trading, started manually) | Program | File | Purpose | Notes | |---|---|---|---| | `nautilus_trader` | `nautilus_event_trader.py` | HZ entry listener trader | Start only during trading hours | | `scan_bridge` | `scan_bridge_service.py` | Arrow → HZ scan bridge | Start when DolphinNG6 is active | | `clean_arch_trader` | `clean_arch/main.py` | Clean architecture trader | Experimental | ### 16.4 ACB Processor (`acb_processor_service.py`) **Purpose**: ACBv6 daily boost + dynamic beta from NG3 NPZ files → HZ `DOLPHIN_FEATURES["acb_boost"]`. **HZ**: CP FencedLock prevents simultaneous writes. ### 16.5 OBF Universe (`obf_universe_service.py`) — NEW v5.0 **Purpose**: L2 health monitor for all 540 USDT perpetuals → HZ `DOLPHIN_FEATURES["obf_universe_latest"]`. **Coverage**: 540 active USDT perps, 3 WS connections (200/200/140 streams). **Stream**: `{symbol}@depth5@500ms` — zero REST weight. **Cadence**: 60s health snapshots; 300s Parquet flush. **Storage**: `/mnt/ng6_data/ob_universe/` (Hive partitioned; `MAX_FILE_AGE_DAYS=0` — never pruned). **See §26.2 for full schema.** ### 16.6 Meta Health Service v3 (`meta_health_service_v3.py`) — NEW v5.0 **Purpose**: 5-sensor weighted health monitor + auto-recovery for all data pipeline services. **Recovery**: `supervisorctl restart` via daemon thread. `RECOVERY_COOLDOWN_CRITICAL_S=10s`. **Output**: `DOLPHIN_META_HEALTH["latest"]` + `/mnt/dolphinng5_predict/run_logs/meta_health.json`. **See §26.3 for full specification.** ### 16.7 ExF Daemon (`exf_fetcher_flow.py`) **Purpose**: External factors — funding rate, DVOL, Fear&Greed, taker ratio → HZ `DOLPHIN_FEATURES["exf_latest"]`. **Field**: `_pushed_at` (Unix timestamp) is the canonical freshness field. ### 16.8 MC-Forewarner Flow (`mc_forewarner_flow.py`) **Purpose**: Prefect-orchestrated daily ML assessment. Outcome: OK / ORANGE / RED → HZ. **Effect**: ORANGE → `day_mc_scale=0.5`. RED → `regime_dd_halt=True`. ### 16.9 paper_trade_flow.py (Primary — 00:05 UTC) **Purpose**: Daily NDAlphaEngine run. Loads klines, wires ACB+OB+MC, runs `begin_day/step_bar/end_day`. **Direction**: `direction = -1` (SHORT, blue). ### 16.10 Daemon Start Sequence ``` 1. docker-compose up -d ← Hazelcast 5701, ManCenter 8080, Prefect 4200 2. supervisord (auto) ← starts dolphin_data group automatically on boot └── exf_fetcher, acb_processor, obf_universe, meta_health start in parallel 3. (Manual when needed): supervisorctl start dolphin:nautilus_trader ← HZ entry listener supervisorctl start dolphin:scan_bridge ← when DolphinNG6 active 4. Prefect deployments (daily, scheduled): paper_trade_flow.py ← 00:05 UTC nautilus_prefect_flow.py ← 00:10 UTC mc_forewarner_flow.py ← daily ``` ### 16.11 Monitoring Endpoints | Service | URL / Command | |---|---| | Hazelcast Management Center | `http://localhost:8080` | | Prefect UI | `http://localhost:4200` | | Supervisord status | `supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status` | | MHS health JSON | `cat /mnt/dolphinng5_predict/run_logs/meta_health.json` | | Daily PnL | `HZ IMap DOLPHIN_PNL_BLUE[YYYY-MM-DD]` | | ACB State | `HZ IMap DOLPHIN_FEATURES["acb_boost"]` | | OBF Universe | `HZ IMap DOLPHIN_FEATURES["obf_universe_latest"]` | --- ## 17. PREFECT ORCHESTRATION LAYER **Version**: Prefect 3.6.22 (siloqy-env) **Server**: `http://localhost:4200/api` **Work pool**: `dolphin` (process type) **Worker command**: `prefect worker start --pool dolphin --type process` ### 17.1 Registered Deployments | Deployment | Flow | Schedule | Config | |---|---|---|---| | `dolphin-paper-blue` | `paper_trade_flow.py` | `0 0 * * *` (00:05 UTC) | `configs/blue.yml` | | `dolphin-paper-green` | `paper_trade_flow.py` | `0 0 * * *` (00:05 UTC) | `configs/green.yml` | | `dolphin-nautilus-blue` | `nautilus_prefect_flow.py` | `10 0 * * *` (00:10 UTC) | `configs/blue.yml` | ### 17.2 nautilus_prefect_flow.py — Nautilus BacktestEngine Supervisor New in v2. Tasks in execution order: ``` hz_probe_task retries=3 timeout=30s — verify HZ reachable; abort on failure validate_champion_params retries=0 timeout=10s — SHA256 hash vs FROZEN params; ValueError on drift load_bar_data_task retries=2 timeout=120s — load vbt_cache_klines parquet; validate vel_div col read_posture_task retries=2 timeout=20s — read DOLPHIN_SAFETY restore_capital_task retries=2 timeout=20s — restore capital from DOLPHIN_STATE_BLUE → HIBERNATE? skip engine, write result, heartbeat, return run_nautilus_backtest_task retries=0 timeout=600s — BacktestEngine + DolphinActor full cycle write_hz_result_task retries=3 timeout=30s — DOLPHIN_PNL_BLUE + DOLPHIN_STATE_BLUE write heartbeat_task retries=0 timeout=15s — phase=flow_end ``` **Champion integrity**: `_CHAMPION_HASH = sha256(json.dumps(_CHAMPION_PARAMS, sort_keys=True))[:16]`. Computed at import time. Any config drift triggers `ValueError` before engine starts. **Capital continuity**: Restores from `DOLPHIN_STATE_BLUE["latest_nautilus"]`. Falls back to `initial_capital` (25,000 USDT) if absent. ### 17.3 paper_trade_flow.py — Task Reference | Task | Retries | Purpose | |---|---|---| | `load_config` | 0 | YAML config load | | `load_day_scans` | 2 | Parquet (preferred) or JSON fallback; vel_div validation | | `run_engine_day` | 0 | begin_day/step_bar×N/end_day; returns daily stats | | `write_hz_state` | 3 | DOLPHIN_STATE_BLUE + DOLPHIN_PNL_BLUE persist | | `log_pnl` | 0 | Disk JSONL append (`paper_logs/{color}/`) | ### 17.4 Registration Commands ```bash source /home/dolphin/siloqy_env/bin/activate PREFECT_API_URL=http://localhost:4200/api python prod/paper_trade_flow.py --register # blue + green paper deployments python prod/nautilus_prefect_flow.py --register # nautilus blue deployment ``` ### 17.5 Manual Run ```bash # Paper trade: python prod/paper_trade_flow.py --config prod/configs/blue.yml --date 2026-03-21 # Nautilus supervisor: python prod/nautilus_prefect_flow.py --date 2026-03-21 # Dry-run (data + param validation, no engine): python prod/nautilus_prefect_flow.py --date 2026-03-21 --dry-run ``` --- ## 18. CI TEST SUITE ### 18.1 Test Suites Overview | Suite | Location | Runner | Gate | |-------|----------|--------|------| | Nautilus bootstrap | `nautilus_dolphin/tests/test_0_nautilus_bootstrap.py` | `pytest nautilus_dolphin/tests/test_0_nautilus_bootstrap.py -v` | 11/11 | | DolphinActor | `nautilus_dolphin/tests/test_dolphin_actor.py` | `pytest nautilus_dolphin/tests/test_dolphin_actor.py -v` | 35/35 | | OBF unit tests | `tests/test_obf_unit.py` | `pytest tests/test_obf_unit.py -v` | ~120/~120 | | Legacy CI | `ci/` directory | `pytest ci/ -v` | 14/14 | | ACB + HZ status | `prod/tests/test_acb_hz_status_integrity.py` | `pytest prod/tests/test_acb_hz_status_integrity.py -v` | 118/118 | | **MHS v3** | `prod/tests/test_mhs_v3.py` | `pytest prod/tests/test_mhs_v3.py -v` | **111/111** | **Total: 46 Nautilus + ~120 OBF + 14 legacy CI + 118 ACB/HZ + 111 MHS = ~409 tests green.** **Run all prod tests**: ```bash source /home/dolphin/siloqy_env/bin/activate cd /mnt/dolphinng5_predict python -m pytest prod/tests/ -v --tb=short ``` ### 18.2 Nautilus Bootstrap Tests (11 tests) `test_0_nautilus_bootstrap.py` — foundation sanity checks: - Nautilus import, catalog construction, Bar/BarType creation - DolphinActor instantiation without full kernel (uses `__new__` + `__init__` pattern) - Champion config loading from blue.yml - HZ connectivity probe (skip if HZ unavailable) - BacktestEngine construction with DolphinActor registered ### 18.3 DolphinActor Tests (35 tests, 8 classes) `test_dolphin_actor.py` — full behavioral coverage: | Class | Tests | What It Covers | |-------|-------|----------------| | `TestChampionParamInvariants` | 6 | Config loading, SHA256 hash stability, frozen param values, blue.yml parity | | `TestACBPendingFlagThreadSafety` | 5 | Lock acquisition, JSON parse outside lock, dict assign inside lock, concurrent event safety | | `TestHibernatePostureGuard` | 3 | HIBERNATE skips engine entirely, APEX/STALKER/TURTLE pass through, posture gate logic | | `TestDateChangeHandling` | 5 | Date rollover triggers end_day/begin_day, once-per-date guard, bar_idx reset | | `TestHZUnavailableDegradation` | 4 | HZ down → engine continues with stale OB features; heartbeat errors silenced; file fallback | | `TestReplayModeBarTracking` | 3 | bar_idx increments per step_bar call; total_bars_processed correct; replay vs live mode flag | | `TestOnStopCleanup` | 4 | on_stop writes final HZ result; HZ down on stop is non-fatal; engine state serialized | | `TestStaleStateGuard` | 5 | _GateSnap detects mid-eval posture/acb changes; snap mismatch triggers abort; re-eval on next bar | **Critical implementation note**: `actor.log` is a Cython/Rust-backed read-only property on `Actor`. Do NOT attempt `actor.log = MagicMock()` — raises `AttributeError: attribute 'log' of ... objects is not writable`. The real Nautilus logger is initialized by `super().__init__()` and works in test context. ### 18.4 Legacy CI Tests (14 tests) **Location**: `ci/` directory. Runner: `pytest ci/ -v` | File | Tests | What It Covers | |------|-------|----------------| | `test_13_nautilus_integration.py` | 6 | Actor import, instantiation, on_bar, HIBERNATE posture, once-per-day guard, ACB thread safety | | `test_14_long_system.py` | 3 | Multi-day run, capital persistence, trade count | | `test_15_acb_reactive.py` | 1 | ACB boost update applied correctly mid-day | | `test_16_scaling.py` | 4 | Memory footprint <4GB (50 assets), shard routing (400 symbols), 400-asset no-crash, 400-asset with IRP | ### 18.5 Key Test Patterns **ACB pending-flag pattern** (ThreadSafety test): ```python # JSON parse OUTSIDE lock, dict assign INSIDE lock with patch.object(actor.engine, 'update_acb_boost') as mock_update: actor._on_acb_event(event) assert actor._pending_acb['boost'] == 1.35 mock_update.assert_not_called() # engine NOT called from listener thread ``` **Date rollover pattern** (DateChange test): ```python # Fires 3 bars on same date → assert begin_day.call_count == 1 # Fires 1 bar on next date → assert begin_day.call_count == 2, end_day.call_count == 1 ``` **_GateSnap stale-state detection**: ```python # Snap taken at start of step_bar; posture changes mid-eval → abort, retry next bar snap = actor._gate_snap # namedtuple(acb_boost, acb_beta, posture, mc_gate_open) ``` --- ## 19. PARAMETER REFERENCE ### 19.1 Champion Parameters (Frozen) | Parameter | Value | Layer | Notes | |-----------|-------|-------|-------| | `vel_div_threshold` | -0.02 | Signal gate | PRIMARY entry threshold | | `vel_div_extreme` | -0.05 | Signal/sizing | Extreme regime: full size | | `min_leverage` | 0.5 | Sizing | Floor leverage | | `max_leverage` | 5.0 | Sizing | Base ceiling (before ACB) | | `abs_max_leverage` | 6.0 | Sizing | Hard geometric ceiling | | `leverage_convexity` | 3.0 | Sizing | Cubic convex curve | | `fraction` | 0.20 | Sizing | Max capital fraction per trade | | `fixed_tp_pct` | 0.0095 | Exit | 95 bps take-profit | | `stop_pct` | 1.0 | Exit | Effectively disabled | | `max_hold_bars` | 120 | Exit | 600 seconds | | `dc_lookback_bars` | 7 | DC | 35 seconds price momentum | | `dc_min_magnitude_bps` | 0.75 | DC | Minimum BTC momentum | | `dc_skip_contradicts` | True | DC | Hard skip on contradiction | | `min_irp_alignment` | 0.45 | IRP | Alignment gate | | `sp_maker_entry_rate` | 0.62 | Fees | 62% maker fill at entry | | `sp_maker_exit_rate` | 0.50 | Fees | 50% maker fill at exit | | `ob_edge_bps` | 5.0 | OB | Legacy MC OB edge | | `ob_confirm_rate` | 0.40 | OB | Legacy MC confirmation rate | | `lookback` | 100 | Warmup | Bars before first entry allowed | | `seed` | 42 | RNG | Deterministic numpy RandomState | ### 19.2 ACBv6 Parameters (Frozen — Validated) | Parameter | Value | Notes | |-----------|-------|-------| | `BETA_HIGH` | 0.8 | w750 above p60 threshold | | `BETA_LOW` | 0.2 | w750 below p60 threshold | | `W750_THRESHOLD_PCT` | 60 | Percentile switch point | | `FUNDING_VERY_BEARISH` | -0.0001 | 1.0 signal | | `DVOL_EXTREME` | 80 | 1.0 signal | | `FNG_EXTREME_FEAR` | 25 | 1.0 signal (needs confirmation) | | `TAKER_SELLING` | 0.8 | 1.0 signal | ### 19.3 Survival Stack Thresholds (Deliberately Tight) | Posture | Rm Threshold | vs. Math Spec | |---------|-------------|---------------| | APEX | ≥ 0.90 | Tighter — spec was 0.85 | | STALKER | ≥ 0.75 | Tighter — spec was 0.70 | | TURTLE | ≥ 0.50 | Tighter — spec was 0.45 | | HIBERNATE | < 0.50 | — | **Do NOT loosen these without quantitative justification.** --- ## 20. OBF SPRINT 1 HARDENING **Completed**: 2026-03-22. All 25 items in `AGENT_TODO_PRIORITY_FIXES_AND_TODOS.md` addressed. ### 20.1 P0/P1/P2 Hardening (Production Safety) | Item | Change | Severity | |------|--------|----------| | Circuit breaker | 5 consecutive HZ push failures → exponential backoff + file-only fallback | P0 | | Crossed-book guard | Ask ≤ bid on incoming feed → discard snapshot, log warning, continue | P0 | | Dark streak detector | N consecutive zero-volume bars → emit STALE_DATA warning | P1 | | First flush delay | No OB features published until 60s after startup (warmup) | P1 | | Stall watchdog | No new bar for `STALL_TIMEOUT` seconds → alert + optional restart | P1 | | Fire-and-forget HZ push | HZ write moved to background thread; hot loop never blocks on HZ | P2 | | Dynamic asset discovery | `hzobprovider` discovers active symbols from HZ at runtime; no hardcoded list | P2 | | Per-timestamp macro map | `latest_macro_at_ts` keyed by bar timestamp; resolves stale-read race on fast replays | P2 | ### 20.2 P3 Infrastructure Items | Item | Status | |------|--------| | `scripts/verify_parquet_archive.py` — validates all daily parquet files for schema and row count | DONE | | `ob_cache/SCHEMA.md` — authoritative JSON schema for `latest_ob_features.json` | DONE | | P3-1 / P3-5 / P3-6 — out of scope for sprint 1, deferred | SKIPPED | ### 20.3 OBF Architecture Post-Sprint ``` Binance WS feed ↓ obf_prefect_flow.py (hot loop, ~100ms cadence) ├── Crossed-book guard → discard if ask ≤ bid ├── Dark streak detector → N zero-vol bars ├── First flush delay → 60s warmup ├── Feature compute (depth imbalance, spread, vwap, pressure ratio) ├── Per-timestamp macro map update ├── Fire-and-forget HZ push (background thread) │ └── Circuit breaker (5 failures → file-only) └── ob_cache/latest_ob_features.json (local fallback) ``` ### 20.4 OBF Live Data Gap — KNOWN LIMITATION (2026-03-26) > **CRITICAL DATA QUALITY CAVEAT**: `nautilus_event_trader.py` (live event trader) is currently wired to `MockOBProvider` with static per-asset imbalance biases (BTC=-0.086, ETH=-0.092, BNB=+0.05, SOL=+0.05). All four OBF functional dimensions compute and produce real outputs — but with frozen, market-unresponsive inputs. The OB cascade regime will always be CALM (no depth drain in mock data). > > `HZOBProvider` (`/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/hz_ob_provider.py`) exists and is format-compatible with `obf_prefect_flow.py`'s HZ output, but `OBFeatureEngine` has no live streaming path — only `preload_date()` (batch/backtest). A `step_live()` method must be added before the switch. > > **Acceptable for**: paper trading > **NOT acceptable for**: live capital deployment > > **Full spec**: `/mnt/dolphinng5_predict/prod/docs/AGENT_SPEC_OBF_LIVE_SWITCHOVER.md` ### 20.5 Test Coverage `tests/test_obf_unit.py` — ~120 unit tests covering all hardening items: - Circuit breaker state machine (CLOSED → OPEN → HALF-OPEN) - Crossed-book guard triggers on malformed data - Dark streak threshold detection - Warmup period gating - Background thread non-blocking behavior - Asset discovery via HZ key scan --- ## 21. KNOWN RESEARCH TODOs | ID | Description | Priority | |----|-------------|----------| | TODO-1 | Calibrate `vd_enabled` adverse-turn exits (currently disabled). Requires analysis of trade vel_div distribution at entry vs. subsequent bars. True invalidation threshold likely ~+0.02 sustained for N=3 bars. | MEDIUM | | TODO-2 | Validate SUBDAY_ACB force-exit threshold (`old_boost >= 1.25 and boost < 1.10`). Currently ARBITRARY — agent-chosen, not backtest-derived. | MEDIUM | | TODO-3 | MIG8: Binance live adapter (real order execution). OUT OF SCOPE until after 30-day paper trading validation. | LOW | | TODO-4 | 48-hour chaos test with all daemons running simultaneously. Watch for: KeyError, stale-read anomalies, concurrent HZ writer collisions. | HIGH (before live capital) | | TODO-5 | Memory profiler with IRP enabled at 400 assets (current 71 MB measurement was without IRP). Projected ~600 MB — verify. | LOW | | TODO-6 | TF-spread recovery exits (`tf_enabled=False`). Requires sweep of tf_exhaust_ratio and tf_flip_ratio vs. champion backtest. | LOW | | TODO-7 | GREEN (LONG) posture paper validation. LONG thresholds (long_threshold=0.01, long_extreme=0.04) not yet production-validated. | MEDIUM | | TODO-8 | ~~ML-MC Forewarner injection into `nautilus_prefect_flow.py`.~~ **DONE 2026-03-22** — wired in `DolphinActor.on_start()` for both flows. | CLOSED | | TODO-9 | Live TradingNode integration (launcher.py exists; Binance adapter config incomplete). Requires 30-day clean paper run first. | LOW | --- ## 22. 0.1S RESOLUTION — READINESS ASSESSMENT **Assessment date**: 2026-03-22. **Status: BLOCKED — 3 hard blockers.** The current system processes 5s OHLCV bars. Upgrading to 0.1s tick resolution requires resolving all three blockers below before any code changes. ### 22.1 Blocker 1 — Async HZ Push **Problem**: The OBF hot loop fires at ~100ms cadence. At 0.1s resolution, the per-bar HZ write latency (currently synchronous in feature compute path, despite fire-and-forget for the push itself) would exceed bar cadence, causing HZ write queue growth and eventual OOM. **Required**: Full async HZ client (`hazelcast-python-client` async API or aiohazelcast). Currently all HZ operations are synchronous blocking calls. Estimated effort: 2–3 days of refactor + regression testing. ### 22.2 Blocker 2 — `get_depth` Timeout **Problem**: `get_depth()` in `HZOBProvider` issues a synchronous HZ `IMap.get()` call with a 500ms timeout. At 0.1s resolution, each bar would wait up to 500ms for OB depth data — 5× the bar cadence. This makes 0.1s resolution impossible without an in-process depth cache. **Required**: Pre-fetched depth cache (e.g., local dict refreshed by a background subscriber), making `get_depth()` a pure in-process read with <1µs latency. Estimated effort: 1–2 days. ### 22.3 Blocker 3 — Lookback Recalibration **Problem**: All champion parameters that reference "bars" were validated against 5s bars: - `lookback=100` (100 × 5s = 500s warmup) - `max_hold_bars=120` (120 × 5s = 600s max hold) - `dc_lookback_bars=7` (7 × 5s = 35s DC window) At 0.1s resolution, the same bar counts would mean 10s warmup, 12s max hold, 0.7s DC window — **completely invalidating champion params**. All params must be re-validated from scratch via VBT backtest at 0.1s resolution. **Required**: Full backtest sweep at 0.1s. Estimated effort: 1–2 weeks of compute + validation time. This is a research milestone, not an engineering task. ### 22.4 Assessment Summary | Blocker | Effort | Dependency | |---------|--------|------------| | Async HZ push | 2–3 days engineering | None — can start now | | `get_depth` cache | 1–2 days engineering | None — can start now | | Lookback recalibration | 1–2 weeks research | Requires blockers 1+2 resolved first | **Recommendation**: Do NOT attempt 0.1s resolution until after 30-day paper trading validation at 5s. The engineering blockers can be prototyped in parallel, but champion params cannot be certified until post-paper-run stability is confirmed. ## 23. SIGNAL PATH VERIFICATION SPECIFICATION Testing the asynchronous, multi-scale signal path requires systematic validation of the data bridge and cross-layer trigger logic. ### 23.1 Verification Flow A local agent (Prefect or standalone) should verify: 1. **Micro Ingestion**: 100ms OB features sharded across 10 HZ maps. 2. **Regime Bridge**: NG5 Arrow scan detection by `scan_hz_bridge.py` and push to `latest_eigen_scan`. 3. **Strategy Reactivity**: `DolphinActor.on_bar` (5s) pulling HZ data and verifying `scan_number` idempotency. 4. **Macro Safety**: Survival Stack Rm-computation pushing `APEX/STALKER/HIBERNATE` posture to `DOLPHIN_SAFETY`. ### 23.2 Reference Document Full test instructions, triggers, and expected values are defined in: `TODO_CHECK_SIGNAL_PATHS.md` (Project Root) --- *End of DOLPHIN-NAUTILUS System Bible v3.0 — 2026-03-23* *Champion: SHORT only (APEX posture, blue configuration)* *Automation: Prefect-supervised paper trading active.* *Status: Capital Sync enabled; Friction SP-bypass active; TradeLogger running.* *Do NOT deploy real capital until 30-day paper run is clean.* ## 24. MULTI-SPEED EVENT-DRIVEN ARCHITECTURE **Version**: v4.1 Addition — 2026-03-25 **Status**: DEPLOYED (Production) **Author**: Kimi Code CLI Agent **Related**: `AGENT_READ_ARCHITECTURAL_CHANGES_SPEC.md` (detailed specification) ### 24.1 Overview The DOLPHIN system has been re-architected from a **single-speed batch-oriented Prefect deployment** to a **multi-speed, event-driven, multi-worker architecture** with proper resource isolation and self-healing capabilities. **Problem Solved**: 2026-03-24 system outage caused by uncontrolled Prefect process explosion (60+ `prefect.engine` zombies → resource exhaustion → kernel deadlock). **Solution**: Frequency isolation + concurrency limits + systemd resource constraints + event-driven architecture. ### 24.2 Architecture Layers | Layer | Frequency | Component | Pattern | Status | |-------|-----------|-----------|---------|--------| | L1 | <1ms | Nautilus Event Trader | Hz Entry Listener | ✅ Active (PID 159402) | | L2 | 1-10s | Scan Bridge | File watcher → Hz | ✅ Active (PID 158929) | | L3 | Varied | ExtF Indicators | Scheduled per-indicator | ⚠️ Not running (NG6 down) | | L4 | ~5s | Meta Health Service | 5-sensor monitoring | ✅ Active (PID 160052) | | L5 | Daily | Paper/Nautilus Flows | Prefect scheduled | ✅ Scheduled | ### 24.3 Nautilus Event-Driven Trader **Purpose**: Millisecond-latency trading via Hazelcast event listener (not polling). **Implementation**: ```python # Hz Entry Listener Pattern features_map.add_entry_listener( key='latest_eigen_scan', updated_func=on_scan_update # Called per scan ) def on_scan_update(event): scan = json.loads(event.value) signal = compute_signal(scan, ob_data, extf_data) if signal.valid: execute_trade(signal) # <10ms total latency ``` **Service**: `dolphin-nautilus-trader.service` **Resource Limits**: MemoryMax=2G, CPUQuota=200%, TasksMax=50 **Hz Input**: `DOLPHIN_FEATURES["latest_eigen_scan"]` **Hz Output**: `DOLPHIN_PNL_BLUE[YYYY-MM-DD]`, `DOLPHIN_STATE_BLUE` ### 24.4 Scan Bridge Service **Purpose**: Detect Arrow scan files from DolphinNG6, push to Hz. **Deployment**: `scan-bridge-flow/scan-bridge` (Prefect) **Concurrency**: Strictly limited to 1 **Safety Mechanisms**: - Work pool concurrency limit: 1 - Deployment concurrency limit: 1 - File mtime-based detection (handles NG6 restarts) **Current Status**: Running directly (PID 158929) due to Prefect worker scheduling issues. ### 24.5 Meta Health Service v3 (MHS) — REWRITTEN v5.0 > **MHS v2 is retired.** `meta_health_daemon_v2.py` was calling `systemctl restart` on supervisord-managed processes — this was the "random killer" bug. v3 is the canonical implementation. **File**: `meta_health_service_v3.py` **Supervisord**: `dolphin_data:meta_health` (`autostart=true`) #### 24.5.1 Five-Sensor Model (Weighted Sum — NOT product) | Sensor | Weight | Metric | Thresholds | |--------|--------|--------|------------| | M4 | 0.35 | Control Plane (HZ port 5701 + Prefect 4200) | HZ=0.8w, Prefect=0.2w | | M1 | 0.35 | Process Integrity (supervisord status) | data services scored separately from trader | | M3 | 0.20 | Data Freshness (HZ key timestamps) | >30s=stale(0.5), >120s=dead(0.0) | | M5 | 0.10 | Data Coherence (boost range, OBF coverage) | OBF<200 assets=0.5 | | M2 | — | Heartbeat (informational only) | Not in rm_meta | | M1_trader | — | Trader process (informational only) | Not in rm_meta (may be intentionally stopped) | #### 24.5.2 Rm_meta Formula ```python # FIX-1: Weighted sum — no single sensor can zero rm_meta (v2 bug fixed) rm_meta = (0.35*m4 + 0.35*m1_data + 0.20*m3 + 0.10*m5) / 1.0 # Thresholds rm > 0.85: GREEN rm > 0.60: DEGRADED rm > 0.30: CRITICAL rm ≤ 0.30: DEAD → Recovery triggered (only for STOPPED critical_data services) ``` #### 24.5.3 Recovery Policy ```python # FIX-2: supervisorctl restart, NOT systemctl (v2 bug fixed) # FIX-3: 10s cooldown for critical services (was 600s) # FIX-4: Non-blocking daemon thread (hung subprocess won't block check loop) # FIX-5: Per-service cooldown (independent buckets per program) # FIX-6: Only STOPPED critical_data services are restarted. Trader never auto-restarted. RECOVERY_COOLDOWN_CRITICAL_S = 10.0 # exf, acb, obf_universe RECOVERY_COOLDOWN_DEFAULT_S = 300.0 # nautilus_trader, scan_bridge (informational only) CHECK_INTERVAL_S = 10.0 ``` #### 24.5.4 Monitored Services | supervisord program | critical_data | Auto-restarted by MHS | |---|---|---| | `dolphin_data:exf_fetcher` | ✅ | ✅ (10s cooldown) | | `dolphin_data:acb_processor` | ✅ | ✅ (10s cooldown) | | `dolphin_data:obf_universe` | ✅ | ✅ (10s cooldown) | | `dolphin:nautilus_trader` | ❌ | ❌ (informational) | | `dolphin:scan_bridge` | ❌ | ❌ (informational) | #### 24.5.5 Monitored HZ Sources | Key | Map | Timestamp Field | Notes | |---|---|---|---| | `exf_latest` | `DOLPHIN_FEATURES` | `_pushed_at` | Unix float | | `acb_boost` | `DOLPHIN_FEATURES` | (none — presence only) | — | | `latest_eigen_scan` | `DOLPHIN_FEATURES` | `timestamp` | ISO string | | `obf_universe_latest` | `DOLPHIN_FEATURES` | `_snapshot_utc` | Unix float | **Output**: `DOLPHIN_META_HEALTH["latest"]` — JSON health report, also written to `run_logs/meta_health.json` ### 24.6 Safety Mechanisms #### 24.6.1 Concurrency Controls (Root Cause Fix) | Level | Mechanism | Value | Prevents | |-------|-----------|-------|----------| | Work Pool | `concurrency_limit` | 1 | Multiple simultaneous runs | | Deployment | `prefect concurrency-limit` | 1 (tag-based) | Tag-based overflow | | Systemd | `TasksMax` | 50 | Process fork bombs | | Systemd | `MemoryMax` | 2G | OOM conditions | | Systemd | `CPUQuota` | 200% | CPU starvation | #### 24.6.2 Recovery Procedures | Scenario | Trigger | Action | |----------|---------|--------| | Critical data service STOPPED | rm CRITICAL/DEAD + service STOPPED | `supervisorctl restart ` (async, 10s cooldown) | | Data staleness | M3 < 0.5 | Alert only (external data dependency) | | Control plane down | M4 < 0.5 | Alert (MHS can't self-heal HZ) | | Trader stopped | m1_trader < 1.0 | Informational only — NEVER auto-restarted | ### 24.7 Data Flow: Scan-to-Trade ``` DolphinNG6 → Arrow File → Scan Bridge → Hz → Entry Listener → Nautilus → Trade (Win) (SMB) (5s poll) (μs) (<1ms) (<1ms) (<10ms) Target: <10ms from NG6 scan to trade execution Current: Waiting for NG6 restart to validate ``` ### 24.8 Service Status (v5.0 — As Running 2026-03-30) | supervisord program | Status | Notes | |---|---|---| | `dolphin_data:exf_fetcher` | ✅ RUNNING | Pushes exf_latest every ~60s | | `dolphin_data:acb_processor` | ✅ RUNNING | Pushes acb_boost on NG3 data | | `dolphin_data:obf_universe` | ✅ RUNNING | 512/540 assets healthy at launch | | `dolphin_data:meta_health` | ✅ RUNNING | RM_META≈0.975 [GREEN] | | `dolphin:nautilus_trader` | ⚙️ STOPPED (manual) | Start when trading | | `dolphin:scan_bridge` | ⚙️ STOPPED (manual) | Start when DolphinNG6 active | | hazelcast | ✅ (docker) | Port 5701 | | prefect-server | ✅ (docker) | Port 4200 | **RETIRED (stopped + disabled)**: - `dolphin-nautilus-trader.service` (systemd) — was causing dual-management - `dolphin-scan-bridge.service` (systemd) — was causing dual-management - `meta_health_daemon.service` (systemd) — was calling `systemctl restart` on supervisord processes (root cause of random killer bug) ### 24.9 Known Issues (v5.0) | Issue | Status | Notes | |-------|--------|-------| | NG6 down (no scan data) | External dependency | `latest_eigen_scan` key absent; MHS reports this cleanly | | OBF shard store (400 assets) vs universe (540) | Architecture gap | Shard store is used by trading engine; universe is health-only | ### 24.10 Operational Commands ```bash CONF=/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf # Status supervisorctl -c $CONF status # Restart a service supervisorctl -c $CONF restart dolphin_data:exf_fetcher # Start the trader supervisorctl -c $CONF start dolphin:nautilus_trader # View MHS health cat /mnt/dolphinng5_predict/run_logs/meta_health.json # View supervisord logs tail -f /mnt/dolphinng5_predict/prod/supervisor/logs/meta_health.log ``` ### 24.11 File Locations | Component | Path | |-----------|------| | Nautilus Trader | `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py` | | MHS v3 | `/mnt/dolphinng5_predict/prod/meta_health_service_v3.py` | | MHS v2 (retired) | `/mnt/dolphinng5_predict/prod/meta_health_daemon_v2.py` | | OBF Universe | `/mnt/dolphinng5_predict/prod/obf_universe_service.py` | | Scan Bridge | `/mnt/dolphinng5_predict/prod/scan_bridge_service.py` | | Supervisord Conf | `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf` | | MHS Logs | `/mnt/dolphinng5_predict/prod/supervisor/logs/meta_health.log` | | MHS Health JSON | `/mnt/dolphinng5_predict/run_logs/meta_health.json` | | OBF Universe Data | `/mnt/ng6_data/ob_universe/` (Hive partitioned Parquet) | --- ## §25 Numerical Precision Policy ### 25.1 Principle **Must use 512-bit native precision when possible, if performance allows.** This system has `python-flint` v0.8.0 installed on all compute nodes. All critical financial math (vol computation, returns, rolling std) MUST use `flint.arb` at 512-bit unless performance constraints make it prohibitive. ### 25.2 Configuration ```python from flint import arb, ctx ctx.prec = 512 # 512-bit mantissa; set once per process ``` ### 25.3 Performance Benchmark (2026-03-28) | Mode | dvol/day | Total 56d | Overhead | |--------------|-----------|-----------|---------- | | numpy float64 | ~94 ms | ~5.3 s | baseline | | arb 512-bit | ~295 ms | ~16.5 s | +11 s | +11 seconds on a ~718 second total run (1.5% overhead) — acceptable. ### 25.4 Validated Equivalence Benchmark result against full 56-day window (6154 rows/day sampled): - NaNs in arb result: **0** - vol_ok=True bars match float64: **MATCH=True** - Divergent bars: **0** float64 and 512-bit produce identical `vol_ok` decisions for this signal at current BTC price magnitudes. The 512-bit path is used as the primary path to prevent precision erosion from future edge cases (extreme micro-volatility, very large or very small price moves). ### 25.5 Implementation Pattern ```python def _compute_dvol_arb512(prices, n_rows, threshold): """Primary: 512-bit arb. Returns None if flint unavailable (fall back to float64).""" try: from flint import arb, ctx ctx.prec = 512 except ImportError: return None # ... arb rolling std ... # Call site: vol_ok_mask = _compute_dvol_arb512(btc, n_rows, VOL_P60_THRESHOLD) if vol_ok_mask is None: # float64 fallback — guards only; should not be reached on production nodes ... ``` ### 25.6 Scope | Computation | Precision | File | |-------------|-----------|------| | Rolling 50-bar dvol (vol_ok) | arb 512-bit | `nautilus_native_continuous.py` | | All other paths | numpy float64 | — | Future additions (returns, leverage math, position sizing) should follow the same pattern: 512-bit primary, float64 last-resort guard. --- ## 26. SUPERVISORD ARCHITECTURE & OBF UNIVERSE (v5.0) ### 26.1 The "Random Killer" Bug — Root Cause & Fix **Incident**: Services were being unexpectedly killed and restarted at seemingly random intervals. The system appeared healthy according to supervisord but processes would die without obvious cause. **Root cause** (diagnosed 2026-03-30): 1. `meta_health_daemon_v2.py` had been running under `meta_health_daemon.service` (systemd) for 4+ days. 2. MHS v2's process patterns (`exf_prefect_final`, `esof_prefect_flow`) did not match any running process → M1=0 → `rm_meta = M1*M2*M3*M4*M5 = 0` always → status="DEAD". 3. MHS v2 recovery action: `systemctl restart ` — called every 5s. 4. But the services were supervisord-managed, not systemd-managed. `systemctl restart` on a supervisord process: - Sends SIGTERM to the process (it dies) - Supervisord detects the death and autostarts a new instance - Creates brief duplicate processes, interleaved with MHS v2's next kill cycle 5. Additionally, `dolphin-nautilus-trader.service` (systemd) AND supervisord were both managing `nautilus_event_trader.py` simultaneously — two PIDs running at once. **Fix applied**: ```bash systemctl stop meta_health_daemon.service && systemctl disable meta_health_daemon.service systemctl stop dolphin-nautilus-trader.service && systemctl disable dolphin-nautilus-trader.service systemctl stop dolphin-scan-bridge.service && systemctl disable dolphin-scan-bridge.service ``` **Permanent guard**: `test_mhs_v3.py::TestKillAndRevive::test_no_systemd_units_active_for_managed_services` asserts no conflicting systemd units are active. ### 26.2 OBF Universe Service **Purpose**: Lightweight L2 order book health monitor for ALL 540 active USDT perpetuals on Binance Futures. **Why**: Asset Picker needs OB health scores for the full universe (540 assets) to make informed selection decisions, not just the 400 assets covered by the existing OBF shard store. **Design**: Push streams (zero REST weight), no polling. ``` wss://fstream.binance.com/ws Connection 1: 200 symbols × @depth5@500ms Connection 2: 200 symbols × @depth5@500ms Connection 3: 140 symbols × @depth5@500ms (total: 540, Binance limit: 300/conn) ``` **Computed metrics per asset** (every 60s snapshot): | Field | Description | |---|---| | `spread_bps` | (ask - bid) / mid × 10000 | | `depth_1pct_usd` | Total USD volume within 1% of mid on both sides | | `depth_quality` | Normalized depth score [0,1] | | `fill_probability` | Estimated probability of fill at mid | | `imbalance` | (bid_vol - ask_vol) / (bid_vol + ask_vol) | | `best_bid`, `best_ask` | L1 prices | | `n_bid_levels`, `n_ask_levels` | Depth5 levels received | **HZ output** (`DOLPHIN_FEATURES["obf_universe_latest"]`): ```json { "_snapshot_utc": 1743350400.0, "_n_assets": 512, "assets": { "BTCUSDT": {"spread_bps": 0.42, "depth_quality": 0.91, ...}, "ETHUSDT": {...}, ... } } ``` **Parquet storage**: `/mnt/ng6_data/ob_universe/` (Hive: `date=YYYY-MM-DD/part-NNN.parquet`) - `MAX_FILE_AGE_DAYS = 0` — never pruned, accumulates for backtesting - Flush cadence: every 300s **Key constants**: ```python SNAPSHOT_INTERVAL_S = 60 # HZ push cadence MAX_STREAMS_PER_CONN = 200 # Binance limit respected FLUSH_INTERVAL_S = 300 # Parquet write cadence ``` ### 26.3 MHS v3 — Full Architecture Reference **File**: `prod/meta_health_service_v3.py` **Tests**: `prod/tests/test_mhs_v3.py` (111 tests, including Hypothesis property tests) #### 26.3.1 Constants ```python CHECK_INTERVAL_S = 10.0 # main loop cadence DATA_STALE_S = 30.0 # age threshold for stale (score=0.5) DATA_DEAD_S = 120.0 # age threshold for dead (score=0.0) RECOVERY_COOLDOWN_CRITICAL_S = 10.0 # critical data infra restart cooldown RECOVERY_COOLDOWN_DEFAULT_S = 300.0 # informational services (never restarted) ``` #### 26.3.2 Weighted Sensor Formula ```python SENSOR_WEIGHTS = { "m4_control_plane": 0.35, # HZ port 5701 (×0.8) + Prefect 4200 (×0.2) "m1_data_infra": 0.35, # fraction of critical_data services RUNNING "m3_data_freshness": 0.20, # average freshness score across HZ keys "m5_coherence": 0.10, # ACB boost range validity + OBF coverage } # m1_trader and m2_heartbeat: emitted but NOT in rm_meta (may be intentionally stopped) rm_meta = sum(weight × sensor) / sum(weights) ``` #### 26.3.3 Recovery Logic ```python def _restart_via_supervisorctl(self, program: str): """ - Checks per-service cooldown (10s critical, 300s default) - Commits timestamp BEFORE spawning thread (prevents double-fire) - Runs in daemon thread — never blocks the check loop - Uses: supervisorctl -c restart - NEVER calls systemctl """ ``` #### 26.3.4 Test Suite Summary | Class | Tests | Coverage | |---|---|---| | `TestSupervisordStatusParsing` | 7 | parseg all supervisorctl output variants | | `TestM1ProcessIntegrity` | 7 | scoring with mocked sv_status, psutil fallback | | `TestM3DataFreshnessScoring` | 7 | stale/dead thresholds, ISO timestamps | | `TestRmMetaFormula` | 10 | weighted sum, product-formula regression guard | | `TestRecoveryGating` | 5 | cooldown, thread isolation | | `TestRecoveryNeverKillsRunning` | 6 | running services never restarted | | `TestM4ControlPlane` | 4 | port checks with mocked socket | | `TestM5Coherence` | 7 | boost range, OBF coverage thresholds | | `TestLiveIntegration` | 10 | live HZ + supervisord (skip if unavailable) | | `TestKillAndRevive` | 9 | E2E: stop service → MHS detects → restarts within 30s | | `TestServiceRegistry` | 7 | invariants: cooldown ≤ 10s, check interval ≤ 15s | | `TestRaceConditions` | 5 | 10 concurrent restarts same service → only 1 fires | | `TestEdgeCases` | 14 | garbage JSON, future timestamps, NaN sensors | | `TestHypothesisProperties` | 13 | 300–500 examples each: rm∈[0,1], monotone sensors, status valid | **Run**: ```bash source /home/dolphin/siloqy_env/bin/activate cd /mnt/dolphinng5_predict python -m pytest prod/tests/test_mhs_v3.py -v --tb=short # ~5 minutes (E2E tests) ``` ### 26.4 OBF Persistence Fix **File**: `prod/obf_persistence.py` **Bug (v4.1)**: `MAX_FILE_AGE_DAYS = 7` — every daily cleanup run deleted all OBF Parquet data older than 7 days, destroying the entire backtesting dataset. **Fix (v5.0)**: ```python MAX_FILE_AGE_DAYS = 0 # 0 = disabled — never prune, accumulate for backtesting def _cleanup_old_partitions(self): """0 = disabled.""" if not MAX_FILE_AGE_DAYS or not self.base_dir.exists(): return ... ``` Data now accumulates indefinitely in `/mnt/ng6_data/ob_features/` (existing OBF) and `/mnt/ng6_data/ob_universe/` (new universe service). --- --- ## 27. NG8 LINUX EIGENSCAN SERVICE **File**: `- Dolphin NG8/ng8_scanner.py` **Status**: Built, smoke-tested. Replaces Windows NG7 eigenscan. **Run**: `source /home/dolphin/siloqy_env/bin/activate && cd "/mnt/dolphinng5_predict/- Dolphin NG8" && python3 ng8_scanner.py` ### 27.1 Root Cause: NG7 Double-Output Bug Windows NG7 maintained two independent tracker cycles: - **Fast cycle** (w50, w150): completed ~11s after scan start → wrote Arrow file 1, HZ write 1 - **Slow cycle** (w300, w750): completed ~3 min later with **stale BTC price** → wrote Arrow file 2, HZ write 2 Both cycles shared the same `scan_number` counter. Result: two Arrow files per logical scan, the second containing stale prices from 3 minutes earlier. The scan bridge de-duplicated by file mtime (file 1 is always the useful one). ### 27.2 NG8 Fix: Single `enhance()` Pass `DolphinCorrelationEnhancerArb512.enhance()` processes all four windows (50, 150, 300, 750) in a single sequential loop. NG8 calls this once per scan cycle: ```python result = self.engine.enhance(price_data, PRIORITY_SYMBOLS, now) # result.multi_window_results has all four windows populated # Exactly one Arrow write + one HZ write follows ``` `use_arrow=False` is passed to the engine constructor so the engine does **not** perform its own internal Arrow write — `ng8_scanner.py` owns that write exclusively. ### 27.3 Schema Contract (Doctrinal NG5) Arrow IPC schema is defined in `ng7_arrow_writer_original.py` → `SCAN_SCHEMA` (27 fields, `SCHEMA_VERSION="5.0.0"`). `arrow_writer.py` is a thin re-export shim: ```python # arrow_writer.py from ng7_arrow_writer_original import ( ArrowEigenvalueWriter, ArrowScanReader, write_scan_arrow, read_scan_arrow, ) ``` **NEVER** modify `arrow_writer.py` schema — edit `ng7_arrow_writer_original.py`. Key schema fields: | Field | Type | Description | |---|---|---| | `scan_number` | int64 | monotonic counter, resumes from last Arrow file on restart | | `timestamp_ns` | int64 | Unix nanoseconds at scan start | | `w50_lambda_max` … `w750_instability` | float64 × 16 | per-window eigenstats | | `vel_div` | float64 | velocity divergence (cross-window signal) | | `regime_signal` | float64 | -1 / 0 / +1 | | `instability_composite` | float64 | composite of w50…w750 instability | | `assets` / `prices` / `loadings` | utf8 | JSON-serialised | | `schema_version` | utf8 | "5.0.0" | ### 27.4 Storage ``` Arrow files : /mnt/dolphinng6_data/arrow_scans/YYYY-MM-DD/scan_NNNNNN_HHMMSS.arrow ArrowEigenvalueWriter storage_root = /mnt/dolphinng6_data # writer appends arrow_scans/ internally ``` **Critical**: pass `get_arrow_scans_path().parent` (= `/mnt/dolphinng6_data`) — NOT `get_arrow_scans_path()` — or the writer creates `arrow_scans/arrow_scans/` double-nesting. ### 27.5 Hazelcast Output Map: `DOLPHIN_FEATURES` → key `latest_eigen_scan` **NG8 flat payload** (written by NG8, differs from NG7 nested payload): ```python { "scan_number": int, "timestamp": "ISO-8601", "bridge_ts": float, # Unix epoch at HZ write "vel_div": float, "w50_velocity": float, "w150_velocity": float, "w300_velocity": float, "w750_velocity": float, "eigenvalue_gradients": {...}, "multi_window_results": {...}, # full per-window stats } ``` TUI v3 `_eigen_from_scan()` normalises both NG7 nested and NG8 flat formats transparently. ### 27.6 Scan Number Continuity On startup, `_load_last_scan_number(arrow_scans_dir)` scans all `scan_NNNNNN_*.arrow` filenames for the highest N and resumes from N+1. Prevents counter reset gaps after service restart. ### 27.7 Symbol List 50 symbols matching doctrinal NG3/NG5/NG7 `PRIORITY_SYMBOLS`. Do NOT change this list without a full schema migration — historical correlation matrices are computed on this exact universe. ### 27.8 Supervisord Integration (Pending) Add to `dolphin-supervisord.conf`: ```ini [program:ng8_scanner] command=/home/dolphin/siloqy_env/bin/python3 ng8_scanner.py directory=/mnt/dolphinng5_predict/- Dolphin NG8 autostart=false ; manual start until NG7 Windows is formally retired autorestart=true stderr_logfile=/var/log/dolphin/ng8_scanner.err.log stdout_logfile=/var/log/dolphin/ng8_scanner.out.log ``` Set `autostart=true` only after confirming Windows NG7 is shut down — dual-write to the same HZ key is safe (last-write-wins) but creates confusing Arrow audit trails. --- ## 28. TUI v3 — LIVE OBSERVABILITY TERMINAL **File**: `Observability/TUI/dolphin_tui_v3.py` **Run**: `source /home/dolphin/siloqy_env/bin/activate && cd /mnt/dolphinng5_predict/Observability/TUI && python3 dolphin_tui_v3.py` **Framework**: Textual 8.1.1 (siloqy_env) **Bindings**: `q` quit · `r` force-refresh · `l` log panel · `t` toggle test footer ### 28.1 Architecture: Zero Load on Origin System All data flows via **Hazelcast entry listeners** (push model): ``` HZ maps ──push──► _State (thread-safe dict) ──call_from_thread──► Textual asyncio loop │ set_interval(1s) ────────┘ ``` `IMap.add_entry_listener(include_value=True, updated=fn, added=fn)` fires callbacks from the HZ internal thread pool on any map change. No polling of origin systems. Prefect is the **only** polled source — 60s interval via `run_worker(prefect_poll_loop())`. ### 28.2 Panel Map | Panel | HZ Source | Update Trigger | |---|---|---| | **Header** | `DOLPHIN_HEARTBEAT` | HZ listener | | **Trader** | `DOLPHIN_STATE_BLUE`, `DOLPHIN_FEATURES/latest_eigen_scan`, `DOLPHIN_HEARTBEAT` | HZ listener | | **SysHealth (M1–M5)** | `DOLPHIN_META_HEALTH/latest` | HZ listener | | **AlphaEngine** | `DOLPHIN_FEATURES/latest_eigen_scan` | HZ listener (eigenscan) | | **Scan** | `DOLPHIN_FEATURES/latest_eigen_scan` | HZ listener (eigenscan) | | **ExtF** | `DOLPHIN_FEATURES/ext_features_latest` | HZ listener | | **OBF** | `DOLPHIN_FEATURES/obf_features_latest` | HZ listener | | **Capital** | `DOLPHIN_STATE_BLUE`, `DOLPHIN_SAFETY` | HZ listener | | **Prefect** | Prefect SDK | 60s poll | | **ACB** | `DOLPHIN_FEATURES/acb_state_latest` | HZ listener | | **MC-Forewarner** | `DOLPHIN_FEATURES/mc_forewarner_latest` | HZ listener (or "not deployed") | | **Test Footer** | `run_logs/test_results_latest.json` | File read on mount + `t` toggle | ### 28.3 HZ Maps Listened ```python DOLPHIN_FEATURES: latest_eigen_scan, ext_features_latest, obf_features_latest, acb_state_latest, mc_forewarner_latest DOLPHIN_META_HEALTH: latest DOLPHIN_SAFETY: latest DOLPHIN_STATE_BLUE: latest DOLPHIN_HEARTBEAT: latest ``` ### 28.4 Test Results Footer The footer reads `run_logs/test_results_latest.json` (relative to `dolphin_tui_v3.py`'s working directory, i.e., `/mnt/dolphinng5_predict/run_logs/test_results_latest.json`). **Schema**: ```json { "_run_at": "2026-04-05T12:00:00", "data_integrity": {"passed": 15, "total": 15, "status": "PASS"}, "finance_fuzz": {"passed": null, "total": null, "status": "N/A"}, "signal_fill": {"passed": null, "total": null, "status": "N/A"}, "degradation": {"passed": 12, "total": 12, "status": "PASS"}, "actor": {"passed": null, "total": null, "status": "N/A"} } ``` **Write API** (exported from `dolphin_tui_v3.py`): ```python from dolphin_tui_v3 import write_test_results write_test_results({ "data_integrity": {"passed": 15, "total": 15, "status": "PASS"}, "finance_fuzz": {"passed": 8, "total": 8, "status": "PASS"}, ... }) ``` `write_test_results()` atomically writes `_run_at` (current UTC ISO timestamp) + the provided category dict. The TUI footer auto-refreshes on next mount or `t` keypress. Full integration documentation: `prod/docs/TEST_REPORTING.md`. ### 28.5 NG7 / NG8 Dual Format Normalisation `_eigen_from_scan(scan)` handles both live HZ formats: ```python def _eigen_from_scan(scan): # NG7 nested: scan["result"]["multi_window_results"]["50"]["velocity"] # NG8 flat: scan["multi_window_results"]["50"]["velocity"] result = scan.get("result", scan) mwr = result.get("multi_window_results", {}) for w in (50, 150, 300, 750): row = mwr.get(w) or mwr.get(str(w)) or {} ... ``` ### 28.6 MC-Forewarner Integration **Status: DEPLOYED AND RUNNING** — `prod/mc_forewarner_flow.py`, Prefect schedule `0 */4 * * *` (every 4 hours UTC). MC-Forewarner writes to `DOLPHIN_FEATURES` key `mc_forewarner_latest`. The TUI entry listener fires on each write and populates the full MC footer panel: `catastrophic_prob` Digits + ProgressBar, `envelope_score` bar, prob sparkline history, `source` label (`REAL_MODEL` / `FALLBACK_NO_DATA` / `FALLBACK_ERROR`). If the TUI starts between 4-hour runs and HZ has never been written to (e.g., fresh HZ instance), the footer shows `"awaiting HZ data (runs every 4h via Prefect)"` in yellow. This is a cold-start state only — once the first Prefect run completes the key persists in HZ indefinitely (no TTL). **MC payload schema**: ```json { "status": "GREEN | ORANGE | RED", "catastrophic_prob": 0.07, "envelope_score": 0.91, "source": "REAL_MODEL | FALLBACK_NO_DATA | FALLBACK_ERROR", "timestamp": "2026-04-05T14:00:00+00:00" } ``` **Thresholds**: GREEN `prob < 0.10` · ORANGE `0.10–0.30` · RED `≥ 0.30` **Models path**: `nautilus_dolphin/mc_results/models/*.pkl` — if absent, falls back to `FALLBACK_NO_DATA` (ORANGE, prob=0.20, env=0.80) which is a safe conservative posture, never random. ### 28.7 Pending: DOLPHIN_PNL_BLUE The Trader panel contains placeholder text `"read DOLPHIN_PNL_BLUE (not yet wired)"`. Open positions and session PnL data should be sourced from this map when Nautilus live trading is active. --- *End of DOLPHIN-NAUTILUS System Bible v6.0 — 2026-04-05* *Champion: SHORT only (APEX posture, blue configuration)* *Process manager: Supervisord exclusively (systemd units retired).* *MHS v3: Active, RM_META≈0.975 [GREEN], 10s critical recovery cooldown.* *OBF Universe: 540 assets live, zero REST weight WS push streams.* *NG8 Scanner: Built, smoke-tested. Awaiting NG7 Windows retirement before autostart.* *TUI v3: Live event-driven observability. All HZ panels hot. MC-Forewarner footer live (4h Prefect cadence). Test footer CI-ready.* *Test gates: 409+ tests green across all suites.* *Do NOT deploy real capital until 30-day paper run is clean.*