# DOLPHIN-NAUTILUS — Production Bringup Master Plan # "From Batch Paper Trading to Hyper-Reactive Memory/Compute Layer Live Algo" **Authored**: 2026-03-06 **Authority**: Synthesizes NAUTILUS-DOLPHIN Prod System Spec (17 pages), LAYER_BRINGUP_PLAN.md, BRINGUP_GUIDE.md, and full champion research state. **Champion baseline** (supersedes spec targets): ROI=+44.89%, PF=1.123, DD=14.95%, Sharpe=2.50, WR=49.3% (55-day, abs_max_lev=6.0). **Spec note**: PDF spec targets (ROI>35%, Sharpe>2.0) are PRE-latest research. Current champion is already superior. Those floors hold as CI regression gates only. **Principle**: The system must be FUNCTIONAL at every phase boundary. Never leave a partially broken state. Each phase ends with a green CI gate. **Deferred (later MIG steps)**: Linux RT kernel, DPDK kernel bypass, TLA+ formal spec, Rocq/Coq proofs. These are asymptotic perfection items, not blockers for live trading. --- ## DAL Reliability Mapping (DO-178C adaptation) | DAL | Component | Failure consequence | Required gate | |-----|-----------|--------------------|--------------------| | A | Kill-switch, capital ledger | Total loss, uncontrolled exposure | Hardware + software interlock | | B | MC-Forewarner, ACB v6 | Excessive drawdown (>20%) | CI regression + integration test | | C | Alpha signal (vel_div, IRP) | Missed trades or false signals | Unit + smoke test | | D | EsoF, DVOL environmental | Suboptimal sizing | Integration test (optional) | | E | Backfill, dashboards | Observability loss only | Best-effort | --- ## Architecture Target (End State — Post MIG7) ``` [ARB512 Scanner] ──► eigenvalues/YYYY-MM-DD/*.json │ [Prefect SITARA — orchestration layer] ├── ExF fetcher flow (macro data, daily) ├── EsoF calculator flow (daily) ├── MC-Forewarner flow (4-hourly) └── Watchdog flow (10s heartbeat) │ [Hazelcast IMDG — hot feature store] ├── DOLPHIN_FEATURES IMap (per-asset, Near Cache) ├── DOLPHIN_STATE_BLUE/GREEN IMap (capital, drawdown) ├── DOLPHIN_SAFETY AtomicReference (posture, kill switch) └── ACB EntryProcessor (atomic boost update) │ [Nautilus-Trader — execution core (Rust)] ├── NautilusActor ←→ NDAlphaEngine ├── AsyncDataEngine (bar subscription) └── Binance Futures adapter (live orders) │ [Survival Stack — 5 categories × 4 postures] Cat1:Invariants → Cat2:Structural → Cat3:Micro → Cat4:Environmental → Cat5:CapitalStress → Rm multiplier → APEX/STALKER/TURTLE/HIBERNATE ``` --- ## MIG0 — Current State Verification (Baseline Gate) **Goal**: Confirm the existing batch paper trading system is fully operational and CI-clean before any migration work begins. Never build on a broken foundation. **Current state**: - Docker stack: Hazelcast 5.3 (port 5701), HZ-MC (port 8080), Prefect Server (port 4200) - Prefect worker running on `dolphin` pool, deployment `dolphin-paper-blue` scheduled daily 00:05 UTC - `paper_trade_flow.py` loads JSON scan files, computes vel_div, runs NDAlphaEngine SHORT-only - Capital NOT persisted (restarts at 25k each day — KNOWN LIMITATION) - OB = MockOBProvider (static 62% fill, -0.09 imbalance bias) - No graceful degradation, no posture management - CI: 5 layers, 24/24 tests passing ### MIG0 Verification Steps **Step MIG0.1 — CI green** ```bash cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict" source "/c/Users/Lenovo/Documents/- Siloqy/Scripts/activate" bash ci/run_ci.sh ``` PASS criteria: - All 24 tests pass (layers 1-5) - Exit code 0 - Layer 3 regression: PF >= 1.08, WR >= 42%, trades >= 5 on 10-day VBT window **Step MIG0.2 — Infrastructure health** ```bash docker compose -f prod/docker-compose.yml ps # ASSERT: hazelcast, hz-mc, prefect-server all "running" (not "restarting") curl -s http://localhost:4200/api/health | python -c "import sys,json; d=json.load(sys.stdin); assert d['status']=='ok', d" # ASSERT: Prefect API healthy python -c "import hazelcast; c=hazelcast.HazelcastClient(); c.shutdown(); print('HZ OK')" # ASSERT: prints "HZ OK" with no exception ``` **Step MIG0.3 — Manual paper run** ```bash source "/c/Users/Lenovo/Documents/- Siloqy/Scripts/activate" PREFECT_API_URL=http://localhost:4200/api \ python prod/paper_trade_flow.py --date $(date +%Y-%m-%d) --config prod/configs/blue.yml # ASSERT: prints "vel_div range=[...]", prints "total_trades=N" where N > 0 # ASSERT: HZ IMap DOLPHIN_PNL_BLUE contains today's entry ``` FAIL criteria for MIG0: Any CI failure, any container in restart loop, zero trades on valid scan date. **MIG0 GATE**: CI 24/24 + all 3 infra checks green. Only then proceed to MIG1. --- ## MIG1 — Prefect SITARA: All Subsystems as Flows + State Persistence **Goal**: Separate the "slow-thinking" (macro, orchestration) from the "fast-doing" (engine, execution). All support subsystems run as independent Prefect flows with retry logic. Capital persists across daily runs. **Spec reference**: Sec IV (Prefect SITARA), "slow-thinking / fast-doing separation." **Why now**: State persistence eliminates the #1 known limitation (restarts at 25k daily). Subsystem flows give observability + retry without coupling to the trading flow. ### MIG1.1 — Capital State Persistence (DAL-A) **What to build**: At flow start, restore capital from HZ. At flow end, write capital + drawdown + session summary back to HZ. If HZ unavailable, fall back to local JSON ledger. **File to modify**: `prod/paper_trade_flow.py` Implementation pattern (add to flow body): ```python # ---- Restore capital ---- STATE_KEY = f"state_{strategy_name}_{date_str}" try: raw = imap_state.get(STATE_KEY) or imap_state.get('latest') or '{}' state = json.loads(raw) if state.get('strategy') == strategy_name and state.get('capital', 0) > 0: engine.capital = float(state['capital']) engine.initial_capital = float(state['capital']) logger.info(f"[STATE] Restored capital={engine.capital:.2f} from HZ") except Exception as e: logger.warning(f"[STATE] HZ restore failed: {e} — using config capital") # ---- Persist capital at end ---- try: new_state = { 'strategy': strategy_name, 'capital': engine.capital, 'date': date_str, 'pnl': day_result['pnl'], 'trades': day_result['trades'], 'peak_capital': max(engine.capital, state.get('peak_capital', engine.capital)), 'drawdown': 1.0 - engine.capital / max(engine.capital, state.get('peak_capital', engine.capital)), } imap_state.put('latest', json.dumps(new_state)) imap_state.put(STATE_KEY, json.dumps(new_state)) except Exception as e: logger.error(f"[STATE] HZ persist failed: {e}") # Fallback: write to local JSON ledger ledger_path = Path(LOG_DIR) / f"state_ledger_{strategy_name}.jsonl" with open(ledger_path, 'a') as f: f.write(json.dumps(new_state) + '\n') ``` Test assertions (add to `ci/test_06_state_persistence.py`): ```python def test_hz_state_roundtrip(): """Capital persists to HZ and is readable back.""" import hazelcast, json c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_STATE_BLUE').blocking() test_state = {'strategy': 'blue', 'capital': 27500.0, 'date': '2026-01-15', 'trades': 42} m.put('test_roundtrip', json.dumps(test_state)) read_back = json.loads(m.get('test_roundtrip')) assert read_back['capital'] == 27500.0 assert read_back['trades'] == 42 m.remove('test_roundtrip') c.shutdown() def test_capital_restoration_on_flow_start(): """If HZ has prior state, engine.capital is set correctly.""" # This tests the restore logic in isolation (mock HZ IMap) from unittest.mock import MagicMock import json stored = {'strategy': 'blue', 'capital': 28000.0} imap = MagicMock() imap.get = MagicMock(return_value=json.dumps(stored)) # ... instantiate engine, run restore logic, assert engine.capital == 28000.0 # (see ci/test_06_state_persistence.py for full implementation) ``` PASS criteria: Capital from day N is used as starting capital for day N+1. If HZ unavailable, local ledger file written. No crash if ledger file missing. ### MIG1.2 — ExF Fetcher Flow **What to build**: A standalone Prefect flow `exf_fetcher_flow.py` that fetches all 14 ExF indicators (FRED, Deribit, F&G, etc.) and writes results to HZ IMap `DOLPHIN_FEATURES` under key `exf_latest`. **File to create**: `prod/exf_fetcher_flow.py` Key design points: - Runs daily at 23:00 UTC (before paper trade at 00:05 UTC next day) - Uses existing `external_factors/` modules - Writes `{indicator_name: value, 'timestamp': iso_str, 'date': YYYY-MM-DD}` to HZ - If fetch fails for any indicator: log warning, write `None` for that indicator, do NOT crash - Separate task per indicator family (FRED, Deribit, F&G) for retry isolation ```python @flow(name="exf-fetcher") def exf_fetcher_flow(date_str: str = None): date_str = date_str or datetime.now(timezone.utc).strftime('%Y-%m-%d') results = {} results.update(fetch_fred_indicators(date_str)) # task results.update(fetch_deribit_funding(date_str)) # task results.update(fetch_fear_and_greed(date_str)) # task write_exf_to_hz(date_str, results) # task return results ``` Test assertions (add to `ci/test_07_exf_flow.py`): ```python def test_exf_flow_runs_without_crash(): """ExF flow completes even if some APIs fail (returns partial results).""" result = exf_fetcher_flow(date_str='2026-01-15') assert isinstance(result, dict) # Core FRED indicators that were working: # claims, us10y, ycurve, stables, m2, hashrate, usdc, vol24 # At least half should be present (some APIs may be down) non_none = sum(1 for v in result.values() if v is not None) assert non_none >= 4, f"Too many ExF indicators failed: {result}" def test_exf_hz_write(): """ExF results are readable from HZ after flow runs.""" import hazelcast, json c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_FEATURES').blocking() val = m.get('exf_latest') if val is None: pytest.skip("ExF flow has not run yet") data = json.loads(val) assert 'timestamp' in data assert 'date' in data c.shutdown() ``` PASS criteria: Flow completes (exit 0) even with partial API failures. Results written to HZ. paper_trade_flow.py reads ExF from HZ (not from disk NPZ fallback) on next run. ### MIG1.3 — MC-Forewarner as Prefect Flow **What to build**: Wrap `mc_forewarning_service.py` daemon as a Prefect flow that runs every 4 hours and writes its state to HZ IMap `DOLPHIN_FEATURES` key `mc_forewarner_latest`. **File to create**: `prod/mc_forewarner_flow.py` Key design: - Schedule: `Cron("0 */4 * * *")` (every 4 hours) - Runs DolphinForewarner with current champion params - Writes `{'status': 'GREEN'|'ORANGE'|'RED', 'catastrophic_prob': float, 'envelope_score': float, 'timestamp': iso}` to HZ - paper_trade_flow.py reads MC state from HZ (already does this via staleness check) - Add staleness gate: if MC timestamp > 6 hours old, treat as ORANGE (structural degradation Cat 2) Test assertions (add to `ci/test_08_mc_flow.py`): ```python def test_mc_forewarner_flow_runs(): """MC-Forewarner flow produces a valid status.""" result = mc_forewarner_flow() assert result['status'] in ('GREEN', 'ORANGE', 'RED') assert 0.0 <= result['catastrophic_prob'] <= 1.0 assert 'timestamp' in result def test_mc_staleness_gate(): """MC state older than 6 hours is treated as ORANGE, not GREEN.""" from datetime import timedelta stale_ts = (datetime.now(timezone.utc) - timedelta(hours=7)).isoformat() stale_state = {'status': 'GREEN', 'timestamp': stale_ts} effective = get_effective_mc_status(stale_state) assert effective == 'ORANGE', "Stale MC should degrade to ORANGE" ``` PASS criteria: MC flow runs on schedule, writes to HZ, paper_trade_flow.py correctly reads MC status. Staleness detected and status degraded to ORANGE after 6h. ### MIG1.4 — Watchdog Flow **What to build**: A Prefect flow `watchdog_flow.py` that runs every 10 minutes (not 10 seconds — Windows Prefect scheduling granularity), checks all system components, and writes `DOLPHIN_SYSTEM_HEALTH` to HZ. Checks performed: - HZ cluster quorum (>= 1 node alive) - Prefect worker responsive - Scan data freshness (latest scan date <= 2 days ago) - Paper log freshness (last JSONL entry <= 2 days old) - Docker containers running ```python @flow(name="watchdog") def watchdog_flow(): health = { 'hz': check_hz_quorum(), # task 'prefect': check_prefect_api(), # task 'scans': check_scan_freshness(), # task 'logs': check_log_freshness(), # task 'timestamp': datetime.now(timezone.utc).isoformat(), } overall = 'GREEN' if all(v == 'OK' for v in health.values() if isinstance(v, str) and v != health['timestamp']) else 'DEGRADED' health['overall'] = overall write_hz_health(health) # task if overall == 'DEGRADED': logger.warning(f"[WATCHDOG] System degraded: {health}") return health ``` Test assertions (`ci/test_09_watchdog.py`): ```python def test_watchdog_detects_all_ok(): result = watchdog_flow() assert result['overall'] in ('GREEN', 'DEGRADED') assert 'timestamp' in result # At minimum, HZ and Prefect should be OK in test environment assert result['hz'] == 'OK' assert result['prefect'] == 'OK' def test_watchdog_writes_to_hz(): import hazelcast, json watchdog_flow() c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_SYSTEM_HEALTH').blocking() h = json.loads(m.get('latest')) assert h['overall'] in ('GREEN', 'DEGRADED') c.shutdown() ``` PASS criteria: Watchdog runs on schedule, writes health to HZ, operator can see system status at HZ-MC UI without reading logs. ### MIG1 GATE All of the following must pass before MIG2: ```bash bash ci/run_ci.sh # original 24 tests pytest ci/test_06_state_persistence.py ci/test_07_exf_flow.py ci/test_08_mc_flow.py ci/test_09_watchdog.py -v ``` PASS criteria: 24 + 8 (new) = 32 tests green. Capital from prior day visible in HZ after manual paper run. MC-Forewarner status readable from HZ. Watchdog health GREEN. --- ## MIG2 — Hazelcast IMDG: Feature Store + Live OB + Entry Processors **Goal**: Replace file-based feature passing with a sub-millisecond in-memory feature store. Enable atomic ACB state updates. Replace MockOBProvider with live Binance WebSocket OB data. **Spec reference**: Sec III (Hazelcast IMDG — DOLPHIN_FEATURES, Near Cache, Jet, Entry Processors). **Architecture**: "Engine Room" — hot feature state that the trading engine reads without network overhead via Near Cache. The engine reads features, not files. ### MIG2.1 — DOLPHIN_FEATURES IMap + Near Cache **What to build**: Schema for the HZ feature store and Near Cache configuration. IMap key schema: ``` DOLPHIN_FEATURES: "exf_latest" → JSON dict: {indicator_name: value, timestamp, date} "mc_forewarner_latest" → JSON dict: {status, catastrophic_prob, envelope_score, timestamp} "acb_state" → JSON dict: {boost, beta, w750_threshold, p60, last_date} "vol_regime" → JSON dict: {vol_p60: float, current_vol: float, regime_ok: bool, timestamp} "asset_{SYMBOL}_ob" → JSON dict: {imbalance, fill_prob, depth_quality, agreement, timestamp} "scan_latest" → JSON dict: {date, vel_div_mean, vel_div_min, asset_count, timestamp} ``` Near Cache configuration (add to HZ client init in paper_trade_flow.py): ```python client = hazelcast.HazelcastClient( cluster_members=["localhost:5701"], near_caches={ "DOLPHIN_FEATURES": { "invalidate_on_change": True, "time_to_live_seconds": 300, # 5 min TTL "max_idle_seconds": 60, "eviction_policy": "LRU", "max_size": 5000, } } ) ``` Test assertions (`ci/test_10_hz_feature_store.py`): ```python def test_near_cache_read_latency(): """Near Cache reads complete in <1ms after first warm read.""" import time, hazelcast, json c = hazelcast.HazelcastClient(near_caches={"DOLPHIN_FEATURES": {...}}) m = c.get_map('DOLPHIN_FEATURES').blocking() m.put('test_nc', json.dumps({'val': 42})) m.get('test_nc') # warm the cache t0 = time.perf_counter() for _ in range(100): m.get('test_nc') elapsed_per_call = (time.perf_counter() - t0) / 100 assert elapsed_per_call < 0.001, f"Near Cache too slow: {elapsed_per_call*1000:.2f}ms" c.shutdown() def test_feature_store_schema(): """All required keys are writable and readable in correct schema.""" import hazelcast, json c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_FEATURES').blocking() for key in ['exf_latest', 'mc_forewarner_latest', 'acb_state', 'vol_regime']: m.put(f'test_{key}', json.dumps({'test': True, 'timestamp': '2026-01-01T00:00:00Z'})) val = m.get(f'test_{key}') assert val is not None m.remove(f'test_{key}') c.shutdown() ``` ### MIG2.2 — ACB Entry Processor (Atomic State Update) **What to build**: An Entry Processor that updates the ACB boost atomically in HZ without a full read-modify-write round trip. Critical for sub-day ACB updates when new scan bars arrive. ```python # prod/hz_entry_processors.py import hazelcast class ACBBoostUpdateProcessor(hazelcast.serialization.api.IdentifiedDataSerializable): """Atomically update ACB boost + beta in DOLPHIN_FEATURES without read-write round trip.""" FACTORY_ID = 1 CLASS_ID = 1 def __init__(self, new_boost=None, new_beta=None, date_str=None): self.new_boost = new_boost self.new_beta = new_beta self.date_str = date_str def process(self, entry): import json current = json.loads(entry.value or '{}') if self.new_boost is not None: current['boost'] = self.new_boost if self.new_beta is not None: current['beta'] = self.new_beta current['last_updated'] = self.date_str entry.set_value(json.dumps(current)) def write_data(self, object_data_output): object_data_output.write_float(self.new_boost or 0.0) object_data_output.write_float(self.new_beta or 0.0) object_data_output.write_utf(self.date_str or '') def read_data(self, object_data_input): self.new_boost = object_data_input.read_float() self.new_beta = object_data_input.read_float() self.date_str = object_data_input.read_utf() def get_factory_id(self): return self.FACTORY_ID def get_class_id(self): return self.CLASS_ID ``` Test assertions: ```python def test_acb_entry_processor_atomic(): """Entry processor updates ACB state without race condition.""" import hazelcast, json c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_FEATURES').blocking() m.put('acb_state', json.dumps({'boost': 1.0, 'beta': 0.5})) processor = ACBBoostUpdateProcessor(new_boost=1.35, new_beta=0.7, date_str='2026-01-15') m.execute_on_key('acb_state', processor) result = json.loads(m.get('acb_state')) assert result['boost'] == 1.35 assert result['beta'] == 0.7 c.shutdown() ``` ### MIG2.3 — Live OB: Replace MockOBProvider **What to build**: Wire `ob_stream_service.py` WebSocket feed into paper_trade_flow.py to replace MockOBProvider. OB features written to HZ per-asset under `asset_{SYMBOL}_ob`. **Implementation**: 1. `ob_stream_service.py` already verified live on Binance Futures WebSocket 2. Start OB service as a background thread or separate Prefect flow at run start 3. OB service writes per-asset OB snapshot to HZ every 5 seconds 4. `run_engine_day` task reads OB from HZ Near Cache instead of MockOBProvider 5. Graceful fallback: if asset OB data missing or stale (>30s), use neutral values (imbalance=0, fill_prob=0.5) OB data schema in HZ: ```python ob_snapshot = { 'imbalance': float, # bid_vol - ask_vol / (bid_vol + ask_vol), range [-1, 1] 'fill_prob': float, # maker fill probability, range [0, 1] 'depth_quality': float, # normalized depth, range [0, 1] 'agreement': float, # OB trend agreement, range [-1, 1] 'timestamp': iso_str, # when this snapshot was taken 'stale': bool, # True if >30s since last update } ``` Test assertions (`ci/test_11_live_ob.py`): ```python def test_ob_stream_connects(): """OB stream service connects to Binance Futures WebSocket without error.""" # Start OB service for BTCUSDT only, run for 10 seconds, check HZ for data import threading, time, hazelcast, json from ob_stream_service import OBStreamService c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_FEATURES').blocking() svc = OBStreamService(symbols=['BTCUSDT'], hz_map=m) t = threading.Thread(target=svc.start, daemon=True) t.start() time.sleep(10) svc.stop() val = m.get('asset_BTCUSDT_ob') assert val is not None, "OB data not written to HZ after 10s" data = json.loads(val) assert -1.0 <= data['imbalance'] <= 1.0 assert 0.0 <= data['fill_prob'] <= 1.0 c.shutdown() def test_ob_stale_fallback(): """Engine uses neutral OB values when OB data is stale.""" # Inject stale OB snapshot, verify engine uses fallback (imbalance=0, fill_prob=0.5) ... assert ob_features.imbalance == 0.0 assert ob_features.fill_prob == 0.5 ``` PASS criteria: `|imbalance| < 0.3` on typical market conditions (confirmed in spec). Live OB replaces Mock. Expected result: 10-15% reduction in daily P&L variance (per 55-day OB validation: σ² reduced 15.35%). ### MIG2.4 — paper_trade_flow.py reads all features from HZ **What to build**: Refactor paper_trade_flow.py so `run_engine_day` reads ExF, MC state, OB, vol regime all from `DOLPHIN_FEATURES` HZ IMap (via Near Cache) instead of computing or loading from disk. ```python @task(persist_result=False) def run_engine_day(date_str, scan_df, pt_cfg, strategy_name): client = hazelcast.HazelcastClient(near_caches={"DOLPHIN_FEATURES": {...}}) features = client.get_map('DOLPHIN_FEATURES').blocking() # Read from HZ instead of computing inline mc_raw = features.get('mc_forewarner_latest') mc_status = json.loads(mc_raw)['status'] if mc_raw else 'GREEN' mc_status = get_effective_mc_status(mc_status, mc_raw) # staleness check vol_raw = features.get('vol_regime') vol_ok = json.loads(vol_raw)['regime_ok'] if vol_raw else True # Pass OB provider backed by HZ ob_provider = HZOBProvider(features, staleness_threshold_sec=30) engine.set_ob_provider(ob_provider) ... ``` ### MIG2 GATE ```bash bash ci/run_ci.sh pytest ci/test_10_hz_feature_store.py ci/test_11_live_ob.py -v ``` PASS criteria: 32 + 4 (new) = 36 tests green. Live OB data flowing to HZ. Engine reads all features from HZ. No MockOBProvider in paper_trade_flow.py. Capital persisted day-over-day (verify manually over 3 consecutive days). --- ## MIG3 — Survival Stack: Graceful Degradation (Control Theory) **Goal**: Replace binary "up/down" thinking with a continuous, multiplicative risk controller. The system degrades gracefully under component failure rather than stopping or operating at full risk. **Spec reference**: Sec VI (Control Theory Survival Stack), 5 categories, 4 postures, hysteresis. **Design**: All 5 category multipliers multiply together to produce a final `Rm` (risk multiplier). Rm then maps to one of 4 operational postures. Hysteresis prevents rapid posture oscillation. ### MIG3.1 — 5-Category Risk Multiplier (Rm) **File to create**: `nautilus_dolphin/nautilus_dolphin/nautilus/survival_stack.py` ``` Rm = Cat1 × Cat2 × Cat3 × Cat4 × Cat5 Cat1 — Invariants (binary kill, <10ms response): Input: HZ quorum status + Nautilus heartbeat Rule: if HZ_nodes < 1 OR heartbeat_age > 30s → Rm1 = 0.0 (HIBERNATE) Else: Rm1 = 1.0 Cat2 — Structural (MC-Forewarner staleness + status): Input: MC status (GREEN/ORANGE/RED) + timestamp age GREEN, fresh → Rm2 = 1.0 ORANGE, fresh → Rm2 = 0.5 RED, fresh → Rm2 = 0.1 (exits only) Any status, stale (>6h) → Rm2 = exp(-staleness_hours / 3.0) ← exponential decay Rule: Rm2 = base_rm2 × exp(-max(0, staleness_hours - 6) / 3.0) Cat3 — Microstructure (OB jitter/depth): Input: OB depth_quality + fill_prob + imbalance stability OB healthy (depth_quality > 0.7, fill_prob > 0.5) → Rm3 = 1.0 OB jittery (fill_prob < 0.3 or depth_quality < 0.3) → Rm3 = 0.3 (passive quoting only) OB stale (>30s) → Rm3 = 0.5 Rule: Rm3 = clip(0.3 + 0.7 * min(depth_quality, fill_prob), 0.3, 1.0) Cat4 — Environmental (DVOL spike): Input: DVOL (Deribit BTC implied vol 30-day) Baseline DVOL (no spike) → Rm4 = 1.0 DVOL spike detected (>2σ above 30-day mean) → Rm4 drops to 0.3 immediately (fast attack) Recovery: Rm4 recovers to 1.0 over 60 minutes (slow recovery, exponential) Rule: impulse-decay — Rm4 = 0.3 + (1.0 - 0.3) * (1 - exp(-t_since_spike / 60)) Cat5 — Capital Stress (sigmoid on drawdown): Input: current_drawdown = 1 - capital / peak_capital Rule: Rm5 = 1 / (1 + exp(20 * (drawdown - 0.12))) Effect: Rm5 ≈ 1.0 at DD<5%, ≈ 0.5 at DD=12%, ≈ 0.1 at DD=20% No cliff — continuous degradation as DD increases Final: Rm = Rm1 × Rm2 × Rm3 × Rm4 × Rm5 ``` ### MIG3.2 — 4 Operational Postures + Hysteresis ``` Rm → Posture mapping (with hysteresis deadband): Rm >= 0.85 → APEX (abs_max_lev=6x, aggressive, full signal) Rm >= 0.40 → STALKER (abs_max_lev=2x, limit orders only) Rm >= 0.10 → TURTLE (passive only, existing positions exit, no new entries) Rm < 0.10 → HIBERNATE (all-stop: close all positions, no new signals) Hysteresis (Schmitt trigger): To DOWNGRADE (e.g., APEX → STALKER): threshold crossed + sustained for 2 consecutive checks To UPGRADE (e.g., STALKER → APEX): threshold exceeded + sustained for 5 consecutive checks Purpose: prevent rapid posture oscillation on noisy Rm boundary Rm written to HZ DOLPHIN_SAFETY AtomicReference: {'posture': 'APEX'|'STALKER'|'TURTLE'|'HIBERNATE', 'Rm': float, 'timestamp': iso, 'breakdown': {'Cat1': float, 'Cat2': float, 'Cat3': float, 'Cat4': float, 'Cat5': float}} ``` ### MIG3.3 — Integration into paper_trade_flow.py `run_engine_day` reads posture from HZ before any engine action: ```python safety_ref = client.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking() safety_state = json.loads(safety_ref.get() or '{}') posture = safety_state.get('posture', 'APEX') Rm = safety_state.get('Rm', 1.0) if posture == 'HIBERNATE': logger.critical("[POSTURE] HIBERNATE — no trades today") return {'pnl': 0.0, 'trades': 0, 'posture': 'HIBERNATE'} # Apply Rm to abs_max_leverage effective_max_lev = pt_cfg['abs_max_leverage'] * Rm engine.abs_max_leverage = max(1.0, effective_max_lev) if posture == 'STALKER': engine.abs_max_leverage = min(engine.abs_max_leverage, 2.0) elif posture == 'TURTLE': # No new entries — only manage existing positions engine.accept_new_entries = False ``` Test assertions (`ci/test_12_survival_stack.py`): ```python def test_rm_calculation_all_green(): """All-green conditions → Rm = 1.0, posture = APEX.""" ss = SurvivalStack(...) Rm, breakdown = ss.compute_rm( hz_nodes=1, heartbeat_age_s=1.0, mc_status='GREEN', mc_staleness_hours=0.5, ob_depth_quality=0.9, ob_fill_prob=0.8, ob_stale=False, dvol_spike=False, t_since_spike_min=999, drawdown=0.03, ) assert Rm >= 0.95, f"Expected ~1.0, got {Rm}" assert breakdown['Cat1'] == 1.0 assert breakdown['Cat5'] >= 0.95 def test_rm_hz_down_triggers_hibernate(): """HZ quorum=0 → Cat1=0 → Rm=0 → HIBERNATE.""" ss = SurvivalStack(...) Rm, _ = ss.compute_rm(hz_nodes=0, ...) assert Rm == 0.0 assert ss.get_posture(Rm) == 'HIBERNATE' def test_rm_drawdown_sigmoid(): """Drawdown 12% → Rm5 ≈ 0.5.""" ss = SurvivalStack(...) Rm5 = ss._cat5_capital_stress(drawdown=0.12) assert 0.4 <= Rm5 <= 0.6, f"Sigmoid expected ~0.5 at DD=12%, got {Rm5}" def test_rm_dvol_spike_impulse_decay(): """DVOL spike → Cat4=0.3. After 60min → Cat4≈1.0.""" ss = SurvivalStack(...) assert ss._cat4_dvol(dvol_spike=True, t_since_spike_min=0) == pytest.approx(0.3, abs=0.05) assert ss._cat4_dvol(dvol_spike=True, t_since_spike_min=60) >= 0.9 def test_hysteresis_prevents_oscillation(): """Rm oscillating at boundary does not cause rapid posture flips.""" ss = SurvivalStack(hysteresis_down=2, hysteresis_up=5) postures = [] for Rm in [0.84, 0.86, 0.84, 0.86, 0.84]: # oscillating around APEX/STALKER boundary postures.append(ss.update_posture(Rm)) # Should NOT oscillate — hysteresis holds the prior posture assert len(set(postures)) == 1, f"Hysteresis failed — postures: {postures}" def test_posture_written_to_hz(): """Posture and Rm are written to HZ DOLPHIN_SAFETY AtomicReference.""" import hazelcast, json ss = SurvivalStack(...) Rm, _ = ss.compute_rm(...) ss.write_to_hz(Rm) c = hazelcast.HazelcastClient() ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking() state = json.loads(ref.get()) assert state['posture'] in ('APEX', 'STALKER', 'TURTLE', 'HIBERNATE') assert 0.0 <= state['Rm'] <= 1.0 c.shutdown() ``` PASS criteria: 36 + 6 = 42 tests green. Survival stack integrates into paper_trade_flow.py. Manual test: kill Hazelcast container → HIBERNATE triggers → restart HZ → system recovers to APEX within 2 check cycles. ### MIG3 GATE ```bash bash ci/run_ci.sh pytest ci/test_12_survival_stack.py -v ``` Also verify manually: - Simulate MC-Forewarner returning RED → STALKER posture, max_lev=2x - Simulate drawdown 15% in ledger → Rm5 ≈ 0.35, posture degrades - System recovers gracefully when conditions improve (hysteresis up threshold met) --- ## MIG4 — Nautilus-Trader Integration: Rust Execution Core **Goal**: Replace the Python paper trading loop with Nautilus-Trader as the execution engine. NDAlphaEngine becomes a Nautilus Actor. Binance Futures orders routed through Nautilus adapter. This achieves true event-driven, sub-millisecond execution. **Spec reference**: Sec V (Nautilus-Trader — Actor model, AsyncDataEngine, Rust networking, zero-copy Arrow). **Why Nautilus**: Rust core, zero-copy Arrow data transport, proper Actor isolation, production-grade risk management. The Python engine (paper_trade_flow.py) was always a stepping stone. ### MIG4.1 — NautilusActor Wrapper **Prereq**: `pip install nautilus_trader>=1.224` in Siloqy venv. **File to create**: `nautilus_dolphin/nautilus_dolphin/nautilus/nautilus_actor.py` Key design: - NautilusActor wraps NDAlphaEngine - Subscribes to bar data (5-second OHLCV bars for all 50 assets) - On each bar: updates eigenvalue features from HZ Near Cache - On each scan completion (5-minute window): calls engine.process_bar() - Orders submitted via Nautilus OrderFactory → Binance Futures adapter - Actor reads posture from HZ DOLPHIN_SAFETY before each order submission ```python from nautilus_trader.trading.actor import Actor from nautilus_trader.model.data import Bar, BarType from nautilus_trader.model.orders import MarketOrder, LimitOrder from nautilus_trader.common.clock import LiveClock from nautilus_trader.core.message import Event class DolphinActor(Actor): def __init__(self, engine: NDAlphaEngine, hz_features_map, config): super().__init__(config) self.engine = engine self.hz = hz_features_map self._bar_buffer = {} # symbol → list of bars def on_start(self): # Subscribe to 5s bars for all assets for symbol in self.engine.asset_columns: bar_type = BarType.from_str(f"{symbol}.BINANCE-5-SECOND-LAST-EXTERNAL") self.subscribe_bars(bar_type) def on_bar(self, bar: Bar): symbol = bar.bar_type.instrument_id.symbol.value self._bar_buffer.setdefault(symbol, []).append(bar) if self._should_process(bar): self._run_engine_on_bar_batch() def _run_engine_on_bar_batch(self): posture_raw = self.cache.get('DOLPHIN_SAFETY') posture = json.loads(posture_raw)['posture'] if posture_raw else 'APEX' if posture == 'HIBERNATE': return Rm = json.loads(posture_raw).get('Rm', 1.0) if posture_raw else 1.0 signals = self.engine.process_bar_batch(self._bar_buffer, Rm=Rm) for signal in signals: self._submit_order(signal, posture) def _submit_order(self, signal, posture): if posture == 'TURTLE': return # No new entries in TURTLE order_type = LimitOrder if posture == 'STALKER' else MarketOrder order = self.order_factory.create( instrument_id=signal.instrument_id, order_side=signal.side, quantity=signal.quantity, price=signal.limit_price if posture == 'STALKER' else None, order_type=order_type, ) self.submit_order(order) ``` ### MIG4.2 — Docker: Add Nautilus Container **File to modify**: `prod/docker-compose.yml` Add Nautilus-Trader container (or run as sidecar process): ```yaml services: dolphin-actor: image: nautechsystems/nautilus_trader:latest volumes: - ../nautilus_dolphin:/app/nautilus_dolphin:ro - ../vbt_cache:/app/vbt_cache:ro environment: - HZ_CLUSTER=hazelcast:5701 - BINANCE_API_KEY=${BINANCE_API_KEY} - BINANCE_API_SECRET=${BINANCE_API_SECRET} - TRADING_MODE=paper # paper = no real orders depends_on: - hazelcast restart: unless-stopped ``` For paper trading: use Nautilus Backtest Engine or SimulatedExchange (no real orders). For live: swap to BinanceFuturesDataClient + BinanceFuturesExecutionClient. ### MIG4.3 — Zero-copy Arrow: HZ → Nautilus **What to build**: Eigenvalue scan DataFrames passed from Prefect scanner flow → HZ → Nautilus Actor using Apache Arrow IPC (zero-copy). ```python # Scanner writes Arrow record batch to HZ import pyarrow as pa import hazelcast schema = pa.schema([ ('symbol', pa.string()), ('vel_div', pa.float64()), ('lambda_max_w50', pa.float64()), ('lambda_max_w150', pa.float64()), ('instability', pa.float64()), ('timestamp', pa.int64()), ]) def write_scan_to_hz(df: pd.DataFrame, hz_map): table = pa.Table.from_pandas(df, schema=schema) sink = pa.BufferOutputStream() writer = pa.ipc.new_file(sink, table.schema) writer.write_table(table) writer.close() arrow_bytes = sink.getvalue().to_pybytes() hz_map.put('scan_arrow_latest', arrow_bytes) # Nautilus Actor reads Arrow from HZ def read_scan_from_hz(hz_map) -> pd.DataFrame: raw = hz_map.get('scan_arrow_latest') if raw is None: return None reader = pa.ipc.open_file(pa.py_buffer(raw)) return reader.read_all().to_pandas() ``` Test assertions (`ci/test_13_nautilus_integration.py`): ```python def test_dolphin_actor_initializes(): """DolphinActor can be constructed with NDAlphaEngine and HZ map.""" from nautilus_dolphin.nautilus.nautilus_actor import DolphinActor engine = build_test_engine() actor = DolphinActor(engine=engine, hz_features_map=MockHZMap(), config={}) assert actor is not None assert actor.engine is engine def test_arrow_hz_roundtrip(): """Scan DataFrame → Arrow IPC → HZ → Arrow IPC → DataFrame is lossless.""" import pandas as pd, numpy as np df = pd.DataFrame({ 'symbol': ['BTCUSDT', 'ETHUSDT'], 'vel_div': [-0.03, -0.01], 'lambda_max_w50': [1.2, 0.9], 'lambda_max_w150': [1.5, 1.0], }) hz = MockHZMap() write_scan_to_hz(df, hz) df2 = read_scan_from_hz(hz) pd.testing.assert_frame_equal(df, df2) def test_actor_respects_hibernate_posture(): """DolphinActor does not submit orders when posture=HIBERNATE.""" actor = DolphinActor(...) actor._posture_override = 'HIBERNATE' signals = actor._run_engine_on_bar_batch() assert signals == [] or signals is None def test_nautilus_paper_run_no_crash(): """NautilusTrader BacktestEngine with DolphinActor runs 1 day without crash.""" from nautilus_trader.backtest.engine import BacktestEngine engine = BacktestEngine(config=BacktestEngineConfig(trader_id="DOLPHIN-001")) actor = DolphinActor(...) engine.add_actor(actor) engine.run(start=pd.Timestamp('2026-01-15'), end=pd.Timestamp('2026-01-16')) # ASSERT: runs without exception ``` PASS criteria: 42 + 4 = 46 tests green. DolphinActor processes one backtest day without crash. Arrow IPC roundtrip lossless. HIBERNATE posture prevents order submission. ### MIG4 GATE Manual integration test: ```bash # Start Nautilus actor in paper mode for one day python -m nautilus_dolphin.nautilus.run_papertrade --date 2026-01-15 --posture APEX # ASSERT: trades > 0 logged, no crashes, capital > 0 at end # ASSERT: orders visible in Nautilus portfolio summary ``` Full CI gate: ```bash bash ci/run_ci.sh pytest ci/test_13_nautilus_integration.py -v ``` --- ## MIG5 — LONG System Activation: Green Deployment **Goal**: Activate bidirectional trading (SHORT + LONG) on the green deployment. Requires LONG validation result from b79rt78uv to confirm PF > 1.05 on 795-day klines. **Spec reference**: LAYER_BRINGUP_PLAN.md Layer 7, green.yml config. **Prerequisites**: - [ ] b79rt78uv result: LONG PF > 1.05 on 795-day klines, WR > 42% - [ ] Regime detector built: identifies when LONG conditions are active - [ ] Capital arbiter: assigns SHORT_weight + LONG_weight per day (sum = 1.0) ### MIG5.1 — Validate LONG Result When b79rt78uv completes, verify: ```python # Expected assertions from test_pf_klines_2y_long.py: assert long_pf > 1.05 # Minimum viable LONG assert long_wr > 0.40 # 40% win rate minimum assert long_roi > 0.0 # Net positive over 795 days assert long_max_dd < 0.30 # Drawdown bounded assert long_trades > 100 # Sufficient sample size ``` If LONG fails (PF < 1.05): green.yml stays SHORT-only. Do not activate LONG. Research continues. ### MIG5.2 — Regime Arbiter **What to build**: `capital_arbiter.py` — determines SHORT_weight vs LONG_weight each day based on regime state. ```python class CapitalArbiter: def get_weights(self, date_str, features) -> dict: """ Returns {'short': float, 'long': float} summing to 1.0. Based on: vel_div direction, BTC trend, ExF signals. """ vel_div_mean = features.get('vel_div_mean', 0.0) btc_7bar_return = features.get('btc_7bar_return', 0.0) if vel_div_mean < -0.02 and btc_7bar_return < 0: # Strong structural breakdown — favor SHORT return {'short': 0.7, 'long': 0.3} elif vel_div_mean > 0.02 and btc_7bar_return > 0: # Strong recovery — favor LONG return {'short': 0.3, 'long': 0.7} else: # Neutral — equal weight return {'short': 0.5, 'long': 0.5} ``` ### MIG5.3 — green.yml and green deployment Update `prod/configs/green.yml`: ```yaml direction: bidirectional # was: short_only long_vel_div_threshold: 0.02 long_extreme_threshold: 0.05 capital_arbiter: equal_weight # or: regime_weighted ``` Register green deployment in Prefect: ```bash PREFECT_API_URL=http://localhost:4200/api \ python -c " from prod.paper_trade_flow import dolphin_paper_trade dolphin_paper_trade.to_deployment( name='dolphin-paper-green', cron='10 0 * * *', # 00:10 UTC (5 min after blue) parameters={'config': 'prod/configs/green.yml'}, ).apply() " ``` Test assertions (`ci/test_14_long_system.py`): ```python def test_long_system_requires_validation(): """green.yml direction=bidirectional is only set after LONG PF > 1.05.""" import yaml with open('prod/configs/green.yml') as f: cfg = yaml.safe_load(f) if cfg.get('direction') == 'bidirectional': # If bidirectional is set, LONG validation must have passed assert cfg.get('long_vel_div_threshold', 0) > 0, "LONG threshold not set" assert cfg.get('long_extreme_threshold', 0) > 0, "LONG extreme threshold not set" def test_capital_arbiter_weights_sum_to_one(): arb = CapitalArbiter() for scenario in [ {'vel_div_mean': -0.05, 'btc_7bar_return': -0.01}, {'vel_div_mean': +0.05, 'btc_7bar_return': +0.01}, {'vel_div_mean': 0.0, 'btc_7bar_return': 0.0}, ]: w = arb.get_weights('2026-01-15', scenario) assert abs(w['short'] + w['long'] - 1.0) < 1e-6 assert w['short'] > 0 and w['long'] > 0 def test_green_engine_fires_long_trades(): """Green deployment engine fires LONG trades on LONG signal days.""" # Use a scan date where vel_div > 0.02 (LONG signal) # ASSERT: engine produces trades with direction=+1 ... ``` PASS criteria: 46 + 3 = 49 tests green. Green deployment running alongside blue. Capital arbiter weights summing to 1.0. Both SHORT and LONG trades logged. --- ## MIG6 — Hazelcast Jet: Reactive ACB Stream Processing **Goal**: Replace batch ACB preload (once daily) with reactive sub-day ACB that updates on each new scan bar. HZ Jet pipeline processes eigenvalue stream, updates ACB state atomically via Entry Processor. Sub-day ACB enables adverse-turn exits within the trading day. **Spec reference**: Sec III (Hazelcast Jet stream processing), Phase MIG6. **Impact**: Per 55-day research, sub-day ACBv6 has +3-4% ROI potential. Currently not implemented in ND engine path. ### MIG6.1 — Jet Pipeline Design ``` [ARB512 Scanner writes JSON] → [File watcher (Prefect sensor flow)] → [Publishes scan to HZ Jet Topic "dolphin.scan.bars"] → [Jet pipeline: eigenvalue processor] → [Computes vel_div, update volatility, update ACB boost] → [ACBBoostUpdateProcessor (Entry Processor) → DOLPHIN_FEATURES "acb_state"] → [Nautilus Actor reads updated ACB state via Near Cache] ``` ### MIG6.2 — Scan File Watcher Prefect Flow **File to create**: `prod/scan_watcher_flow.py` ```python @flow(name="scan-watcher") def scan_watcher_flow(): """Watches eigenvalues dir for new scan files. Publishes to HZ Jet topic.""" import watchdog.events, watchdog.observers last_seen = set() while True: current = set(glob.glob(f"{SCANS_DIR}/*/*.json")) new_files = current - last_seen for f in sorted(new_files): publish_scan_to_jet(f) # task last_seen = current time.sleep(5) ``` ### MIG6.3 — Sub-day ACB Adverse-Turn Exits When ACB boost drops significantly (>0.2x reduction) within a day → signal potential regime adverse turn. Engine checks for open SHORT positions and triggers early exit (subject to OB). ```python def on_acb_state_update(old_acb_state, new_acb_state, engine): """Called by Jet processor when ACB state updates.""" boost_drop = old_acb_state['boost'] - new_acb_state['boost'] if boost_drop > 0.2 and engine.has_open_positions(): # Adverse turn signal: boost dropped significantly ob_quality = get_ob_quality() if ob_quality > 0.5: engine.request_orderly_exit() # maker fill preferred else: engine.request_duress_exit() # bypass OB wait, market order ``` Test assertions (`ci/test_15_jet_pipeline.py`): ```python def test_jet_topic_publish(): """Scan file published to HZ Jet topic is received by subscriber.""" import hazelcast, time c = hazelcast.HazelcastClient() topic = c.get_topic('dolphin.scan.bars').blocking() received = [] topic.add_message_listener(lambda msg: received.append(msg.message_object)) topic.publish({'vel_div': -0.03, 'timestamp': time.time()}) time.sleep(0.1) assert len(received) == 1 assert received[0]['vel_div'] == -0.03 c.shutdown() def test_acb_entry_processor_subday(): """ACB Entry Processor updates boost atomically from Jet pipeline.""" # Simulate mid-day ACB update: boost drops from 1.3 to 0.9 processor = ACBBoostUpdateProcessor(new_boost=0.9, date_str='2026-01-15') hz_map.execute_on_key('acb_state', processor) updated = json.loads(hz_map.get('acb_state')) assert updated['boost'] == 0.9 def test_adverse_turn_triggers_exit(): """Boost drop >0.2x with open positions triggers exit request.""" engine = build_test_engine_with_open_position() old_state = {'boost': 1.3, 'beta': 0.7} new_state = {'boost': 1.0, 'beta': 0.5} on_acb_state_update(old_state, new_state, engine) assert engine.exit_requested, "Adverse turn should trigger exit" ``` PASS criteria: 49 + 3 = 52 tests green. Sub-day ACB updating on new scan files. Adverse-turn exit fires on simulated boost drop. Jet pipeline end-to-end test with mock scanner. ### MIG6 GATE ```bash bash ci/run_ci.sh pytest ci/test_15_jet_pipeline.py -v # ASSERT: 52 tests green ``` Operational check: Start scanner, watch HZ-MC topic dashboard, verify scan events appear in `dolphin.scan.bars` topic within 10s of each new JSON file. --- ## MIG7 — Multi-Asset Scaling: 50 → 400 Assets **Goal**: Scale from 50 to 400 assets while maintaining performance. Current memory footprint limits scaling. Distribute feature store across sharded HZ IMap. Multi-market capability. **Spec reference**: Phase MIG7, MEMORY.md ("PROVEN better: higher returns + signal fidelity in tests. Blocked by RAM — optimize memory footprint FIRST, then scale"). **Prerequisite (HARD)**: RAM optimization before scaling. Profile current 50-asset memory footprint first. ### MIG7.1 — Memory Footprint Analysis ```bash # Profile current memory usage python -c " import tracemalloc, sys tracemalloc.start() # ... run engine on 50 assets for 1 day ... snapshot = tracemalloc.take_snapshot() stats = snapshot.statistics('lineno') for s in stats[:20]: print(s) " # ASSERT: identify top memory consumers # TARGET: < 4GB for 50 assets (< 32GB for 400 assets) ``` Known memory hotspots (probable): - `_price_histories`: rolling price buffer per asset × bar count - VBT parquet cache: 55 days × 50 assets × ~5k bars each - ACB: p60 threshold storage (per day, per asset) ### MIG7.2 — Sharded IMap for 400-Asset Feature Store ```python # Shard by asset group (10 shards × 40 assets each) def get_shard_map_name(symbol: str) -> str: shard = hash(symbol) % 10 return f"DOLPHIN_FEATURES_SHARD_{shard:02d}" # Each shard has its own Near Cache near_cache_config = {f"DOLPHIN_FEATURES_SHARD_{i:02d}": {...} for i in range(10)} ``` ### MIG7.3 — Distributed Worker Pool HZ IMDG + Prefect external workers on multiple machines (or Docker replicas): - Worker 1: assets 0-99 (BTCUSDT group) - Worker 2: assets 100-199 - Worker 3: assets 200-299 - Worker 4: assets 300-399 Capital arbiter aggregates signals from all workers before order submission. Test assertions (`ci/test_16_scaling.py`): ```python def test_memory_footprint_50_assets(): """50-asset engine uses < 4GB RAM.""" import tracemalloc tracemalloc.start() run_engine_50_assets_1_day() _, peak = tracemalloc.get_traced_memory() assert peak < 4 * 1024**3, f"Memory too high: {peak/1024**3:.1f}GB" def test_sharded_imap_read_write(): """Feature store sharding: all 400 symbols writable and readable.""" c = hazelcast.HazelcastClient() for i, symbol in enumerate(all_400_symbols): map_name = get_shard_map_name(symbol) m = c.get_map(map_name).blocking() m.put(f"vel_div_{symbol}", -0.03) assert m.get(f"vel_div_{symbol}") == -0.03 c.shutdown() def test_400_asset_engine_no_crash(): """Engine processes 1 day with 400 assets without crash or OOM.""" engine = build_400_asset_engine() result = engine.process_day('2026-01-15', df_400_assets, ...) assert result['trades'] > 0 assert result['capital'] > 0 ``` PASS criteria: 52 + 3 = 55 tests green. 400-asset engine processes one day. Memory < 32GB (if available). Sharded IMap round-trip working. --- ## CI Test Suite — Cumulative Summary | MIG Phase | New Tests | Cumulative Total | Key Assertion | |-----------|-----------|-----------------|---------------| | MIG0 (baseline) | 24 | 24 | CI gate green, infra healthy | | MIG1 (SITARA flows) | 8 | 32 | Capital persists, MC/ExF flows running | | MIG2 (HZ feature store) | 4 | 36 | Near Cache <1ms, live OB flowing | | MIG3 (survival stack) | 6 | 42 | Rm correct, postures fire, hysteresis holds | | MIG4 (Nautilus) | 4 | 46 | Actor initializes, HIBERNATE blocks orders | | MIG5 (LONG system) | 3 | 49 | LONG PF>1.05, arbiter weights sum=1 | | MIG6 (Jet reactive) | 3 | 52 | Jet topic live, Entry Processor atomic, adverse-turn fires | | MIG7 (scaling) | 3 | 55 | Memory <4GB/50-asset, shard read-write, 400-asset no crash | Full CI gate at each phase boundary: ```bash bash ci/run_ci.sh # original 24 always must pass pytest ci/ -v --ignore=ci/test_03_regression.py # fast suite pytest ci/test_03_regression.py # regression (slower, run before prod push only) ``` --- ## Regression Floors (Phase Gate Minima) These floors apply at EVERY phase gate. If a phase change causes any floor to be breached, STOP and investigate before proceeding. | Metric | Floor | Champion (current best) | Notes | |--------|-------|------------------------|-------| | PF (10-day VBT) | >= 1.08 | 1.123 | 55-day window: 1.123 | | WR (10-day VBT) | >= 42% | 49.3% | Champion WR | | ROI (10-day) | >= -5% | +44.89% (55d) | Any 10-day window >= -5% | | Trades (10-day) | >= 5 | ~380 (55d avg 7/day) | Not a dead system | | Max DD (55d) | < 20% | 14.95% | Don't exceed DD spec target | | Sharpe (55d) | > 1.5 | 2.50 | Don't regress below spec target | --- ## Open Items (Research Queue, Not Blocking MIG1-3) These are noted here so they don't fall through the cracks, but they MUST NOT block forward migration: 1. **TP sweep**: Apply 95bps to `test_pf_dynamic_beta_validate.py` ENGINE_KWARGS (still uses 0.0099). Low-risk, 10-min change. Do before next benchmark run. 2. **VOL gate EWMA**: 5-bar EWMA before p60 gate (smooths noisy vol_ok). Minor improvement, not a blocker. 3. **Sub-day ACB adverse-turn exits** (full implementation): Architecture documented in MEMORY.md Dynamic Exit Manager section. Prototype search in legacy standalone engine tests before building. 4. **Regime fragility sensing (Feb06-08 problem)**: HD Disentangled VAE on eigenvalue data + ExF conditioning. Long-term research. Does not block MIG1-4. 5. **MC-Forewarner live wiring verification**: Mechanical exit/reduce execution on RED/ORANGE (currently only affects sizing, not execution). Must verify real-money path before live trading. 6. **1m calibration sweep (b1ahez7tq)**: max_hold × abs_max_lev grid. When complete, update blue.yml if improvement found. 7. **EsoF multi-year backfill**: Needed for N>6 tail events. N=6 currently insufficient for production. Backfiller script exists but needs multi-year klines data. --- ## Operational Runbook — Standing Procedure ### Daily check (takes 2 min) ```bash # 1. Check Prefect UI for last run result open http://localhost:4200 # check DOLPHIN-PAPER-BLUE last run status # 2. Check HZ for today's P&L python -c " import hazelcast, json c = hazelcast.HazelcastClient() m = c.get_map('DOLPHIN_PNL_BLUE').blocking() keys = sorted(m.key_set()) if keys: print(json.loads(m.get(keys[-1]))) c.shutdown() " # 3. Check survival stack posture python -c " import hazelcast, json c = hazelcast.HazelcastClient() ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking() print(json.loads(ref.get() or '{}')) c.shutdown() " ``` ### Before any push to prod/blue or prod/green ```bash bash ci/run_ci.sh --fast # <60s, blocks push if fails (pre-push hook does this automatically) ``` ### Recovery from HIBERNATE posture ```bash # 1. Diagnose: which Cat is failing? python -c "from survival_stack import SurvivalStack; print(SurvivalStack().diagnose())" # 2. Fix the underlying issue (restart HZ if Cat1, wait for MC if Cat2, etc.) # 3. Survival stack auto-recovers after 5 consecutive checks above threshold # Or manual override (EMERGENCY ONLY): python -c " import hazelcast, json c = hazelcast.HazelcastClient() ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking() ref.set(json.dumps({'posture': 'APEX', 'Rm': 1.0, 'override': True})) c.shutdown() print('Manual override set to APEX') " ``` --- ## Quick-Reference Phase Summary | Phase | Deliverable | Duration est. | Functional system? | |-------|-------------|---------------|--------------------| | MIG0 | CI 24/24 green, infra verified | Done | YES (batch paper trading) | | MIG1 | State persistence + subsystem flows | 2-3 sessions | YES + capital compounds | | MIG2 | HZ feature store + live OB | 3-4 sessions | YES + real OB signal | | MIG3 | Survival stack + postures | 2-3 sessions | YES + graceful degradation | | MIG4 | Nautilus-Trader execution | 4-6 sessions | YES + Rust core | | MIG5 | LONG system (GREEN deployment) | 1-2 sessions | YES + bidirectional | | MIG6 | HZ Jet reactive ACB | 3-4 sessions | YES + sub-day ACB | | MIG7 | 400-asset scaling | 4-6 sessions | YES + full scale | **The system is always functional.** Every phase boundary = working system + passing CI. No dark periods.