Files
DOLPHIN/prod/docs/PRODUCTION_BRINGUP_MASTER_PLAN.md

1326 lines
53 KiB
Markdown
Raw Normal View History

# DOLPHIN-NAUTILUS — Production Bringup Master Plan
# "From Batch Paper Trading to Hyper-Reactive Memory/Compute Layer Live Algo"
**Authored**: 2026-03-06
**Authority**: Synthesizes NAUTILUS-DOLPHIN Prod System Spec (17 pages), LAYER_BRINGUP_PLAN.md, BRINGUP_GUIDE.md, and full champion research state.
**Champion baseline** (supersedes spec targets): ROI=+44.89%, PF=1.123, DD=14.95%, Sharpe=2.50, WR=49.3% (55-day, abs_max_lev=6.0).
**Spec note**: PDF spec targets (ROI>35%, Sharpe>2.0) are PRE-latest research. Current champion is already superior. Those floors hold as CI regression gates only.
**Principle**: The system must be FUNCTIONAL at every phase boundary. Never leave a partially broken state. Each phase ends with a green CI gate.
**Deferred (later MIG steps)**: Linux RT kernel, DPDK kernel bypass, TLA+ formal spec, Rocq/Coq proofs. These are asymptotic perfection items, not blockers for live trading.
---
## DAL Reliability Mapping (DO-178C adaptation)
| DAL | Component | Failure consequence | Required gate |
|-----|-----------|--------------------|--------------------|
| A | Kill-switch, capital ledger | Total loss, uncontrolled exposure | Hardware + software interlock |
| B | MC-Forewarner, ACB v6 | Excessive drawdown (>20%) | CI regression + integration test |
| C | Alpha signal (vel_div, IRP) | Missed trades or false signals | Unit + smoke test |
| D | EsoF, DVOL environmental | Suboptimal sizing | Integration test (optional) |
| E | Backfill, dashboards | Observability loss only | Best-effort |
---
## Architecture Target (End State — Post MIG7)
```
[ARB512 Scanner] ──► eigenvalues/YYYY-MM-DD/*.json
[Prefect SITARA — orchestration layer]
├── ExF fetcher flow (macro data, daily)
├── EsoF calculator flow (daily)
├── MC-Forewarner flow (4-hourly)
└── Watchdog flow (10s heartbeat)
[Hazelcast IMDG — hot feature store]
├── DOLPHIN_FEATURES IMap (per-asset, Near Cache)
├── DOLPHIN_STATE_BLUE/GREEN IMap (capital, drawdown)
├── DOLPHIN_SAFETY AtomicReference (posture, kill switch)
└── ACB EntryProcessor (atomic boost update)
[Nautilus-Trader — execution core (Rust)]
├── NautilusActor ←→ NDAlphaEngine
├── AsyncDataEngine (bar subscription)
└── Binance Futures adapter (live orders)
[Survival Stack — 5 categories × 4 postures]
Cat1:Invariants → Cat2:Structural → Cat3:Micro
→ Cat4:Environmental → Cat5:CapitalStress
→ Rm multiplier → APEX/STALKER/TURTLE/HIBERNATE
```
---
## MIG0 — Current State Verification (Baseline Gate)
**Goal**: Confirm the existing batch paper trading system is fully operational and CI-clean before any migration work begins. Never build on a broken foundation.
**Current state**:
- Docker stack: Hazelcast 5.3 (port 5701), HZ-MC (port 8080), Prefect Server (port 4200)
- Prefect worker running on `dolphin` pool, deployment `dolphin-paper-blue` scheduled daily 00:05 UTC
- `paper_trade_flow.py` loads JSON scan files, computes vel_div, runs NDAlphaEngine SHORT-only
- Capital NOT persisted (restarts at 25k each day — KNOWN LIMITATION)
- OB = MockOBProvider (static 62% fill, -0.09 imbalance bias)
- No graceful degradation, no posture management
- CI: 5 layers, 24/24 tests passing
### MIG0 Verification Steps
**Step MIG0.1 — CI green**
```bash
cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict"
source "/c/Users/Lenovo/Documents/- Siloqy/Scripts/activate"
bash ci/run_ci.sh
```
PASS criteria:
- All 24 tests pass (layers 1-5)
- Exit code 0
- Layer 3 regression: PF >= 1.08, WR >= 42%, trades >= 5 on 10-day VBT window
**Step MIG0.2 — Infrastructure health**
```bash
docker compose -f prod/docker-compose.yml ps
# ASSERT: hazelcast, hz-mc, prefect-server all "running" (not "restarting")
curl -s http://localhost:4200/api/health | python -c "import sys,json; d=json.load(sys.stdin); assert d['status']=='ok', d"
# ASSERT: Prefect API healthy
python -c "import hazelcast; c=hazelcast.HazelcastClient(); c.shutdown(); print('HZ OK')"
# ASSERT: prints "HZ OK" with no exception
```
**Step MIG0.3 — Manual paper run**
```bash
source "/c/Users/Lenovo/Documents/- Siloqy/Scripts/activate"
PREFECT_API_URL=http://localhost:4200/api \
python prod/paper_trade_flow.py --date $(date +%Y-%m-%d) --config prod/configs/blue.yml
# ASSERT: prints "vel_div range=[...]", prints "total_trades=N" where N > 0
# ASSERT: HZ IMap DOLPHIN_PNL_BLUE contains today's entry
```
FAIL criteria for MIG0: Any CI failure, any container in restart loop, zero trades on valid scan date.
**MIG0 GATE**: CI 24/24 + all 3 infra checks green. Only then proceed to MIG1.
---
## MIG1 — Prefect SITARA: All Subsystems as Flows + State Persistence
**Goal**: Separate the "slow-thinking" (macro, orchestration) from the "fast-doing" (engine, execution). All support subsystems run as independent Prefect flows with retry logic. Capital persists across daily runs.
**Spec reference**: Sec IV (Prefect SITARA), "slow-thinking / fast-doing separation."
**Why now**: State persistence eliminates the #1 known limitation (restarts at 25k daily). Subsystem flows give observability + retry without coupling to the trading flow.
### MIG1.1 — Capital State Persistence (DAL-A)
**What to build**: At flow start, restore capital from HZ. At flow end, write capital + drawdown + session summary back to HZ. If HZ unavailable, fall back to local JSON ledger.
**File to modify**: `prod/paper_trade_flow.py`
Implementation pattern (add to flow body):
```python
# ---- Restore capital ----
STATE_KEY = f"state_{strategy_name}_{date_str}"
try:
raw = imap_state.get(STATE_KEY) or imap_state.get('latest') or '{}'
state = json.loads(raw)
if state.get('strategy') == strategy_name and state.get('capital', 0) > 0:
engine.capital = float(state['capital'])
engine.initial_capital = float(state['capital'])
logger.info(f"[STATE] Restored capital={engine.capital:.2f} from HZ")
except Exception as e:
logger.warning(f"[STATE] HZ restore failed: {e} — using config capital")
# ---- Persist capital at end ----
try:
new_state = {
'strategy': strategy_name, 'capital': engine.capital,
'date': date_str, 'pnl': day_result['pnl'], 'trades': day_result['trades'],
'peak_capital': max(engine.capital, state.get('peak_capital', engine.capital)),
'drawdown': 1.0 - engine.capital / max(engine.capital, state.get('peak_capital', engine.capital)),
}
imap_state.put('latest', json.dumps(new_state))
imap_state.put(STATE_KEY, json.dumps(new_state))
except Exception as e:
logger.error(f"[STATE] HZ persist failed: {e}")
# Fallback: write to local JSON ledger
ledger_path = Path(LOG_DIR) / f"state_ledger_{strategy_name}.jsonl"
with open(ledger_path, 'a') as f:
f.write(json.dumps(new_state) + '\n')
```
Test assertions (add to `ci/test_06_state_persistence.py`):
```python
def test_hz_state_roundtrip():
"""Capital persists to HZ and is readable back."""
import hazelcast, json
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_STATE_BLUE').blocking()
test_state = {'strategy': 'blue', 'capital': 27500.0, 'date': '2026-01-15', 'trades': 42}
m.put('test_roundtrip', json.dumps(test_state))
read_back = json.loads(m.get('test_roundtrip'))
assert read_back['capital'] == 27500.0
assert read_back['trades'] == 42
m.remove('test_roundtrip')
c.shutdown()
def test_capital_restoration_on_flow_start():
"""If HZ has prior state, engine.capital is set correctly."""
# This tests the restore logic in isolation (mock HZ IMap)
from unittest.mock import MagicMock
import json
stored = {'strategy': 'blue', 'capital': 28000.0}
imap = MagicMock()
imap.get = MagicMock(return_value=json.dumps(stored))
# ... instantiate engine, run restore logic, assert engine.capital == 28000.0
# (see ci/test_06_state_persistence.py for full implementation)
```
PASS criteria: Capital from day N is used as starting capital for day N+1. If HZ unavailable, local ledger file written. No crash if ledger file missing.
### MIG1.2 — ExF Fetcher Flow
**What to build**: A standalone Prefect flow `exf_fetcher_flow.py` that fetches all 14 ExF indicators (FRED, Deribit, F&G, etc.) and writes results to HZ IMap `DOLPHIN_FEATURES` under key `exf_latest`.
**File to create**: `prod/exf_fetcher_flow.py`
Key design points:
- Runs daily at 23:00 UTC (before paper trade at 00:05 UTC next day)
- Uses existing `external_factors/` modules
- Writes `{indicator_name: value, 'timestamp': iso_str, 'date': YYYY-MM-DD}` to HZ
- If fetch fails for any indicator: log warning, write `None` for that indicator, do NOT crash
- Separate task per indicator family (FRED, Deribit, F&G) for retry isolation
```python
@flow(name="exf-fetcher")
def exf_fetcher_flow(date_str: str = None):
date_str = date_str or datetime.now(timezone.utc).strftime('%Y-%m-%d')
results = {}
results.update(fetch_fred_indicators(date_str)) # task
results.update(fetch_deribit_funding(date_str)) # task
results.update(fetch_fear_and_greed(date_str)) # task
write_exf_to_hz(date_str, results) # task
return results
```
Test assertions (add to `ci/test_07_exf_flow.py`):
```python
def test_exf_flow_runs_without_crash():
"""ExF flow completes even if some APIs fail (returns partial results)."""
result = exf_fetcher_flow(date_str='2026-01-15')
assert isinstance(result, dict)
# Core FRED indicators that were working:
# claims, us10y, ycurve, stables, m2, hashrate, usdc, vol24
# At least half should be present (some APIs may be down)
non_none = sum(1 for v in result.values() if v is not None)
assert non_none >= 4, f"Too many ExF indicators failed: {result}"
def test_exf_hz_write():
"""ExF results are readable from HZ after flow runs."""
import hazelcast, json
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_FEATURES').blocking()
val = m.get('exf_latest')
if val is None:
pytest.skip("ExF flow has not run yet")
data = json.loads(val)
assert 'timestamp' in data
assert 'date' in data
c.shutdown()
```
PASS criteria: Flow completes (exit 0) even with partial API failures. Results written to HZ. paper_trade_flow.py reads ExF from HZ (not from disk NPZ fallback) on next run.
### MIG1.3 — MC-Forewarner as Prefect Flow
**What to build**: Wrap `mc_forewarning_service.py` daemon as a Prefect flow that runs every 4 hours and writes its state to HZ IMap `DOLPHIN_FEATURES` key `mc_forewarner_latest`.
**File to create**: `prod/mc_forewarner_flow.py`
Key design:
- Schedule: `Cron("0 */4 * * *")` (every 4 hours)
- Runs DolphinForewarner with current champion params
- Writes `{'status': 'GREEN'|'ORANGE'|'RED', 'catastrophic_prob': float, 'envelope_score': float, 'timestamp': iso}` to HZ
- paper_trade_flow.py reads MC state from HZ (already does this via staleness check)
- Add staleness gate: if MC timestamp > 6 hours old, treat as ORANGE (structural degradation Cat 2)
Test assertions (add to `ci/test_08_mc_flow.py`):
```python
def test_mc_forewarner_flow_runs():
"""MC-Forewarner flow produces a valid status."""
result = mc_forewarner_flow()
assert result['status'] in ('GREEN', 'ORANGE', 'RED')
assert 0.0 <= result['catastrophic_prob'] <= 1.0
assert 'timestamp' in result
def test_mc_staleness_gate():
"""MC state older than 6 hours is treated as ORANGE, not GREEN."""
from datetime import timedelta
stale_ts = (datetime.now(timezone.utc) - timedelta(hours=7)).isoformat()
stale_state = {'status': 'GREEN', 'timestamp': stale_ts}
effective = get_effective_mc_status(stale_state)
assert effective == 'ORANGE', "Stale MC should degrade to ORANGE"
```
PASS criteria: MC flow runs on schedule, writes to HZ, paper_trade_flow.py correctly reads MC status. Staleness detected and status degraded to ORANGE after 6h.
### MIG1.4 — Watchdog Flow
**What to build**: A Prefect flow `watchdog_flow.py` that runs every 10 minutes (not 10 seconds — Windows Prefect scheduling granularity), checks all system components, and writes `DOLPHIN_SYSTEM_HEALTH` to HZ.
Checks performed:
- HZ cluster quorum (>= 1 node alive)
- Prefect worker responsive
- Scan data freshness (latest scan date <= 2 days ago)
- Paper log freshness (last JSONL entry <= 2 days old)
- Docker containers running
```python
@flow(name="watchdog")
def watchdog_flow():
health = {
'hz': check_hz_quorum(), # task
'prefect': check_prefect_api(), # task
'scans': check_scan_freshness(), # task
'logs': check_log_freshness(), # task
'timestamp': datetime.now(timezone.utc).isoformat(),
}
overall = 'GREEN' if all(v == 'OK' for v in health.values()
if isinstance(v, str) and v != health['timestamp']) else 'DEGRADED'
health['overall'] = overall
write_hz_health(health) # task
if overall == 'DEGRADED':
logger.warning(f"[WATCHDOG] System degraded: {health}")
return health
```
Test assertions (`ci/test_09_watchdog.py`):
```python
def test_watchdog_detects_all_ok():
result = watchdog_flow()
assert result['overall'] in ('GREEN', 'DEGRADED')
assert 'timestamp' in result
# At minimum, HZ and Prefect should be OK in test environment
assert result['hz'] == 'OK'
assert result['prefect'] == 'OK'
def test_watchdog_writes_to_hz():
import hazelcast, json
watchdog_flow()
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_SYSTEM_HEALTH').blocking()
h = json.loads(m.get('latest'))
assert h['overall'] in ('GREEN', 'DEGRADED')
c.shutdown()
```
PASS criteria: Watchdog runs on schedule, writes health to HZ, operator can see system status at HZ-MC UI without reading logs.
### MIG1 GATE
All of the following must pass before MIG2:
```bash
bash ci/run_ci.sh # original 24 tests
pytest ci/test_06_state_persistence.py ci/test_07_exf_flow.py ci/test_08_mc_flow.py ci/test_09_watchdog.py -v
```
PASS criteria: 24 + 8 (new) = 32 tests green. Capital from prior day visible in HZ after manual paper run. MC-Forewarner status readable from HZ. Watchdog health GREEN.
---
## MIG2 — Hazelcast IMDG: Feature Store + Live OB + Entry Processors
**Goal**: Replace file-based feature passing with a sub-millisecond in-memory feature store. Enable atomic ACB state updates. Replace MockOBProvider with live Binance WebSocket OB data.
**Spec reference**: Sec III (Hazelcast IMDG — DOLPHIN_FEATURES, Near Cache, Jet, Entry Processors).
**Architecture**: "Engine Room" — hot feature state that the trading engine reads without network overhead via Near Cache. The engine reads features, not files.
### MIG2.1 — DOLPHIN_FEATURES IMap + Near Cache
**What to build**: Schema for the HZ feature store and Near Cache configuration.
IMap key schema:
```
DOLPHIN_FEATURES:
"exf_latest" → JSON dict: {indicator_name: value, timestamp, date}
"mc_forewarner_latest" → JSON dict: {status, catastrophic_prob, envelope_score, timestamp}
"acb_state" → JSON dict: {boost, beta, w750_threshold, p60, last_date}
"vol_regime" → JSON dict: {vol_p60: float, current_vol: float, regime_ok: bool, timestamp}
"asset_{SYMBOL}_ob" → JSON dict: {imbalance, fill_prob, depth_quality, agreement, timestamp}
"scan_latest" → JSON dict: {date, vel_div_mean, vel_div_min, asset_count, timestamp}
```
Near Cache configuration (add to HZ client init in paper_trade_flow.py):
```python
client = hazelcast.HazelcastClient(
cluster_members=["localhost:5701"],
near_caches={
"DOLPHIN_FEATURES": {
"invalidate_on_change": True,
"time_to_live_seconds": 300, # 5 min TTL
"max_idle_seconds": 60,
"eviction_policy": "LRU",
"max_size": 5000,
}
}
)
```
Test assertions (`ci/test_10_hz_feature_store.py`):
```python
def test_near_cache_read_latency():
"""Near Cache reads complete in <1ms after first warm read."""
import time, hazelcast, json
c = hazelcast.HazelcastClient(near_caches={"DOLPHIN_FEATURES": {...}})
m = c.get_map('DOLPHIN_FEATURES').blocking()
m.put('test_nc', json.dumps({'val': 42}))
m.get('test_nc') # warm the cache
t0 = time.perf_counter()
for _ in range(100):
m.get('test_nc')
elapsed_per_call = (time.perf_counter() - t0) / 100
assert elapsed_per_call < 0.001, f"Near Cache too slow: {elapsed_per_call*1000:.2f}ms"
c.shutdown()
def test_feature_store_schema():
"""All required keys are writable and readable in correct schema."""
import hazelcast, json
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_FEATURES').blocking()
for key in ['exf_latest', 'mc_forewarner_latest', 'acb_state', 'vol_regime']:
m.put(f'test_{key}', json.dumps({'test': True, 'timestamp': '2026-01-01T00:00:00Z'}))
val = m.get(f'test_{key}')
assert val is not None
m.remove(f'test_{key}')
c.shutdown()
```
### MIG2.2 — ACB Entry Processor (Atomic State Update)
**What to build**: An Entry Processor that updates the ACB boost atomically in HZ without a full read-modify-write round trip. Critical for sub-day ACB updates when new scan bars arrive.
```python
# prod/hz_entry_processors.py
import hazelcast
class ACBBoostUpdateProcessor(hazelcast.serialization.api.IdentifiedDataSerializable):
"""Atomically update ACB boost + beta in DOLPHIN_FEATURES without read-write round trip."""
FACTORY_ID = 1
CLASS_ID = 1
def __init__(self, new_boost=None, new_beta=None, date_str=None):
self.new_boost = new_boost
self.new_beta = new_beta
self.date_str = date_str
def process(self, entry):
import json
current = json.loads(entry.value or '{}')
if self.new_boost is not None:
current['boost'] = self.new_boost
if self.new_beta is not None:
current['beta'] = self.new_beta
current['last_updated'] = self.date_str
entry.set_value(json.dumps(current))
def write_data(self, object_data_output):
object_data_output.write_float(self.new_boost or 0.0)
object_data_output.write_float(self.new_beta or 0.0)
object_data_output.write_utf(self.date_str or '')
def read_data(self, object_data_input):
self.new_boost = object_data_input.read_float()
self.new_beta = object_data_input.read_float()
self.date_str = object_data_input.read_utf()
def get_factory_id(self): return self.FACTORY_ID
def get_class_id(self): return self.CLASS_ID
```
Test assertions:
```python
def test_acb_entry_processor_atomic():
"""Entry processor updates ACB state without race condition."""
import hazelcast, json
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_FEATURES').blocking()
m.put('acb_state', json.dumps({'boost': 1.0, 'beta': 0.5}))
processor = ACBBoostUpdateProcessor(new_boost=1.35, new_beta=0.7, date_str='2026-01-15')
m.execute_on_key('acb_state', processor)
result = json.loads(m.get('acb_state'))
assert result['boost'] == 1.35
assert result['beta'] == 0.7
c.shutdown()
```
### MIG2.3 — Live OB: Replace MockOBProvider
**What to build**: Wire `ob_stream_service.py` WebSocket feed into paper_trade_flow.py to replace MockOBProvider. OB features written to HZ per-asset under `asset_{SYMBOL}_ob`.
**Implementation**:
1. `ob_stream_service.py` already verified live on Binance Futures WebSocket
2. Start OB service as a background thread or separate Prefect flow at run start
3. OB service writes per-asset OB snapshot to HZ every 5 seconds
4. `run_engine_day` task reads OB from HZ Near Cache instead of MockOBProvider
5. Graceful fallback: if asset OB data missing or stale (>30s), use neutral values (imbalance=0, fill_prob=0.5)
OB data schema in HZ:
```python
ob_snapshot = {
'imbalance': float, # bid_vol - ask_vol / (bid_vol + ask_vol), range [-1, 1]
'fill_prob': float, # maker fill probability, range [0, 1]
'depth_quality': float, # normalized depth, range [0, 1]
'agreement': float, # OB trend agreement, range [-1, 1]
'timestamp': iso_str, # when this snapshot was taken
'stale': bool, # True if >30s since last update
}
```
Test assertions (`ci/test_11_live_ob.py`):
```python
def test_ob_stream_connects():
"""OB stream service connects to Binance Futures WebSocket without error."""
# Start OB service for BTCUSDT only, run for 10 seconds, check HZ for data
import threading, time, hazelcast, json
from ob_stream_service import OBStreamService
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_FEATURES').blocking()
svc = OBStreamService(symbols=['BTCUSDT'], hz_map=m)
t = threading.Thread(target=svc.start, daemon=True)
t.start()
time.sleep(10)
svc.stop()
val = m.get('asset_BTCUSDT_ob')
assert val is not None, "OB data not written to HZ after 10s"
data = json.loads(val)
assert -1.0 <= data['imbalance'] <= 1.0
assert 0.0 <= data['fill_prob'] <= 1.0
c.shutdown()
def test_ob_stale_fallback():
"""Engine uses neutral OB values when OB data is stale."""
# Inject stale OB snapshot, verify engine uses fallback (imbalance=0, fill_prob=0.5)
...
assert ob_features.imbalance == 0.0
assert ob_features.fill_prob == 0.5
```
PASS criteria: `|imbalance| < 0.3` on typical market conditions (confirmed in spec). Live OB replaces Mock. Expected result: 10-15% reduction in daily P&L variance (per 55-day OB validation: σ² reduced 15.35%).
### MIG2.4 — paper_trade_flow.py reads all features from HZ
**What to build**: Refactor paper_trade_flow.py so `run_engine_day` reads ExF, MC state, OB, vol regime all from `DOLPHIN_FEATURES` HZ IMap (via Near Cache) instead of computing or loading from disk.
```python
@task(persist_result=False)
def run_engine_day(date_str, scan_df, pt_cfg, strategy_name):
client = hazelcast.HazelcastClient(near_caches={"DOLPHIN_FEATURES": {...}})
features = client.get_map('DOLPHIN_FEATURES').blocking()
# Read from HZ instead of computing inline
mc_raw = features.get('mc_forewarner_latest')
mc_status = json.loads(mc_raw)['status'] if mc_raw else 'GREEN'
mc_status = get_effective_mc_status(mc_status, mc_raw) # staleness check
vol_raw = features.get('vol_regime')
vol_ok = json.loads(vol_raw)['regime_ok'] if vol_raw else True
# Pass OB provider backed by HZ
ob_provider = HZOBProvider(features, staleness_threshold_sec=30)
engine.set_ob_provider(ob_provider)
...
```
### MIG2 GATE
```bash
bash ci/run_ci.sh
pytest ci/test_10_hz_feature_store.py ci/test_11_live_ob.py -v
```
PASS criteria: 32 + 4 (new) = 36 tests green. Live OB data flowing to HZ. Engine reads all features from HZ. No MockOBProvider in paper_trade_flow.py. Capital persisted day-over-day (verify manually over 3 consecutive days).
---
## MIG3 — Survival Stack: Graceful Degradation (Control Theory)
**Goal**: Replace binary "up/down" thinking with a continuous, multiplicative risk controller. The system degrades gracefully under component failure rather than stopping or operating at full risk.
**Spec reference**: Sec VI (Control Theory Survival Stack), 5 categories, 4 postures, hysteresis.
**Design**: All 5 category multipliers multiply together to produce a final `Rm` (risk multiplier). Rm then maps to one of 4 operational postures. Hysteresis prevents rapid posture oscillation.
### MIG3.1 — 5-Category Risk Multiplier (Rm)
**File to create**: `nautilus_dolphin/nautilus_dolphin/nautilus/survival_stack.py`
```
Rm = Cat1 × Cat2 × Cat3 × Cat4 × Cat5
Cat1 — Invariants (binary kill, <10ms response):
Input: HZ quorum status + Nautilus heartbeat
Rule: if HZ_nodes < 1 OR heartbeat_age > 30s → Rm1 = 0.0 (HIBERNATE)
Else: Rm1 = 1.0
Cat2 — Structural (MC-Forewarner staleness + status):
Input: MC status (GREEN/ORANGE/RED) + timestamp age
GREEN, fresh → Rm2 = 1.0
ORANGE, fresh → Rm2 = 0.5
RED, fresh → Rm2 = 0.1 (exits only)
Any status, stale (>6h) → Rm2 = exp(-staleness_hours / 3.0) ← exponential decay
Rule: Rm2 = base_rm2 × exp(-max(0, staleness_hours - 6) / 3.0)
Cat3 — Microstructure (OB jitter/depth):
Input: OB depth_quality + fill_prob + imbalance stability
OB healthy (depth_quality > 0.7, fill_prob > 0.5) → Rm3 = 1.0
OB jittery (fill_prob < 0.3 or depth_quality < 0.3) Rm3 = 0.3 (passive quoting only)
OB stale (>30s) → Rm3 = 0.5
Rule: Rm3 = clip(0.3 + 0.7 * min(depth_quality, fill_prob), 0.3, 1.0)
Cat4 — Environmental (DVOL spike):
Input: DVOL (Deribit BTC implied vol 30-day)
Baseline DVOL (no spike) → Rm4 = 1.0
DVOL spike detected (>2σ above 30-day mean) → Rm4 drops to 0.3 immediately (fast attack)
Recovery: Rm4 recovers to 1.0 over 60 minutes (slow recovery, exponential)
Rule: impulse-decay — Rm4 = 0.3 + (1.0 - 0.3) * (1 - exp(-t_since_spike / 60))
Cat5 — Capital Stress (sigmoid on drawdown):
Input: current_drawdown = 1 - capital / peak_capital
Rule: Rm5 = 1 / (1 + exp(20 * (drawdown - 0.12)))
Effect: Rm5 ≈ 1.0 at DD<5%, 0.5 at DD=12%, 0.1 at DD=20%
No cliff — continuous degradation as DD increases
Final: Rm = Rm1 × Rm2 × Rm3 × Rm4 × Rm5
```
### MIG3.2 — 4 Operational Postures + Hysteresis
```
Rm → Posture mapping (with hysteresis deadband):
Rm >= 0.85 → APEX (abs_max_lev=6x, aggressive, full signal)
Rm >= 0.40 → STALKER (abs_max_lev=2x, limit orders only)
Rm >= 0.10 → TURTLE (passive only, existing positions exit, no new entries)
Rm < 0.10 HIBERNATE (all-stop: close all positions, no new signals)
Hysteresis (Schmitt trigger):
To DOWNGRADE (e.g., APEX → STALKER): threshold crossed + sustained for 2 consecutive checks
To UPGRADE (e.g., STALKER → APEX): threshold exceeded + sustained for 5 consecutive checks
Purpose: prevent rapid posture oscillation on noisy Rm boundary
Rm written to HZ DOLPHIN_SAFETY AtomicReference:
{'posture': 'APEX'|'STALKER'|'TURTLE'|'HIBERNATE', 'Rm': float, 'timestamp': iso,
'breakdown': {'Cat1': float, 'Cat2': float, 'Cat3': float, 'Cat4': float, 'Cat5': float}}
```
### MIG3.3 — Integration into paper_trade_flow.py
`run_engine_day` reads posture from HZ before any engine action:
```python
safety_ref = client.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
safety_state = json.loads(safety_ref.get() or '{}')
posture = safety_state.get('posture', 'APEX')
Rm = safety_state.get('Rm', 1.0)
if posture == 'HIBERNATE':
logger.critical("[POSTURE] HIBERNATE — no trades today")
return {'pnl': 0.0, 'trades': 0, 'posture': 'HIBERNATE'}
# Apply Rm to abs_max_leverage
effective_max_lev = pt_cfg['abs_max_leverage'] * Rm
engine.abs_max_leverage = max(1.0, effective_max_lev)
if posture == 'STALKER':
engine.abs_max_leverage = min(engine.abs_max_leverage, 2.0)
elif posture == 'TURTLE':
# No new entries — only manage existing positions
engine.accept_new_entries = False
```
Test assertions (`ci/test_12_survival_stack.py`):
```python
def test_rm_calculation_all_green():
"""All-green conditions → Rm = 1.0, posture = APEX."""
ss = SurvivalStack(...)
Rm, breakdown = ss.compute_rm(
hz_nodes=1, heartbeat_age_s=1.0,
mc_status='GREEN', mc_staleness_hours=0.5,
ob_depth_quality=0.9, ob_fill_prob=0.8, ob_stale=False,
dvol_spike=False, t_since_spike_min=999,
drawdown=0.03,
)
assert Rm >= 0.95, f"Expected ~1.0, got {Rm}"
assert breakdown['Cat1'] == 1.0
assert breakdown['Cat5'] >= 0.95
def test_rm_hz_down_triggers_hibernate():
"""HZ quorum=0 → Cat1=0 → Rm=0 → HIBERNATE."""
ss = SurvivalStack(...)
Rm, _ = ss.compute_rm(hz_nodes=0, ...)
assert Rm == 0.0
assert ss.get_posture(Rm) == 'HIBERNATE'
def test_rm_drawdown_sigmoid():
"""Drawdown 12% → Rm5 ≈ 0.5."""
ss = SurvivalStack(...)
Rm5 = ss._cat5_capital_stress(drawdown=0.12)
assert 0.4 <= Rm5 <= 0.6, f"Sigmoid expected ~0.5 at DD=12%, got {Rm5}"
def test_rm_dvol_spike_impulse_decay():
"""DVOL spike → Cat4=0.3. After 60min → Cat4≈1.0."""
ss = SurvivalStack(...)
assert ss._cat4_dvol(dvol_spike=True, t_since_spike_min=0) == pytest.approx(0.3, abs=0.05)
assert ss._cat4_dvol(dvol_spike=True, t_since_spike_min=60) >= 0.9
def test_hysteresis_prevents_oscillation():
"""Rm oscillating at boundary does not cause rapid posture flips."""
ss = SurvivalStack(hysteresis_down=2, hysteresis_up=5)
postures = []
for Rm in [0.84, 0.86, 0.84, 0.86, 0.84]: # oscillating around APEX/STALKER boundary
postures.append(ss.update_posture(Rm))
# Should NOT oscillate — hysteresis holds the prior posture
assert len(set(postures)) == 1, f"Hysteresis failed — postures: {postures}"
def test_posture_written_to_hz():
"""Posture and Rm are written to HZ DOLPHIN_SAFETY AtomicReference."""
import hazelcast, json
ss = SurvivalStack(...)
Rm, _ = ss.compute_rm(...)
ss.write_to_hz(Rm)
c = hazelcast.HazelcastClient()
ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
state = json.loads(ref.get())
assert state['posture'] in ('APEX', 'STALKER', 'TURTLE', 'HIBERNATE')
assert 0.0 <= state['Rm'] <= 1.0
c.shutdown()
```
PASS criteria: 36 + 6 = 42 tests green. Survival stack integrates into paper_trade_flow.py. Manual test: kill Hazelcast container → HIBERNATE triggers → restart HZ → system recovers to APEX within 2 check cycles.
### MIG3 GATE
```bash
bash ci/run_ci.sh
pytest ci/test_12_survival_stack.py -v
```
Also verify manually:
- Simulate MC-Forewarner returning RED → STALKER posture, max_lev=2x
- Simulate drawdown 15% in ledger → Rm5 ≈ 0.35, posture degrades
- System recovers gracefully when conditions improve (hysteresis up threshold met)
---
## MIG4 — Nautilus-Trader Integration: Rust Execution Core
**Goal**: Replace the Python paper trading loop with Nautilus-Trader as the execution engine. NDAlphaEngine becomes a Nautilus Actor. Binance Futures orders routed through Nautilus adapter. This achieves true event-driven, sub-millisecond execution.
**Spec reference**: Sec V (Nautilus-Trader — Actor model, AsyncDataEngine, Rust networking, zero-copy Arrow).
**Why Nautilus**: Rust core, zero-copy Arrow data transport, proper Actor isolation, production-grade risk management. The Python engine (paper_trade_flow.py) was always a stepping stone.
### MIG4.1 — NautilusActor Wrapper
**Prereq**: `pip install nautilus_trader>=1.224` in Siloqy venv.
**File to create**: `nautilus_dolphin/nautilus_dolphin/nautilus/nautilus_actor.py`
Key design:
- NautilusActor wraps NDAlphaEngine
- Subscribes to bar data (5-second OHLCV bars for all 50 assets)
- On each bar: updates eigenvalue features from HZ Near Cache
- On each scan completion (5-minute window): calls engine.process_bar()
- Orders submitted via Nautilus OrderFactory → Binance Futures adapter
- Actor reads posture from HZ DOLPHIN_SAFETY before each order submission
```python
from nautilus_trader.trading.actor import Actor
from nautilus_trader.model.data import Bar, BarType
from nautilus_trader.model.orders import MarketOrder, LimitOrder
from nautilus_trader.common.clock import LiveClock
from nautilus_trader.core.message import Event
class DolphinActor(Actor):
def __init__(self, engine: NDAlphaEngine, hz_features_map, config):
super().__init__(config)
self.engine = engine
self.hz = hz_features_map
self._bar_buffer = {} # symbol → list of bars
def on_start(self):
# Subscribe to 5s bars for all assets
for symbol in self.engine.asset_columns:
bar_type = BarType.from_str(f"{symbol}.BINANCE-5-SECOND-LAST-EXTERNAL")
self.subscribe_bars(bar_type)
def on_bar(self, bar: Bar):
symbol = bar.bar_type.instrument_id.symbol.value
self._bar_buffer.setdefault(symbol, []).append(bar)
if self._should_process(bar):
self._run_engine_on_bar_batch()
def _run_engine_on_bar_batch(self):
posture_raw = self.cache.get('DOLPHIN_SAFETY')
posture = json.loads(posture_raw)['posture'] if posture_raw else 'APEX'
if posture == 'HIBERNATE':
return
Rm = json.loads(posture_raw).get('Rm', 1.0) if posture_raw else 1.0
signals = self.engine.process_bar_batch(self._bar_buffer, Rm=Rm)
for signal in signals:
self._submit_order(signal, posture)
def _submit_order(self, signal, posture):
if posture == 'TURTLE':
return # No new entries in TURTLE
order_type = LimitOrder if posture == 'STALKER' else MarketOrder
order = self.order_factory.create(
instrument_id=signal.instrument_id,
order_side=signal.side,
quantity=signal.quantity,
price=signal.limit_price if posture == 'STALKER' else None,
order_type=order_type,
)
self.submit_order(order)
```
### MIG4.2 — Docker: Add Nautilus Container
**File to modify**: `prod/docker-compose.yml`
Add Nautilus-Trader container (or run as sidecar process):
```yaml
services:
dolphin-actor:
image: nautechsystems/nautilus_trader:latest
volumes:
- ../nautilus_dolphin:/app/nautilus_dolphin:ro
- ../vbt_cache:/app/vbt_cache:ro
environment:
- HZ_CLUSTER=hazelcast:5701
- BINANCE_API_KEY=${BINANCE_API_KEY}
- BINANCE_API_SECRET=${BINANCE_API_SECRET}
- TRADING_MODE=paper # paper = no real orders
depends_on:
- hazelcast
restart: unless-stopped
```
For paper trading: use Nautilus Backtest Engine or SimulatedExchange (no real orders). For live: swap to BinanceFuturesDataClient + BinanceFuturesExecutionClient.
### MIG4.3 — Zero-copy Arrow: HZ → Nautilus
**What to build**: Eigenvalue scan DataFrames passed from Prefect scanner flow → HZ → Nautilus Actor using Apache Arrow IPC (zero-copy).
```python
# Scanner writes Arrow record batch to HZ
import pyarrow as pa
import hazelcast
schema = pa.schema([
('symbol', pa.string()),
('vel_div', pa.float64()),
('lambda_max_w50', pa.float64()),
('lambda_max_w150', pa.float64()),
('instability', pa.float64()),
('timestamp', pa.int64()),
])
def write_scan_to_hz(df: pd.DataFrame, hz_map):
table = pa.Table.from_pandas(df, schema=schema)
sink = pa.BufferOutputStream()
writer = pa.ipc.new_file(sink, table.schema)
writer.write_table(table)
writer.close()
arrow_bytes = sink.getvalue().to_pybytes()
hz_map.put('scan_arrow_latest', arrow_bytes)
# Nautilus Actor reads Arrow from HZ
def read_scan_from_hz(hz_map) -> pd.DataFrame:
raw = hz_map.get('scan_arrow_latest')
if raw is None:
return None
reader = pa.ipc.open_file(pa.py_buffer(raw))
return reader.read_all().to_pandas()
```
Test assertions (`ci/test_13_nautilus_integration.py`):
```python
def test_dolphin_actor_initializes():
"""DolphinActor can be constructed with NDAlphaEngine and HZ map."""
from nautilus_dolphin.nautilus.nautilus_actor import DolphinActor
engine = build_test_engine()
actor = DolphinActor(engine=engine, hz_features_map=MockHZMap(), config={})
assert actor is not None
assert actor.engine is engine
def test_arrow_hz_roundtrip():
"""Scan DataFrame → Arrow IPC → HZ → Arrow IPC → DataFrame is lossless."""
import pandas as pd, numpy as np
df = pd.DataFrame({
'symbol': ['BTCUSDT', 'ETHUSDT'],
'vel_div': [-0.03, -0.01],
'lambda_max_w50': [1.2, 0.9],
'lambda_max_w150': [1.5, 1.0],
})
hz = MockHZMap()
write_scan_to_hz(df, hz)
df2 = read_scan_from_hz(hz)
pd.testing.assert_frame_equal(df, df2)
def test_actor_respects_hibernate_posture():
"""DolphinActor does not submit orders when posture=HIBERNATE."""
actor = DolphinActor(...)
actor._posture_override = 'HIBERNATE'
signals = actor._run_engine_on_bar_batch()
assert signals == [] or signals is None
def test_nautilus_paper_run_no_crash():
"""NautilusTrader BacktestEngine with DolphinActor runs 1 day without crash."""
from nautilus_trader.backtest.engine import BacktestEngine
engine = BacktestEngine(config=BacktestEngineConfig(trader_id="DOLPHIN-001"))
actor = DolphinActor(...)
engine.add_actor(actor)
engine.run(start=pd.Timestamp('2026-01-15'), end=pd.Timestamp('2026-01-16'))
# ASSERT: runs without exception
```
PASS criteria: 42 + 4 = 46 tests green. DolphinActor processes one backtest day without crash. Arrow IPC roundtrip lossless. HIBERNATE posture prevents order submission.
### MIG4 GATE
Manual integration test:
```bash
# Start Nautilus actor in paper mode for one day
python -m nautilus_dolphin.nautilus.run_papertrade --date 2026-01-15 --posture APEX
# ASSERT: trades > 0 logged, no crashes, capital > 0 at end
# ASSERT: orders visible in Nautilus portfolio summary
```
Full CI gate:
```bash
bash ci/run_ci.sh
pytest ci/test_13_nautilus_integration.py -v
```
---
## MIG5 — LONG System Activation: Green Deployment
**Goal**: Activate bidirectional trading (SHORT + LONG) on the green deployment. Requires LONG validation result from b79rt78uv to confirm PF > 1.05 on 795-day klines.
**Spec reference**: LAYER_BRINGUP_PLAN.md Layer 7, green.yml config.
**Prerequisites**:
- [ ] b79rt78uv result: LONG PF > 1.05 on 795-day klines, WR > 42%
- [ ] Regime detector built: identifies when LONG conditions are active
- [ ] Capital arbiter: assigns SHORT_weight + LONG_weight per day (sum = 1.0)
### MIG5.1 — Validate LONG Result
When b79rt78uv completes, verify:
```python
# Expected assertions from test_pf_klines_2y_long.py:
assert long_pf > 1.05 # Minimum viable LONG
assert long_wr > 0.40 # 40% win rate minimum
assert long_roi > 0.0 # Net positive over 795 days
assert long_max_dd < 0.30 # Drawdown bounded
assert long_trades > 100 # Sufficient sample size
```
If LONG fails (PF < 1.05): green.yml stays SHORT-only. Do not activate LONG. Research continues.
### MIG5.2 — Regime Arbiter
**What to build**: `capital_arbiter.py` — determines SHORT_weight vs LONG_weight each day based on regime state.
```python
class CapitalArbiter:
def get_weights(self, date_str, features) -> dict:
"""
Returns {'short': float, 'long': float} summing to 1.0.
Based on: vel_div direction, BTC trend, ExF signals.
"""
vel_div_mean = features.get('vel_div_mean', 0.0)
btc_7bar_return = features.get('btc_7bar_return', 0.0)
if vel_div_mean < -0.02 and btc_7bar_return < 0:
# Strong structural breakdown — favor SHORT
return {'short': 0.7, 'long': 0.3}
elif vel_div_mean > 0.02 and btc_7bar_return > 0:
# Strong recovery — favor LONG
return {'short': 0.3, 'long': 0.7}
else:
# Neutral — equal weight
return {'short': 0.5, 'long': 0.5}
```
### MIG5.3 — green.yml and green deployment
Update `prod/configs/green.yml`:
```yaml
direction: bidirectional # was: short_only
long_vel_div_threshold: 0.02
long_extreme_threshold: 0.05
capital_arbiter: equal_weight # or: regime_weighted
```
Register green deployment in Prefect:
```bash
PREFECT_API_URL=http://localhost:4200/api \
python -c "
from prod.paper_trade_flow import dolphin_paper_trade
dolphin_paper_trade.to_deployment(
name='dolphin-paper-green',
cron='10 0 * * *', # 00:10 UTC (5 min after blue)
parameters={'config': 'prod/configs/green.yml'},
).apply()
"
```
Test assertions (`ci/test_14_long_system.py`):
```python
def test_long_system_requires_validation():
"""green.yml direction=bidirectional is only set after LONG PF > 1.05."""
import yaml
with open('prod/configs/green.yml') as f:
cfg = yaml.safe_load(f)
if cfg.get('direction') == 'bidirectional':
# If bidirectional is set, LONG validation must have passed
assert cfg.get('long_vel_div_threshold', 0) > 0, "LONG threshold not set"
assert cfg.get('long_extreme_threshold', 0) > 0, "LONG extreme threshold not set"
def test_capital_arbiter_weights_sum_to_one():
arb = CapitalArbiter()
for scenario in [
{'vel_div_mean': -0.05, 'btc_7bar_return': -0.01},
{'vel_div_mean': +0.05, 'btc_7bar_return': +0.01},
{'vel_div_mean': 0.0, 'btc_7bar_return': 0.0},
]:
w = arb.get_weights('2026-01-15', scenario)
assert abs(w['short'] + w['long'] - 1.0) < 1e-6
assert w['short'] > 0 and w['long'] > 0
def test_green_engine_fires_long_trades():
"""Green deployment engine fires LONG trades on LONG signal days."""
# Use a scan date where vel_div > 0.02 (LONG signal)
# ASSERT: engine produces trades with direction=+1
...
```
PASS criteria: 46 + 3 = 49 tests green. Green deployment running alongside blue. Capital arbiter weights summing to 1.0. Both SHORT and LONG trades logged.
---
## MIG6 — Hazelcast Jet: Reactive ACB Stream Processing
**Goal**: Replace batch ACB preload (once daily) with reactive sub-day ACB that updates on each new scan bar. HZ Jet pipeline processes eigenvalue stream, updates ACB state atomically via Entry Processor. Sub-day ACB enables adverse-turn exits within the trading day.
**Spec reference**: Sec III (Hazelcast Jet stream processing), Phase MIG6.
**Impact**: Per 55-day research, sub-day ACBv6 has +3-4% ROI potential. Currently not implemented in ND engine path.
### MIG6.1 — Jet Pipeline Design
```
[ARB512 Scanner writes JSON]
→ [File watcher (Prefect sensor flow)]
→ [Publishes scan to HZ Jet Topic "dolphin.scan.bars"]
→ [Jet pipeline: eigenvalue processor]
→ [Computes vel_div, update volatility, update ACB boost]
→ [ACBBoostUpdateProcessor (Entry Processor) → DOLPHIN_FEATURES "acb_state"]
→ [Nautilus Actor reads updated ACB state via Near Cache]
```
### MIG6.2 — Scan File Watcher Prefect Flow
**File to create**: `prod/scan_watcher_flow.py`
```python
@flow(name="scan-watcher")
def scan_watcher_flow():
"""Watches eigenvalues dir for new scan files. Publishes to HZ Jet topic."""
import watchdog.events, watchdog.observers
last_seen = set()
while True:
current = set(glob.glob(f"{SCANS_DIR}/*/*.json"))
new_files = current - last_seen
for f in sorted(new_files):
publish_scan_to_jet(f) # task
last_seen = current
time.sleep(5)
```
### MIG6.3 — Sub-day ACB Adverse-Turn Exits
When ACB boost drops significantly (>0.2x reduction) within a day → signal potential regime adverse turn. Engine checks for open SHORT positions and triggers early exit (subject to OB).
```python
def on_acb_state_update(old_acb_state, new_acb_state, engine):
"""Called by Jet processor when ACB state updates."""
boost_drop = old_acb_state['boost'] - new_acb_state['boost']
if boost_drop > 0.2 and engine.has_open_positions():
# Adverse turn signal: boost dropped significantly
ob_quality = get_ob_quality()
if ob_quality > 0.5:
engine.request_orderly_exit() # maker fill preferred
else:
engine.request_duress_exit() # bypass OB wait, market order
```
Test assertions (`ci/test_15_jet_pipeline.py`):
```python
def test_jet_topic_publish():
"""Scan file published to HZ Jet topic is received by subscriber."""
import hazelcast, time
c = hazelcast.HazelcastClient()
topic = c.get_topic('dolphin.scan.bars').blocking()
received = []
topic.add_message_listener(lambda msg: received.append(msg.message_object))
topic.publish({'vel_div': -0.03, 'timestamp': time.time()})
time.sleep(0.1)
assert len(received) == 1
assert received[0]['vel_div'] == -0.03
c.shutdown()
def test_acb_entry_processor_subday():
"""ACB Entry Processor updates boost atomically from Jet pipeline."""
# Simulate mid-day ACB update: boost drops from 1.3 to 0.9
processor = ACBBoostUpdateProcessor(new_boost=0.9, date_str='2026-01-15')
hz_map.execute_on_key('acb_state', processor)
updated = json.loads(hz_map.get('acb_state'))
assert updated['boost'] == 0.9
def test_adverse_turn_triggers_exit():
"""Boost drop >0.2x with open positions triggers exit request."""
engine = build_test_engine_with_open_position()
old_state = {'boost': 1.3, 'beta': 0.7}
new_state = {'boost': 1.0, 'beta': 0.5}
on_acb_state_update(old_state, new_state, engine)
assert engine.exit_requested, "Adverse turn should trigger exit"
```
PASS criteria: 49 + 3 = 52 tests green. Sub-day ACB updating on new scan files. Adverse-turn exit fires on simulated boost drop. Jet pipeline end-to-end test with mock scanner.
### MIG6 GATE
```bash
bash ci/run_ci.sh
pytest ci/test_15_jet_pipeline.py -v
# ASSERT: 52 tests green
```
Operational check: Start scanner, watch HZ-MC topic dashboard, verify scan events appear in `dolphin.scan.bars` topic within 10s of each new JSON file.
---
## MIG7 — Multi-Asset Scaling: 50 → 400 Assets
**Goal**: Scale from 50 to 400 assets while maintaining performance. Current memory footprint limits scaling. Distribute feature store across sharded HZ IMap. Multi-market capability.
**Spec reference**: Phase MIG7, MEMORY.md ("PROVEN better: higher returns + signal fidelity in tests. Blocked by RAM — optimize memory footprint FIRST, then scale").
**Prerequisite (HARD)**: RAM optimization before scaling. Profile current 50-asset memory footprint first.
### MIG7.1 — Memory Footprint Analysis
```bash
# Profile current memory usage
python -c "
import tracemalloc, sys
tracemalloc.start()
# ... run engine on 50 assets for 1 day ...
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('lineno')
for s in stats[:20]:
print(s)
"
# ASSERT: identify top memory consumers
# TARGET: < 4GB for 50 assets (< 32GB for 400 assets)
```
Known memory hotspots (probable):
- `_price_histories`: rolling price buffer per asset × bar count
- VBT parquet cache: 55 days × 50 assets × ~5k bars each
- ACB: p60 threshold storage (per day, per asset)
### MIG7.2 — Sharded IMap for 400-Asset Feature Store
```python
# Shard by asset group (10 shards × 40 assets each)
def get_shard_map_name(symbol: str) -> str:
shard = hash(symbol) % 10
return f"DOLPHIN_FEATURES_SHARD_{shard:02d}"
# Each shard has its own Near Cache
near_cache_config = {f"DOLPHIN_FEATURES_SHARD_{i:02d}": {...} for i in range(10)}
```
### MIG7.3 — Distributed Worker Pool
HZ IMDG + Prefect external workers on multiple machines (or Docker replicas):
- Worker 1: assets 0-99 (BTCUSDT group)
- Worker 2: assets 100-199
- Worker 3: assets 200-299
- Worker 4: assets 300-399
Capital arbiter aggregates signals from all workers before order submission.
Test assertions (`ci/test_16_scaling.py`):
```python
def test_memory_footprint_50_assets():
"""50-asset engine uses < 4GB RAM."""
import tracemalloc
tracemalloc.start()
run_engine_50_assets_1_day()
_, peak = tracemalloc.get_traced_memory()
assert peak < 4 * 1024**3, f"Memory too high: {peak/1024**3:.1f}GB"
def test_sharded_imap_read_write():
"""Feature store sharding: all 400 symbols writable and readable."""
c = hazelcast.HazelcastClient()
for i, symbol in enumerate(all_400_symbols):
map_name = get_shard_map_name(symbol)
m = c.get_map(map_name).blocking()
m.put(f"vel_div_{symbol}", -0.03)
assert m.get(f"vel_div_{symbol}") == -0.03
c.shutdown()
def test_400_asset_engine_no_crash():
"""Engine processes 1 day with 400 assets without crash or OOM."""
engine = build_400_asset_engine()
result = engine.process_day('2026-01-15', df_400_assets, ...)
assert result['trades'] > 0
assert result['capital'] > 0
```
PASS criteria: 52 + 3 = 55 tests green. 400-asset engine processes one day. Memory < 32GB (if available). Sharded IMap round-trip working.
---
## CI Test Suite — Cumulative Summary
| MIG Phase | New Tests | Cumulative Total | Key Assertion |
|-----------|-----------|-----------------|---------------|
| MIG0 (baseline) | 24 | 24 | CI gate green, infra healthy |
| MIG1 (SITARA flows) | 8 | 32 | Capital persists, MC/ExF flows running |
| MIG2 (HZ feature store) | 4 | 36 | Near Cache <1ms, live OB flowing |
| MIG3 (survival stack) | 6 | 42 | Rm correct, postures fire, hysteresis holds |
| MIG4 (Nautilus) | 4 | 46 | Actor initializes, HIBERNATE blocks orders |
| MIG5 (LONG system) | 3 | 49 | LONG PF>1.05, arbiter weights sum=1 |
| MIG6 (Jet reactive) | 3 | 52 | Jet topic live, Entry Processor atomic, adverse-turn fires |
| MIG7 (scaling) | 3 | 55 | Memory <4GB/50-asset, shard read-write, 400-asset no crash |
Full CI gate at each phase boundary:
```bash
bash ci/run_ci.sh # original 24 always must pass
pytest ci/ -v --ignore=ci/test_03_regression.py # fast suite
pytest ci/test_03_regression.py # regression (slower, run before prod push only)
```
---
## Regression Floors (Phase Gate Minima)
These floors apply at EVERY phase gate. If a phase change causes any floor to be breached, STOP and investigate before proceeding.
| Metric | Floor | Champion (current best) | Notes |
|--------|-------|------------------------|-------|
| PF (10-day VBT) | >= 1.08 | 1.123 | 55-day window: 1.123 |
| WR (10-day VBT) | >= 42% | 49.3% | Champion WR |
| ROI (10-day) | >= -5% | +44.89% (55d) | Any 10-day window >= -5% |
| Trades (10-day) | >= 5 | ~380 (55d avg 7/day) | Not a dead system |
| Max DD (55d) | < 20% | 14.95% | Don't exceed DD spec target |
| Sharpe (55d) | > 1.5 | 2.50 | Don't regress below spec target |
---
## Open Items (Research Queue, Not Blocking MIG1-3)
These are noted here so they don't fall through the cracks, but they MUST NOT block forward migration:
1. **TP sweep**: Apply 95bps to `test_pf_dynamic_beta_validate.py` ENGINE_KWARGS (still uses 0.0099). Low-risk, 10-min change. Do before next benchmark run.
2. **VOL gate EWMA**: 5-bar EWMA before p60 gate (smooths noisy vol_ok). Minor improvement, not a blocker.
3. **Sub-day ACB adverse-turn exits** (full implementation): Architecture documented in MEMORY.md Dynamic Exit Manager section. Prototype search in legacy standalone engine tests before building.
4. **Regime fragility sensing (Feb06-08 problem)**: HD Disentangled VAE on eigenvalue data + ExF conditioning. Long-term research. Does not block MIG1-4.
5. **MC-Forewarner live wiring verification**: Mechanical exit/reduce execution on RED/ORANGE (currently only affects sizing, not execution). Must verify real-money path before live trading.
6. **1m calibration sweep (b1ahez7tq)**: max_hold × abs_max_lev grid. When complete, update blue.yml if improvement found.
7. **EsoF multi-year backfill**: Needed for N>6 tail events. N=6 currently insufficient for production. Backfiller script exists but needs multi-year klines data.
---
## Operational Runbook — Standing Procedure
### Daily check (takes 2 min)
```bash
# 1. Check Prefect UI for last run result
open http://localhost:4200 # check DOLPHIN-PAPER-BLUE last run status
# 2. Check HZ for today's P&L
python -c "
import hazelcast, json
c = hazelcast.HazelcastClient()
m = c.get_map('DOLPHIN_PNL_BLUE').blocking()
keys = sorted(m.key_set())
if keys:
print(json.loads(m.get(keys[-1])))
c.shutdown()
"
# 3. Check survival stack posture
python -c "
import hazelcast, json
c = hazelcast.HazelcastClient()
ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
print(json.loads(ref.get() or '{}'))
c.shutdown()
"
```
### Before any push to prod/blue or prod/green
```bash
bash ci/run_ci.sh --fast # <60s, blocks push if fails (pre-push hook does this automatically)
```
### Recovery from HIBERNATE posture
```bash
# 1. Diagnose: which Cat is failing?
python -c "from survival_stack import SurvivalStack; print(SurvivalStack().diagnose())"
# 2. Fix the underlying issue (restart HZ if Cat1, wait for MC if Cat2, etc.)
# 3. Survival stack auto-recovers after 5 consecutive checks above threshold
# Or manual override (EMERGENCY ONLY):
python -c "
import hazelcast, json
c = hazelcast.HazelcastClient()
ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
ref.set(json.dumps({'posture': 'APEX', 'Rm': 1.0, 'override': True}))
c.shutdown()
print('Manual override set to APEX')
"
```
---
## Quick-Reference Phase Summary
| Phase | Deliverable | Duration est. | Functional system? |
|-------|-------------|---------------|--------------------|
| MIG0 | CI 24/24 green, infra verified | Done | YES (batch paper trading) |
| MIG1 | State persistence + subsystem flows | 2-3 sessions | YES + capital compounds |
| MIG2 | HZ feature store + live OB | 3-4 sessions | YES + real OB signal |
| MIG3 | Survival stack + postures | 2-3 sessions | YES + graceful degradation |
| MIG4 | Nautilus-Trader execution | 4-6 sessions | YES + Rust core |
| MIG5 | LONG system (GREEN deployment) | 1-2 sessions | YES + bidirectional |
| MIG6 | HZ Jet reactive ACB | 3-4 sessions | YES + sub-day ACB |
| MIG7 | 400-asset scaling | 4-6 sessions | YES + full scale |
**The system is always functional.** Every phase boundary = working system + passing CI. No dark periods.