1326 lines
53 KiB
Markdown
1326 lines
53 KiB
Markdown
|
|
# DOLPHIN-NAUTILUS — Production Bringup Master Plan
|
|||
|
|
# "From Batch Paper Trading to Hyper-Reactive Memory/Compute Layer Live Algo"
|
|||
|
|
|
|||
|
|
**Authored**: 2026-03-06
|
|||
|
|
**Authority**: Synthesizes NAUTILUS-DOLPHIN Prod System Spec (17 pages), LAYER_BRINGUP_PLAN.md, BRINGUP_GUIDE.md, and full champion research state.
|
|||
|
|
**Champion baseline** (supersedes spec targets): ROI=+44.89%, PF=1.123, DD=14.95%, Sharpe=2.50, WR=49.3% (55-day, abs_max_lev=6.0).
|
|||
|
|
**Spec note**: PDF spec targets (ROI>35%, Sharpe>2.0) are PRE-latest research. Current champion is already superior. Those floors hold as CI regression gates only.
|
|||
|
|
|
|||
|
|
**Principle**: The system must be FUNCTIONAL at every phase boundary. Never leave a partially broken state. Each phase ends with a green CI gate.
|
|||
|
|
|
|||
|
|
**Deferred (later MIG steps)**: Linux RT kernel, DPDK kernel bypass, TLA+ formal spec, Rocq/Coq proofs. These are asymptotic perfection items, not blockers for live trading.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## DAL Reliability Mapping (DO-178C adaptation)
|
|||
|
|
|
|||
|
|
| DAL | Component | Failure consequence | Required gate |
|
|||
|
|
|-----|-----------|--------------------|--------------------|
|
|||
|
|
| A | Kill-switch, capital ledger | Total loss, uncontrolled exposure | Hardware + software interlock |
|
|||
|
|
| B | MC-Forewarner, ACB v6 | Excessive drawdown (>20%) | CI regression + integration test |
|
|||
|
|
| C | Alpha signal (vel_div, IRP) | Missed trades or false signals | Unit + smoke test |
|
|||
|
|
| D | EsoF, DVOL environmental | Suboptimal sizing | Integration test (optional) |
|
|||
|
|
| E | Backfill, dashboards | Observability loss only | Best-effort |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Architecture Target (End State — Post MIG7)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[ARB512 Scanner] ──► eigenvalues/YYYY-MM-DD/*.json
|
|||
|
|
│
|
|||
|
|
[Prefect SITARA — orchestration layer]
|
|||
|
|
├── ExF fetcher flow (macro data, daily)
|
|||
|
|
├── EsoF calculator flow (daily)
|
|||
|
|
├── MC-Forewarner flow (4-hourly)
|
|||
|
|
└── Watchdog flow (10s heartbeat)
|
|||
|
|
│
|
|||
|
|
[Hazelcast IMDG — hot feature store]
|
|||
|
|
├── DOLPHIN_FEATURES IMap (per-asset, Near Cache)
|
|||
|
|
├── DOLPHIN_STATE_BLUE/GREEN IMap (capital, drawdown)
|
|||
|
|
├── DOLPHIN_SAFETY AtomicReference (posture, kill switch)
|
|||
|
|
└── ACB EntryProcessor (atomic boost update)
|
|||
|
|
│
|
|||
|
|
[Nautilus-Trader — execution core (Rust)]
|
|||
|
|
├── NautilusActor ←→ NDAlphaEngine
|
|||
|
|
├── AsyncDataEngine (bar subscription)
|
|||
|
|
└── Binance Futures adapter (live orders)
|
|||
|
|
│
|
|||
|
|
[Survival Stack — 5 categories × 4 postures]
|
|||
|
|
Cat1:Invariants → Cat2:Structural → Cat3:Micro
|
|||
|
|
→ Cat4:Environmental → Cat5:CapitalStress
|
|||
|
|
→ Rm multiplier → APEX/STALKER/TURTLE/HIBERNATE
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG0 — Current State Verification (Baseline Gate)
|
|||
|
|
|
|||
|
|
**Goal**: Confirm the existing batch paper trading system is fully operational and CI-clean before any migration work begins. Never build on a broken foundation.
|
|||
|
|
|
|||
|
|
**Current state**:
|
|||
|
|
- Docker stack: Hazelcast 5.3 (port 5701), HZ-MC (port 8080), Prefect Server (port 4200)
|
|||
|
|
- Prefect worker running on `dolphin` pool, deployment `dolphin-paper-blue` scheduled daily 00:05 UTC
|
|||
|
|
- `paper_trade_flow.py` loads JSON scan files, computes vel_div, runs NDAlphaEngine SHORT-only
|
|||
|
|
- Capital NOT persisted (restarts at 25k each day — KNOWN LIMITATION)
|
|||
|
|
- OB = MockOBProvider (static 62% fill, -0.09 imbalance bias)
|
|||
|
|
- No graceful degradation, no posture management
|
|||
|
|
- CI: 5 layers, 24/24 tests passing
|
|||
|
|
|
|||
|
|
### MIG0 Verification Steps
|
|||
|
|
|
|||
|
|
**Step MIG0.1 — CI green**
|
|||
|
|
```bash
|
|||
|
|
cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict"
|
|||
|
|
source "/c/Users/Lenovo/Documents/- Siloqy/Scripts/activate"
|
|||
|
|
bash ci/run_ci.sh
|
|||
|
|
```
|
|||
|
|
PASS criteria:
|
|||
|
|
- All 24 tests pass (layers 1-5)
|
|||
|
|
- Exit code 0
|
|||
|
|
- Layer 3 regression: PF >= 1.08, WR >= 42%, trades >= 5 on 10-day VBT window
|
|||
|
|
|
|||
|
|
**Step MIG0.2 — Infrastructure health**
|
|||
|
|
```bash
|
|||
|
|
docker compose -f prod/docker-compose.yml ps
|
|||
|
|
# ASSERT: hazelcast, hz-mc, prefect-server all "running" (not "restarting")
|
|||
|
|
|
|||
|
|
curl -s http://localhost:4200/api/health | python -c "import sys,json; d=json.load(sys.stdin); assert d['status']=='ok', d"
|
|||
|
|
# ASSERT: Prefect API healthy
|
|||
|
|
|
|||
|
|
python -c "import hazelcast; c=hazelcast.HazelcastClient(); c.shutdown(); print('HZ OK')"
|
|||
|
|
# ASSERT: prints "HZ OK" with no exception
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Step MIG0.3 — Manual paper run**
|
|||
|
|
```bash
|
|||
|
|
source "/c/Users/Lenovo/Documents/- Siloqy/Scripts/activate"
|
|||
|
|
PREFECT_API_URL=http://localhost:4200/api \
|
|||
|
|
python prod/paper_trade_flow.py --date $(date +%Y-%m-%d) --config prod/configs/blue.yml
|
|||
|
|
# ASSERT: prints "vel_div range=[...]", prints "total_trades=N" where N > 0
|
|||
|
|
# ASSERT: HZ IMap DOLPHIN_PNL_BLUE contains today's entry
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
FAIL criteria for MIG0: Any CI failure, any container in restart loop, zero trades on valid scan date.
|
|||
|
|
|
|||
|
|
**MIG0 GATE**: CI 24/24 + all 3 infra checks green. Only then proceed to MIG1.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG1 — Prefect SITARA: All Subsystems as Flows + State Persistence
|
|||
|
|
|
|||
|
|
**Goal**: Separate the "slow-thinking" (macro, orchestration) from the "fast-doing" (engine, execution). All support subsystems run as independent Prefect flows with retry logic. Capital persists across daily runs.
|
|||
|
|
|
|||
|
|
**Spec reference**: Sec IV (Prefect SITARA), "slow-thinking / fast-doing separation."
|
|||
|
|
|
|||
|
|
**Why now**: State persistence eliminates the #1 known limitation (restarts at 25k daily). Subsystem flows give observability + retry without coupling to the trading flow.
|
|||
|
|
|
|||
|
|
### MIG1.1 — Capital State Persistence (DAL-A)
|
|||
|
|
|
|||
|
|
**What to build**: At flow start, restore capital from HZ. At flow end, write capital + drawdown + session summary back to HZ. If HZ unavailable, fall back to local JSON ledger.
|
|||
|
|
|
|||
|
|
**File to modify**: `prod/paper_trade_flow.py`
|
|||
|
|
|
|||
|
|
Implementation pattern (add to flow body):
|
|||
|
|
```python
|
|||
|
|
# ---- Restore capital ----
|
|||
|
|
STATE_KEY = f"state_{strategy_name}_{date_str}"
|
|||
|
|
try:
|
|||
|
|
raw = imap_state.get(STATE_KEY) or imap_state.get('latest') or '{}'
|
|||
|
|
state = json.loads(raw)
|
|||
|
|
if state.get('strategy') == strategy_name and state.get('capital', 0) > 0:
|
|||
|
|
engine.capital = float(state['capital'])
|
|||
|
|
engine.initial_capital = float(state['capital'])
|
|||
|
|
logger.info(f"[STATE] Restored capital={engine.capital:.2f} from HZ")
|
|||
|
|
except Exception as e:
|
|||
|
|
logger.warning(f"[STATE] HZ restore failed: {e} — using config capital")
|
|||
|
|
|
|||
|
|
# ---- Persist capital at end ----
|
|||
|
|
try:
|
|||
|
|
new_state = {
|
|||
|
|
'strategy': strategy_name, 'capital': engine.capital,
|
|||
|
|
'date': date_str, 'pnl': day_result['pnl'], 'trades': day_result['trades'],
|
|||
|
|
'peak_capital': max(engine.capital, state.get('peak_capital', engine.capital)),
|
|||
|
|
'drawdown': 1.0 - engine.capital / max(engine.capital, state.get('peak_capital', engine.capital)),
|
|||
|
|
}
|
|||
|
|
imap_state.put('latest', json.dumps(new_state))
|
|||
|
|
imap_state.put(STATE_KEY, json.dumps(new_state))
|
|||
|
|
except Exception as e:
|
|||
|
|
logger.error(f"[STATE] HZ persist failed: {e}")
|
|||
|
|
# Fallback: write to local JSON ledger
|
|||
|
|
ledger_path = Path(LOG_DIR) / f"state_ledger_{strategy_name}.jsonl"
|
|||
|
|
with open(ledger_path, 'a') as f:
|
|||
|
|
f.write(json.dumps(new_state) + '\n')
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (add to `ci/test_06_state_persistence.py`):
|
|||
|
|
```python
|
|||
|
|
def test_hz_state_roundtrip():
|
|||
|
|
"""Capital persists to HZ and is readable back."""
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_STATE_BLUE').blocking()
|
|||
|
|
test_state = {'strategy': 'blue', 'capital': 27500.0, 'date': '2026-01-15', 'trades': 42}
|
|||
|
|
m.put('test_roundtrip', json.dumps(test_state))
|
|||
|
|
read_back = json.loads(m.get('test_roundtrip'))
|
|||
|
|
assert read_back['capital'] == 27500.0
|
|||
|
|
assert read_back['trades'] == 42
|
|||
|
|
m.remove('test_roundtrip')
|
|||
|
|
c.shutdown()
|
|||
|
|
|
|||
|
|
def test_capital_restoration_on_flow_start():
|
|||
|
|
"""If HZ has prior state, engine.capital is set correctly."""
|
|||
|
|
# This tests the restore logic in isolation (mock HZ IMap)
|
|||
|
|
from unittest.mock import MagicMock
|
|||
|
|
import json
|
|||
|
|
stored = {'strategy': 'blue', 'capital': 28000.0}
|
|||
|
|
imap = MagicMock()
|
|||
|
|
imap.get = MagicMock(return_value=json.dumps(stored))
|
|||
|
|
# ... instantiate engine, run restore logic, assert engine.capital == 28000.0
|
|||
|
|
# (see ci/test_06_state_persistence.py for full implementation)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: Capital from day N is used as starting capital for day N+1. If HZ unavailable, local ledger file written. No crash if ledger file missing.
|
|||
|
|
|
|||
|
|
### MIG1.2 — ExF Fetcher Flow
|
|||
|
|
|
|||
|
|
**What to build**: A standalone Prefect flow `exf_fetcher_flow.py` that fetches all 14 ExF indicators (FRED, Deribit, F&G, etc.) and writes results to HZ IMap `DOLPHIN_FEATURES` under key `exf_latest`.
|
|||
|
|
|
|||
|
|
**File to create**: `prod/exf_fetcher_flow.py`
|
|||
|
|
|
|||
|
|
Key design points:
|
|||
|
|
- Runs daily at 23:00 UTC (before paper trade at 00:05 UTC next day)
|
|||
|
|
- Uses existing `external_factors/` modules
|
|||
|
|
- Writes `{indicator_name: value, 'timestamp': iso_str, 'date': YYYY-MM-DD}` to HZ
|
|||
|
|
- If fetch fails for any indicator: log warning, write `None` for that indicator, do NOT crash
|
|||
|
|
- Separate task per indicator family (FRED, Deribit, F&G) for retry isolation
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@flow(name="exf-fetcher")
|
|||
|
|
def exf_fetcher_flow(date_str: str = None):
|
|||
|
|
date_str = date_str or datetime.now(timezone.utc).strftime('%Y-%m-%d')
|
|||
|
|
results = {}
|
|||
|
|
results.update(fetch_fred_indicators(date_str)) # task
|
|||
|
|
results.update(fetch_deribit_funding(date_str)) # task
|
|||
|
|
results.update(fetch_fear_and_greed(date_str)) # task
|
|||
|
|
write_exf_to_hz(date_str, results) # task
|
|||
|
|
return results
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (add to `ci/test_07_exf_flow.py`):
|
|||
|
|
```python
|
|||
|
|
def test_exf_flow_runs_without_crash():
|
|||
|
|
"""ExF flow completes even if some APIs fail (returns partial results)."""
|
|||
|
|
result = exf_fetcher_flow(date_str='2026-01-15')
|
|||
|
|
assert isinstance(result, dict)
|
|||
|
|
# Core FRED indicators that were working:
|
|||
|
|
# claims, us10y, ycurve, stables, m2, hashrate, usdc, vol24
|
|||
|
|
# At least half should be present (some APIs may be down)
|
|||
|
|
non_none = sum(1 for v in result.values() if v is not None)
|
|||
|
|
assert non_none >= 4, f"Too many ExF indicators failed: {result}"
|
|||
|
|
|
|||
|
|
def test_exf_hz_write():
|
|||
|
|
"""ExF results are readable from HZ after flow runs."""
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_FEATURES').blocking()
|
|||
|
|
val = m.get('exf_latest')
|
|||
|
|
if val is None:
|
|||
|
|
pytest.skip("ExF flow has not run yet")
|
|||
|
|
data = json.loads(val)
|
|||
|
|
assert 'timestamp' in data
|
|||
|
|
assert 'date' in data
|
|||
|
|
c.shutdown()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: Flow completes (exit 0) even with partial API failures. Results written to HZ. paper_trade_flow.py reads ExF from HZ (not from disk NPZ fallback) on next run.
|
|||
|
|
|
|||
|
|
### MIG1.3 — MC-Forewarner as Prefect Flow
|
|||
|
|
|
|||
|
|
**What to build**: Wrap `mc_forewarning_service.py` daemon as a Prefect flow that runs every 4 hours and writes its state to HZ IMap `DOLPHIN_FEATURES` key `mc_forewarner_latest`.
|
|||
|
|
|
|||
|
|
**File to create**: `prod/mc_forewarner_flow.py`
|
|||
|
|
|
|||
|
|
Key design:
|
|||
|
|
- Schedule: `Cron("0 */4 * * *")` (every 4 hours)
|
|||
|
|
- Runs DolphinForewarner with current champion params
|
|||
|
|
- Writes `{'status': 'GREEN'|'ORANGE'|'RED', 'catastrophic_prob': float, 'envelope_score': float, 'timestamp': iso}` to HZ
|
|||
|
|
- paper_trade_flow.py reads MC state from HZ (already does this via staleness check)
|
|||
|
|
- Add staleness gate: if MC timestamp > 6 hours old, treat as ORANGE (structural degradation Cat 2)
|
|||
|
|
|
|||
|
|
Test assertions (add to `ci/test_08_mc_flow.py`):
|
|||
|
|
```python
|
|||
|
|
def test_mc_forewarner_flow_runs():
|
|||
|
|
"""MC-Forewarner flow produces a valid status."""
|
|||
|
|
result = mc_forewarner_flow()
|
|||
|
|
assert result['status'] in ('GREEN', 'ORANGE', 'RED')
|
|||
|
|
assert 0.0 <= result['catastrophic_prob'] <= 1.0
|
|||
|
|
assert 'timestamp' in result
|
|||
|
|
|
|||
|
|
def test_mc_staleness_gate():
|
|||
|
|
"""MC state older than 6 hours is treated as ORANGE, not GREEN."""
|
|||
|
|
from datetime import timedelta
|
|||
|
|
stale_ts = (datetime.now(timezone.utc) - timedelta(hours=7)).isoformat()
|
|||
|
|
stale_state = {'status': 'GREEN', 'timestamp': stale_ts}
|
|||
|
|
effective = get_effective_mc_status(stale_state)
|
|||
|
|
assert effective == 'ORANGE', "Stale MC should degrade to ORANGE"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: MC flow runs on schedule, writes to HZ, paper_trade_flow.py correctly reads MC status. Staleness detected and status degraded to ORANGE after 6h.
|
|||
|
|
|
|||
|
|
### MIG1.4 — Watchdog Flow
|
|||
|
|
|
|||
|
|
**What to build**: A Prefect flow `watchdog_flow.py` that runs every 10 minutes (not 10 seconds — Windows Prefect scheduling granularity), checks all system components, and writes `DOLPHIN_SYSTEM_HEALTH` to HZ.
|
|||
|
|
|
|||
|
|
Checks performed:
|
|||
|
|
- HZ cluster quorum (>= 1 node alive)
|
|||
|
|
- Prefect worker responsive
|
|||
|
|
- Scan data freshness (latest scan date <= 2 days ago)
|
|||
|
|
- Paper log freshness (last JSONL entry <= 2 days old)
|
|||
|
|
- Docker containers running
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@flow(name="watchdog")
|
|||
|
|
def watchdog_flow():
|
|||
|
|
health = {
|
|||
|
|
'hz': check_hz_quorum(), # task
|
|||
|
|
'prefect': check_prefect_api(), # task
|
|||
|
|
'scans': check_scan_freshness(), # task
|
|||
|
|
'logs': check_log_freshness(), # task
|
|||
|
|
'timestamp': datetime.now(timezone.utc).isoformat(),
|
|||
|
|
}
|
|||
|
|
overall = 'GREEN' if all(v == 'OK' for v in health.values()
|
|||
|
|
if isinstance(v, str) and v != health['timestamp']) else 'DEGRADED'
|
|||
|
|
health['overall'] = overall
|
|||
|
|
write_hz_health(health) # task
|
|||
|
|
if overall == 'DEGRADED':
|
|||
|
|
logger.warning(f"[WATCHDOG] System degraded: {health}")
|
|||
|
|
return health
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_09_watchdog.py`):
|
|||
|
|
```python
|
|||
|
|
def test_watchdog_detects_all_ok():
|
|||
|
|
result = watchdog_flow()
|
|||
|
|
assert result['overall'] in ('GREEN', 'DEGRADED')
|
|||
|
|
assert 'timestamp' in result
|
|||
|
|
# At minimum, HZ and Prefect should be OK in test environment
|
|||
|
|
assert result['hz'] == 'OK'
|
|||
|
|
assert result['prefect'] == 'OK'
|
|||
|
|
|
|||
|
|
def test_watchdog_writes_to_hz():
|
|||
|
|
import hazelcast, json
|
|||
|
|
watchdog_flow()
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_SYSTEM_HEALTH').blocking()
|
|||
|
|
h = json.loads(m.get('latest'))
|
|||
|
|
assert h['overall'] in ('GREEN', 'DEGRADED')
|
|||
|
|
c.shutdown()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: Watchdog runs on schedule, writes health to HZ, operator can see system status at HZ-MC UI without reading logs.
|
|||
|
|
|
|||
|
|
### MIG1 GATE
|
|||
|
|
|
|||
|
|
All of the following must pass before MIG2:
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh # original 24 tests
|
|||
|
|
pytest ci/test_06_state_persistence.py ci/test_07_exf_flow.py ci/test_08_mc_flow.py ci/test_09_watchdog.py -v
|
|||
|
|
```
|
|||
|
|
PASS criteria: 24 + 8 (new) = 32 tests green. Capital from prior day visible in HZ after manual paper run. MC-Forewarner status readable from HZ. Watchdog health GREEN.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG2 — Hazelcast IMDG: Feature Store + Live OB + Entry Processors
|
|||
|
|
|
|||
|
|
**Goal**: Replace file-based feature passing with a sub-millisecond in-memory feature store. Enable atomic ACB state updates. Replace MockOBProvider with live Binance WebSocket OB data.
|
|||
|
|
|
|||
|
|
**Spec reference**: Sec III (Hazelcast IMDG — DOLPHIN_FEATURES, Near Cache, Jet, Entry Processors).
|
|||
|
|
|
|||
|
|
**Architecture**: "Engine Room" — hot feature state that the trading engine reads without network overhead via Near Cache. The engine reads features, not files.
|
|||
|
|
|
|||
|
|
### MIG2.1 — DOLPHIN_FEATURES IMap + Near Cache
|
|||
|
|
|
|||
|
|
**What to build**: Schema for the HZ feature store and Near Cache configuration.
|
|||
|
|
|
|||
|
|
IMap key schema:
|
|||
|
|
```
|
|||
|
|
DOLPHIN_FEATURES:
|
|||
|
|
"exf_latest" → JSON dict: {indicator_name: value, timestamp, date}
|
|||
|
|
"mc_forewarner_latest" → JSON dict: {status, catastrophic_prob, envelope_score, timestamp}
|
|||
|
|
"acb_state" → JSON dict: {boost, beta, w750_threshold, p60, last_date}
|
|||
|
|
"vol_regime" → JSON dict: {vol_p60: float, current_vol: float, regime_ok: bool, timestamp}
|
|||
|
|
"asset_{SYMBOL}_ob" → JSON dict: {imbalance, fill_prob, depth_quality, agreement, timestamp}
|
|||
|
|
"scan_latest" → JSON dict: {date, vel_div_mean, vel_div_min, asset_count, timestamp}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Near Cache configuration (add to HZ client init in paper_trade_flow.py):
|
|||
|
|
```python
|
|||
|
|
client = hazelcast.HazelcastClient(
|
|||
|
|
cluster_members=["localhost:5701"],
|
|||
|
|
near_caches={
|
|||
|
|
"DOLPHIN_FEATURES": {
|
|||
|
|
"invalidate_on_change": True,
|
|||
|
|
"time_to_live_seconds": 300, # 5 min TTL
|
|||
|
|
"max_idle_seconds": 60,
|
|||
|
|
"eviction_policy": "LRU",
|
|||
|
|
"max_size": 5000,
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_10_hz_feature_store.py`):
|
|||
|
|
```python
|
|||
|
|
def test_near_cache_read_latency():
|
|||
|
|
"""Near Cache reads complete in <1ms after first warm read."""
|
|||
|
|
import time, hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient(near_caches={"DOLPHIN_FEATURES": {...}})
|
|||
|
|
m = c.get_map('DOLPHIN_FEATURES').blocking()
|
|||
|
|
m.put('test_nc', json.dumps({'val': 42}))
|
|||
|
|
m.get('test_nc') # warm the cache
|
|||
|
|
t0 = time.perf_counter()
|
|||
|
|
for _ in range(100):
|
|||
|
|
m.get('test_nc')
|
|||
|
|
elapsed_per_call = (time.perf_counter() - t0) / 100
|
|||
|
|
assert elapsed_per_call < 0.001, f"Near Cache too slow: {elapsed_per_call*1000:.2f}ms"
|
|||
|
|
c.shutdown()
|
|||
|
|
|
|||
|
|
def test_feature_store_schema():
|
|||
|
|
"""All required keys are writable and readable in correct schema."""
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_FEATURES').blocking()
|
|||
|
|
for key in ['exf_latest', 'mc_forewarner_latest', 'acb_state', 'vol_regime']:
|
|||
|
|
m.put(f'test_{key}', json.dumps({'test': True, 'timestamp': '2026-01-01T00:00:00Z'}))
|
|||
|
|
val = m.get(f'test_{key}')
|
|||
|
|
assert val is not None
|
|||
|
|
m.remove(f'test_{key}')
|
|||
|
|
c.shutdown()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG2.2 — ACB Entry Processor (Atomic State Update)
|
|||
|
|
|
|||
|
|
**What to build**: An Entry Processor that updates the ACB boost atomically in HZ without a full read-modify-write round trip. Critical for sub-day ACB updates when new scan bars arrive.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# prod/hz_entry_processors.py
|
|||
|
|
import hazelcast
|
|||
|
|
|
|||
|
|
class ACBBoostUpdateProcessor(hazelcast.serialization.api.IdentifiedDataSerializable):
|
|||
|
|
"""Atomically update ACB boost + beta in DOLPHIN_FEATURES without read-write round trip."""
|
|||
|
|
FACTORY_ID = 1
|
|||
|
|
CLASS_ID = 1
|
|||
|
|
|
|||
|
|
def __init__(self, new_boost=None, new_beta=None, date_str=None):
|
|||
|
|
self.new_boost = new_boost
|
|||
|
|
self.new_beta = new_beta
|
|||
|
|
self.date_str = date_str
|
|||
|
|
|
|||
|
|
def process(self, entry):
|
|||
|
|
import json
|
|||
|
|
current = json.loads(entry.value or '{}')
|
|||
|
|
if self.new_boost is not None:
|
|||
|
|
current['boost'] = self.new_boost
|
|||
|
|
if self.new_beta is not None:
|
|||
|
|
current['beta'] = self.new_beta
|
|||
|
|
current['last_updated'] = self.date_str
|
|||
|
|
entry.set_value(json.dumps(current))
|
|||
|
|
|
|||
|
|
def write_data(self, object_data_output):
|
|||
|
|
object_data_output.write_float(self.new_boost or 0.0)
|
|||
|
|
object_data_output.write_float(self.new_beta or 0.0)
|
|||
|
|
object_data_output.write_utf(self.date_str or '')
|
|||
|
|
|
|||
|
|
def read_data(self, object_data_input):
|
|||
|
|
self.new_boost = object_data_input.read_float()
|
|||
|
|
self.new_beta = object_data_input.read_float()
|
|||
|
|
self.date_str = object_data_input.read_utf()
|
|||
|
|
|
|||
|
|
def get_factory_id(self): return self.FACTORY_ID
|
|||
|
|
def get_class_id(self): return self.CLASS_ID
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions:
|
|||
|
|
```python
|
|||
|
|
def test_acb_entry_processor_atomic():
|
|||
|
|
"""Entry processor updates ACB state without race condition."""
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_FEATURES').blocking()
|
|||
|
|
m.put('acb_state', json.dumps({'boost': 1.0, 'beta': 0.5}))
|
|||
|
|
processor = ACBBoostUpdateProcessor(new_boost=1.35, new_beta=0.7, date_str='2026-01-15')
|
|||
|
|
m.execute_on_key('acb_state', processor)
|
|||
|
|
result = json.loads(m.get('acb_state'))
|
|||
|
|
assert result['boost'] == 1.35
|
|||
|
|
assert result['beta'] == 0.7
|
|||
|
|
c.shutdown()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG2.3 — Live OB: Replace MockOBProvider
|
|||
|
|
|
|||
|
|
**What to build**: Wire `ob_stream_service.py` WebSocket feed into paper_trade_flow.py to replace MockOBProvider. OB features written to HZ per-asset under `asset_{SYMBOL}_ob`.
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
1. `ob_stream_service.py` already verified live on Binance Futures WebSocket
|
|||
|
|
2. Start OB service as a background thread or separate Prefect flow at run start
|
|||
|
|
3. OB service writes per-asset OB snapshot to HZ every 5 seconds
|
|||
|
|
4. `run_engine_day` task reads OB from HZ Near Cache instead of MockOBProvider
|
|||
|
|
5. Graceful fallback: if asset OB data missing or stale (>30s), use neutral values (imbalance=0, fill_prob=0.5)
|
|||
|
|
|
|||
|
|
OB data schema in HZ:
|
|||
|
|
```python
|
|||
|
|
ob_snapshot = {
|
|||
|
|
'imbalance': float, # bid_vol - ask_vol / (bid_vol + ask_vol), range [-1, 1]
|
|||
|
|
'fill_prob': float, # maker fill probability, range [0, 1]
|
|||
|
|
'depth_quality': float, # normalized depth, range [0, 1]
|
|||
|
|
'agreement': float, # OB trend agreement, range [-1, 1]
|
|||
|
|
'timestamp': iso_str, # when this snapshot was taken
|
|||
|
|
'stale': bool, # True if >30s since last update
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_11_live_ob.py`):
|
|||
|
|
```python
|
|||
|
|
def test_ob_stream_connects():
|
|||
|
|
"""OB stream service connects to Binance Futures WebSocket without error."""
|
|||
|
|
# Start OB service for BTCUSDT only, run for 10 seconds, check HZ for data
|
|||
|
|
import threading, time, hazelcast, json
|
|||
|
|
from ob_stream_service import OBStreamService
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_FEATURES').blocking()
|
|||
|
|
svc = OBStreamService(symbols=['BTCUSDT'], hz_map=m)
|
|||
|
|
t = threading.Thread(target=svc.start, daemon=True)
|
|||
|
|
t.start()
|
|||
|
|
time.sleep(10)
|
|||
|
|
svc.stop()
|
|||
|
|
val = m.get('asset_BTCUSDT_ob')
|
|||
|
|
assert val is not None, "OB data not written to HZ after 10s"
|
|||
|
|
data = json.loads(val)
|
|||
|
|
assert -1.0 <= data['imbalance'] <= 1.0
|
|||
|
|
assert 0.0 <= data['fill_prob'] <= 1.0
|
|||
|
|
c.shutdown()
|
|||
|
|
|
|||
|
|
def test_ob_stale_fallback():
|
|||
|
|
"""Engine uses neutral OB values when OB data is stale."""
|
|||
|
|
# Inject stale OB snapshot, verify engine uses fallback (imbalance=0, fill_prob=0.5)
|
|||
|
|
...
|
|||
|
|
assert ob_features.imbalance == 0.0
|
|||
|
|
assert ob_features.fill_prob == 0.5
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: `|imbalance| < 0.3` on typical market conditions (confirmed in spec). Live OB replaces Mock. Expected result: 10-15% reduction in daily P&L variance (per 55-day OB validation: σ² reduced 15.35%).
|
|||
|
|
|
|||
|
|
### MIG2.4 — paper_trade_flow.py reads all features from HZ
|
|||
|
|
|
|||
|
|
**What to build**: Refactor paper_trade_flow.py so `run_engine_day` reads ExF, MC state, OB, vol regime all from `DOLPHIN_FEATURES` HZ IMap (via Near Cache) instead of computing or loading from disk.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@task(persist_result=False)
|
|||
|
|
def run_engine_day(date_str, scan_df, pt_cfg, strategy_name):
|
|||
|
|
client = hazelcast.HazelcastClient(near_caches={"DOLPHIN_FEATURES": {...}})
|
|||
|
|
features = client.get_map('DOLPHIN_FEATURES').blocking()
|
|||
|
|
|
|||
|
|
# Read from HZ instead of computing inline
|
|||
|
|
mc_raw = features.get('mc_forewarner_latest')
|
|||
|
|
mc_status = json.loads(mc_raw)['status'] if mc_raw else 'GREEN'
|
|||
|
|
mc_status = get_effective_mc_status(mc_status, mc_raw) # staleness check
|
|||
|
|
|
|||
|
|
vol_raw = features.get('vol_regime')
|
|||
|
|
vol_ok = json.loads(vol_raw)['regime_ok'] if vol_raw else True
|
|||
|
|
|
|||
|
|
# Pass OB provider backed by HZ
|
|||
|
|
ob_provider = HZOBProvider(features, staleness_threshold_sec=30)
|
|||
|
|
engine.set_ob_provider(ob_provider)
|
|||
|
|
...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG2 GATE
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh
|
|||
|
|
pytest ci/test_10_hz_feature_store.py ci/test_11_live_ob.py -v
|
|||
|
|
```
|
|||
|
|
PASS criteria: 32 + 4 (new) = 36 tests green. Live OB data flowing to HZ. Engine reads all features from HZ. No MockOBProvider in paper_trade_flow.py. Capital persisted day-over-day (verify manually over 3 consecutive days).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG3 — Survival Stack: Graceful Degradation (Control Theory)
|
|||
|
|
|
|||
|
|
**Goal**: Replace binary "up/down" thinking with a continuous, multiplicative risk controller. The system degrades gracefully under component failure rather than stopping or operating at full risk.
|
|||
|
|
|
|||
|
|
**Spec reference**: Sec VI (Control Theory Survival Stack), 5 categories, 4 postures, hysteresis.
|
|||
|
|
|
|||
|
|
**Design**: All 5 category multipliers multiply together to produce a final `Rm` (risk multiplier). Rm then maps to one of 4 operational postures. Hysteresis prevents rapid posture oscillation.
|
|||
|
|
|
|||
|
|
### MIG3.1 — 5-Category Risk Multiplier (Rm)
|
|||
|
|
|
|||
|
|
**File to create**: `nautilus_dolphin/nautilus_dolphin/nautilus/survival_stack.py`
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Rm = Cat1 × Cat2 × Cat3 × Cat4 × Cat5
|
|||
|
|
|
|||
|
|
Cat1 — Invariants (binary kill, <10ms response):
|
|||
|
|
Input: HZ quorum status + Nautilus heartbeat
|
|||
|
|
Rule: if HZ_nodes < 1 OR heartbeat_age > 30s → Rm1 = 0.0 (HIBERNATE)
|
|||
|
|
Else: Rm1 = 1.0
|
|||
|
|
|
|||
|
|
Cat2 — Structural (MC-Forewarner staleness + status):
|
|||
|
|
Input: MC status (GREEN/ORANGE/RED) + timestamp age
|
|||
|
|
GREEN, fresh → Rm2 = 1.0
|
|||
|
|
ORANGE, fresh → Rm2 = 0.5
|
|||
|
|
RED, fresh → Rm2 = 0.1 (exits only)
|
|||
|
|
Any status, stale (>6h) → Rm2 = exp(-staleness_hours / 3.0) ← exponential decay
|
|||
|
|
Rule: Rm2 = base_rm2 × exp(-max(0, staleness_hours - 6) / 3.0)
|
|||
|
|
|
|||
|
|
Cat3 — Microstructure (OB jitter/depth):
|
|||
|
|
Input: OB depth_quality + fill_prob + imbalance stability
|
|||
|
|
OB healthy (depth_quality > 0.7, fill_prob > 0.5) → Rm3 = 1.0
|
|||
|
|
OB jittery (fill_prob < 0.3 or depth_quality < 0.3) → Rm3 = 0.3 (passive quoting only)
|
|||
|
|
OB stale (>30s) → Rm3 = 0.5
|
|||
|
|
Rule: Rm3 = clip(0.3 + 0.7 * min(depth_quality, fill_prob), 0.3, 1.0)
|
|||
|
|
|
|||
|
|
Cat4 — Environmental (DVOL spike):
|
|||
|
|
Input: DVOL (Deribit BTC implied vol 30-day)
|
|||
|
|
Baseline DVOL (no spike) → Rm4 = 1.0
|
|||
|
|
DVOL spike detected (>2σ above 30-day mean) → Rm4 drops to 0.3 immediately (fast attack)
|
|||
|
|
Recovery: Rm4 recovers to 1.0 over 60 minutes (slow recovery, exponential)
|
|||
|
|
Rule: impulse-decay — Rm4 = 0.3 + (1.0 - 0.3) * (1 - exp(-t_since_spike / 60))
|
|||
|
|
|
|||
|
|
Cat5 — Capital Stress (sigmoid on drawdown):
|
|||
|
|
Input: current_drawdown = 1 - capital / peak_capital
|
|||
|
|
Rule: Rm5 = 1 / (1 + exp(20 * (drawdown - 0.12)))
|
|||
|
|
Effect: Rm5 ≈ 1.0 at DD<5%, ≈ 0.5 at DD=12%, ≈ 0.1 at DD=20%
|
|||
|
|
No cliff — continuous degradation as DD increases
|
|||
|
|
|
|||
|
|
Final: Rm = Rm1 × Rm2 × Rm3 × Rm4 × Rm5
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG3.2 — 4 Operational Postures + Hysteresis
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Rm → Posture mapping (with hysteresis deadband):
|
|||
|
|
Rm >= 0.85 → APEX (abs_max_lev=6x, aggressive, full signal)
|
|||
|
|
Rm >= 0.40 → STALKER (abs_max_lev=2x, limit orders only)
|
|||
|
|
Rm >= 0.10 → TURTLE (passive only, existing positions exit, no new entries)
|
|||
|
|
Rm < 0.10 → HIBERNATE (all-stop: close all positions, no new signals)
|
|||
|
|
|
|||
|
|
Hysteresis (Schmitt trigger):
|
|||
|
|
To DOWNGRADE (e.g., APEX → STALKER): threshold crossed + sustained for 2 consecutive checks
|
|||
|
|
To UPGRADE (e.g., STALKER → APEX): threshold exceeded + sustained for 5 consecutive checks
|
|||
|
|
Purpose: prevent rapid posture oscillation on noisy Rm boundary
|
|||
|
|
|
|||
|
|
Rm written to HZ DOLPHIN_SAFETY AtomicReference:
|
|||
|
|
{'posture': 'APEX'|'STALKER'|'TURTLE'|'HIBERNATE', 'Rm': float, 'timestamp': iso,
|
|||
|
|
'breakdown': {'Cat1': float, 'Cat2': float, 'Cat3': float, 'Cat4': float, 'Cat5': float}}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG3.3 — Integration into paper_trade_flow.py
|
|||
|
|
|
|||
|
|
`run_engine_day` reads posture from HZ before any engine action:
|
|||
|
|
```python
|
|||
|
|
safety_ref = client.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
|
|||
|
|
safety_state = json.loads(safety_ref.get() or '{}')
|
|||
|
|
posture = safety_state.get('posture', 'APEX')
|
|||
|
|
Rm = safety_state.get('Rm', 1.0)
|
|||
|
|
|
|||
|
|
if posture == 'HIBERNATE':
|
|||
|
|
logger.critical("[POSTURE] HIBERNATE — no trades today")
|
|||
|
|
return {'pnl': 0.0, 'trades': 0, 'posture': 'HIBERNATE'}
|
|||
|
|
|
|||
|
|
# Apply Rm to abs_max_leverage
|
|||
|
|
effective_max_lev = pt_cfg['abs_max_leverage'] * Rm
|
|||
|
|
engine.abs_max_leverage = max(1.0, effective_max_lev)
|
|||
|
|
|
|||
|
|
if posture == 'STALKER':
|
|||
|
|
engine.abs_max_leverage = min(engine.abs_max_leverage, 2.0)
|
|||
|
|
elif posture == 'TURTLE':
|
|||
|
|
# No new entries — only manage existing positions
|
|||
|
|
engine.accept_new_entries = False
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_12_survival_stack.py`):
|
|||
|
|
```python
|
|||
|
|
def test_rm_calculation_all_green():
|
|||
|
|
"""All-green conditions → Rm = 1.0, posture = APEX."""
|
|||
|
|
ss = SurvivalStack(...)
|
|||
|
|
Rm, breakdown = ss.compute_rm(
|
|||
|
|
hz_nodes=1, heartbeat_age_s=1.0,
|
|||
|
|
mc_status='GREEN', mc_staleness_hours=0.5,
|
|||
|
|
ob_depth_quality=0.9, ob_fill_prob=0.8, ob_stale=False,
|
|||
|
|
dvol_spike=False, t_since_spike_min=999,
|
|||
|
|
drawdown=0.03,
|
|||
|
|
)
|
|||
|
|
assert Rm >= 0.95, f"Expected ~1.0, got {Rm}"
|
|||
|
|
assert breakdown['Cat1'] == 1.0
|
|||
|
|
assert breakdown['Cat5'] >= 0.95
|
|||
|
|
|
|||
|
|
def test_rm_hz_down_triggers_hibernate():
|
|||
|
|
"""HZ quorum=0 → Cat1=0 → Rm=0 → HIBERNATE."""
|
|||
|
|
ss = SurvivalStack(...)
|
|||
|
|
Rm, _ = ss.compute_rm(hz_nodes=0, ...)
|
|||
|
|
assert Rm == 0.0
|
|||
|
|
assert ss.get_posture(Rm) == 'HIBERNATE'
|
|||
|
|
|
|||
|
|
def test_rm_drawdown_sigmoid():
|
|||
|
|
"""Drawdown 12% → Rm5 ≈ 0.5."""
|
|||
|
|
ss = SurvivalStack(...)
|
|||
|
|
Rm5 = ss._cat5_capital_stress(drawdown=0.12)
|
|||
|
|
assert 0.4 <= Rm5 <= 0.6, f"Sigmoid expected ~0.5 at DD=12%, got {Rm5}"
|
|||
|
|
|
|||
|
|
def test_rm_dvol_spike_impulse_decay():
|
|||
|
|
"""DVOL spike → Cat4=0.3. After 60min → Cat4≈1.0."""
|
|||
|
|
ss = SurvivalStack(...)
|
|||
|
|
assert ss._cat4_dvol(dvol_spike=True, t_since_spike_min=0) == pytest.approx(0.3, abs=0.05)
|
|||
|
|
assert ss._cat4_dvol(dvol_spike=True, t_since_spike_min=60) >= 0.9
|
|||
|
|
|
|||
|
|
def test_hysteresis_prevents_oscillation():
|
|||
|
|
"""Rm oscillating at boundary does not cause rapid posture flips."""
|
|||
|
|
ss = SurvivalStack(hysteresis_down=2, hysteresis_up=5)
|
|||
|
|
postures = []
|
|||
|
|
for Rm in [0.84, 0.86, 0.84, 0.86, 0.84]: # oscillating around APEX/STALKER boundary
|
|||
|
|
postures.append(ss.update_posture(Rm))
|
|||
|
|
# Should NOT oscillate — hysteresis holds the prior posture
|
|||
|
|
assert len(set(postures)) == 1, f"Hysteresis failed — postures: {postures}"
|
|||
|
|
|
|||
|
|
def test_posture_written_to_hz():
|
|||
|
|
"""Posture and Rm are written to HZ DOLPHIN_SAFETY AtomicReference."""
|
|||
|
|
import hazelcast, json
|
|||
|
|
ss = SurvivalStack(...)
|
|||
|
|
Rm, _ = ss.compute_rm(...)
|
|||
|
|
ss.write_to_hz(Rm)
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
|
|||
|
|
state = json.loads(ref.get())
|
|||
|
|
assert state['posture'] in ('APEX', 'STALKER', 'TURTLE', 'HIBERNATE')
|
|||
|
|
assert 0.0 <= state['Rm'] <= 1.0
|
|||
|
|
c.shutdown()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: 36 + 6 = 42 tests green. Survival stack integrates into paper_trade_flow.py. Manual test: kill Hazelcast container → HIBERNATE triggers → restart HZ → system recovers to APEX within 2 check cycles.
|
|||
|
|
|
|||
|
|
### MIG3 GATE
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh
|
|||
|
|
pytest ci/test_12_survival_stack.py -v
|
|||
|
|
```
|
|||
|
|
Also verify manually:
|
|||
|
|
- Simulate MC-Forewarner returning RED → STALKER posture, max_lev=2x
|
|||
|
|
- Simulate drawdown 15% in ledger → Rm5 ≈ 0.35, posture degrades
|
|||
|
|
- System recovers gracefully when conditions improve (hysteresis up threshold met)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG4 — Nautilus-Trader Integration: Rust Execution Core
|
|||
|
|
|
|||
|
|
**Goal**: Replace the Python paper trading loop with Nautilus-Trader as the execution engine. NDAlphaEngine becomes a Nautilus Actor. Binance Futures orders routed through Nautilus adapter. This achieves true event-driven, sub-millisecond execution.
|
|||
|
|
|
|||
|
|
**Spec reference**: Sec V (Nautilus-Trader — Actor model, AsyncDataEngine, Rust networking, zero-copy Arrow).
|
|||
|
|
|
|||
|
|
**Why Nautilus**: Rust core, zero-copy Arrow data transport, proper Actor isolation, production-grade risk management. The Python engine (paper_trade_flow.py) was always a stepping stone.
|
|||
|
|
|
|||
|
|
### MIG4.1 — NautilusActor Wrapper
|
|||
|
|
|
|||
|
|
**Prereq**: `pip install nautilus_trader>=1.224` in Siloqy venv.
|
|||
|
|
|
|||
|
|
**File to create**: `nautilus_dolphin/nautilus_dolphin/nautilus/nautilus_actor.py`
|
|||
|
|
|
|||
|
|
Key design:
|
|||
|
|
- NautilusActor wraps NDAlphaEngine
|
|||
|
|
- Subscribes to bar data (5-second OHLCV bars for all 50 assets)
|
|||
|
|
- On each bar: updates eigenvalue features from HZ Near Cache
|
|||
|
|
- On each scan completion (5-minute window): calls engine.process_bar()
|
|||
|
|
- Orders submitted via Nautilus OrderFactory → Binance Futures adapter
|
|||
|
|
- Actor reads posture from HZ DOLPHIN_SAFETY before each order submission
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from nautilus_trader.trading.actor import Actor
|
|||
|
|
from nautilus_trader.model.data import Bar, BarType
|
|||
|
|
from nautilus_trader.model.orders import MarketOrder, LimitOrder
|
|||
|
|
from nautilus_trader.common.clock import LiveClock
|
|||
|
|
from nautilus_trader.core.message import Event
|
|||
|
|
|
|||
|
|
class DolphinActor(Actor):
|
|||
|
|
def __init__(self, engine: NDAlphaEngine, hz_features_map, config):
|
|||
|
|
super().__init__(config)
|
|||
|
|
self.engine = engine
|
|||
|
|
self.hz = hz_features_map
|
|||
|
|
self._bar_buffer = {} # symbol → list of bars
|
|||
|
|
|
|||
|
|
def on_start(self):
|
|||
|
|
# Subscribe to 5s bars for all assets
|
|||
|
|
for symbol in self.engine.asset_columns:
|
|||
|
|
bar_type = BarType.from_str(f"{symbol}.BINANCE-5-SECOND-LAST-EXTERNAL")
|
|||
|
|
self.subscribe_bars(bar_type)
|
|||
|
|
|
|||
|
|
def on_bar(self, bar: Bar):
|
|||
|
|
symbol = bar.bar_type.instrument_id.symbol.value
|
|||
|
|
self._bar_buffer.setdefault(symbol, []).append(bar)
|
|||
|
|
if self._should_process(bar):
|
|||
|
|
self._run_engine_on_bar_batch()
|
|||
|
|
|
|||
|
|
def _run_engine_on_bar_batch(self):
|
|||
|
|
posture_raw = self.cache.get('DOLPHIN_SAFETY')
|
|||
|
|
posture = json.loads(posture_raw)['posture'] if posture_raw else 'APEX'
|
|||
|
|
if posture == 'HIBERNATE':
|
|||
|
|
return
|
|||
|
|
Rm = json.loads(posture_raw).get('Rm', 1.0) if posture_raw else 1.0
|
|||
|
|
signals = self.engine.process_bar_batch(self._bar_buffer, Rm=Rm)
|
|||
|
|
for signal in signals:
|
|||
|
|
self._submit_order(signal, posture)
|
|||
|
|
|
|||
|
|
def _submit_order(self, signal, posture):
|
|||
|
|
if posture == 'TURTLE':
|
|||
|
|
return # No new entries in TURTLE
|
|||
|
|
order_type = LimitOrder if posture == 'STALKER' else MarketOrder
|
|||
|
|
order = self.order_factory.create(
|
|||
|
|
instrument_id=signal.instrument_id,
|
|||
|
|
order_side=signal.side,
|
|||
|
|
quantity=signal.quantity,
|
|||
|
|
price=signal.limit_price if posture == 'STALKER' else None,
|
|||
|
|
order_type=order_type,
|
|||
|
|
)
|
|||
|
|
self.submit_order(order)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG4.2 — Docker: Add Nautilus Container
|
|||
|
|
|
|||
|
|
**File to modify**: `prod/docker-compose.yml`
|
|||
|
|
|
|||
|
|
Add Nautilus-Trader container (or run as sidecar process):
|
|||
|
|
```yaml
|
|||
|
|
services:
|
|||
|
|
dolphin-actor:
|
|||
|
|
image: nautechsystems/nautilus_trader:latest
|
|||
|
|
volumes:
|
|||
|
|
- ../nautilus_dolphin:/app/nautilus_dolphin:ro
|
|||
|
|
- ../vbt_cache:/app/vbt_cache:ro
|
|||
|
|
environment:
|
|||
|
|
- HZ_CLUSTER=hazelcast:5701
|
|||
|
|
- BINANCE_API_KEY=${BINANCE_API_KEY}
|
|||
|
|
- BINANCE_API_SECRET=${BINANCE_API_SECRET}
|
|||
|
|
- TRADING_MODE=paper # paper = no real orders
|
|||
|
|
depends_on:
|
|||
|
|
- hazelcast
|
|||
|
|
restart: unless-stopped
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
For paper trading: use Nautilus Backtest Engine or SimulatedExchange (no real orders). For live: swap to BinanceFuturesDataClient + BinanceFuturesExecutionClient.
|
|||
|
|
|
|||
|
|
### MIG4.3 — Zero-copy Arrow: HZ → Nautilus
|
|||
|
|
|
|||
|
|
**What to build**: Eigenvalue scan DataFrames passed from Prefect scanner flow → HZ → Nautilus Actor using Apache Arrow IPC (zero-copy).
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Scanner writes Arrow record batch to HZ
|
|||
|
|
import pyarrow as pa
|
|||
|
|
import hazelcast
|
|||
|
|
|
|||
|
|
schema = pa.schema([
|
|||
|
|
('symbol', pa.string()),
|
|||
|
|
('vel_div', pa.float64()),
|
|||
|
|
('lambda_max_w50', pa.float64()),
|
|||
|
|
('lambda_max_w150', pa.float64()),
|
|||
|
|
('instability', pa.float64()),
|
|||
|
|
('timestamp', pa.int64()),
|
|||
|
|
])
|
|||
|
|
|
|||
|
|
def write_scan_to_hz(df: pd.DataFrame, hz_map):
|
|||
|
|
table = pa.Table.from_pandas(df, schema=schema)
|
|||
|
|
sink = pa.BufferOutputStream()
|
|||
|
|
writer = pa.ipc.new_file(sink, table.schema)
|
|||
|
|
writer.write_table(table)
|
|||
|
|
writer.close()
|
|||
|
|
arrow_bytes = sink.getvalue().to_pybytes()
|
|||
|
|
hz_map.put('scan_arrow_latest', arrow_bytes)
|
|||
|
|
|
|||
|
|
# Nautilus Actor reads Arrow from HZ
|
|||
|
|
def read_scan_from_hz(hz_map) -> pd.DataFrame:
|
|||
|
|
raw = hz_map.get('scan_arrow_latest')
|
|||
|
|
if raw is None:
|
|||
|
|
return None
|
|||
|
|
reader = pa.ipc.open_file(pa.py_buffer(raw))
|
|||
|
|
return reader.read_all().to_pandas()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_13_nautilus_integration.py`):
|
|||
|
|
```python
|
|||
|
|
def test_dolphin_actor_initializes():
|
|||
|
|
"""DolphinActor can be constructed with NDAlphaEngine and HZ map."""
|
|||
|
|
from nautilus_dolphin.nautilus.nautilus_actor import DolphinActor
|
|||
|
|
engine = build_test_engine()
|
|||
|
|
actor = DolphinActor(engine=engine, hz_features_map=MockHZMap(), config={})
|
|||
|
|
assert actor is not None
|
|||
|
|
assert actor.engine is engine
|
|||
|
|
|
|||
|
|
def test_arrow_hz_roundtrip():
|
|||
|
|
"""Scan DataFrame → Arrow IPC → HZ → Arrow IPC → DataFrame is lossless."""
|
|||
|
|
import pandas as pd, numpy as np
|
|||
|
|
df = pd.DataFrame({
|
|||
|
|
'symbol': ['BTCUSDT', 'ETHUSDT'],
|
|||
|
|
'vel_div': [-0.03, -0.01],
|
|||
|
|
'lambda_max_w50': [1.2, 0.9],
|
|||
|
|
'lambda_max_w150': [1.5, 1.0],
|
|||
|
|
})
|
|||
|
|
hz = MockHZMap()
|
|||
|
|
write_scan_to_hz(df, hz)
|
|||
|
|
df2 = read_scan_from_hz(hz)
|
|||
|
|
pd.testing.assert_frame_equal(df, df2)
|
|||
|
|
|
|||
|
|
def test_actor_respects_hibernate_posture():
|
|||
|
|
"""DolphinActor does not submit orders when posture=HIBERNATE."""
|
|||
|
|
actor = DolphinActor(...)
|
|||
|
|
actor._posture_override = 'HIBERNATE'
|
|||
|
|
signals = actor._run_engine_on_bar_batch()
|
|||
|
|
assert signals == [] or signals is None
|
|||
|
|
|
|||
|
|
def test_nautilus_paper_run_no_crash():
|
|||
|
|
"""NautilusTrader BacktestEngine with DolphinActor runs 1 day without crash."""
|
|||
|
|
from nautilus_trader.backtest.engine import BacktestEngine
|
|||
|
|
engine = BacktestEngine(config=BacktestEngineConfig(trader_id="DOLPHIN-001"))
|
|||
|
|
actor = DolphinActor(...)
|
|||
|
|
engine.add_actor(actor)
|
|||
|
|
engine.run(start=pd.Timestamp('2026-01-15'), end=pd.Timestamp('2026-01-16'))
|
|||
|
|
# ASSERT: runs without exception
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: 42 + 4 = 46 tests green. DolphinActor processes one backtest day without crash. Arrow IPC roundtrip lossless. HIBERNATE posture prevents order submission.
|
|||
|
|
|
|||
|
|
### MIG4 GATE
|
|||
|
|
|
|||
|
|
Manual integration test:
|
|||
|
|
```bash
|
|||
|
|
# Start Nautilus actor in paper mode for one day
|
|||
|
|
python -m nautilus_dolphin.nautilus.run_papertrade --date 2026-01-15 --posture APEX
|
|||
|
|
# ASSERT: trades > 0 logged, no crashes, capital > 0 at end
|
|||
|
|
# ASSERT: orders visible in Nautilus portfolio summary
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Full CI gate:
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh
|
|||
|
|
pytest ci/test_13_nautilus_integration.py -v
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG5 — LONG System Activation: Green Deployment
|
|||
|
|
|
|||
|
|
**Goal**: Activate bidirectional trading (SHORT + LONG) on the green deployment. Requires LONG validation result from b79rt78uv to confirm PF > 1.05 on 795-day klines.
|
|||
|
|
|
|||
|
|
**Spec reference**: LAYER_BRINGUP_PLAN.md Layer 7, green.yml config.
|
|||
|
|
|
|||
|
|
**Prerequisites**:
|
|||
|
|
- [ ] b79rt78uv result: LONG PF > 1.05 on 795-day klines, WR > 42%
|
|||
|
|
- [ ] Regime detector built: identifies when LONG conditions are active
|
|||
|
|
- [ ] Capital arbiter: assigns SHORT_weight + LONG_weight per day (sum = 1.0)
|
|||
|
|
|
|||
|
|
### MIG5.1 — Validate LONG Result
|
|||
|
|
|
|||
|
|
When b79rt78uv completes, verify:
|
|||
|
|
```python
|
|||
|
|
# Expected assertions from test_pf_klines_2y_long.py:
|
|||
|
|
assert long_pf > 1.05 # Minimum viable LONG
|
|||
|
|
assert long_wr > 0.40 # 40% win rate minimum
|
|||
|
|
assert long_roi > 0.0 # Net positive over 795 days
|
|||
|
|
assert long_max_dd < 0.30 # Drawdown bounded
|
|||
|
|
assert long_trades > 100 # Sufficient sample size
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
If LONG fails (PF < 1.05): green.yml stays SHORT-only. Do not activate LONG. Research continues.
|
|||
|
|
|
|||
|
|
### MIG5.2 — Regime Arbiter
|
|||
|
|
|
|||
|
|
**What to build**: `capital_arbiter.py` — determines SHORT_weight vs LONG_weight each day based on regime state.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class CapitalArbiter:
|
|||
|
|
def get_weights(self, date_str, features) -> dict:
|
|||
|
|
"""
|
|||
|
|
Returns {'short': float, 'long': float} summing to 1.0.
|
|||
|
|
Based on: vel_div direction, BTC trend, ExF signals.
|
|||
|
|
"""
|
|||
|
|
vel_div_mean = features.get('vel_div_mean', 0.0)
|
|||
|
|
btc_7bar_return = features.get('btc_7bar_return', 0.0)
|
|||
|
|
|
|||
|
|
if vel_div_mean < -0.02 and btc_7bar_return < 0:
|
|||
|
|
# Strong structural breakdown — favor SHORT
|
|||
|
|
return {'short': 0.7, 'long': 0.3}
|
|||
|
|
elif vel_div_mean > 0.02 and btc_7bar_return > 0:
|
|||
|
|
# Strong recovery — favor LONG
|
|||
|
|
return {'short': 0.3, 'long': 0.7}
|
|||
|
|
else:
|
|||
|
|
# Neutral — equal weight
|
|||
|
|
return {'short': 0.5, 'long': 0.5}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG5.3 — green.yml and green deployment
|
|||
|
|
|
|||
|
|
Update `prod/configs/green.yml`:
|
|||
|
|
```yaml
|
|||
|
|
direction: bidirectional # was: short_only
|
|||
|
|
long_vel_div_threshold: 0.02
|
|||
|
|
long_extreme_threshold: 0.05
|
|||
|
|
capital_arbiter: equal_weight # or: regime_weighted
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Register green deployment in Prefect:
|
|||
|
|
```bash
|
|||
|
|
PREFECT_API_URL=http://localhost:4200/api \
|
|||
|
|
python -c "
|
|||
|
|
from prod.paper_trade_flow import dolphin_paper_trade
|
|||
|
|
dolphin_paper_trade.to_deployment(
|
|||
|
|
name='dolphin-paper-green',
|
|||
|
|
cron='10 0 * * *', # 00:10 UTC (5 min after blue)
|
|||
|
|
parameters={'config': 'prod/configs/green.yml'},
|
|||
|
|
).apply()
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_14_long_system.py`):
|
|||
|
|
```python
|
|||
|
|
def test_long_system_requires_validation():
|
|||
|
|
"""green.yml direction=bidirectional is only set after LONG PF > 1.05."""
|
|||
|
|
import yaml
|
|||
|
|
with open('prod/configs/green.yml') as f:
|
|||
|
|
cfg = yaml.safe_load(f)
|
|||
|
|
if cfg.get('direction') == 'bidirectional':
|
|||
|
|
# If bidirectional is set, LONG validation must have passed
|
|||
|
|
assert cfg.get('long_vel_div_threshold', 0) > 0, "LONG threshold not set"
|
|||
|
|
assert cfg.get('long_extreme_threshold', 0) > 0, "LONG extreme threshold not set"
|
|||
|
|
|
|||
|
|
def test_capital_arbiter_weights_sum_to_one():
|
|||
|
|
arb = CapitalArbiter()
|
|||
|
|
for scenario in [
|
|||
|
|
{'vel_div_mean': -0.05, 'btc_7bar_return': -0.01},
|
|||
|
|
{'vel_div_mean': +0.05, 'btc_7bar_return': +0.01},
|
|||
|
|
{'vel_div_mean': 0.0, 'btc_7bar_return': 0.0},
|
|||
|
|
]:
|
|||
|
|
w = arb.get_weights('2026-01-15', scenario)
|
|||
|
|
assert abs(w['short'] + w['long'] - 1.0) < 1e-6
|
|||
|
|
assert w['short'] > 0 and w['long'] > 0
|
|||
|
|
|
|||
|
|
def test_green_engine_fires_long_trades():
|
|||
|
|
"""Green deployment engine fires LONG trades on LONG signal days."""
|
|||
|
|
# Use a scan date where vel_div > 0.02 (LONG signal)
|
|||
|
|
# ASSERT: engine produces trades with direction=+1
|
|||
|
|
...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: 46 + 3 = 49 tests green. Green deployment running alongside blue. Capital arbiter weights summing to 1.0. Both SHORT and LONG trades logged.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG6 — Hazelcast Jet: Reactive ACB Stream Processing
|
|||
|
|
|
|||
|
|
**Goal**: Replace batch ACB preload (once daily) with reactive sub-day ACB that updates on each new scan bar. HZ Jet pipeline processes eigenvalue stream, updates ACB state atomically via Entry Processor. Sub-day ACB enables adverse-turn exits within the trading day.
|
|||
|
|
|
|||
|
|
**Spec reference**: Sec III (Hazelcast Jet stream processing), Phase MIG6.
|
|||
|
|
|
|||
|
|
**Impact**: Per 55-day research, sub-day ACBv6 has +3-4% ROI potential. Currently not implemented in ND engine path.
|
|||
|
|
|
|||
|
|
### MIG6.1 — Jet Pipeline Design
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[ARB512 Scanner writes JSON]
|
|||
|
|
→ [File watcher (Prefect sensor flow)]
|
|||
|
|
→ [Publishes scan to HZ Jet Topic "dolphin.scan.bars"]
|
|||
|
|
→ [Jet pipeline: eigenvalue processor]
|
|||
|
|
→ [Computes vel_div, update volatility, update ACB boost]
|
|||
|
|
→ [ACBBoostUpdateProcessor (Entry Processor) → DOLPHIN_FEATURES "acb_state"]
|
|||
|
|
→ [Nautilus Actor reads updated ACB state via Near Cache]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG6.2 — Scan File Watcher Prefect Flow
|
|||
|
|
|
|||
|
|
**File to create**: `prod/scan_watcher_flow.py`
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@flow(name="scan-watcher")
|
|||
|
|
def scan_watcher_flow():
|
|||
|
|
"""Watches eigenvalues dir for new scan files. Publishes to HZ Jet topic."""
|
|||
|
|
import watchdog.events, watchdog.observers
|
|||
|
|
last_seen = set()
|
|||
|
|
while True:
|
|||
|
|
current = set(glob.glob(f"{SCANS_DIR}/*/*.json"))
|
|||
|
|
new_files = current - last_seen
|
|||
|
|
for f in sorted(new_files):
|
|||
|
|
publish_scan_to_jet(f) # task
|
|||
|
|
last_seen = current
|
|||
|
|
time.sleep(5)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG6.3 — Sub-day ACB Adverse-Turn Exits
|
|||
|
|
|
|||
|
|
When ACB boost drops significantly (>0.2x reduction) within a day → signal potential regime adverse turn. Engine checks for open SHORT positions and triggers early exit (subject to OB).
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def on_acb_state_update(old_acb_state, new_acb_state, engine):
|
|||
|
|
"""Called by Jet processor when ACB state updates."""
|
|||
|
|
boost_drop = old_acb_state['boost'] - new_acb_state['boost']
|
|||
|
|
if boost_drop > 0.2 and engine.has_open_positions():
|
|||
|
|
# Adverse turn signal: boost dropped significantly
|
|||
|
|
ob_quality = get_ob_quality()
|
|||
|
|
if ob_quality > 0.5:
|
|||
|
|
engine.request_orderly_exit() # maker fill preferred
|
|||
|
|
else:
|
|||
|
|
engine.request_duress_exit() # bypass OB wait, market order
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_15_jet_pipeline.py`):
|
|||
|
|
```python
|
|||
|
|
def test_jet_topic_publish():
|
|||
|
|
"""Scan file published to HZ Jet topic is received by subscriber."""
|
|||
|
|
import hazelcast, time
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
topic = c.get_topic('dolphin.scan.bars').blocking()
|
|||
|
|
received = []
|
|||
|
|
topic.add_message_listener(lambda msg: received.append(msg.message_object))
|
|||
|
|
topic.publish({'vel_div': -0.03, 'timestamp': time.time()})
|
|||
|
|
time.sleep(0.1)
|
|||
|
|
assert len(received) == 1
|
|||
|
|
assert received[0]['vel_div'] == -0.03
|
|||
|
|
c.shutdown()
|
|||
|
|
|
|||
|
|
def test_acb_entry_processor_subday():
|
|||
|
|
"""ACB Entry Processor updates boost atomically from Jet pipeline."""
|
|||
|
|
# Simulate mid-day ACB update: boost drops from 1.3 to 0.9
|
|||
|
|
processor = ACBBoostUpdateProcessor(new_boost=0.9, date_str='2026-01-15')
|
|||
|
|
hz_map.execute_on_key('acb_state', processor)
|
|||
|
|
updated = json.loads(hz_map.get('acb_state'))
|
|||
|
|
assert updated['boost'] == 0.9
|
|||
|
|
|
|||
|
|
def test_adverse_turn_triggers_exit():
|
|||
|
|
"""Boost drop >0.2x with open positions triggers exit request."""
|
|||
|
|
engine = build_test_engine_with_open_position()
|
|||
|
|
old_state = {'boost': 1.3, 'beta': 0.7}
|
|||
|
|
new_state = {'boost': 1.0, 'beta': 0.5}
|
|||
|
|
on_acb_state_update(old_state, new_state, engine)
|
|||
|
|
assert engine.exit_requested, "Adverse turn should trigger exit"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: 49 + 3 = 52 tests green. Sub-day ACB updating on new scan files. Adverse-turn exit fires on simulated boost drop. Jet pipeline end-to-end test with mock scanner.
|
|||
|
|
|
|||
|
|
### MIG6 GATE
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh
|
|||
|
|
pytest ci/test_15_jet_pipeline.py -v
|
|||
|
|
# ASSERT: 52 tests green
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Operational check: Start scanner, watch HZ-MC topic dashboard, verify scan events appear in `dolphin.scan.bars` topic within 10s of each new JSON file.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## MIG7 — Multi-Asset Scaling: 50 → 400 Assets
|
|||
|
|
|
|||
|
|
**Goal**: Scale from 50 to 400 assets while maintaining performance. Current memory footprint limits scaling. Distribute feature store across sharded HZ IMap. Multi-market capability.
|
|||
|
|
|
|||
|
|
**Spec reference**: Phase MIG7, MEMORY.md ("PROVEN better: higher returns + signal fidelity in tests. Blocked by RAM — optimize memory footprint FIRST, then scale").
|
|||
|
|
|
|||
|
|
**Prerequisite (HARD)**: RAM optimization before scaling. Profile current 50-asset memory footprint first.
|
|||
|
|
|
|||
|
|
### MIG7.1 — Memory Footprint Analysis
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Profile current memory usage
|
|||
|
|
python -c "
|
|||
|
|
import tracemalloc, sys
|
|||
|
|
tracemalloc.start()
|
|||
|
|
# ... run engine on 50 assets for 1 day ...
|
|||
|
|
snapshot = tracemalloc.take_snapshot()
|
|||
|
|
stats = snapshot.statistics('lineno')
|
|||
|
|
for s in stats[:20]:
|
|||
|
|
print(s)
|
|||
|
|
"
|
|||
|
|
# ASSERT: identify top memory consumers
|
|||
|
|
# TARGET: < 4GB for 50 assets (< 32GB for 400 assets)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Known memory hotspots (probable):
|
|||
|
|
- `_price_histories`: rolling price buffer per asset × bar count
|
|||
|
|
- VBT parquet cache: 55 days × 50 assets × ~5k bars each
|
|||
|
|
- ACB: p60 threshold storage (per day, per asset)
|
|||
|
|
|
|||
|
|
### MIG7.2 — Sharded IMap for 400-Asset Feature Store
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Shard by asset group (10 shards × 40 assets each)
|
|||
|
|
def get_shard_map_name(symbol: str) -> str:
|
|||
|
|
shard = hash(symbol) % 10
|
|||
|
|
return f"DOLPHIN_FEATURES_SHARD_{shard:02d}"
|
|||
|
|
|
|||
|
|
# Each shard has its own Near Cache
|
|||
|
|
near_cache_config = {f"DOLPHIN_FEATURES_SHARD_{i:02d}": {...} for i in range(10)}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### MIG7.3 — Distributed Worker Pool
|
|||
|
|
|
|||
|
|
HZ IMDG + Prefect external workers on multiple machines (or Docker replicas):
|
|||
|
|
- Worker 1: assets 0-99 (BTCUSDT group)
|
|||
|
|
- Worker 2: assets 100-199
|
|||
|
|
- Worker 3: assets 200-299
|
|||
|
|
- Worker 4: assets 300-399
|
|||
|
|
|
|||
|
|
Capital arbiter aggregates signals from all workers before order submission.
|
|||
|
|
|
|||
|
|
Test assertions (`ci/test_16_scaling.py`):
|
|||
|
|
```python
|
|||
|
|
def test_memory_footprint_50_assets():
|
|||
|
|
"""50-asset engine uses < 4GB RAM."""
|
|||
|
|
import tracemalloc
|
|||
|
|
tracemalloc.start()
|
|||
|
|
run_engine_50_assets_1_day()
|
|||
|
|
_, peak = tracemalloc.get_traced_memory()
|
|||
|
|
assert peak < 4 * 1024**3, f"Memory too high: {peak/1024**3:.1f}GB"
|
|||
|
|
|
|||
|
|
def test_sharded_imap_read_write():
|
|||
|
|
"""Feature store sharding: all 400 symbols writable and readable."""
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
for i, symbol in enumerate(all_400_symbols):
|
|||
|
|
map_name = get_shard_map_name(symbol)
|
|||
|
|
m = c.get_map(map_name).blocking()
|
|||
|
|
m.put(f"vel_div_{symbol}", -0.03)
|
|||
|
|
assert m.get(f"vel_div_{symbol}") == -0.03
|
|||
|
|
c.shutdown()
|
|||
|
|
|
|||
|
|
def test_400_asset_engine_no_crash():
|
|||
|
|
"""Engine processes 1 day with 400 assets without crash or OOM."""
|
|||
|
|
engine = build_400_asset_engine()
|
|||
|
|
result = engine.process_day('2026-01-15', df_400_assets, ...)
|
|||
|
|
assert result['trades'] > 0
|
|||
|
|
assert result['capital'] > 0
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PASS criteria: 52 + 3 = 55 tests green. 400-asset engine processes one day. Memory < 32GB (if available). Sharded IMap round-trip working.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## CI Test Suite — Cumulative Summary
|
|||
|
|
|
|||
|
|
| MIG Phase | New Tests | Cumulative Total | Key Assertion |
|
|||
|
|
|-----------|-----------|-----------------|---------------|
|
|||
|
|
| MIG0 (baseline) | 24 | 24 | CI gate green, infra healthy |
|
|||
|
|
| MIG1 (SITARA flows) | 8 | 32 | Capital persists, MC/ExF flows running |
|
|||
|
|
| MIG2 (HZ feature store) | 4 | 36 | Near Cache <1ms, live OB flowing |
|
|||
|
|
| MIG3 (survival stack) | 6 | 42 | Rm correct, postures fire, hysteresis holds |
|
|||
|
|
| MIG4 (Nautilus) | 4 | 46 | Actor initializes, HIBERNATE blocks orders |
|
|||
|
|
| MIG5 (LONG system) | 3 | 49 | LONG PF>1.05, arbiter weights sum=1 |
|
|||
|
|
| MIG6 (Jet reactive) | 3 | 52 | Jet topic live, Entry Processor atomic, adverse-turn fires |
|
|||
|
|
| MIG7 (scaling) | 3 | 55 | Memory <4GB/50-asset, shard read-write, 400-asset no crash |
|
|||
|
|
|
|||
|
|
Full CI gate at each phase boundary:
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh # original 24 always must pass
|
|||
|
|
pytest ci/ -v --ignore=ci/test_03_regression.py # fast suite
|
|||
|
|
pytest ci/test_03_regression.py # regression (slower, run before prod push only)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Regression Floors (Phase Gate Minima)
|
|||
|
|
|
|||
|
|
These floors apply at EVERY phase gate. If a phase change causes any floor to be breached, STOP and investigate before proceeding.
|
|||
|
|
|
|||
|
|
| Metric | Floor | Champion (current best) | Notes |
|
|||
|
|
|--------|-------|------------------------|-------|
|
|||
|
|
| PF (10-day VBT) | >= 1.08 | 1.123 | 55-day window: 1.123 |
|
|||
|
|
| WR (10-day VBT) | >= 42% | 49.3% | Champion WR |
|
|||
|
|
| ROI (10-day) | >= -5% | +44.89% (55d) | Any 10-day window >= -5% |
|
|||
|
|
| Trades (10-day) | >= 5 | ~380 (55d avg 7/day) | Not a dead system |
|
|||
|
|
| Max DD (55d) | < 20% | 14.95% | Don't exceed DD spec target |
|
|||
|
|
| Sharpe (55d) | > 1.5 | 2.50 | Don't regress below spec target |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Open Items (Research Queue, Not Blocking MIG1-3)
|
|||
|
|
|
|||
|
|
These are noted here so they don't fall through the cracks, but they MUST NOT block forward migration:
|
|||
|
|
|
|||
|
|
1. **TP sweep**: Apply 95bps to `test_pf_dynamic_beta_validate.py` ENGINE_KWARGS (still uses 0.0099). Low-risk, 10-min change. Do before next benchmark run.
|
|||
|
|
|
|||
|
|
2. **VOL gate EWMA**: 5-bar EWMA before p60 gate (smooths noisy vol_ok). Minor improvement, not a blocker.
|
|||
|
|
|
|||
|
|
3. **Sub-day ACB adverse-turn exits** (full implementation): Architecture documented in MEMORY.md Dynamic Exit Manager section. Prototype search in legacy standalone engine tests before building.
|
|||
|
|
|
|||
|
|
4. **Regime fragility sensing (Feb06-08 problem)**: HD Disentangled VAE on eigenvalue data + ExF conditioning. Long-term research. Does not block MIG1-4.
|
|||
|
|
|
|||
|
|
5. **MC-Forewarner live wiring verification**: Mechanical exit/reduce execution on RED/ORANGE (currently only affects sizing, not execution). Must verify real-money path before live trading.
|
|||
|
|
|
|||
|
|
6. **1m calibration sweep (b1ahez7tq)**: max_hold × abs_max_lev grid. When complete, update blue.yml if improvement found.
|
|||
|
|
|
|||
|
|
7. **EsoF multi-year backfill**: Needed for N>6 tail events. N=6 currently insufficient for production. Backfiller script exists but needs multi-year klines data.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Operational Runbook — Standing Procedure
|
|||
|
|
|
|||
|
|
### Daily check (takes 2 min)
|
|||
|
|
```bash
|
|||
|
|
# 1. Check Prefect UI for last run result
|
|||
|
|
open http://localhost:4200 # check DOLPHIN-PAPER-BLUE last run status
|
|||
|
|
|
|||
|
|
# 2. Check HZ for today's P&L
|
|||
|
|
python -c "
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
m = c.get_map('DOLPHIN_PNL_BLUE').blocking()
|
|||
|
|
keys = sorted(m.key_set())
|
|||
|
|
if keys:
|
|||
|
|
print(json.loads(m.get(keys[-1])))
|
|||
|
|
c.shutdown()
|
|||
|
|
"
|
|||
|
|
|
|||
|
|
# 3. Check survival stack posture
|
|||
|
|
python -c "
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
|
|||
|
|
print(json.loads(ref.get() or '{}'))
|
|||
|
|
c.shutdown()
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Before any push to prod/blue or prod/green
|
|||
|
|
```bash
|
|||
|
|
bash ci/run_ci.sh --fast # <60s, blocks push if fails (pre-push hook does this automatically)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Recovery from HIBERNATE posture
|
|||
|
|
```bash
|
|||
|
|
# 1. Diagnose: which Cat is failing?
|
|||
|
|
python -c "from survival_stack import SurvivalStack; print(SurvivalStack().diagnose())"
|
|||
|
|
|
|||
|
|
# 2. Fix the underlying issue (restart HZ if Cat1, wait for MC if Cat2, etc.)
|
|||
|
|
|
|||
|
|
# 3. Survival stack auto-recovers after 5 consecutive checks above threshold
|
|||
|
|
# Or manual override (EMERGENCY ONLY):
|
|||
|
|
python -c "
|
|||
|
|
import hazelcast, json
|
|||
|
|
c = hazelcast.HazelcastClient()
|
|||
|
|
ref = c.get_cp_subsystem().get_atomic_reference('DOLPHIN_SAFETY').blocking()
|
|||
|
|
ref.set(json.dumps({'posture': 'APEX', 'Rm': 1.0, 'override': True}))
|
|||
|
|
c.shutdown()
|
|||
|
|
print('Manual override set to APEX')
|
|||
|
|
"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Quick-Reference Phase Summary
|
|||
|
|
|
|||
|
|
| Phase | Deliverable | Duration est. | Functional system? |
|
|||
|
|
|-------|-------------|---------------|--------------------|
|
|||
|
|
| MIG0 | CI 24/24 green, infra verified | Done | YES (batch paper trading) |
|
|||
|
|
| MIG1 | State persistence + subsystem flows | 2-3 sessions | YES + capital compounds |
|
|||
|
|
| MIG2 | HZ feature store + live OB | 3-4 sessions | YES + real OB signal |
|
|||
|
|
| MIG3 | Survival stack + postures | 2-3 sessions | YES + graceful degradation |
|
|||
|
|
| MIG4 | Nautilus-Trader execution | 4-6 sessions | YES + Rust core |
|
|||
|
|
| MIG5 | LONG system (GREEN deployment) | 1-2 sessions | YES + bidirectional |
|
|||
|
|
| MIG6 | HZ Jet reactive ACB | 3-4 sessions | YES + sub-day ACB |
|
|||
|
|
| MIG7 | 400-asset scaling | 4-6 sessions | YES + full scale |
|
|||
|
|
|
|||
|
|
**The system is always functional.** Every phase boundary = working system + passing CI. No dark periods.
|