# PINK — BLUE Capital Handling: Complete Map Traced from `prod/nautilus_event_trader.py` (4405 lines). Every store, every write path, every restore priority, every consistency property. --- ## 1. Capital Stores ### 1.1 HZ `DOLPHIN_STATE_BLUE` — primary runtime authority | Key | Schema | Written by | Restore rank | |---|---|---|---| | `capital_update_ledger` | `[{"capital_before", "capital_after", "capital", "capital_delta", "ts", "reason", "source", "trade_id", "asset", "mode", ...}]` — JSON array, capped at 1000 entries | `_record_capital_ledger_event()` on trade close, retract, internal update, corrective replay | **65** (highest) | | `latest_nautilus` | Full engine snapshot dict incl `capital`, `open_positions`, `algo_version`, `posture`, timestamps, leverage envelope | `_commit_capital_state()` — trade close, retract, replay, internal update, and periodic `_save_capital()` every scan | 40 | | `engine_snapshot` | Same payload as `latest_nautilus`. ALSO written by `_push_state()` on EVERY scan (async put) | `_commit_capital_state()` + `_push_state()` per scan cycle | 30 | | `capital_checkpoint` | `{"capital": X, "ts": Y}` — scalar, legacy | `_commit_capital_state()` | **5** (requires `DOLPHIN_ALLOW_LEGACY_CAPITAL_CHECKPOINT=1`) | | `capital_correction_replay` | Full state payload | `_commit_capital_state()` with `update_replay_key=True` | 10 | ### 1.2 HZ `DOLPHIN_PNL_BLUE` Key pattern: `YYYY-MM-DD` → same full state payload as `latest_nautilus`. Written by `_commit_capital_state()` on every capital state change. Restore rank 25. ### 1.3 HZ `blue_control_plane` | Key | What | Written by | |---|---|---| | `blue_capital_update_latest` | Mirror of every `_commit_capital_state()` call (if `mirror_control_plane=True`, default) | `_commit_capital_state()` | | `blue_capital_update_ledger_latest` | Last ledger entry, as JSON | `_record_capital_ledger_event()` | | `blue_runtime_commands` | Queue for external `SET_CAPITAL` commands | External callers via `request_capital_update()` | ### 1.4 Disk `/tmp/` — 4 files, startup survival layer | File | Schema | Written by | Restore rank | |---|---|---|---| | `/tmp/dolphin_capital_update_ledger.json` | Same JSON array as HZ ledger | `_record_capital_ledger_event()` | **65** via `capital_update_ledger_local` — this is the FIRST restore source checked | | `/tmp/dolphin_latest_nautilus_replay.json` | Full state payload | `_commit_capital_state()` when `update_replay_key=True` | 20 | | `/tmp/dolphin_capital_checkpoint.json` | `{"capital": X, "ts": Y}` | `_commit_capital_state()` | **5** (legacy, env var gated) | | `/tmp/dolphin_capital_correction_replay.json` | Same file as replay (different PATH constant) | Same | 10 | ### 1.5 ClickHouse `dolphin.trade_events` Written via `ch_put()` on every trade close. Columns include `capital_before`, `capital_after`, `pnl`. Restore rank 5 — lowest. Must pass validation: `|capital_after - (capital_before + pnl)| <= max(1.0, expected * 0.002)`. ### 1.6 ClickHouse `dolphin.status_snapshots` Written by `ch_state_listener` (separate supervisord process, not the trader). The trader reads it on startup. Restore rank 50 — second highest source. --- ## 2. Restore Order (startup path) ``` run() └─ _restore_capital() └─ _restore_capital_from_state() ├─ 1. Read local /tmp/dolphin_capital_update_ledger.json │ → parsed_state["capital_update_ledger_local"] (rank 65) ├─ 2. Read HZ capital_update_ledger │ → parsed_state["capital_update_ledger"] (rank 65) ├─ 3. CH status_snapshots (rank 50) ├─ 4. HZ latest_nautilus (rank 40) ├─ 5. HZ engine_snapshot (rank 30) ├─ 6. HZ pnl_day (rank 25) ├─ 7. Read local /tmp/dolphin_latest_nautilus_replay.json │ → parsed_state["correction_replay_local"] (rank 20) ├─ 8. HZ capital_correction_replay (rank 10) ├─ 9. CH trade_events (rank 5) └─ _select_restore_candidate() │ │ SHORTCUT: if capital_update_ledger_local exists → return immediately │ (lines 1416-1420) │ └─ Sort candidates by (ts DESC, rank DESC) → pick top └─ _restore_capital_from_legacy_checkpoint() [ENV GATE] └─ HZ capital_checkpoint → disk /tmp/dolphin_capital_checkpoint.json ``` **Critical**: The local disk ledger (`capital_update_ledger_local`) has a **hardcoded shortcut** — if it exists, `_select_restore_candidate()` returns it immediately without considering any other source or its timestamp. This means a stale `/tmp/dolphin_capital_update_ledger.json` from a prior session **unconditionally** overrides HZ, CH, and everything else on restart. --- ## 3. Write Triggers (every path that touches capital) | Trigger | Code path | What gets written | |---|---|---| | **Trade close** | `_process_exit` → `_apply_trade_capital_update()` → `_commit_capital_state()` + `_record_capital_ledger_event()` | HZ: all 5 state keys + ledger + PNL map. Disk: ledger + checkpoint. CH: trade_events + position_state + trade_reconstruction + execution_quality. Control plane mirror. | | **Retract** (V7, ASL, SC haircut) | `_process_exit` → same as trade close | Same as trade close (minus CH trade_events) | | **Every scan** | `_push_state()` → `_save_capital()` → `_commit_capital_state()` | HZ: latest_nautilus + engine_snapshot + capital_checkpoint + PNL map. Disk: checkpoint. **No ledger write.** | | **Startup seed push** | `run()` → `_push_state()` once after restore | Same as scan path | | **Internal capital update** (control plane `SET_CAPITAL`) | `_apply_internal_capital_update()` → `_commit_capital_state()` + `_record_capital_ledger_event()` | Full write + replay key + ledger entry | | **Corrective replay** | `_publish_corrective_replay()` → `_commit_capital_state()` | Full write with `update_replay_key=True` | --- ## 4. `_commit_capital_state()` — the central write fan-out Called by: `_apply_trade_capital_update()`, `_apply_internal_capital_update()`, `_save_capital()`, `_publish_corrective_replay()`. ```python _commit_capital_state(capital, reason, source, trade_id, asset, replay_blob, update_replay_key, mirror_control_plane): payload = _capital_state_payload(...) # {"capital", "ts", "updated_at", "reason", ...} # Write 6 HZ keys state_map.put("capital_checkpoint", checkpoint_payload) # {"capital", "ts"} state_map.put("latest_nautilus", state_payload) state_map.put("engine_snapshot", state_payload) state_map.put("pnl_day:YYYY-MM-DD", state_payload) # via pnl_map if update_replay_key: state_map.put("capital_correction_replay", state_payload) disk: /tmp/dolphin_latest_nautilus_replay.json # Write 1 disk file disk: /tmp/dolphin_capital_checkpoint.json # Mirror to control plane if mirror_control_plane: control_map.put("blue_capital_update_latest", state_payload) # Set in-memory self.eng.capital = capital ``` --- ## 5. Capital resolution for trade PnL application `_apply_trade_capital_update()` does a three-source merge before applying a PnL delta: ```python _resolved_capital_state_value(fallback=self.eng.capital): # Same logic as restore but simpler — reads local first # Returns (capital, source_label, timestamp) # Sources checked: local corrective replay, HZ ledger, HZ latest_nautilus, # HZ engine_snapshot, HZ pnl_day, disk capital_checkpoint, local disk ledger # Sort by (ts DESC, rank DESC) → pick top ``` This means even during live trading, the capital used as the base for the next PnL application is resolved from the same multi-source hierarchy, not just the in-memory value. --- ## 6. Consistency Properties | Property | Detail | |---|---| | **Dual-write HZ then disk** | `_commit_capital_state()` writes HZ keys first, then disk. If HZ succeeds but disk fails (ENOSPC), restart gets HZ value via rank 40. If HZ is down, local disk ledger at rank 65 becomes the sole source. | | **Scan-cycle overwrite** | `_push_state()` calls `_save_capital()` every ~10 seconds, writing `self.eng.capital` to HZ. Manually fixing HZ while the trader runs is futile — the next scan writes the trader's in-memory value back. Restart is required. | | **No CH on _commit_capital_state** | ClickHouse only gets capital data via the explicit `ch_put("trade_events", ...)` call at trade close time, not from the capital state commit path. | | **CH status_snapshots are external** | Written by `ch_state_listener` (a separate supervisord process), not the trader. The trader reads them on startup as a restore candidate but never writes them. | | **Ledger is append-only, capped at 1000** | `_record_capital_ledger_event()` truncates to `ledger[-1000:]`. Old entries are silently dropped. If someone needs to reconstruct capital from 3 months ago, they'd need CH trade_events replay. | | **Local disk ledger is the single source of truth on restart** | The hardcoded shortcut in `_select_restore_candidate()` (lines 1416-1420) returns `capital_update_ledger_local` unconditionally. Fixing `/tmp/dolphin_capital_update_ledger.json` is **mandatory** for a correct restart. | --- ## 7. Operational Hazards 1. **Stale local ledger beats HZ**: The file at `/tmp/dolphin_capital_update_ledger.json` has unconditional priority on restart. If you fix HZ but not this file, the trader restores the stale value anyway. This is exactly what happened in the 2026-05-27 BNB spurious trade recovery. 2. **ENOSPC silent truncation**: If `/tmp/dolphin_capital_update_ledger.json` is on a full SMB mount, the `write_text()` call can produce a 0-byte file. On restart, `json.loads("")` returns `None`, the local ledger candidate is rejected, and the next-best source is used. But if the file is truncated mid-write to a *partial* JSON array, `json.loads()` will raise and the file won't be retried — next source wins. 3. **Multiple competing restore sources**: With 4 HZ keys, 4 disk files, and 2 CH tables all carrying capital data, a mismatch between any two can cause silent capital corruption on restart. There is no consistency check across sources — the sort-based `_select_restore_candidate()` just picks the one with the highest (timestamp, rank) tuple. 4. **HZ write vs async put**: `engine_snapshot` is written by `_push_state()` via an **async** `future = state_map.put(...)`. The subsequent `_save_capital()` is sync but only writes to `latest_nautilus` + `capital_checkpoint` + PNL map, NOT to `engine_snapshot`. So if the async put fails silently, the engine_snapshot in HZ is stale and will be used as a restore candidate (rank 30) on next restart. 5. **No ledger entry on periodic save**: `_save_capital()` (called every scan) writes to all HZ state keys but does NOT append to the ledger. This means the periodically-saved capital values are invisible to the ledger-based restore path — they only appear in `latest_nautilus`, `engine_snapshot`, and `pnl_day`, which have lower restore ranks. --- ## 8. Summary Diagram ``` TRADER (in-memory self.eng.capital) │ │ ┌──────────────┼──────────────────────────────┐ │ │ │ ▼ ▼ ▼ TRADE CLOSE SCAN (every ~10s) CONTROL PLANE (retract too) │ (external cmd) │ │ │ ▼ ▼ ▼ _apply_trade_ _push_state() _apply_internal_ capital_update() │ capital_update() │ │ │ └───────┬───────┘ │ │ │ ▼ ▼ ┌─────────────────────────────┐ ┌──────────────────────┐ │ _commit_capital_state() │ │ _commit_capital_ │ │ + │ │ state() │ │ _record_capital_ledger_ │ │ + │ │ event() │ │ _record_capital_ │ └──────────┬──────────────────┘ │ ledger_event() │ │ └──────────┬───────────┘ │ │ └─────────────────┬────────────────┘ │ ┌──────────────┼──────────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌─────────────┐ ┌──────────────┐ │ HZ STATE │ │ DISK /tmp/ │ │ CH (close │ │ (6 keys) │ │ (4 files) │ │ only) │ └──────────┘ └─────────────┘ └──────────────┘ RESTART: disk ledger (rank 65) ─── immediate win CH status_snapshots (50) HZ latest_nautilus (40) HZ engine_snapshot (30) HZ pnl_day (25) disk corrective replay (20) HZ corrective replay (10) CH trade_events (5) legacy checkpoint (5, gated) ```