Files
siloqy/pink/CAPITAL_HANDLING_NOTES.md
Codex d4b73b236a PINK DITAv2 Sprint 2-3: accounting parity + multi-leg groundwork
Sprint 2 (accounting + observability parity, PINK scope):
- Verified pink_clickhouse.py writes the 8 BLUE-legacy row families at
  matching schema and that capital authority in pink_direct.step() is
  solely kernel.account (no balance-poll overwrite in the hot loop).
- Report: prod/clean_arch/dita_v2/SPRINT2_ACCOUNTING_PARITY.md.

Sprint 3 offline groundwork (no exchange contact):
- Add _write_trade_exit_leg to pink_clickhouse.py: one BLUE-schema-faithful
  trade_exit_legs row per exit leg, with isolated (non-cumulative) per-leg
  deltas tracked via _leg_state (reset on ENTER). Closes the docstring gap.
- New offline suite test_pink_multi_exit_groundwork.py (3 passed):
  * Flaw 4 — two-leg exit closes once, realized accrues per leg, closed
    slot rejects further EXIT (no double-close).
  * Overshoot invariant — a final EXIT requesting more than the remaining
    size CLAMPS (size to 0, no oversell), retiring the Sprint 0 cumulative-
    ratio risk empirically.
  * trade_exit_legs delta + full BLUE column-set assertions.
- Persistence regression after edits: 10 passed.

BLUE untouched: no changes to dolphin.* / DOLPHIN_*_BLUE / nautilus_event_trader.py.
Live VST multi-leg run remains deferred pending explicit authorization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:21:45 +02:00

14 KiB

PINK — BLUE Capital Handling: Complete Map

Traced from prod/nautilus_event_trader.py (4405 lines). Every store, every write path, every restore priority, every consistency property.


1. Capital Stores

1.1 HZ DOLPHIN_STATE_BLUE — primary runtime authority

Key Schema Written by Restore rank
capital_update_ledger [{"capital_before", "capital_after", "capital", "capital_delta", "ts", "reason", "source", "trade_id", "asset", "mode", ...}] — JSON array, capped at 1000 entries _record_capital_ledger_event() on trade close, retract, internal update, corrective replay 65 (highest)
latest_nautilus Full engine snapshot dict incl capital, open_positions, algo_version, posture, timestamps, leverage envelope _commit_capital_state() — trade close, retract, replay, internal update, and periodic _save_capital() every scan 40
engine_snapshot Same payload as latest_nautilus. ALSO written by _push_state() on EVERY scan (async put) _commit_capital_state() + _push_state() per scan cycle 30
capital_checkpoint {"capital": X, "ts": Y} — scalar, legacy _commit_capital_state() 5 (requires DOLPHIN_ALLOW_LEGACY_CAPITAL_CHECKPOINT=1)
capital_correction_replay Full state payload _commit_capital_state() with update_replay_key=True 10

1.2 HZ DOLPHIN_PNL_BLUE

Key pattern: YYYY-MM-DD → same full state payload as latest_nautilus.

Written by _commit_capital_state() on every capital state change. Restore rank 25.

1.3 HZ blue_control_plane

Key What Written by
blue_capital_update_latest Mirror of every _commit_capital_state() call (if mirror_control_plane=True, default) _commit_capital_state()
blue_capital_update_ledger_latest Last ledger entry, as JSON _record_capital_ledger_event()
blue_runtime_commands Queue for external SET_CAPITAL commands External callers via request_capital_update()

1.4 Disk /tmp/ — 4 files, startup survival layer

File Schema Written by Restore rank
/tmp/dolphin_capital_update_ledger.json Same JSON array as HZ ledger _record_capital_ledger_event() 65 via capital_update_ledger_local — this is the FIRST restore source checked
/tmp/dolphin_latest_nautilus_replay.json Full state payload _commit_capital_state() when update_replay_key=True 20
/tmp/dolphin_capital_checkpoint.json {"capital": X, "ts": Y} _commit_capital_state() 5 (legacy, env var gated)
/tmp/dolphin_capital_correction_replay.json Same file as replay (different PATH constant) Same 10

1.5 ClickHouse dolphin.trade_events

Written via ch_put() on every trade close. Columns include capital_before, capital_after, pnl.

Restore rank 5 — lowest. Must pass validation: |capital_after - (capital_before + pnl)| <= max(1.0, expected * 0.002).

1.6 ClickHouse dolphin.status_snapshots

Written by ch_state_listener (separate supervisord process, not the trader). The trader reads it on startup.

Restore rank 50 — second highest source.


2. Restore Order (startup path)

run()
  └─ _restore_capital()
       └─ _restore_capital_from_state()
            ├─ 1. Read local /tmp/dolphin_capital_update_ledger.json
            │     → parsed_state["capital_update_ledger_local"] (rank 65)
            ├─ 2. Read HZ capital_update_ledger
            │     → parsed_state["capital_update_ledger"] (rank 65)
            ├─ 3. CH status_snapshots (rank 50)
            ├─ 4. HZ latest_nautilus (rank 40)
            ├─ 5. HZ engine_snapshot (rank 30)
            ├─ 6. HZ pnl_day (rank 25)
            ├─ 7. Read local /tmp/dolphin_latest_nautilus_replay.json
            │     → parsed_state["correction_replay_local"] (rank 20)
            ├─ 8. HZ capital_correction_replay (rank 10)
            ├─ 9. CH trade_events (rank 5)
            └─ _select_restore_candidate()
                 │
                 │ SHORTCUT: if capital_update_ledger_local exists → return immediately
                 │ (lines 1416-1420)
                 │
                 └─ Sort candidates by (ts DESC, rank DESC) → pick top
       └─ _restore_capital_from_legacy_checkpoint() [ENV GATE]
            └─ HZ capital_checkpoint → disk /tmp/dolphin_capital_checkpoint.json

Critical: The local disk ledger (capital_update_ledger_local) has a hardcoded shortcut — if it exists, _select_restore_candidate() returns it immediately without considering any other source or its timestamp. This means a stale /tmp/dolphin_capital_update_ledger.json from a prior session unconditionally overrides HZ, CH, and everything else on restart.


3. Write Triggers (every path that touches capital)

Trigger Code path What gets written
Trade close _process_exit_apply_trade_capital_update()_commit_capital_state() + _record_capital_ledger_event() HZ: all 5 state keys + ledger + PNL map. Disk: ledger + checkpoint. CH: trade_events + position_state + trade_reconstruction + execution_quality. Control plane mirror.
Retract (V7, ASL, SC haircut) _process_exit → same as trade close Same as trade close (minus CH trade_events)
Every scan _push_state()_save_capital()_commit_capital_state() HZ: latest_nautilus + engine_snapshot + capital_checkpoint + PNL map. Disk: checkpoint. No ledger write.
Startup seed push run()_push_state() once after restore Same as scan path
Internal capital update (control plane SET_CAPITAL) _apply_internal_capital_update()_commit_capital_state() + _record_capital_ledger_event() Full write + replay key + ledger entry
Corrective replay _publish_corrective_replay()_commit_capital_state() Full write with update_replay_key=True

4. _commit_capital_state() — the central write fan-out

Called by: _apply_trade_capital_update(), _apply_internal_capital_update(), _save_capital(), _publish_corrective_replay().

_commit_capital_state(capital, reason, source, trade_id, asset, replay_blob,
                       update_replay_key, mirror_control_plane):
    payload = _capital_state_payload(...)  # {"capital", "ts", "updated_at", "reason", ...}

    # Write 6 HZ keys
    state_map.put("capital_checkpoint",     checkpoint_payload)  # {"capital", "ts"}
    state_map.put("latest_nautilus",         state_payload)
    state_map.put("engine_snapshot",         state_payload)
    state_map.put("pnl_day:YYYY-MM-DD",      state_payload)      # via pnl_map
    if update_replay_key:
        state_map.put("capital_correction_replay", state_payload)
        disk: /tmp/dolphin_latest_nautilus_replay.json

    # Write 1 disk file
    disk: /tmp/dolphin_capital_checkpoint.json

    # Mirror to control plane
    if mirror_control_plane:
        control_map.put("blue_capital_update_latest", state_payload)

    # Set in-memory
    self.eng.capital = capital

5. Capital resolution for trade PnL application

_apply_trade_capital_update() does a three-source merge before applying a PnL delta:

_resolved_capital_state_value(fallback=self.eng.capital):
    # Same logic as restore but simpler — reads local first
    # Returns (capital, source_label, timestamp)
    # Sources checked: local corrective replay, HZ ledger, HZ latest_nautilus,
    # HZ engine_snapshot, HZ pnl_day, disk capital_checkpoint, local disk ledger

    # Sort by (ts DESC, rank DESC) → pick top

This means even during live trading, the capital used as the base for the next PnL application is resolved from the same multi-source hierarchy, not just the in-memory value.


6. Consistency Properties

Property Detail
Dual-write HZ then disk _commit_capital_state() writes HZ keys first, then disk. If HZ succeeds but disk fails (ENOSPC), restart gets HZ value via rank 40. If HZ is down, local disk ledger at rank 65 becomes the sole source.
Scan-cycle overwrite _push_state() calls _save_capital() every ~10 seconds, writing self.eng.capital to HZ. Manually fixing HZ while the trader runs is futile — the next scan writes the trader's in-memory value back. Restart is required.
No CH on _commit_capital_state ClickHouse only gets capital data via the explicit ch_put("trade_events", ...) call at trade close time, not from the capital state commit path.
CH status_snapshots are external Written by ch_state_listener (a separate supervisord process), not the trader. The trader reads them on startup as a restore candidate but never writes them.
Ledger is append-only, capped at 1000 _record_capital_ledger_event() truncates to ledger[-1000:]. Old entries are silently dropped. If someone needs to reconstruct capital from 3 months ago, they'd need CH trade_events replay.
Local disk ledger is the single source of truth on restart The hardcoded shortcut in _select_restore_candidate() (lines 1416-1420) returns capital_update_ledger_local unconditionally. Fixing /tmp/dolphin_capital_update_ledger.json is mandatory for a correct restart.

7. Operational Hazards

  1. Stale local ledger beats HZ: The file at /tmp/dolphin_capital_update_ledger.json has unconditional priority on restart. If you fix HZ but not this file, the trader restores the stale value anyway. This is exactly what happened in the 2026-05-27 BNB spurious trade recovery.

  2. ENOSPC silent truncation: If /tmp/dolphin_capital_update_ledger.json is on a full SMB mount, the write_text() call can produce a 0-byte file. On restart, json.loads("") returns None, the local ledger candidate is rejected, and the next-best source is used. But if the file is truncated mid-write to a partial JSON array, json.loads() will raise and the file won't be retried — next source wins.

  3. Multiple competing restore sources: With 4 HZ keys, 4 disk files, and 2 CH tables all carrying capital data, a mismatch between any two can cause silent capital corruption on restart. There is no consistency check across sources — the sort-based _select_restore_candidate() just picks the one with the highest (timestamp, rank) tuple.

  4. HZ write vs async put: engine_snapshot is written by _push_state() via an async future = state_map.put(...). The subsequent _save_capital() is sync but only writes to latest_nautilus + capital_checkpoint + PNL map, NOT to engine_snapshot. So if the async put fails silently, the engine_snapshot in HZ is stale and will be used as a restore candidate (rank 30) on next restart.

  5. No ledger entry on periodic save: _save_capital() (called every scan) writes to all HZ state keys but does NOT append to the ledger. This means the periodically-saved capital values are invisible to the ledger-based restore path — they only appear in latest_nautilus, engine_snapshot, and pnl_day, which have lower restore ranks.


8. Summary Diagram

                   TRADER (in-memory self.eng.capital)
                   │
                   │
    ┌──────────────┼──────────────────────────────┐
    │              │                              │
    ▼              ▼                              ▼
  TRADE CLOSE    SCAN (every ~10s)         CONTROL PLANE
  (retract too)   │                         (external cmd)
  │               │                              │
  ▼               ▼                              ▼
  _apply_trade_   _push_state()              _apply_internal_
  capital_update() │                          capital_update()
  │               │                              │
  └───────┬───────┘                              │
          │                                      │
          ▼                                      ▼
  ┌─────────────────────────────┐    ┌──────────────────────┐
  │  _commit_capital_state()    │    │  _commit_capital_    │
  │  +                          │    │  state()             │
  │  _record_capital_ledger_    │    │  +                    │
  │  event()                    │    │  _record_capital_     │
  └──────────┬──────────────────┘    │  ledger_event()       │
             │                       └──────────┬───────────┘
             │                                  │
             └─────────────────┬────────────────┘
                               │
                ┌──────────────┼──────────────────┐
                ▼              ▼                  ▼
          ┌──────────┐  ┌─────────────┐   ┌──────────────┐
          │ HZ STATE │  │ DISK /tmp/  │   │ CH (close     │
          │ (6 keys) │  │ (4 files)   │   │ only)         │
          └──────────┘  └─────────────┘   └──────────────┘

         RESTART:
         disk ledger (rank 65) ─── immediate win
         CH status_snapshots (50)
         HZ latest_nautilus (40)
         HZ engine_snapshot (30)
         HZ pnl_day (25)
         disk corrective replay (20)
         HZ corrective replay (10)
         CH trade_events (5)
         legacy checkpoint (5, gated)