Files
siloqy/prod/docs/PINK_ACCOUNTING_EXEC_FIX.md
Codex c3a18f693a docs: VIBRISS spec (+ §10.6 cascade/adaptive-TP paramsets), PINK accounting fix spec, BLUE incident docs
VIBRISS_PARAMETER_GOVERNANCE_SPEC §10.6: ob_cascade.count_threshold
(currently cascade_count>0 = ONE asset widens every TP x1.40),
tp_widen_factor, withdrawal_velocity_threshold as governance candidates;
adaptive/Dynamic-TP threshold marked fit for VIBRISS governance; TP_FLOOR
joint-policy reward requirement.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 15:04:15 +02:00

20 KiB
Raw Blame History

PINK / DITAv2 Accounting & Execution Fix — Spec and Dev Guide

Status: SPEC — ready for implementation agent Date: 2026-06-11 Branch: exp/pink-ditav2-sprint0-20260530 (continue on it or fork fix/pink-accounting-consolidation) Author of spec: forensic session 2026-06-11 (FET $5,990.90 mis-book replay) Prerequisite for: VIOLET rebuild (violet_subsecond_rebuild_plan memory / future plan session)


0. Why this exists — the incident in one paragraph

On 2026-06-11 PINK closed a FET-USDT short that the exchange settled at ≈ +$164 net (entry VWAP 0.1878, exit 0.1866, ~202K FET) but the kernel booked $5,990.90 and capital diverged $6,154 from the exchange wallet. Replay against dolphin_pink.trade_reconstruction slot images identified three stacked defects, all in derivation code (none in exchange facts): (1) fill events carried BingX's MARKET protective bound price (0.229, +22% off tape) instead of the true fill price; (2) realized_pnl() and mark_price() multiplied PnL by slot.leverage (exchange leverage — but slot.size is exchange quantity, so every leg was 3× inflated); (3) the Python settle baseline _last_settled_pnl resets empty on every restart, so reconcile-adopted slots re-settle carried PnL. Exact replay of leg 1: 26,007 × (0.2290.1878)/0.1878 × 0.1878 × 3 = 3,214.4652 ✓ matches the booked increment to the cent.

A fourth structural finding: there are three parallel ledgers (Rust AccountState K/E, Python AccountProjection — the one persistence reads, fee-blind — and AccountProjectionV2, dead in the live path). This spec consolidates to E-facts as ledger of record + K as integrity checksum + one atomic published snapshot.


1. Scope and non-goals

IN SCOPE

  1. Commit + activate the Phase-0 fixes already in the working tree.
  2. E-anchored published capital; single atomic account snapshot.
  3. Per-trade PnL provenance (exchange | kernel_estimate) end-to-end.
  4. Sizer feedback off trade-realized PnL (not capital deltas).
  5. Persistence hygiene: duplicate row emission, silent async-insert loss, event_seq stamping, bars_held clamp, naive-UTC timestamps.
  6. Kernel hardening leftovers: resolve_slot no-match sentinel, FILL_SETTLED realized override of flagged estimate legs.

OUT OF SCOPE (separate tickets)

  • BLUE's exit-path masking bug (LINK $1,248, TODO_TP_SCAN_CADENCE_BUGFIX.md) — BLUE stack, not DITAv2.
  • VIOLET fork, sub-second clock, venue price-feed port, cadence quantizer.
  • ch_writer head-of-line poison-row parking redesign (mitigations land here; the full parking-lane design is its own task).
  • prefect.db / ClickHouse TTL disk remediation.

HARD INVARIANTS — MUST NOT CHANGE

  • Dual leverage: slot.size = exchange quantity; slot.leverage = exchange leverage (13x cap, set at BingX API); our-leverage (conviction) = size × entry_price / capital, computed only at pink_direct._hz_publish (line ~911). PnL is therefore leverage-free: qty × Δprice, side-signed. Do not touch the conviction→exchange mapping (round_half_even_linear_0.5_to_9.0_to_1_to_exchange_cap) or target_size computation.
  • Exits are never skipped (exec-router invariant set, §16 kernel ref).
  • BLUE-parity policy contract: DecisionEngine/IntentEngine inputs (MarketSnapshot + capital + slot state) unchanged in shape.
  • Namespace isolation: zero writes to dolphin.* / dolphin_prodgreen.* or BLUE/PRODGREEN HZ maps. Re-verify with pink_ctl.py mode-verify.
  • Data cadences are sacred (operator rule 2026-06-10): never reduce a data cadence for throughput.

2. Phase 0 — Commit and activate the already-applied fixes

These changes exist UNCOMMITTED in the working tree as of 2026-06-11 ~16:30. Verify each hunk, commit as one reviewed unit, then restart dolphin_pink.

0.1 prod/clean_arch/dita_v2/_rust_kernel/src/lib.rs

Function Change (already applied)
KernelCore::realized_pnl (~line 1153) PnL = side-signed qty × (exit entry); no leverage factor; returns 0 when entry<=0 exit_size<=0 exit_price<=0 !finite
TradeSlot::mark_price (~line 394) no × leverage in unrealized; a mark NEVER becomes entry basis — missing basis flags metadata.entry_basis_missing=true, unrealized stays 0
KernelCore::fill_matches_order (new) identity match on venue_order_id / venue_client_id
KernelCore::apply_fill entry/exit routing by ORDER IDENTITY first, FSM state second (!id_matches_exit / !id_matches_entry guards); entry basis = VWAP across entry fills ((prev_basis×prev_filled + price×fill)/accumulated); price-less exit fill reduces size, books 0 PnL, flags metadata.realized_skipped_no_price=true

Rebuild required: cargo build --release in _rust_kernel/ (the .so is only auto-built when missing — source/binary drift is a known hazard; add the build to the commit checklist). cargo test: 32/32 green as of spec.

0.2 prod/clean_arch/dita_v2/bingx_venue.py

Fill events must carry a TRUE fill price or 0.0 — never the order's nominal price / submit receipt.price (BingX MARKET bound price, ±2025%):

  • _events_from_submit fill event (~line 585): _row_float(ack_row, "avgPrice","ap","lastFillPrice","L", default=0.0)
  • _event_from_row (~line 697): fills use the same true-price chain; non-fill events (ACK/CANCEL/REJECT) may keep nominal price as info
  • _fill_event_from_row (~line 736): "lastFillPrice","L","avgPrice","ap"

0.3 prod/clean_arch/dita_v2/rust_backend.py

  • reconcile_from_slots: seeds _last_settled_pnl[slot_id] = slot.realized_pnl and _slot_was_closed[slot_id] = slot.closed for every adopted slot.
  • restore_state: same re-anchoring after successful restore.

0.4 Adjacent fixes riding the same commit

  • prod/ch_writer.py: insert URLs append &date_time_input_format=best_effort; flush errors log at WARNING (first 10 + every 100th), counter _flush_errors.
  • prod/clean_arch/dita_v2/blue_parity.py price_of: hyphen-tolerant fallback (FET-USDTFETUSDT) — fixes the unmanaged-position block.
  • prod/clickhouse/users.xml: date_time_input_format=best_effort for the dolphin user (NOTE: running CH container did not honor it even after restart — the container does not mount compose configs; effective on next compose recreation. The client-side URL param is the operative fix.)
  • prod/tests/test_dita_v2_kernel.py: partial→full fill test updated to incremental filled_size semantics (BingX WS lastFilledQty).

0.5 Phase 0 gates

  1. cargo test in _rust_kernel: 32/32.
  2. pytest prod/tests/test_dita_v2_kernel.py: 7/7.
  3. pytest prod/clean_arch/dita_v2/test_exec_router_runtime.py test_venue_reconcile.py test_orphan_prevention.py prod/tests/test_pink_async_fill_pump.py prod/clean_arch/dita_v2/test_account_core_v2.py test_bingx_bugs.py: 134/134.
  4. KNOWN pre-existing failures (NOT introduced by this work — verified by hunk-revert): 4 tests in prod/tests/test_dita_v2_bingx_adapter.py (snapshot-fill emission broke when sync submit() started passing None snapshots on 2026-06-10). Fix or quarantine them explicitly in this phase — do not let them mask new regressions.
  5. Restart dolphin_pink at a FLAT moment; verify in logs: no realized_skipped_no_price storms, no entry_basis_missing on fresh entries, first round-trip books PnL within ±(fees+slippage) of GET /openApi/swap/v2/user/income for the same trade.

3. Phase 1 — E-anchored published capital

Goal: the capital that persistence/HZ/sizer see is exchange-anchored; K never publishes.

3.1 prod/clean_arch/dita_v2/account.py

  • Add to AccountSnapshot: capital_source: str ("e_anchored" | "k_bridged" | "seed"), e_wallet_balance: float, event_seq: int.
  • New method AccountProjection.anchor_to_exchange(wallet_balance: float, available_margin: float, event_seq: int): sets capital = wallet_balance (guard >0 and finite — the zero-wb frame lesson), capital_source = "e_anchored", recomputes equity. settle() remains for the BRIDGE case only: between anchors, capital += realized (capital_source="k_bridged").
  • settle(realized_pnl, fees): stop ignoring feescapital += realized_pnl fees (today fees only accumulate in fees_paid; published capital ignores them between reseeds).

3.2 prod/clean_arch/runtime/pink_direct.py

  • The existing reseed path (balance-bearing ACCOUNT_UPDATE → kernel.reset_and_seed(wb)) additionally calls kernel.account.anchor_to_exchange(...) — one anchoring action, two ledgers consistent.
  • Boot seed (launcher exchange_balance_capital block, pink_direct ~line 262) goes through anchor_to_exchange instead of direct attribute writes.

3.3 Gates

  • New unit tests (prod/tests/test_pink_account_anchor.py): anchor sets capital/source; zero/negative/NaN wb rejected; settle bridges with fees; anchor after bridge snaps to wb exactly.
  • Shadow check (live, 24 h on VST): published capital vs GET /openApi/swap/v2/user/balance polled 1/min — max |Δ| outside a trade-settlement window ≤ $0.01; during settlement ≤ pending-fee bound.

4. Phase 2 — Single atomic snapshot, ledger consolidation

Goal: one immutable, versioned account snapshot; the two redundant ledgers demoted/removed.

4.1 prod/clean_arch/dita_v2/account.py

  • Make the published snapshot immutable-replace: AccountProjection builds a new frozen AccountSnapshot (carry event_seq) on every mutation and swaps a single reference (GIL-atomic). Readers must take snap = kernel.account.snapshot once per use (audit call sites: pink_clickhouse.py, hazelcast_projection.py HZ writer, pink_direct).
  • AccountProjectionV2: DELETE, or move to prod/clean_arch/dita_v2/ _attic/ with a module docstring pointing here. Its only live-path import is exchange_event.py — migrate that import or the dataclasses it uses (EPosition is genuinely useful; keep it in account.py).
  • The Rust AccountState K-ledger STAYS — demoted by documentation and by Phase 1 (it no longer feeds published capital): its jobs are reconcile classification (R1-style), capital_frozen, and E-dark bridging. Update the module docstring to say exactly this.

4.2 prod/clean_arch/persistence/pink_clickhouse.py

  • Read capital/equity/peak/trade_seq from the single snapshot reference; no recomputation.
  • Add columns to emitted rows (and the matching ALTER TABLE DDLs under prod/clickhouse/pink/08_provenance.sqlapply DDLs to CH BEFORE deploying code that emits them; the missing-table head-of-line jam of 2026-06-11 is the cautionary tale):
    • account_events, status_snapshots: capital_source LowCardinality(String) DEFAULT '', account_event_seq UInt64 DEFAULT 0
    • trade_events, trade_exit_legs: pnl_source LowCardinality(String) DEFAULT '' (exchange | kernel_estimate)
  • bars_held: clamp to max(0, …) at row-build time (UInt16 column; negative values currently 400 on trade_events / silently vanish on async tables).
  • Timestamps: route every ts through one helper emitting naive-UTC microsecond ISO (no +00:00) — best_effort already tolerates both, but rows must stop depending on a parser setting.

4.3 Duplicate-emission fix (same file)

Every CH row is currently emitted twice (visible in any query). Hunt the double call: instrument _sink() with a per-(table, content-hash) debug counter in a test, then trace the two call paths (suspect: persist_result invoked both from the runtime step and from the fill pump for the same event). Fix at the caller level; do NOT dedupe by content in the sink (masks real double-events). Regression test: one simulated round trip → exactly one row per logical event per table.

4.4 prod/ch_writer.py

  • wait_for_async_insert: "1" for ALL dolphin_pink tables (accounting rows must never be silently lost; the spool absorbs latency). Keep 0 acceptable only for high-volume shadow tables if measured necessary — document any exception inline.
  • Mitigation for head-of-line (full redesign out of scope): after attempts > 1000 on a row, log ERROR with the CH response body once per 100 attempts (today the reject reason is invisible without manual replay).

4.5 Gates

  • Full offline suite (the 533+ DITAv2/PINK set) green, minus the Phase-0 quarantined adapter tests if still open.
  • One live VST round trip: every table gets exactly one row per event; pnl_source/capital_source populated; CH system.text_log shows zero parse rejections for dolphin_pink.

5. Phase 3 — Sizer feedback off trade-realized PnL

THE one seam where this refactor can silently change alpha behavior.

5.1 prod/clean_arch/runtime/pink_direct.py_sizer_trade_feedback (~line 1453)

Today: pnl = acc.capital self._sizer_entry_capital (capital delta). Under E-anchored capital this absorbs funding, fees of other activity, and foreign fills from the shared VST account (PRODGREEN collision class). Change to:

pnl = slot_realized_for_trade(trade_id)   # Σ slot.realized_pnl legs, i.e.
                                          # kernel estimate, overridden by
                                          # exchange rp when settled (5.2)

Source: the closing slot dict already carries realized_pnl; use it (minus the fees recorded for the trade when available) instead of the capital delta. Keep the magnitude semantics the sizer expects (sign + rough size — per the existing comment, bucket/streak multipliers only need that).

5.2 Exchange override (E-led repair) — bingx_user_stream.py + rust_backend.py

  • The WS FILL_SETTLED path already carries the exchange's realized (rp) and fee (n, sign-flipped at boundary per BingX quirks memory). Extend the kernel account-event payload with trade_id, and on receipt:
    • if the matching slot leg was flagged realized_skipped_no_price, ADD the exchange realized to slot.realized_pnl (repair) and clear the flag; settle the increment through the normal baseline mechanism;
    • else record pnl_source="exchange" for the trade-event row (the estimate stays as the booked figure unless |estimaterp| exceeds a tolerance — then log ERROR + emit an anomaly_events row; do NOT silently re-book).
  • Rust: add dita_kernel_repair_realized(slot_id, amount) FFI (or fold the repair into on_account_event with slot_id in payload). Keep it idempotent via the existing account-event dedup.

5.3 Gates

  • Unit: feedback receives trade-realized, not capital delta (simulate a foreign-fill capital jump mid-trade → feedback unaffected).
  • Unit: price-less exit leg + later FILL_SETTLED repair → slot realized equals exchange rp; settle baseline consistent (no double-settle).
  • Parity: test_blue_parity.py, test_alpha_blue_untouched_g7.py green (sizer behavior unchanged for normal fills).

6. Phase 4 — Kernel hardening leftovers

6.1 lib.rsresolve_slot (~line 1099)

Falls back to slot 0 when nothing matches. Change: return Option<usize>; on None, on_venue_event returns UNRESOLVED_SLOT (diagnostic exists already) without mutating any slot, severity WARNING, event recorded in outcome details. Python callers: the runtime treats UNRESOLVED_SLOT as a logged no-op (the _fill_is_ours filter remains first-line defense; this is kernel-side defense for venue-agnostic reuse). NOTE: several tests construct events with slot_id=-1 expecting slot-0 fallback — update them to pass explicit slot_id=0 (behavioral test change; list each in the PR description).

6.2 ID-less fill routing (documentation + metric, not code)

BingX WS omits clientOrderId, so identity routing can't always engage. Add a counter metric (fills_routed_by_state_total) via an anomaly_events row per occurrence, severity INFO — gives VIOLET the data to justify per-venue synthetic ids later. No FSM behavior change.

6.3 Gates

  • New Rust tests: unresolved event mutates nothing; entry-id fill during EXIT_WORKING routes to entry (already covered by Phase-0 routing — add the explicit case); price-less exit leg books 0 + flag.

7. Test matrix (run-order for the implementing agent)

Stage Command (env: PYTHONPATH=/mnt/dolphinng5_predict:/mnt/dolphinng5_predict/nautilus_dolphin, venv /home/dolphin/siloqy_env/bin/python3) Pass bar
Rust unit cargo test --release in _rust_kernel/ 100%
Kernel FSM pytest prod/tests/test_dita_v2_kernel.py 100%
Bridge/accounting pytest prod/tests/test_pink_ditav2_kernel_bridge.py test_pink_ditav2_accounting_invariants.py prod/clean_arch/dita_v2/test_account_core_v2.py 100%
Runtime/reconcile pytest prod/clean_arch/dita_v2/test_venue_reconcile.py test_orphan_prevention.py test_exec_router_runtime.py prod/tests/test_pink_async_fill_pump.py test_pink_direct_runtime.py 100%
Chaos pytest prod/tests/test_pink_ditav2_chaos_harness.py + test_dita_v2_e2e_functional.py 100%
Parity pytest prod/clean_arch/dita_v2/test_blue_parity.py test_alpha_blue_untouched_g7.py 100%
Adapter pytest prod/tests/test_dita_v2_bingx_adapter.py 100% after Phase-0 item 4 resolution
LIVE VST E2E python prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT suite green
Golden replays (NEW — write these) prod/tests/test_pink_accounting_golden.py see below
Shadow soak 2448 h on VST capital vs balance ≤ $0.01 idle

Golden replay tests (the heart of the acceptance)

Feed the kernel the recorded FET event sequence (entry fills 195,259 + 7,017 @ 0.1878; exit fills 26,007 + remainder; the poisoned variant with price=0.229 and the clean variant with 0.1866):

  1. Clean prices → realized = (0.18780.1866) × 202,276 ≈ +242.7 gross.
  2. Poisoned price (0.229) reaching the kernel anyway → with the adapter fix it must arrive as 0.0 → leg books 0 + realized_skipped_no_price; after synthetic FILL_SETTLED rp=+164 → slot realized = +164, pnl_source=exchange.
  3. Restart mid-position (save_state/restore_state + reconcile_from_slots) → next venue event settles ONLY the incremental PnL.
  4. VWAP: two entry fills at different prices → basis = weighted average.
  5. Dual-leverage invariant: same fills at exchange-leverage 1 vs 3 → identical realized PnL; only margin fields differ.

8. Rollout & rollback

  1. Each phase = one PR-sized commit, gates green before the next.
  2. Activation requires supervisorctl restart dolphin_pink — restart at a FLAT moment (check DOLPHIN_STATE_PINK + exchange positions). The restart-reconcile path is itself under test here; first restart after Phase 0 should be watched live.
  3. Rollback = git revert of the phase commit + rebuild .so + restart. The Rust .so MUST be rebuilt on both apply and revert — stale-binary drift is how the incremental-fill change sat uncompiled until 2026-06-11.
  4. CH DDLs are additive (ADD COLUMN ... DEFAULT) — no destructive migrations anywhere in this spec; rollback leaves unused columns, which is fine.
  5. PINK is VST (virtual funds) — it is the canary by construction. Nothing in this spec touches BLUE files (verify with git diff --name-only against the §38.7 checklist).

9. Done criteria (the whole spec)

  • All phases merged; full matrix green; golden replays green.
  • 48 h VST soak: zero UNEXPLAINED reconcile errors; published capital tracks exchange balance; every closed trade's trade_events.pnl within fees+slippage of the exchange income record, with pnl_source populated.
  • pink_ctl.py mode-verify passes (namespace isolation intact).
  • SYSTEM BIBLE §38 addendum updated (one paragraph: E-led ledger, K as checksum, provenance fields) + DITA_V2_KERNEL_REFERENCE.md §"Capital simplification" rewritten to match reality.