docs: VIBRISS spec (+ §10.6 cascade/adaptive-TP paramsets), PINK accounting fix spec, BLUE incident docs

VIBRISS_PARAMETER_GOVERNANCE_SPEC §10.6: ob_cascade.count_threshold
(currently cascade_count>0 = ONE asset widens every TP x1.40),
tp_widen_factor, withdrawal_velocity_threshold as governance candidates;
adaptive/Dynamic-TP threshold marked fit for VIBRISS governance; TP_FLOOR
joint-policy reward requirement.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Codex
2026-06-12 15:04:15 +02:00
parent f4ff1cd9b7
commit c3a18f693a
4 changed files with 3653 additions and 0 deletions

View File

@@ -0,0 +1,182 @@
# Critical Violet Design: BLUE hydration bug
Date: 2026-06-11
## Summary
This incident is a BLUE hydration / restore bug on the XTZUSDT short trade `863c21da`.
The important facts are:
1. The XTZ trade was real and opened at `2026-06-11 17:22:12.678265+00:00`.
2. The trade did **not** close via TP, SL, or MAX_HOLD before hydration.
3. The restore path later rebuilt the open slot from `position_state` and `trade_reconstruction`.
4. The restored state had a chain-token mismatch, but the engine continued with the derived token instead of hard-failing.
5. A later hydrate-time stop was recorded at `2026-06-11 18:35:52.789008+00:00` with `STOP_LOSS`.
6. The ledger shows the next trade was admitted while XTZ was still officially open, which violates the single-slot invariant.
## Trade identity
- trade_id: `863c21da`
- asset: `XTZUSDT`
- side: `SHORT`
- entry price: `0.2276`
- entry notional: `56484.4305702418`
- leverage: `6.374647927191287`
- entry bar: `238`
- tp_base_pct: `0.002`
- tp_effective_pct: `0.0019999655500463724`
## Ledger evidence
### Open record
`dolphin.trade_reconstruction` contains the canonical open record:
- ts: `2026-06-11 17:22:12.911989`
- event_type: `OPEN`
- event_id: `863c21da:open`
- chain_token: `26852fa25fb5cdaa3b4c354d5e3eea93e27bce0ebdcd0da896d4f981642eeeb2`
The payload confirms:
- `entry_ts = 1781198532678265`
- `entry_bar = 238`
- `retraction_legs = 0`
- `realized_pnl_legs_total = 0.0`
- `chain_mode = LIVE`
- `chain_kind = ROOT`
### No close before hydrate
`dolphin.trade_exit_legs` has no rows for `863c21da`.
`dolphin.trade_events` also has no close row for `863c21da`.
So there is no official TP, SL, or MAX_HOLD exit recorded before the restore/hydration event.
### Decision tape before hydrate
`dolphin.v7_decision_events` shows the trade was live and being evaluated:
- `2026-06-11 17:22:13.274556` `HOLD`
- `2026-06-11 17:22:23.124863` `HOLD`
- `2026-06-11 17:22:45.232894` `HOLD`
- `2026-06-11 17:23:28.274004` `HOLD`
- `2026-06-11 17:24:43.182413` `RETRACT / V7_RISK_DOMINANT`
The best favorable excursion in the pre-hydrate tape was only about `+0.065905094%`, which is far below the fixed TP threshold.
## Restore / hydration behavior
At restore time the engine logged:
- `chain token mismatch on restore: trade=863c21da stored=26852fa25fb5 derived=98875e225e9e — continuing with derived token`
- `position_state RESTORED: XTZUSDT SHORT entry=0.2276 notional=56484 bars_held≈0 trade=863c21da`
The restore path in [`prod/nautilus_event_trader.py`](../nautilus_event_trader.py) does the following:
- reads `position_state`
- reconstructs `restored_entry_bar = max(0, self.bar_idx - stored_bars)`
- loads reconstruction data from `dolphin.trade_reconstruction`
- rebuilds chain state from the persisted payload
- if the stored chain token differs from the derived token, it logs the mismatch and continues with the derived token
Relevant code:
- `_chain_state_from_reconstruction(...)` around lines `3315-3348`
- restore from `position_state` around lines `1944-2058`
This is a validator, not a hard guardrail.
## Single-slot violation
The next distinct open trade in the reconstruction ledger is:
- ts: `2026-06-11 17:50:50.420620`
- trade_id: `43494ade`
- asset: `TRXUSDT`
- side: `SHORT`
That means the system admitted a new trade while XTZ was still officially open in the ledger.
On a single-slot engine, that should not happen.
## What would have happened without hydration
This is the conservative conclusion from the tape:
- The trade did not hit TP on the observed pre-hydrate tape.
- The trade did not have an official close row before hydration.
- The tape does not contain a clean uninterrupted decision path beyond the first pre-hydrate window.
The best-supported natural outcome from the observed tape is the live `RETRACT` state at `2026-06-11 17:24:43.182413`, where the engine still considered the slot active and the trade had only reached `bars_held = 14`.
At that point:
- `current_price = 0.22765000000000002`
- `pnl_pct = -0.021968365`
- `reason = V7_RISK_DOMINANT`
If that retract state had been executed immediately, the estimated trade PnL would have been:
- `-12.4087058758423` USDT on the recorded notional
- trade ROI: `-0.021968365%`
The max-hold clock also would have forced a decision long before the 18:35 restore:
- trade-specific `market_state_max_hold_bars = 102`
- live tape reached `bars_held = 14` by `17:24:43`
- at an ~11 second cadence, the max-hold boundary would have arrived around `17:40-17:41`
So the 18:35 stop-loss is not the natural continuation of the original entry. It is a restore-time artifact on top of a stale open slot.
What is observable is the hydrated-path close that actually got booked:
- exit ts: `2026-06-11 18:35:52.789008+00:00`
- exit reason: `STOP_LOSS`
- exit price: `0.23526757499999998`
- realized pnl_pct: `-0.033056485743551446`
- realized net_pnl: `-1913.155101369921`
That realized stop corresponds to:
- price move against the short of about `3.3056%`
- account-level ROI of about `-2.726636%` using capital before exit (`70165.39`)
## Root cause
The bug is the restore path itself:
1. The open trade state was preserved in `trade_reconstruction`.
2. The current `position_state` snapshot was lossy or stale enough to rehydrate with `bars_held≈0`.
3. The chain token mismatch was detected, but the code explicitly continues with the derived token.
4. The engine therefore recovered continuity without enforcing strict equality between the live open chain and the reconstructed state.
That combination makes orphaned trades possible after a bad hydrate.
## Operational impact
- The XTZ short remained open in the ledger with no formal close.
- The engine later allowed a new trade while the slot should still have been occupied.
- Capital accounting diverged from the true live slot history.
- The restore path masked the inconsistency instead of stopping the recovery.
## Recommended fix direction
1. Treat a chain-token mismatch on restore as a hard failure for BLUE when a live open slot exists.
2. Preserve the original `entry_bar` and bar counter from the open-chain payload instead of reconstructing them from the current `position_state` row when the two disagree materially.
3. Refuse to admit a new trade until the single-slot invariant is proven flat.
4. Add a regression test for:
- open XTZ trade
- stale `position_state`
- chain-token mismatch
- no new trade admission while the open slot remains unresolved
## Bottom line
XTZ was a real open trade.
It never got a clean pre-hydrate exit.
The restore path tolerated chain drift and rebuilt a misleading open state.
The best-supported no-freeze outcome is the 17:24 retract, roughly flat to slightly negative.
The realized hydrated-path loss was `-3.3056485743551446%` on the position and `-2.726636%` of capital before exit, but that is a restore artifact, not the natural end of the original trade.

View File

@@ -0,0 +1,131 @@
# MALFORMED_OPEN_RESTORE_BUG
## Summary
BLUE was repeatedly rehydrating after startup because `dolphin.position_state` contained stale `OPEN` rows with zero effective size.
The restore path treated those rows as fatal:
- it selected the latest `OPEN` row per `trade_id`
- it accepted that row even when `quantity` or `notional` had been driven to `0`
- it hard-stopped on `position_state row invalid quantity ...`
- `supervisord` then restarted the trader
- the next startup read the same bad row again
That created a restart loop.
This was observed most clearly on the `2026-06-11` BLUE window. The recurring bad row was the legacy `ATOMUSDT` leg `1a3d2f9c`, which was persisted as:
- `status = OPEN`
- `quantity = 0`
- `notional = 0`
- `bars_held = 34`
That row is not a live position. It is a stale snapshot that should have been treated as tombstoned history.
## Root Cause
The bad rows were self-inflicted by the partial-retract path in `nautilus_event_trader.py`.
Before the fix:
1. `_apply_internal_retract()` shrank the live position.
2. It wrote a new `position_state` row with `status="OPEN"` for the remaining leg.
3. If the remaining size rounded to zero, the row still existed as an `OPEN` snapshot.
4. A later startup restore could pick that row and treat it as authoritative.
That is enough to leave behind `OPEN` rows with:
- `quantity = 0`
- `notional = 0`
These are not valid live positions, but they looked like one to the old restore logic.
There is a second contributing factor in the restore path:
- the restore code historically trusted the latest `OPEN` candidate too early
- zero-sized `OPEN` rows were only rejected after the row had already been chosen as the best candidate
- rejection used a hard failure path, which made the process exit instead of trying the next sane source
That means the persistence bug and the restore policy bug reinforced each other.
## Observable Symptoms
- repeated `restore candidate parse failed from capital_update_ledger: 'list' object has no attribute 'get'`
- repeated `position_state row invalid quantity for trade ...: 0.0`
- `RESTORE HALT`
- immediate restart by `supervisord`
The chain-token mismatch logs were a separate warning. They were not the restart trigger.
The capital-ledger parse warning is also distinct:
- it indicates the ledger file is list-shaped, not a dict
- it forces restore to rely more heavily on the other state surfaces
- it is noisy, but it is not what actually killed the process in this incident
## Fix Applied
Two changes were made.
### 1. Stop writing zero-sized `OPEN` rows
In `_apply_internal_retract()`:
- compute `remaining_qty`
- if the remaining size is effectively zero, treat the retract as a full close
- return the forced exit without emitting a new `position_state` row with `status="OPEN"`
This prevents the bad row from being created in the first place.
### 2. Make restore skip legacy bad `OPEN` rows
In `_restore_position_state()`:
- the ClickHouse restore query now filters `OPEN` rows with `quantity > 0 AND notional > 0`
- if an invalid candidate still appears, restore logs and rejects it instead of hard-halting the process
- restore falls back to HZ state or flat continuation rather than turning a stale row into a restart loop
This is important because the repository already contains stale history. The fix is not only to stop producing new malformed rows; it also has to prevent old rows from re-triggering the same failure path on the next reboot.
### 3. Keep the full-close path coherent
The retract path now computes `remaining_qty` explicitly and treats `remaining_notional <= 1e-9` or `remaining_qty <= 0.0` as a full close.
That means:
- a full retract does not leave a zero-size `OPEN` snapshot behind
- the exit is finalized as a close, not as a pseudo-open partial state
- the runtime slot is removed cleanly instead of being left in a half-closed limbo
## Verification Added
Regression tests were added for both sides:
- full-close retracts no longer emit zero-sized `OPEN` rows
- restore skips zero-sized `OPEN` candidates without setting `restore_failed`
The tests use the existing retract and restore harnesses:
- one test seeds a tiny short leg that collapses to zero on retract and asserts no `OPEN` zero-size row is written
- one test feeds a zero-sized `OPEN` `position_state` row into restore and asserts restore does not hard-halt
## Operational Impact
After this fix:
- stale zero-sized `OPEN` rows no longer restart BLUE
- malformed open snapshots are quarantined as legacy garbage
- the live runtime can continue from a sane source instead of bouncing on the same bad record
## What This Does Not Fix
This change does not rewrite historical ClickHouse rows already present in the warehouse.
It only changes:
- new retract writes
- restore selection and rejection policy
- restart behavior when the old garbage is encountered
If you want the historical ledger cleaned up, that is a separate reconciliation task. The current patch is intentionally conservative and only stops the bad row from causing further damage.

View File

@@ -0,0 +1,362 @@
# PINK / DITAv2 Accounting & Execution Fix — Spec and Dev Guide
**Status**: SPEC — ready for implementation agent
**Date**: 2026-06-11
**Branch**: `exp/pink-ditav2-sprint0-20260530` (continue on it or fork `fix/pink-accounting-consolidation`)
**Author of spec**: forensic session 2026-06-11 (FET $5,990.90 mis-book replay)
**Prerequisite for**: VIOLET rebuild (`violet_subsecond_rebuild_plan` memory / future plan session)
---
## 0. Why this exists — the incident in one paragraph
On 2026-06-11 PINK closed a FET-USDT short that the exchange settled at
**+$164 net** (entry VWAP 0.1878, exit 0.1866, ~202K FET) but the kernel
booked **$5,990.90** and capital diverged $6,154 from the exchange wallet.
Replay against `dolphin_pink.trade_reconstruction` slot images identified
three stacked defects, all in *derivation* code (none in exchange facts):
(1) fill events carried BingX's MARKET **protective bound price** (0.229,
+22% off tape) instead of the true fill price; (2) `realized_pnl()` and
`mark_price()` multiplied PnL by `slot.leverage` (exchange leverage — but
`slot.size` is exchange *quantity*, so every leg was 3× inflated); (3) the
Python settle baseline `_last_settled_pnl` resets empty on every restart,
so reconcile-adopted slots re-settle carried PnL. Exact replay of leg 1:
`26,007 × (0.2290.1878)/0.1878 × 0.1878 × 3 = 3,214.4652` ✓ matches the
booked increment to the cent.
A fourth structural finding: there are **three parallel ledgers** (Rust
`AccountState` K/E, Python `AccountProjection` — the one persistence reads,
fee-blind — and `AccountProjectionV2`, dead in the live path). This spec
consolidates to **E-facts as ledger of record + K as integrity checksum +
one atomic published snapshot**.
---
## 1. Scope and non-goals
IN SCOPE
1. Commit + activate the Phase-0 fixes already in the working tree.
2. E-anchored published capital; single atomic account snapshot.
3. Per-trade PnL provenance (`exchange | kernel_estimate`) end-to-end.
4. Sizer feedback off trade-realized PnL (not capital deltas).
5. Persistence hygiene: duplicate row emission, silent async-insert loss,
`event_seq` stamping, `bars_held` clamp, naive-UTC timestamps.
6. Kernel hardening leftovers: `resolve_slot` no-match sentinel,
FILL_SETTLED realized override of flagged estimate legs.
OUT OF SCOPE (separate tickets)
- BLUE's exit-path masking bug (LINK $1,248, `TODO_TP_SCAN_CADENCE_BUGFIX.md`) — BLUE stack, not DITAv2.
- VIOLET fork, sub-second clock, venue price-feed port, cadence quantizer.
- ch_writer head-of-line poison-row parking redesign (mitigations land here;
the full parking-lane design is its own task).
- prefect.db / ClickHouse TTL disk remediation.
HARD INVARIANTS — MUST NOT CHANGE
- **Dual leverage**: `slot.size` = exchange quantity; `slot.leverage` =
exchange leverage (13x cap, set at BingX API); *our*-leverage
(conviction) = `size × entry_price / capital`, computed only at
`pink_direct._hz_publish` (line ~911). PnL is therefore **leverage-free**:
`qty × Δprice`, side-signed. Do not touch the conviction→exchange mapping
(`round_half_even_linear_0.5_to_9.0_to_1_to_exchange_cap`) or
`target_size` computation.
- **Exits are never skipped** (exec-router invariant set, §16 kernel ref).
- **BLUE-parity policy contract**: `DecisionEngine`/`IntentEngine` inputs
(MarketSnapshot + capital + slot state) unchanged in shape.
- **Namespace isolation**: zero writes to `dolphin.*` / `dolphin_prodgreen.*`
or BLUE/PRODGREEN HZ maps. Re-verify with `pink_ctl.py mode-verify`.
- **Data cadences are sacred** (operator rule 2026-06-10): never reduce a
data cadence for throughput.
---
## 2. Phase 0 — Commit and activate the already-applied fixes
These changes exist UNCOMMITTED in the working tree as of 2026-06-11 ~16:30.
Verify each hunk, commit as one reviewed unit, then restart `dolphin_pink`.
### 0.1 `prod/clean_arch/dita_v2/_rust_kernel/src/lib.rs`
| Function | Change (already applied) |
|---|---|
| `KernelCore::realized_pnl` (~line 1153) | PnL = side-signed `qty × (exit entry)`; **no leverage factor**; returns 0 when `entry<=0 exit_size<=0 exit_price<=0 !finite` |
| `TradeSlot::mark_price` (~line 394) | no `× leverage` in unrealized; a mark NEVER becomes entry basis — missing basis flags `metadata.entry_basis_missing=true`, unrealized stays 0 |
| `KernelCore::fill_matches_order` (new) | identity match on `venue_order_id` / `venue_client_id` |
| `KernelCore::apply_fill` | entry/exit routing by ORDER IDENTITY first, FSM state second (`!id_matches_exit` / `!id_matches_entry` guards); entry basis = **VWAP across entry fills** (`(prev_basis×prev_filled + price×fill)/accumulated`); price-less exit fill reduces size, books 0 PnL, flags `metadata.realized_skipped_no_price=true` |
Rebuild required: `cargo build --release` in `_rust_kernel/` (the `.so` is
only auto-built when missing — **source/binary drift is a known hazard**;
add the build to the commit checklist). `cargo test`: 32/32 green as of spec.
### 0.2 `prod/clean_arch/dita_v2/bingx_venue.py`
Fill events must carry a TRUE fill price or 0.0 — never the order's nominal
`price` / submit `receipt.price` (BingX MARKET bound price, ±2025%):
- `_events_from_submit` fill event (~line 585): `_row_float(ack_row,
"avgPrice","ap","lastFillPrice","L", default=0.0)`
- `_event_from_row` (~line 697): fills use the same true-price chain;
non-fill events (ACK/CANCEL/REJECT) may keep nominal `price` as info
- `_fill_event_from_row` (~line 736): `"lastFillPrice","L","avgPrice","ap"`
### 0.3 `prod/clean_arch/dita_v2/rust_backend.py`
- `reconcile_from_slots`: seeds `_last_settled_pnl[slot_id] = slot.realized_pnl`
and `_slot_was_closed[slot_id] = slot.closed` for every adopted slot.
- `restore_state`: same re-anchoring after successful restore.
### 0.4 Adjacent fixes riding the same commit
- `prod/ch_writer.py`: insert URLs append `&date_time_input_format=best_effort`;
flush errors log at WARNING (first 10 + every 100th), counter `_flush_errors`.
- `prod/clean_arch/dita_v2/blue_parity.py` `price_of`: hyphen-tolerant
fallback (`FET-USDT` → `FETUSDT`) — fixes the unmanaged-position block.
- `prod/clickhouse/users.xml`: `date_time_input_format=best_effort` for the
`dolphin` user (NOTE: running CH container did not honor it even after
restart — the container does not mount compose configs; effective on next
compose recreation. The client-side URL param is the operative fix.)
- `prod/tests/test_dita_v2_kernel.py`: partial→full fill test updated to
incremental `filled_size` semantics (BingX WS `lastFilledQty`).
### 0.5 Phase 0 gates
1. `cargo test` in `_rust_kernel`: 32/32.
2. `pytest prod/tests/test_dita_v2_kernel.py`: 7/7.
3. `pytest prod/clean_arch/dita_v2/test_exec_router_runtime.py
test_venue_reconcile.py test_orphan_prevention.py
prod/tests/test_pink_async_fill_pump.py
prod/clean_arch/dita_v2/test_account_core_v2.py test_bingx_bugs.py`: 134/134.
4. KNOWN pre-existing failures (NOT introduced by this work — verified by
hunk-revert): 4 tests in `prod/tests/test_dita_v2_bingx_adapter.py`
(snapshot-fill emission broke when sync `submit()` started passing None
snapshots on 2026-06-10). Fix or quarantine them explicitly in this phase
— do not let them mask new regressions.
5. Restart `dolphin_pink` at a FLAT moment; verify in logs: no
`realized_skipped_no_price` storms, no `entry_basis_missing` on fresh
entries, first round-trip books PnL within ±(fees+slippage) of
`GET /openApi/swap/v2/user/income` for the same trade.
---
## 3. Phase 1 — E-anchored published capital
**Goal**: the capital that persistence/HZ/sizer see is exchange-anchored;
K never publishes.
### 3.1 `prod/clean_arch/dita_v2/account.py`
- Add to `AccountSnapshot`: `capital_source: str` (`"e_anchored" |
"k_bridged" | "seed"`), `e_wallet_balance: float`, `event_seq: int`.
- New method `AccountProjection.anchor_to_exchange(wallet_balance: float,
available_margin: float, event_seq: int)`: sets `capital = wallet_balance`
(guard `>0` and finite — the zero-wb frame lesson), `capital_source =
"e_anchored"`, recomputes equity. `settle()` remains for the BRIDGE case
only: between anchors, capital += realized (`capital_source="k_bridged"`).
- `settle(realized_pnl, fees)`: **stop ignoring fees** — `capital +=
realized_pnl fees` (today fees only accumulate in `fees_paid`; published
capital ignores them between reseeds).
### 3.2 `prod/clean_arch/runtime/pink_direct.py`
- The existing reseed path (balance-bearing ACCOUNT_UPDATE →
`kernel.reset_and_seed(wb)`) additionally calls
`kernel.account.anchor_to_exchange(...)` — one anchoring action, two
ledgers consistent.
- Boot seed (launcher `exchange_balance_capital` block, pink_direct ~line
262) goes through `anchor_to_exchange` instead of direct attribute writes.
### 3.3 Gates
- New unit tests (`prod/tests/test_pink_account_anchor.py`):
anchor sets capital/source; zero/negative/NaN wb rejected; settle bridges
with fees; anchor after bridge snaps to wb exactly.
- Shadow check (live, 24 h on VST): published capital vs
`GET /openApi/swap/v2/user/balance` polled 1/min — max |Δ| outside a
trade-settlement window ≤ $0.01; during settlement ≤ pending-fee bound.
---
## 4. Phase 2 — Single atomic snapshot, ledger consolidation
**Goal**: one immutable, versioned account snapshot; the two redundant
ledgers demoted/removed.
### 4.1 `prod/clean_arch/dita_v2/account.py`
- Make the published snapshot **immutable-replace**: `AccountProjection`
builds a new frozen `AccountSnapshot` (carry `event_seq`) on every
mutation and swaps a single reference (GIL-atomic). Readers must take
`snap = kernel.account.snapshot` once per use (audit call sites:
`pink_clickhouse.py`, `hazelcast_projection.py` HZ writer, `pink_direct`).
- `AccountProjectionV2`: DELETE, or move to `prod/clean_arch/dita_v2/
_attic/` with a module docstring pointing here. Its only live-path import
is `exchange_event.py` — migrate that import or the dataclasses it uses
(`EPosition` is genuinely useful; keep it in `account.py`).
- The Rust `AccountState` K-ledger STAYS — demoted by documentation and by
Phase 1 (it no longer feeds published capital): its jobs are reconcile
classification (R1-style), `capital_frozen`, and E-dark bridging. Update
the module docstring to say exactly this.
### 4.2 `prod/clean_arch/persistence/pink_clickhouse.py`
- Read capital/equity/peak/trade_seq from the single snapshot reference;
no recomputation.
- Add columns to emitted rows (and the matching `ALTER TABLE` DDLs under
`prod/clickhouse/pink/08_provenance.sql` — **apply DDLs to CH BEFORE
deploying code that emits them**; the missing-table head-of-line jam of
2026-06-11 is the cautionary tale):
- `account_events`, `status_snapshots`: `capital_source LowCardinality(String) DEFAULT ''`,
`account_event_seq UInt64 DEFAULT 0`
- `trade_events`, `trade_exit_legs`: `pnl_source LowCardinality(String) DEFAULT ''`
(`exchange` | `kernel_estimate`)
- `bars_held`: clamp to `max(0, …)` at row-build time (UInt16 column;
negative values currently 400 on trade_events / silently vanish on
async tables).
- Timestamps: route every `ts` through one helper emitting **naive-UTC
microsecond ISO** (no `+00:00`) — best_effort already tolerates both, but
rows must stop depending on a parser setting.
### 4.3 Duplicate-emission fix (same file)
Every CH row is currently emitted twice (visible in any query). Hunt the
double call: instrument `_sink()` with a per-(table, content-hash) debug
counter in a test, then trace the two call paths (suspect: `persist_result`
invoked both from the runtime step and from the fill pump for the same
event). Fix at the caller level; do NOT dedupe by content in the sink
(masks real double-events). Regression test: one simulated round trip →
exactly one row per logical event per table.
### 4.4 `prod/ch_writer.py`
- `wait_for_async_insert`: `"1"` for ALL `dolphin_pink` tables (accounting
rows must never be silently lost; the spool absorbs latency). Keep `0`
acceptable only for high-volume shadow tables if measured necessary —
document any exception inline.
- Mitigation for head-of-line (full redesign out of scope): after
`attempts > 1000` on a row, log ERROR with the CH response body once per
100 attempts (today the reject reason is invisible without manual replay).
### 4.5 Gates
- Full offline suite (the 533+ DITAv2/PINK set) green, minus the Phase-0
quarantined adapter tests if still open.
- One live VST round trip: every table gets exactly one row per event;
`pnl_source`/`capital_source` populated; CH `system.text_log` shows zero
parse rejections for `dolphin_pink`.
---
## 5. Phase 3 — Sizer feedback off trade-realized PnL
**THE one seam where this refactor can silently change alpha behavior.**
### 5.1 `prod/clean_arch/runtime/pink_direct.py` — `_sizer_trade_feedback` (~line 1453)
Today: `pnl = acc.capital self._sizer_entry_capital` (capital delta).
Under E-anchored capital this absorbs funding, fees of other activity, and
**foreign fills from the shared VST account** (PRODGREEN collision class).
Change to:
```
pnl = slot_realized_for_trade(trade_id) # Σ slot.realized_pnl legs, i.e.
# kernel estimate, overridden by
# exchange rp when settled (5.2)
```
Source: the closing slot dict already carries `realized_pnl`; use it (minus
the fees recorded for the trade when available) instead of the capital
delta. Keep the magnitude semantics the sizer expects (sign + rough size —
per the existing comment, bucket/streak multipliers only need that).
### 5.2 Exchange override (E-led repair) — `bingx_user_stream.py` + `rust_backend.py`
- The WS `FILL_SETTLED` path already carries the exchange's realized (`rp`)
and fee (`n`, sign-flipped at boundary per BingX quirks memory). Extend
the kernel account-event payload with `trade_id`, and on receipt:
- if the matching slot leg was flagged `realized_skipped_no_price`,
ADD the exchange realized to `slot.realized_pnl` (repair) and clear
the flag; settle the increment through the normal baseline mechanism;
- else record `pnl_source="exchange"` for the trade-event row (the
estimate stays as the booked figure unless |estimaterp| exceeds a
tolerance — then log ERROR + emit an `anomaly_events` row; do NOT
silently re-book).
- Rust: add `dita_kernel_repair_realized(slot_id, amount)` FFI (or fold the
repair into `on_account_event` with `slot_id` in payload). Keep it
idempotent via the existing account-event dedup.
### 5.3 Gates
- Unit: feedback receives trade-realized, not capital delta (simulate a
foreign-fill capital jump mid-trade → feedback unaffected).
- Unit: price-less exit leg + later FILL_SETTLED repair → slot realized
equals exchange `rp`; settle baseline consistent (no double-settle).
- Parity: `test_blue_parity.py`, `test_alpha_blue_untouched_g7.py` green
(sizer behavior unchanged for normal fills).
---
## 6. Phase 4 — Kernel hardening leftovers
### 6.1 `lib.rs` — `resolve_slot` (~line 1099)
Falls back to **slot 0** when nothing matches. Change: return
`Option<usize>`; on `None`, `on_venue_event` returns
`UNRESOLVED_SLOT` (diagnostic exists already) without mutating any slot,
severity WARNING, event recorded in outcome details. Python callers: the
runtime treats UNRESOLVED_SLOT as a logged no-op (the `_fill_is_ours`
filter remains first-line defense; this is kernel-side defense for
venue-agnostic reuse).
NOTE: several tests construct events with `slot_id=-1` expecting slot-0
fallback — update them to pass explicit `slot_id=0` (behavioral test
change; list each in the PR description).
### 6.2 ID-less fill routing (documentation + metric, not code)
BingX WS omits clientOrderId, so identity routing can't always engage.
Add a counter metric (`fills_routed_by_state_total`) via an
`anomaly_events` row per occurrence, severity INFO — gives VIOLET the data
to justify per-venue synthetic ids later. No FSM behavior change.
### 6.3 Gates
- New Rust tests: unresolved event mutates nothing; entry-id fill during
EXIT_WORKING routes to entry (already covered by Phase-0 routing — add
the explicit case); price-less exit leg books 0 + flag.
---
## 7. Test matrix (run-order for the implementing agent)
| Stage | Command (env: `PYTHONPATH=/mnt/dolphinng5_predict:/mnt/dolphinng5_predict/nautilus_dolphin`, venv `/home/dolphin/siloqy_env/bin/python3`) | Pass bar |
|---|---|---|
| Rust unit | `cargo test --release` in `_rust_kernel/` | 100% |
| Kernel FSM | `pytest prod/tests/test_dita_v2_kernel.py` | 100% |
| Bridge/accounting | `pytest prod/tests/test_pink_ditav2_kernel_bridge.py test_pink_ditav2_accounting_invariants.py prod/clean_arch/dita_v2/test_account_core_v2.py` | 100% |
| Runtime/reconcile | `pytest prod/clean_arch/dita_v2/test_venue_reconcile.py test_orphan_prevention.py test_exec_router_runtime.py prod/tests/test_pink_async_fill_pump.py test_pink_direct_runtime.py` | 100% |
| Chaos | `pytest prod/tests/test_pink_ditav2_chaos_harness.py` + `test_dita_v2_e2e_functional.py` | 100% |
| Parity | `pytest prod/clean_arch/dita_v2/test_blue_parity.py test_alpha_blue_untouched_g7.py` | 100% |
| Adapter | `pytest prod/tests/test_dita_v2_bingx_adapter.py` | 100% after Phase-0 item 4 resolution |
| LIVE VST E2E | `python prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT` | suite green |
| **Golden replays (NEW — write these)** | `prod/tests/test_pink_accounting_golden.py` | see below |
| Shadow soak | 2448 h on VST | capital vs balance ≤ $0.01 idle |
### Golden replay tests (the heart of the acceptance)
Feed the kernel the recorded FET event sequence (entry fills 195,259 +
7,017 @ 0.1878; exit fills 26,007 + remainder; the poisoned variant with
price=0.229 and the clean variant with 0.1866):
1. Clean prices → realized = `(0.18780.1866) × 202,276 ≈ +242.7` gross.
2. Poisoned price (0.229) reaching the kernel anyway → with the adapter fix
it must arrive as 0.0 → leg books 0 + `realized_skipped_no_price`; after
synthetic FILL_SETTLED rp=+164 → slot realized = +164, `pnl_source=exchange`.
3. Restart mid-position (save_state/restore_state + reconcile_from_slots)
→ next venue event settles ONLY the incremental PnL.
4. VWAP: two entry fills at different prices → basis = weighted average.
5. Dual-leverage invariant: same fills at exchange-leverage 1 vs 3 →
**identical realized PnL**; only margin fields differ.
---
## 8. Rollout & rollback
1. Each phase = one PR-sized commit, gates green before the next.
2. Activation requires `supervisorctl restart dolphin_pink` — restart at a
FLAT moment (check `DOLPHIN_STATE_PINK` + exchange positions). The
restart-reconcile path is itself under test here; first restart after
Phase 0 should be watched live.
3. Rollback = `git revert` of the phase commit + rebuild `.so` + restart.
The Rust `.so` MUST be rebuilt on both apply and revert — stale-binary
drift is how the incremental-fill change sat uncompiled until 2026-06-11.
4. CH DDLs are additive (`ADD COLUMN ... DEFAULT`) — no destructive
migrations anywhere in this spec; rollback leaves unused columns, which
is fine.
5. PINK is VST (virtual funds) — it is the canary by construction. Nothing
in this spec touches BLUE files (verify with `git diff --name-only`
against the §38.7 checklist).
## 9. Done criteria (the whole spec)
- All phases merged; full matrix green; golden replays green.
- 48 h VST soak: zero UNEXPLAINED reconcile errors; published capital
tracks exchange balance; every closed trade's `trade_events.pnl` within
fees+slippage of the exchange income record, with `pnl_source` populated.
- `pink_ctl.py mode-verify` passes (namespace isolation intact).
- SYSTEM BIBLE §38 addendum updated (one paragraph: E-led ledger, K as
checksum, provenance fields) + `DITA_V2_KERNEL_REFERENCE.md` §"Capital
simplification" rewritten to match reality.

File diff suppressed because it is too large Load Diff