docs: VIBRISS spec (+ §10.6 cascade/adaptive-TP paramsets), PINK accounting fix spec, BLUE incident docs
VIBRISS_PARAMETER_GOVERNANCE_SPEC §10.6: ob_cascade.count_threshold (currently cascade_count>0 = ONE asset widens every TP x1.40), tp_widen_factor, withdrawal_velocity_threshold as governance candidates; adaptive/Dynamic-TP threshold marked fit for VIBRISS governance; TP_FLOOR joint-policy reward requirement. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
182
prod/docs/CRITICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md
Normal file
182
prod/docs/CRITICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
# Critical Violet Design: BLUE hydration bug
|
||||||
|
|
||||||
|
Date: 2026-06-11
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
This incident is a BLUE hydration / restore bug on the XTZUSDT short trade `863c21da`.
|
||||||
|
|
||||||
|
The important facts are:
|
||||||
|
|
||||||
|
1. The XTZ trade was real and opened at `2026-06-11 17:22:12.678265+00:00`.
|
||||||
|
2. The trade did **not** close via TP, SL, or MAX_HOLD before hydration.
|
||||||
|
3. The restore path later rebuilt the open slot from `position_state` and `trade_reconstruction`.
|
||||||
|
4. The restored state had a chain-token mismatch, but the engine continued with the derived token instead of hard-failing.
|
||||||
|
5. A later hydrate-time stop was recorded at `2026-06-11 18:35:52.789008+00:00` with `STOP_LOSS`.
|
||||||
|
6. The ledger shows the next trade was admitted while XTZ was still officially open, which violates the single-slot invariant.
|
||||||
|
|
||||||
|
## Trade identity
|
||||||
|
|
||||||
|
- trade_id: `863c21da`
|
||||||
|
- asset: `XTZUSDT`
|
||||||
|
- side: `SHORT`
|
||||||
|
- entry price: `0.2276`
|
||||||
|
- entry notional: `56484.4305702418`
|
||||||
|
- leverage: `6.374647927191287`
|
||||||
|
- entry bar: `238`
|
||||||
|
- tp_base_pct: `0.002`
|
||||||
|
- tp_effective_pct: `0.0019999655500463724`
|
||||||
|
|
||||||
|
## Ledger evidence
|
||||||
|
|
||||||
|
### Open record
|
||||||
|
|
||||||
|
`dolphin.trade_reconstruction` contains the canonical open record:
|
||||||
|
|
||||||
|
- ts: `2026-06-11 17:22:12.911989`
|
||||||
|
- event_type: `OPEN`
|
||||||
|
- event_id: `863c21da:open`
|
||||||
|
- chain_token: `26852fa25fb5cdaa3b4c354d5e3eea93e27bce0ebdcd0da896d4f981642eeeb2`
|
||||||
|
|
||||||
|
The payload confirms:
|
||||||
|
|
||||||
|
- `entry_ts = 1781198532678265`
|
||||||
|
- `entry_bar = 238`
|
||||||
|
- `retraction_legs = 0`
|
||||||
|
- `realized_pnl_legs_total = 0.0`
|
||||||
|
- `chain_mode = LIVE`
|
||||||
|
- `chain_kind = ROOT`
|
||||||
|
|
||||||
|
### No close before hydrate
|
||||||
|
|
||||||
|
`dolphin.trade_exit_legs` has no rows for `863c21da`.
|
||||||
|
|
||||||
|
`dolphin.trade_events` also has no close row for `863c21da`.
|
||||||
|
|
||||||
|
So there is no official TP, SL, or MAX_HOLD exit recorded before the restore/hydration event.
|
||||||
|
|
||||||
|
### Decision tape before hydrate
|
||||||
|
|
||||||
|
`dolphin.v7_decision_events` shows the trade was live and being evaluated:
|
||||||
|
|
||||||
|
- `2026-06-11 17:22:13.274556` `HOLD`
|
||||||
|
- `2026-06-11 17:22:23.124863` `HOLD`
|
||||||
|
- `2026-06-11 17:22:45.232894` `HOLD`
|
||||||
|
- `2026-06-11 17:23:28.274004` `HOLD`
|
||||||
|
- `2026-06-11 17:24:43.182413` `RETRACT / V7_RISK_DOMINANT`
|
||||||
|
|
||||||
|
The best favorable excursion in the pre-hydrate tape was only about `+0.065905094%`, which is far below the fixed TP threshold.
|
||||||
|
|
||||||
|
## Restore / hydration behavior
|
||||||
|
|
||||||
|
At restore time the engine logged:
|
||||||
|
|
||||||
|
- `chain token mismatch on restore: trade=863c21da stored=26852fa25fb5 derived=98875e225e9e — continuing with derived token`
|
||||||
|
- `position_state RESTORED: XTZUSDT SHORT entry=0.2276 notional=56484 bars_held≈0 trade=863c21da`
|
||||||
|
|
||||||
|
The restore path in [`prod/nautilus_event_trader.py`](../nautilus_event_trader.py) does the following:
|
||||||
|
|
||||||
|
- reads `position_state`
|
||||||
|
- reconstructs `restored_entry_bar = max(0, self.bar_idx - stored_bars)`
|
||||||
|
- loads reconstruction data from `dolphin.trade_reconstruction`
|
||||||
|
- rebuilds chain state from the persisted payload
|
||||||
|
- if the stored chain token differs from the derived token, it logs the mismatch and continues with the derived token
|
||||||
|
|
||||||
|
Relevant code:
|
||||||
|
|
||||||
|
- `_chain_state_from_reconstruction(...)` around lines `3315-3348`
|
||||||
|
- restore from `position_state` around lines `1944-2058`
|
||||||
|
|
||||||
|
This is a validator, not a hard guardrail.
|
||||||
|
|
||||||
|
## Single-slot violation
|
||||||
|
|
||||||
|
The next distinct open trade in the reconstruction ledger is:
|
||||||
|
|
||||||
|
- ts: `2026-06-11 17:50:50.420620`
|
||||||
|
- trade_id: `43494ade`
|
||||||
|
- asset: `TRXUSDT`
|
||||||
|
- side: `SHORT`
|
||||||
|
|
||||||
|
That means the system admitted a new trade while XTZ was still officially open in the ledger.
|
||||||
|
|
||||||
|
On a single-slot engine, that should not happen.
|
||||||
|
|
||||||
|
## What would have happened without hydration
|
||||||
|
|
||||||
|
This is the conservative conclusion from the tape:
|
||||||
|
|
||||||
|
- The trade did not hit TP on the observed pre-hydrate tape.
|
||||||
|
- The trade did not have an official close row before hydration.
|
||||||
|
- The tape does not contain a clean uninterrupted decision path beyond the first pre-hydrate window.
|
||||||
|
|
||||||
|
The best-supported natural outcome from the observed tape is the live `RETRACT` state at `2026-06-11 17:24:43.182413`, where the engine still considered the slot active and the trade had only reached `bars_held = 14`.
|
||||||
|
|
||||||
|
At that point:
|
||||||
|
|
||||||
|
- `current_price = 0.22765000000000002`
|
||||||
|
- `pnl_pct = -0.021968365`
|
||||||
|
- `reason = V7_RISK_DOMINANT`
|
||||||
|
|
||||||
|
If that retract state had been executed immediately, the estimated trade PnL would have been:
|
||||||
|
|
||||||
|
- `-12.4087058758423` USDT on the recorded notional
|
||||||
|
- trade ROI: `-0.021968365%`
|
||||||
|
|
||||||
|
The max-hold clock also would have forced a decision long before the 18:35 restore:
|
||||||
|
|
||||||
|
- trade-specific `market_state_max_hold_bars = 102`
|
||||||
|
- live tape reached `bars_held = 14` by `17:24:43`
|
||||||
|
- at an ~11 second cadence, the max-hold boundary would have arrived around `17:40-17:41`
|
||||||
|
|
||||||
|
So the 18:35 stop-loss is not the natural continuation of the original entry. It is a restore-time artifact on top of a stale open slot.
|
||||||
|
|
||||||
|
What is observable is the hydrated-path close that actually got booked:
|
||||||
|
|
||||||
|
- exit ts: `2026-06-11 18:35:52.789008+00:00`
|
||||||
|
- exit reason: `STOP_LOSS`
|
||||||
|
- exit price: `0.23526757499999998`
|
||||||
|
- realized pnl_pct: `-0.033056485743551446`
|
||||||
|
- realized net_pnl: `-1913.155101369921`
|
||||||
|
|
||||||
|
That realized stop corresponds to:
|
||||||
|
|
||||||
|
- price move against the short of about `3.3056%`
|
||||||
|
- account-level ROI of about `-2.726636%` using capital before exit (`70165.39`)
|
||||||
|
|
||||||
|
## Root cause
|
||||||
|
|
||||||
|
The bug is the restore path itself:
|
||||||
|
|
||||||
|
1. The open trade state was preserved in `trade_reconstruction`.
|
||||||
|
2. The current `position_state` snapshot was lossy or stale enough to rehydrate with `bars_held≈0`.
|
||||||
|
3. The chain token mismatch was detected, but the code explicitly continues with the derived token.
|
||||||
|
4. The engine therefore recovered continuity without enforcing strict equality between the live open chain and the reconstructed state.
|
||||||
|
|
||||||
|
That combination makes orphaned trades possible after a bad hydrate.
|
||||||
|
|
||||||
|
## Operational impact
|
||||||
|
|
||||||
|
- The XTZ short remained open in the ledger with no formal close.
|
||||||
|
- The engine later allowed a new trade while the slot should still have been occupied.
|
||||||
|
- Capital accounting diverged from the true live slot history.
|
||||||
|
- The restore path masked the inconsistency instead of stopping the recovery.
|
||||||
|
|
||||||
|
## Recommended fix direction
|
||||||
|
|
||||||
|
1. Treat a chain-token mismatch on restore as a hard failure for BLUE when a live open slot exists.
|
||||||
|
2. Preserve the original `entry_bar` and bar counter from the open-chain payload instead of reconstructing them from the current `position_state` row when the two disagree materially.
|
||||||
|
3. Refuse to admit a new trade until the single-slot invariant is proven flat.
|
||||||
|
4. Add a regression test for:
|
||||||
|
- open XTZ trade
|
||||||
|
- stale `position_state`
|
||||||
|
- chain-token mismatch
|
||||||
|
- no new trade admission while the open slot remains unresolved
|
||||||
|
|
||||||
|
## Bottom line
|
||||||
|
|
||||||
|
XTZ was a real open trade.
|
||||||
|
It never got a clean pre-hydrate exit.
|
||||||
|
The restore path tolerated chain drift and rebuilt a misleading open state.
|
||||||
|
The best-supported no-freeze outcome is the 17:24 retract, roughly flat to slightly negative.
|
||||||
|
The realized hydrated-path loss was `-3.3056485743551446%` on the position and `-2.726636%` of capital before exit, but that is a restore artifact, not the natural end of the original trade.
|
||||||
131
prod/docs/MALFORMED_OPEN_RESTORE_BUG.md
Normal file
131
prod/docs/MALFORMED_OPEN_RESTORE_BUG.md
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
# MALFORMED_OPEN_RESTORE_BUG
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
BLUE was repeatedly rehydrating after startup because `dolphin.position_state` contained stale `OPEN` rows with zero effective size.
|
||||||
|
|
||||||
|
The restore path treated those rows as fatal:
|
||||||
|
|
||||||
|
- it selected the latest `OPEN` row per `trade_id`
|
||||||
|
- it accepted that row even when `quantity` or `notional` had been driven to `0`
|
||||||
|
- it hard-stopped on `position_state row invalid quantity ...`
|
||||||
|
- `supervisord` then restarted the trader
|
||||||
|
- the next startup read the same bad row again
|
||||||
|
|
||||||
|
That created a restart loop.
|
||||||
|
|
||||||
|
This was observed most clearly on the `2026-06-11` BLUE window. The recurring bad row was the legacy `ATOMUSDT` leg `1a3d2f9c`, which was persisted as:
|
||||||
|
|
||||||
|
- `status = OPEN`
|
||||||
|
- `quantity = 0`
|
||||||
|
- `notional = 0`
|
||||||
|
- `bars_held = 34`
|
||||||
|
|
||||||
|
That row is not a live position. It is a stale snapshot that should have been treated as tombstoned history.
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
The bad rows were self-inflicted by the partial-retract path in `nautilus_event_trader.py`.
|
||||||
|
|
||||||
|
Before the fix:
|
||||||
|
|
||||||
|
1. `_apply_internal_retract()` shrank the live position.
|
||||||
|
2. It wrote a new `position_state` row with `status="OPEN"` for the remaining leg.
|
||||||
|
3. If the remaining size rounded to zero, the row still existed as an `OPEN` snapshot.
|
||||||
|
4. A later startup restore could pick that row and treat it as authoritative.
|
||||||
|
|
||||||
|
That is enough to leave behind `OPEN` rows with:
|
||||||
|
|
||||||
|
- `quantity = 0`
|
||||||
|
- `notional = 0`
|
||||||
|
|
||||||
|
These are not valid live positions, but they looked like one to the old restore logic.
|
||||||
|
|
||||||
|
There is a second contributing factor in the restore path:
|
||||||
|
|
||||||
|
- the restore code historically trusted the latest `OPEN` candidate too early
|
||||||
|
- zero-sized `OPEN` rows were only rejected after the row had already been chosen as the best candidate
|
||||||
|
- rejection used a hard failure path, which made the process exit instead of trying the next sane source
|
||||||
|
|
||||||
|
That means the persistence bug and the restore policy bug reinforced each other.
|
||||||
|
|
||||||
|
## Observable Symptoms
|
||||||
|
|
||||||
|
- repeated `restore candidate parse failed from capital_update_ledger: 'list' object has no attribute 'get'`
|
||||||
|
- repeated `position_state row invalid quantity for trade ...: 0.0`
|
||||||
|
- `RESTORE HALT`
|
||||||
|
- immediate restart by `supervisord`
|
||||||
|
|
||||||
|
The chain-token mismatch logs were a separate warning. They were not the restart trigger.
|
||||||
|
|
||||||
|
The capital-ledger parse warning is also distinct:
|
||||||
|
|
||||||
|
- it indicates the ledger file is list-shaped, not a dict
|
||||||
|
- it forces restore to rely more heavily on the other state surfaces
|
||||||
|
- it is noisy, but it is not what actually killed the process in this incident
|
||||||
|
|
||||||
|
## Fix Applied
|
||||||
|
|
||||||
|
Two changes were made.
|
||||||
|
|
||||||
|
### 1. Stop writing zero-sized `OPEN` rows
|
||||||
|
|
||||||
|
In `_apply_internal_retract()`:
|
||||||
|
|
||||||
|
- compute `remaining_qty`
|
||||||
|
- if the remaining size is effectively zero, treat the retract as a full close
|
||||||
|
- return the forced exit without emitting a new `position_state` row with `status="OPEN"`
|
||||||
|
|
||||||
|
This prevents the bad row from being created in the first place.
|
||||||
|
|
||||||
|
### 2. Make restore skip legacy bad `OPEN` rows
|
||||||
|
|
||||||
|
In `_restore_position_state()`:
|
||||||
|
|
||||||
|
- the ClickHouse restore query now filters `OPEN` rows with `quantity > 0 AND notional > 0`
|
||||||
|
- if an invalid candidate still appears, restore logs and rejects it instead of hard-halting the process
|
||||||
|
- restore falls back to HZ state or flat continuation rather than turning a stale row into a restart loop
|
||||||
|
|
||||||
|
This is important because the repository already contains stale history. The fix is not only to stop producing new malformed rows; it also has to prevent old rows from re-triggering the same failure path on the next reboot.
|
||||||
|
|
||||||
|
### 3. Keep the full-close path coherent
|
||||||
|
|
||||||
|
The retract path now computes `remaining_qty` explicitly and treats `remaining_notional <= 1e-9` or `remaining_qty <= 0.0` as a full close.
|
||||||
|
|
||||||
|
That means:
|
||||||
|
|
||||||
|
- a full retract does not leave a zero-size `OPEN` snapshot behind
|
||||||
|
- the exit is finalized as a close, not as a pseudo-open partial state
|
||||||
|
- the runtime slot is removed cleanly instead of being left in a half-closed limbo
|
||||||
|
|
||||||
|
## Verification Added
|
||||||
|
|
||||||
|
Regression tests were added for both sides:
|
||||||
|
|
||||||
|
- full-close retracts no longer emit zero-sized `OPEN` rows
|
||||||
|
- restore skips zero-sized `OPEN` candidates without setting `restore_failed`
|
||||||
|
|
||||||
|
The tests use the existing retract and restore harnesses:
|
||||||
|
|
||||||
|
- one test seeds a tiny short leg that collapses to zero on retract and asserts no `OPEN` zero-size row is written
|
||||||
|
- one test feeds a zero-sized `OPEN` `position_state` row into restore and asserts restore does not hard-halt
|
||||||
|
|
||||||
|
## Operational Impact
|
||||||
|
|
||||||
|
After this fix:
|
||||||
|
|
||||||
|
- stale zero-sized `OPEN` rows no longer restart BLUE
|
||||||
|
- malformed open snapshots are quarantined as legacy garbage
|
||||||
|
- the live runtime can continue from a sane source instead of bouncing on the same bad record
|
||||||
|
|
||||||
|
## What This Does Not Fix
|
||||||
|
|
||||||
|
This change does not rewrite historical ClickHouse rows already present in the warehouse.
|
||||||
|
|
||||||
|
It only changes:
|
||||||
|
|
||||||
|
- new retract writes
|
||||||
|
- restore selection and rejection policy
|
||||||
|
- restart behavior when the old garbage is encountered
|
||||||
|
|
||||||
|
If you want the historical ledger cleaned up, that is a separate reconciliation task. The current patch is intentionally conservative and only stops the bad row from causing further damage.
|
||||||
362
prod/docs/PINK_ACCOUNTING_EXEC_FIX.md
Normal file
362
prod/docs/PINK_ACCOUNTING_EXEC_FIX.md
Normal file
@@ -0,0 +1,362 @@
|
|||||||
|
# PINK / DITAv2 Accounting & Execution Fix — Spec and Dev Guide
|
||||||
|
|
||||||
|
**Status**: SPEC — ready for implementation agent
|
||||||
|
**Date**: 2026-06-11
|
||||||
|
**Branch**: `exp/pink-ditav2-sprint0-20260530` (continue on it or fork `fix/pink-accounting-consolidation`)
|
||||||
|
**Author of spec**: forensic session 2026-06-11 (FET −$5,990.90 mis-book replay)
|
||||||
|
**Prerequisite for**: VIOLET rebuild (`violet_subsecond_rebuild_plan` memory / future plan session)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Why this exists — the incident in one paragraph
|
||||||
|
|
||||||
|
On 2026-06-11 PINK closed a FET-USDT short that the exchange settled at
|
||||||
|
≈ **+$164 net** (entry VWAP 0.1878, exit 0.1866, ~202K FET) but the kernel
|
||||||
|
booked **−$5,990.90** and capital diverged −$6,154 from the exchange wallet.
|
||||||
|
Replay against `dolphin_pink.trade_reconstruction` slot images identified
|
||||||
|
three stacked defects, all in *derivation* code (none in exchange facts):
|
||||||
|
(1) fill events carried BingX's MARKET **protective bound price** (0.229,
|
||||||
|
+22% off tape) instead of the true fill price; (2) `realized_pnl()` and
|
||||||
|
`mark_price()` multiplied PnL by `slot.leverage` (exchange leverage — but
|
||||||
|
`slot.size` is exchange *quantity*, so every leg was 3× inflated); (3) the
|
||||||
|
Python settle baseline `_last_settled_pnl` resets empty on every restart,
|
||||||
|
so reconcile-adopted slots re-settle carried PnL. Exact replay of leg 1:
|
||||||
|
`26,007 × (0.229−0.1878)/0.1878 × 0.1878 × 3 = −3,214.4652` ✓ matches the
|
||||||
|
booked increment to the cent.
|
||||||
|
|
||||||
|
A fourth structural finding: there are **three parallel ledgers** (Rust
|
||||||
|
`AccountState` K/E, Python `AccountProjection` — the one persistence reads,
|
||||||
|
fee-blind — and `AccountProjectionV2`, dead in the live path). This spec
|
||||||
|
consolidates to **E-facts as ledger of record + K as integrity checksum +
|
||||||
|
one atomic published snapshot**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Scope and non-goals
|
||||||
|
|
||||||
|
IN SCOPE
|
||||||
|
1. Commit + activate the Phase-0 fixes already in the working tree.
|
||||||
|
2. E-anchored published capital; single atomic account snapshot.
|
||||||
|
3. Per-trade PnL provenance (`exchange | kernel_estimate`) end-to-end.
|
||||||
|
4. Sizer feedback off trade-realized PnL (not capital deltas).
|
||||||
|
5. Persistence hygiene: duplicate row emission, silent async-insert loss,
|
||||||
|
`event_seq` stamping, `bars_held` clamp, naive-UTC timestamps.
|
||||||
|
6. Kernel hardening leftovers: `resolve_slot` no-match sentinel,
|
||||||
|
FILL_SETTLED realized override of flagged estimate legs.
|
||||||
|
|
||||||
|
OUT OF SCOPE (separate tickets)
|
||||||
|
- BLUE's exit-path masking bug (LINK −$1,248, `TODO_TP_SCAN_CADENCE_BUGFIX.md`) — BLUE stack, not DITAv2.
|
||||||
|
- VIOLET fork, sub-second clock, venue price-feed port, cadence quantizer.
|
||||||
|
- ch_writer head-of-line poison-row parking redesign (mitigations land here;
|
||||||
|
the full parking-lane design is its own task).
|
||||||
|
- prefect.db / ClickHouse TTL disk remediation.
|
||||||
|
|
||||||
|
HARD INVARIANTS — MUST NOT CHANGE
|
||||||
|
- **Dual leverage**: `slot.size` = exchange quantity; `slot.leverage` =
|
||||||
|
exchange leverage (1–3x cap, set at BingX API); *our*-leverage
|
||||||
|
(conviction) = `size × entry_price / capital`, computed only at
|
||||||
|
`pink_direct._hz_publish` (line ~911). PnL is therefore **leverage-free**:
|
||||||
|
`qty × Δprice`, side-signed. Do not touch the conviction→exchange mapping
|
||||||
|
(`round_half_even_linear_0.5_to_9.0_to_1_to_exchange_cap`) or
|
||||||
|
`target_size` computation.
|
||||||
|
- **Exits are never skipped** (exec-router invariant set, §16 kernel ref).
|
||||||
|
- **BLUE-parity policy contract**: `DecisionEngine`/`IntentEngine` inputs
|
||||||
|
(MarketSnapshot + capital + slot state) unchanged in shape.
|
||||||
|
- **Namespace isolation**: zero writes to `dolphin.*` / `dolphin_prodgreen.*`
|
||||||
|
or BLUE/PRODGREEN HZ maps. Re-verify with `pink_ctl.py mode-verify`.
|
||||||
|
- **Data cadences are sacred** (operator rule 2026-06-10): never reduce a
|
||||||
|
data cadence for throughput.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Phase 0 — Commit and activate the already-applied fixes
|
||||||
|
|
||||||
|
These changes exist UNCOMMITTED in the working tree as of 2026-06-11 ~16:30.
|
||||||
|
Verify each hunk, commit as one reviewed unit, then restart `dolphin_pink`.
|
||||||
|
|
||||||
|
### 0.1 `prod/clean_arch/dita_v2/_rust_kernel/src/lib.rs`
|
||||||
|
| Function | Change (already applied) |
|
||||||
|
|---|---|
|
||||||
|
| `KernelCore::realized_pnl` (~line 1153) | PnL = side-signed `qty × (exit − entry)`; **no leverage factor**; returns 0 when `entry<=0 ∨ exit_size<=0 ∨ exit_price<=0 ∨ !finite` |
|
||||||
|
| `TradeSlot::mark_price` (~line 394) | no `× leverage` in unrealized; a mark NEVER becomes entry basis — missing basis flags `metadata.entry_basis_missing=true`, unrealized stays 0 |
|
||||||
|
| `KernelCore::fill_matches_order` (new) | identity match on `venue_order_id` / `venue_client_id` |
|
||||||
|
| `KernelCore::apply_fill` | entry/exit routing by ORDER IDENTITY first, FSM state second (`!id_matches_exit` / `!id_matches_entry` guards); entry basis = **VWAP across entry fills** (`(prev_basis×prev_filled + price×fill)/accumulated`); price-less exit fill reduces size, books 0 PnL, flags `metadata.realized_skipped_no_price=true` |
|
||||||
|
|
||||||
|
Rebuild required: `cargo build --release` in `_rust_kernel/` (the `.so` is
|
||||||
|
only auto-built when missing — **source/binary drift is a known hazard**;
|
||||||
|
add the build to the commit checklist). `cargo test`: 32/32 green as of spec.
|
||||||
|
|
||||||
|
### 0.2 `prod/clean_arch/dita_v2/bingx_venue.py`
|
||||||
|
Fill events must carry a TRUE fill price or 0.0 — never the order's nominal
|
||||||
|
`price` / submit `receipt.price` (BingX MARKET bound price, ±20–25%):
|
||||||
|
- `_events_from_submit` fill event (~line 585): `_row_float(ack_row,
|
||||||
|
"avgPrice","ap","lastFillPrice","L", default=0.0)`
|
||||||
|
- `_event_from_row` (~line 697): fills use the same true-price chain;
|
||||||
|
non-fill events (ACK/CANCEL/REJECT) may keep nominal `price` as info
|
||||||
|
- `_fill_event_from_row` (~line 736): `"lastFillPrice","L","avgPrice","ap"`
|
||||||
|
|
||||||
|
### 0.3 `prod/clean_arch/dita_v2/rust_backend.py`
|
||||||
|
- `reconcile_from_slots`: seeds `_last_settled_pnl[slot_id] = slot.realized_pnl`
|
||||||
|
and `_slot_was_closed[slot_id] = slot.closed` for every adopted slot.
|
||||||
|
- `restore_state`: same re-anchoring after successful restore.
|
||||||
|
|
||||||
|
### 0.4 Adjacent fixes riding the same commit
|
||||||
|
- `prod/ch_writer.py`: insert URLs append `&date_time_input_format=best_effort`;
|
||||||
|
flush errors log at WARNING (first 10 + every 100th), counter `_flush_errors`.
|
||||||
|
- `prod/clean_arch/dita_v2/blue_parity.py` `price_of`: hyphen-tolerant
|
||||||
|
fallback (`FET-USDT` → `FETUSDT`) — fixes the unmanaged-position block.
|
||||||
|
- `prod/clickhouse/users.xml`: `date_time_input_format=best_effort` for the
|
||||||
|
`dolphin` user (NOTE: running CH container did not honor it even after
|
||||||
|
restart — the container does not mount compose configs; effective on next
|
||||||
|
compose recreation. The client-side URL param is the operative fix.)
|
||||||
|
- `prod/tests/test_dita_v2_kernel.py`: partial→full fill test updated to
|
||||||
|
incremental `filled_size` semantics (BingX WS `lastFilledQty`).
|
||||||
|
|
||||||
|
### 0.5 Phase 0 gates
|
||||||
|
1. `cargo test` in `_rust_kernel`: 32/32.
|
||||||
|
2. `pytest prod/tests/test_dita_v2_kernel.py`: 7/7.
|
||||||
|
3. `pytest prod/clean_arch/dita_v2/test_exec_router_runtime.py
|
||||||
|
test_venue_reconcile.py test_orphan_prevention.py
|
||||||
|
prod/tests/test_pink_async_fill_pump.py
|
||||||
|
prod/clean_arch/dita_v2/test_account_core_v2.py test_bingx_bugs.py`: 134/134.
|
||||||
|
4. KNOWN pre-existing failures (NOT introduced by this work — verified by
|
||||||
|
hunk-revert): 4 tests in `prod/tests/test_dita_v2_bingx_adapter.py`
|
||||||
|
(snapshot-fill emission broke when sync `submit()` started passing None
|
||||||
|
snapshots on 2026-06-10). Fix or quarantine them explicitly in this phase
|
||||||
|
— do not let them mask new regressions.
|
||||||
|
5. Restart `dolphin_pink` at a FLAT moment; verify in logs: no
|
||||||
|
`realized_skipped_no_price` storms, no `entry_basis_missing` on fresh
|
||||||
|
entries, first round-trip books PnL within ±(fees+slippage) of
|
||||||
|
`GET /openApi/swap/v2/user/income` for the same trade.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Phase 1 — E-anchored published capital
|
||||||
|
|
||||||
|
**Goal**: the capital that persistence/HZ/sizer see is exchange-anchored;
|
||||||
|
K never publishes.
|
||||||
|
|
||||||
|
### 3.1 `prod/clean_arch/dita_v2/account.py`
|
||||||
|
- Add to `AccountSnapshot`: `capital_source: str` (`"e_anchored" |
|
||||||
|
"k_bridged" | "seed"`), `e_wallet_balance: float`, `event_seq: int`.
|
||||||
|
- New method `AccountProjection.anchor_to_exchange(wallet_balance: float,
|
||||||
|
available_margin: float, event_seq: int)`: sets `capital = wallet_balance`
|
||||||
|
(guard `>0` and finite — the zero-wb frame lesson), `capital_source =
|
||||||
|
"e_anchored"`, recomputes equity. `settle()` remains for the BRIDGE case
|
||||||
|
only: between anchors, capital += realized (`capital_source="k_bridged"`).
|
||||||
|
- `settle(realized_pnl, fees)`: **stop ignoring fees** — `capital +=
|
||||||
|
realized_pnl − fees` (today fees only accumulate in `fees_paid`; published
|
||||||
|
capital ignores them between reseeds).
|
||||||
|
|
||||||
|
### 3.2 `prod/clean_arch/runtime/pink_direct.py`
|
||||||
|
- The existing reseed path (balance-bearing ACCOUNT_UPDATE →
|
||||||
|
`kernel.reset_and_seed(wb)`) additionally calls
|
||||||
|
`kernel.account.anchor_to_exchange(...)` — one anchoring action, two
|
||||||
|
ledgers consistent.
|
||||||
|
- Boot seed (launcher `exchange_balance_capital` block, pink_direct ~line
|
||||||
|
262) goes through `anchor_to_exchange` instead of direct attribute writes.
|
||||||
|
|
||||||
|
### 3.3 Gates
|
||||||
|
- New unit tests (`prod/tests/test_pink_account_anchor.py`):
|
||||||
|
anchor sets capital/source; zero/negative/NaN wb rejected; settle bridges
|
||||||
|
with fees; anchor after bridge snaps to wb exactly.
|
||||||
|
- Shadow check (live, 24 h on VST): published capital vs
|
||||||
|
`GET /openApi/swap/v2/user/balance` polled 1/min — max |Δ| outside a
|
||||||
|
trade-settlement window ≤ $0.01; during settlement ≤ pending-fee bound.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Phase 2 — Single atomic snapshot, ledger consolidation
|
||||||
|
|
||||||
|
**Goal**: one immutable, versioned account snapshot; the two redundant
|
||||||
|
ledgers demoted/removed.
|
||||||
|
|
||||||
|
### 4.1 `prod/clean_arch/dita_v2/account.py`
|
||||||
|
- Make the published snapshot **immutable-replace**: `AccountProjection`
|
||||||
|
builds a new frozen `AccountSnapshot` (carry `event_seq`) on every
|
||||||
|
mutation and swaps a single reference (GIL-atomic). Readers must take
|
||||||
|
`snap = kernel.account.snapshot` once per use (audit call sites:
|
||||||
|
`pink_clickhouse.py`, `hazelcast_projection.py` HZ writer, `pink_direct`).
|
||||||
|
- `AccountProjectionV2`: DELETE, or move to `prod/clean_arch/dita_v2/
|
||||||
|
_attic/` with a module docstring pointing here. Its only live-path import
|
||||||
|
is `exchange_event.py` — migrate that import or the dataclasses it uses
|
||||||
|
(`EPosition` is genuinely useful; keep it in `account.py`).
|
||||||
|
- The Rust `AccountState` K-ledger STAYS — demoted by documentation and by
|
||||||
|
Phase 1 (it no longer feeds published capital): its jobs are reconcile
|
||||||
|
classification (R1-style), `capital_frozen`, and E-dark bridging. Update
|
||||||
|
the module docstring to say exactly this.
|
||||||
|
|
||||||
|
### 4.2 `prod/clean_arch/persistence/pink_clickhouse.py`
|
||||||
|
- Read capital/equity/peak/trade_seq from the single snapshot reference;
|
||||||
|
no recomputation.
|
||||||
|
- Add columns to emitted rows (and the matching `ALTER TABLE` DDLs under
|
||||||
|
`prod/clickhouse/pink/08_provenance.sql` — **apply DDLs to CH BEFORE
|
||||||
|
deploying code that emits them**; the missing-table head-of-line jam of
|
||||||
|
2026-06-11 is the cautionary tale):
|
||||||
|
- `account_events`, `status_snapshots`: `capital_source LowCardinality(String) DEFAULT ''`,
|
||||||
|
`account_event_seq UInt64 DEFAULT 0`
|
||||||
|
- `trade_events`, `trade_exit_legs`: `pnl_source LowCardinality(String) DEFAULT ''`
|
||||||
|
(`exchange` | `kernel_estimate`)
|
||||||
|
- `bars_held`: clamp to `max(0, …)` at row-build time (UInt16 column;
|
||||||
|
negative values currently 400 on trade_events / silently vanish on
|
||||||
|
async tables).
|
||||||
|
- Timestamps: route every `ts` through one helper emitting **naive-UTC
|
||||||
|
microsecond ISO** (no `+00:00`) — best_effort already tolerates both, but
|
||||||
|
rows must stop depending on a parser setting.
|
||||||
|
|
||||||
|
### 4.3 Duplicate-emission fix (same file)
|
||||||
|
Every CH row is currently emitted twice (visible in any query). Hunt the
|
||||||
|
double call: instrument `_sink()` with a per-(table, content-hash) debug
|
||||||
|
counter in a test, then trace the two call paths (suspect: `persist_result`
|
||||||
|
invoked both from the runtime step and from the fill pump for the same
|
||||||
|
event). Fix at the caller level; do NOT dedupe by content in the sink
|
||||||
|
(masks real double-events). Regression test: one simulated round trip →
|
||||||
|
exactly one row per logical event per table.
|
||||||
|
|
||||||
|
### 4.4 `prod/ch_writer.py`
|
||||||
|
- `wait_for_async_insert`: `"1"` for ALL `dolphin_pink` tables (accounting
|
||||||
|
rows must never be silently lost; the spool absorbs latency). Keep `0`
|
||||||
|
acceptable only for high-volume shadow tables if measured necessary —
|
||||||
|
document any exception inline.
|
||||||
|
- Mitigation for head-of-line (full redesign out of scope): after
|
||||||
|
`attempts > 1000` on a row, log ERROR with the CH response body once per
|
||||||
|
100 attempts (today the reject reason is invisible without manual replay).
|
||||||
|
|
||||||
|
### 4.5 Gates
|
||||||
|
- Full offline suite (the 533+ DITAv2/PINK set) green, minus the Phase-0
|
||||||
|
quarantined adapter tests if still open.
|
||||||
|
- One live VST round trip: every table gets exactly one row per event;
|
||||||
|
`pnl_source`/`capital_source` populated; CH `system.text_log` shows zero
|
||||||
|
parse rejections for `dolphin_pink`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Phase 3 — Sizer feedback off trade-realized PnL
|
||||||
|
|
||||||
|
**THE one seam where this refactor can silently change alpha behavior.**
|
||||||
|
|
||||||
|
### 5.1 `prod/clean_arch/runtime/pink_direct.py` — `_sizer_trade_feedback` (~line 1453)
|
||||||
|
Today: `pnl = acc.capital − self._sizer_entry_capital` (capital delta).
|
||||||
|
Under E-anchored capital this absorbs funding, fees of other activity, and
|
||||||
|
**foreign fills from the shared VST account** (PRODGREEN collision class).
|
||||||
|
Change to:
|
||||||
|
```
|
||||||
|
pnl = slot_realized_for_trade(trade_id) # Σ slot.realized_pnl legs, i.e.
|
||||||
|
# kernel estimate, overridden by
|
||||||
|
# exchange rp when settled (5.2)
|
||||||
|
```
|
||||||
|
Source: the closing slot dict already carries `realized_pnl`; use it (minus
|
||||||
|
the fees recorded for the trade when available) instead of the capital
|
||||||
|
delta. Keep the magnitude semantics the sizer expects (sign + rough size —
|
||||||
|
per the existing comment, bucket/streak multipliers only need that).
|
||||||
|
|
||||||
|
### 5.2 Exchange override (E-led repair) — `bingx_user_stream.py` + `rust_backend.py`
|
||||||
|
- The WS `FILL_SETTLED` path already carries the exchange's realized (`rp`)
|
||||||
|
and fee (`n`, sign-flipped at boundary per BingX quirks memory). Extend
|
||||||
|
the kernel account-event payload with `trade_id`, and on receipt:
|
||||||
|
- if the matching slot leg was flagged `realized_skipped_no_price`,
|
||||||
|
ADD the exchange realized to `slot.realized_pnl` (repair) and clear
|
||||||
|
the flag; settle the increment through the normal baseline mechanism;
|
||||||
|
- else record `pnl_source="exchange"` for the trade-event row (the
|
||||||
|
estimate stays as the booked figure unless |estimate−rp| exceeds a
|
||||||
|
tolerance — then log ERROR + emit an `anomaly_events` row; do NOT
|
||||||
|
silently re-book).
|
||||||
|
- Rust: add `dita_kernel_repair_realized(slot_id, amount)` FFI (or fold the
|
||||||
|
repair into `on_account_event` with `slot_id` in payload). Keep it
|
||||||
|
idempotent via the existing account-event dedup.
|
||||||
|
|
||||||
|
### 5.3 Gates
|
||||||
|
- Unit: feedback receives trade-realized, not capital delta (simulate a
|
||||||
|
foreign-fill capital jump mid-trade → feedback unaffected).
|
||||||
|
- Unit: price-less exit leg + later FILL_SETTLED repair → slot realized
|
||||||
|
equals exchange `rp`; settle baseline consistent (no double-settle).
|
||||||
|
- Parity: `test_blue_parity.py`, `test_alpha_blue_untouched_g7.py` green
|
||||||
|
(sizer behavior unchanged for normal fills).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Phase 4 — Kernel hardening leftovers
|
||||||
|
|
||||||
|
### 6.1 `lib.rs` — `resolve_slot` (~line 1099)
|
||||||
|
Falls back to **slot 0** when nothing matches. Change: return
|
||||||
|
`Option<usize>`; on `None`, `on_venue_event` returns
|
||||||
|
`UNRESOLVED_SLOT` (diagnostic exists already) without mutating any slot,
|
||||||
|
severity WARNING, event recorded in outcome details. Python callers: the
|
||||||
|
runtime treats UNRESOLVED_SLOT as a logged no-op (the `_fill_is_ours`
|
||||||
|
filter remains first-line defense; this is kernel-side defense for
|
||||||
|
venue-agnostic reuse).
|
||||||
|
NOTE: several tests construct events with `slot_id=-1` expecting slot-0
|
||||||
|
fallback — update them to pass explicit `slot_id=0` (behavioral test
|
||||||
|
change; list each in the PR description).
|
||||||
|
|
||||||
|
### 6.2 ID-less fill routing (documentation + metric, not code)
|
||||||
|
BingX WS omits clientOrderId, so identity routing can't always engage.
|
||||||
|
Add a counter metric (`fills_routed_by_state_total`) via an
|
||||||
|
`anomaly_events` row per occurrence, severity INFO — gives VIOLET the data
|
||||||
|
to justify per-venue synthetic ids later. No FSM behavior change.
|
||||||
|
|
||||||
|
### 6.3 Gates
|
||||||
|
- New Rust tests: unresolved event mutates nothing; entry-id fill during
|
||||||
|
EXIT_WORKING routes to entry (already covered by Phase-0 routing — add
|
||||||
|
the explicit case); price-less exit leg books 0 + flag.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Test matrix (run-order for the implementing agent)
|
||||||
|
|
||||||
|
| Stage | Command (env: `PYTHONPATH=/mnt/dolphinng5_predict:/mnt/dolphinng5_predict/nautilus_dolphin`, venv `/home/dolphin/siloqy_env/bin/python3`) | Pass bar |
|
||||||
|
|---|---|---|
|
||||||
|
| Rust unit | `cargo test --release` in `_rust_kernel/` | 100% |
|
||||||
|
| Kernel FSM | `pytest prod/tests/test_dita_v2_kernel.py` | 100% |
|
||||||
|
| Bridge/accounting | `pytest prod/tests/test_pink_ditav2_kernel_bridge.py test_pink_ditav2_accounting_invariants.py prod/clean_arch/dita_v2/test_account_core_v2.py` | 100% |
|
||||||
|
| Runtime/reconcile | `pytest prod/clean_arch/dita_v2/test_venue_reconcile.py test_orphan_prevention.py test_exec_router_runtime.py prod/tests/test_pink_async_fill_pump.py test_pink_direct_runtime.py` | 100% |
|
||||||
|
| Chaos | `pytest prod/tests/test_pink_ditav2_chaos_harness.py` + `test_dita_v2_e2e_functional.py` | 100% |
|
||||||
|
| Parity | `pytest prod/clean_arch/dita_v2/test_blue_parity.py test_alpha_blue_untouched_g7.py` | 100% |
|
||||||
|
| Adapter | `pytest prod/tests/test_dita_v2_bingx_adapter.py` | 100% after Phase-0 item 4 resolution |
|
||||||
|
| LIVE VST E2E | `python prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT` | suite green |
|
||||||
|
| **Golden replays (NEW — write these)** | `prod/tests/test_pink_accounting_golden.py` | see below |
|
||||||
|
| Shadow soak | 24–48 h on VST | capital vs balance ≤ $0.01 idle |
|
||||||
|
|
||||||
|
### Golden replay tests (the heart of the acceptance)
|
||||||
|
Feed the kernel the recorded FET event sequence (entry fills 195,259 +
|
||||||
|
7,017 @ 0.1878; exit fills 26,007 + remainder; the poisoned variant with
|
||||||
|
price=0.229 and the clean variant with 0.1866):
|
||||||
|
1. Clean prices → realized = `(0.1878−0.1866) × 202,276 ≈ +242.7` gross.
|
||||||
|
2. Poisoned price (0.229) reaching the kernel anyway → with the adapter fix
|
||||||
|
it must arrive as 0.0 → leg books 0 + `realized_skipped_no_price`; after
|
||||||
|
synthetic FILL_SETTLED rp=+164 → slot realized = +164, `pnl_source=exchange`.
|
||||||
|
3. Restart mid-position (save_state/restore_state + reconcile_from_slots)
|
||||||
|
→ next venue event settles ONLY the incremental PnL.
|
||||||
|
4. VWAP: two entry fills at different prices → basis = weighted average.
|
||||||
|
5. Dual-leverage invariant: same fills at exchange-leverage 1 vs 3 →
|
||||||
|
**identical realized PnL**; only margin fields differ.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Rollout & rollback
|
||||||
|
|
||||||
|
1. Each phase = one PR-sized commit, gates green before the next.
|
||||||
|
2. Activation requires `supervisorctl restart dolphin_pink` — restart at a
|
||||||
|
FLAT moment (check `DOLPHIN_STATE_PINK` + exchange positions). The
|
||||||
|
restart-reconcile path is itself under test here; first restart after
|
||||||
|
Phase 0 should be watched live.
|
||||||
|
3. Rollback = `git revert` of the phase commit + rebuild `.so` + restart.
|
||||||
|
The Rust `.so` MUST be rebuilt on both apply and revert — stale-binary
|
||||||
|
drift is how the incremental-fill change sat uncompiled until 2026-06-11.
|
||||||
|
4. CH DDLs are additive (`ADD COLUMN ... DEFAULT`) — no destructive
|
||||||
|
migrations anywhere in this spec; rollback leaves unused columns, which
|
||||||
|
is fine.
|
||||||
|
5. PINK is VST (virtual funds) — it is the canary by construction. Nothing
|
||||||
|
in this spec touches BLUE files (verify with `git diff --name-only`
|
||||||
|
against the §38.7 checklist).
|
||||||
|
|
||||||
|
## 9. Done criteria (the whole spec)
|
||||||
|
|
||||||
|
- All phases merged; full matrix green; golden replays green.
|
||||||
|
- 48 h VST soak: zero UNEXPLAINED reconcile errors; published capital
|
||||||
|
tracks exchange balance; every closed trade's `trade_events.pnl` within
|
||||||
|
fees+slippage of the exchange income record, with `pnl_source` populated.
|
||||||
|
- `pink_ctl.py mode-verify` passes (namespace isolation intact).
|
||||||
|
- SYSTEM BIBLE §38 addendum updated (one paragraph: E-led ledger, K as
|
||||||
|
checksum, provenance fields) + `DITA_V2_KERNEL_REFERENCE.md` §"Capital
|
||||||
|
simplification" rewritten to match reality.
|
||||||
2978
prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md
Normal file
2978
prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user