From c3a18f693a031e981d5b5e87408c2c2632b4b34f Mon Sep 17 00:00:00 2001 From: Codex Date: Fri, 12 Jun 2026 15:04:15 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20VIBRISS=20spec=20(+=20=C2=A710.6=20casc?= =?UTF-8?q?ade/adaptive-TP=20paramsets),=20PINK=20accounting=20fix=20spec,?= =?UTF-8?q?=20BLUE=20incident=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit VIBRISS_PARAMETER_GOVERNANCE_SPEC §10.6: ob_cascade.count_threshold (currently cascade_count>0 = ONE asset widens every TP x1.40), tp_widen_factor, withdrawal_velocity_threshold as governance candidates; adaptive/Dynamic-TP threshold marked fit for VIBRISS governance; TP_FLOOR joint-policy reward requirement. Co-Authored-By: Claude Fable 5 --- ...TICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md | 182 + prod/docs/MALFORMED_OPEN_RESTORE_BUG.md | 131 + prod/docs/PINK_ACCOUNTING_EXEC_FIX.md | 362 ++ .../docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md | 2978 +++++++++++++++++ 4 files changed, 3653 insertions(+) create mode 100644 prod/docs/CRITICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md create mode 100644 prod/docs/MALFORMED_OPEN_RESTORE_BUG.md create mode 100644 prod/docs/PINK_ACCOUNTING_EXEC_FIX.md create mode 100644 prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md diff --git a/prod/docs/CRITICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md b/prod/docs/CRITICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md new file mode 100644 index 0000000..1513851 --- /dev/null +++ b/prod/docs/CRITICAL_VIOLET_DESIGN__BLUE_HYDRATION_BUG.md @@ -0,0 +1,182 @@ +# Critical Violet Design: BLUE hydration bug + +Date: 2026-06-11 + +## Summary + +This incident is a BLUE hydration / restore bug on the XTZUSDT short trade `863c21da`. + +The important facts are: + +1. The XTZ trade was real and opened at `2026-06-11 17:22:12.678265+00:00`. +2. The trade did **not** close via TP, SL, or MAX_HOLD before hydration. +3. The restore path later rebuilt the open slot from `position_state` and `trade_reconstruction`. +4. The restored state had a chain-token mismatch, but the engine continued with the derived token instead of hard-failing. +5. A later hydrate-time stop was recorded at `2026-06-11 18:35:52.789008+00:00` with `STOP_LOSS`. +6. The ledger shows the next trade was admitted while XTZ was still officially open, which violates the single-slot invariant. + +## Trade identity + +- trade_id: `863c21da` +- asset: `XTZUSDT` +- side: `SHORT` +- entry price: `0.2276` +- entry notional: `56484.4305702418` +- leverage: `6.374647927191287` +- entry bar: `238` +- tp_base_pct: `0.002` +- tp_effective_pct: `0.0019999655500463724` + +## Ledger evidence + +### Open record + +`dolphin.trade_reconstruction` contains the canonical open record: + +- ts: `2026-06-11 17:22:12.911989` +- event_type: `OPEN` +- event_id: `863c21da:open` +- chain_token: `26852fa25fb5cdaa3b4c354d5e3eea93e27bce0ebdcd0da896d4f981642eeeb2` + +The payload confirms: + +- `entry_ts = 1781198532678265` +- `entry_bar = 238` +- `retraction_legs = 0` +- `realized_pnl_legs_total = 0.0` +- `chain_mode = LIVE` +- `chain_kind = ROOT` + +### No close before hydrate + +`dolphin.trade_exit_legs` has no rows for `863c21da`. + +`dolphin.trade_events` also has no close row for `863c21da`. + +So there is no official TP, SL, or MAX_HOLD exit recorded before the restore/hydration event. + +### Decision tape before hydrate + +`dolphin.v7_decision_events` shows the trade was live and being evaluated: + +- `2026-06-11 17:22:13.274556` `HOLD` +- `2026-06-11 17:22:23.124863` `HOLD` +- `2026-06-11 17:22:45.232894` `HOLD` +- `2026-06-11 17:23:28.274004` `HOLD` +- `2026-06-11 17:24:43.182413` `RETRACT / V7_RISK_DOMINANT` + +The best favorable excursion in the pre-hydrate tape was only about `+0.065905094%`, which is far below the fixed TP threshold. + +## Restore / hydration behavior + +At restore time the engine logged: + +- `chain token mismatch on restore: trade=863c21da stored=26852fa25fb5 derived=98875e225e9e — continuing with derived token` +- `position_state RESTORED: XTZUSDT SHORT entry=0.2276 notional=56484 bars_held≈0 trade=863c21da` + +The restore path in [`prod/nautilus_event_trader.py`](../nautilus_event_trader.py) does the following: + +- reads `position_state` +- reconstructs `restored_entry_bar = max(0, self.bar_idx - stored_bars)` +- loads reconstruction data from `dolphin.trade_reconstruction` +- rebuilds chain state from the persisted payload +- if the stored chain token differs from the derived token, it logs the mismatch and continues with the derived token + +Relevant code: + +- `_chain_state_from_reconstruction(...)` around lines `3315-3348` +- restore from `position_state` around lines `1944-2058` + +This is a validator, not a hard guardrail. + +## Single-slot violation + +The next distinct open trade in the reconstruction ledger is: + +- ts: `2026-06-11 17:50:50.420620` +- trade_id: `43494ade` +- asset: `TRXUSDT` +- side: `SHORT` + +That means the system admitted a new trade while XTZ was still officially open in the ledger. + +On a single-slot engine, that should not happen. + +## What would have happened without hydration + +This is the conservative conclusion from the tape: + +- The trade did not hit TP on the observed pre-hydrate tape. +- The trade did not have an official close row before hydration. +- The tape does not contain a clean uninterrupted decision path beyond the first pre-hydrate window. + +The best-supported natural outcome from the observed tape is the live `RETRACT` state at `2026-06-11 17:24:43.182413`, where the engine still considered the slot active and the trade had only reached `bars_held = 14`. + +At that point: + +- `current_price = 0.22765000000000002` +- `pnl_pct = -0.021968365` +- `reason = V7_RISK_DOMINANT` + +If that retract state had been executed immediately, the estimated trade PnL would have been: + +- `-12.4087058758423` USDT on the recorded notional +- trade ROI: `-0.021968365%` + +The max-hold clock also would have forced a decision long before the 18:35 restore: + +- trade-specific `market_state_max_hold_bars = 102` +- live tape reached `bars_held = 14` by `17:24:43` +- at an ~11 second cadence, the max-hold boundary would have arrived around `17:40-17:41` + +So the 18:35 stop-loss is not the natural continuation of the original entry. It is a restore-time artifact on top of a stale open slot. + +What is observable is the hydrated-path close that actually got booked: + +- exit ts: `2026-06-11 18:35:52.789008+00:00` +- exit reason: `STOP_LOSS` +- exit price: `0.23526757499999998` +- realized pnl_pct: `-0.033056485743551446` +- realized net_pnl: `-1913.155101369921` + +That realized stop corresponds to: + +- price move against the short of about `3.3056%` +- account-level ROI of about `-2.726636%` using capital before exit (`70165.39`) + +## Root cause + +The bug is the restore path itself: + +1. The open trade state was preserved in `trade_reconstruction`. +2. The current `position_state` snapshot was lossy or stale enough to rehydrate with `bars_held≈0`. +3. The chain token mismatch was detected, but the code explicitly continues with the derived token. +4. The engine therefore recovered continuity without enforcing strict equality between the live open chain and the reconstructed state. + +That combination makes orphaned trades possible after a bad hydrate. + +## Operational impact + +- The XTZ short remained open in the ledger with no formal close. +- The engine later allowed a new trade while the slot should still have been occupied. +- Capital accounting diverged from the true live slot history. +- The restore path masked the inconsistency instead of stopping the recovery. + +## Recommended fix direction + +1. Treat a chain-token mismatch on restore as a hard failure for BLUE when a live open slot exists. +2. Preserve the original `entry_bar` and bar counter from the open-chain payload instead of reconstructing them from the current `position_state` row when the two disagree materially. +3. Refuse to admit a new trade until the single-slot invariant is proven flat. +4. Add a regression test for: + - open XTZ trade + - stale `position_state` + - chain-token mismatch + - no new trade admission while the open slot remains unresolved + +## Bottom line + +XTZ was a real open trade. +It never got a clean pre-hydrate exit. +The restore path tolerated chain drift and rebuilt a misleading open state. +The best-supported no-freeze outcome is the 17:24 retract, roughly flat to slightly negative. +The realized hydrated-path loss was `-3.3056485743551446%` on the position and `-2.726636%` of capital before exit, but that is a restore artifact, not the natural end of the original trade. diff --git a/prod/docs/MALFORMED_OPEN_RESTORE_BUG.md b/prod/docs/MALFORMED_OPEN_RESTORE_BUG.md new file mode 100644 index 0000000..a9430d1 --- /dev/null +++ b/prod/docs/MALFORMED_OPEN_RESTORE_BUG.md @@ -0,0 +1,131 @@ +# MALFORMED_OPEN_RESTORE_BUG + +## Summary + +BLUE was repeatedly rehydrating after startup because `dolphin.position_state` contained stale `OPEN` rows with zero effective size. + +The restore path treated those rows as fatal: + +- it selected the latest `OPEN` row per `trade_id` +- it accepted that row even when `quantity` or `notional` had been driven to `0` +- it hard-stopped on `position_state row invalid quantity ...` +- `supervisord` then restarted the trader +- the next startup read the same bad row again + +That created a restart loop. + +This was observed most clearly on the `2026-06-11` BLUE window. The recurring bad row was the legacy `ATOMUSDT` leg `1a3d2f9c`, which was persisted as: + +- `status = OPEN` +- `quantity = 0` +- `notional = 0` +- `bars_held = 34` + +That row is not a live position. It is a stale snapshot that should have been treated as tombstoned history. + +## Root Cause + +The bad rows were self-inflicted by the partial-retract path in `nautilus_event_trader.py`. + +Before the fix: + +1. `_apply_internal_retract()` shrank the live position. +2. It wrote a new `position_state` row with `status="OPEN"` for the remaining leg. +3. If the remaining size rounded to zero, the row still existed as an `OPEN` snapshot. +4. A later startup restore could pick that row and treat it as authoritative. + +That is enough to leave behind `OPEN` rows with: + +- `quantity = 0` +- `notional = 0` + +These are not valid live positions, but they looked like one to the old restore logic. + +There is a second contributing factor in the restore path: + +- the restore code historically trusted the latest `OPEN` candidate too early +- zero-sized `OPEN` rows were only rejected after the row had already been chosen as the best candidate +- rejection used a hard failure path, which made the process exit instead of trying the next sane source + +That means the persistence bug and the restore policy bug reinforced each other. + +## Observable Symptoms + +- repeated `restore candidate parse failed from capital_update_ledger: 'list' object has no attribute 'get'` +- repeated `position_state row invalid quantity for trade ...: 0.0` +- `RESTORE HALT` +- immediate restart by `supervisord` + +The chain-token mismatch logs were a separate warning. They were not the restart trigger. + +The capital-ledger parse warning is also distinct: + +- it indicates the ledger file is list-shaped, not a dict +- it forces restore to rely more heavily on the other state surfaces +- it is noisy, but it is not what actually killed the process in this incident + +## Fix Applied + +Two changes were made. + +### 1. Stop writing zero-sized `OPEN` rows + +In `_apply_internal_retract()`: + +- compute `remaining_qty` +- if the remaining size is effectively zero, treat the retract as a full close +- return the forced exit without emitting a new `position_state` row with `status="OPEN"` + +This prevents the bad row from being created in the first place. + +### 2. Make restore skip legacy bad `OPEN` rows + +In `_restore_position_state()`: + +- the ClickHouse restore query now filters `OPEN` rows with `quantity > 0 AND notional > 0` +- if an invalid candidate still appears, restore logs and rejects it instead of hard-halting the process +- restore falls back to HZ state or flat continuation rather than turning a stale row into a restart loop + +This is important because the repository already contains stale history. The fix is not only to stop producing new malformed rows; it also has to prevent old rows from re-triggering the same failure path on the next reboot. + +### 3. Keep the full-close path coherent + +The retract path now computes `remaining_qty` explicitly and treats `remaining_notional <= 1e-9` or `remaining_qty <= 0.0` as a full close. + +That means: + +- a full retract does not leave a zero-size `OPEN` snapshot behind +- the exit is finalized as a close, not as a pseudo-open partial state +- the runtime slot is removed cleanly instead of being left in a half-closed limbo + +## Verification Added + +Regression tests were added for both sides: + +- full-close retracts no longer emit zero-sized `OPEN` rows +- restore skips zero-sized `OPEN` candidates without setting `restore_failed` + +The tests use the existing retract and restore harnesses: + +- one test seeds a tiny short leg that collapses to zero on retract and asserts no `OPEN` zero-size row is written +- one test feeds a zero-sized `OPEN` `position_state` row into restore and asserts restore does not hard-halt + +## Operational Impact + +After this fix: + +- stale zero-sized `OPEN` rows no longer restart BLUE +- malformed open snapshots are quarantined as legacy garbage +- the live runtime can continue from a sane source instead of bouncing on the same bad record + +## What This Does Not Fix + +This change does not rewrite historical ClickHouse rows already present in the warehouse. + +It only changes: + +- new retract writes +- restore selection and rejection policy +- restart behavior when the old garbage is encountered + +If you want the historical ledger cleaned up, that is a separate reconciliation task. The current patch is intentionally conservative and only stops the bad row from causing further damage. diff --git a/prod/docs/PINK_ACCOUNTING_EXEC_FIX.md b/prod/docs/PINK_ACCOUNTING_EXEC_FIX.md new file mode 100644 index 0000000..5b94709 --- /dev/null +++ b/prod/docs/PINK_ACCOUNTING_EXEC_FIX.md @@ -0,0 +1,362 @@ +# PINK / DITAv2 Accounting & Execution Fix — Spec and Dev Guide + +**Status**: SPEC — ready for implementation agent +**Date**: 2026-06-11 +**Branch**: `exp/pink-ditav2-sprint0-20260530` (continue on it or fork `fix/pink-accounting-consolidation`) +**Author of spec**: forensic session 2026-06-11 (FET −$5,990.90 mis-book replay) +**Prerequisite for**: VIOLET rebuild (`violet_subsecond_rebuild_plan` memory / future plan session) + +--- + +## 0. Why this exists — the incident in one paragraph + +On 2026-06-11 PINK closed a FET-USDT short that the exchange settled at +≈ **+$164 net** (entry VWAP 0.1878, exit 0.1866, ~202K FET) but the kernel +booked **−$5,990.90** and capital diverged −$6,154 from the exchange wallet. +Replay against `dolphin_pink.trade_reconstruction` slot images identified +three stacked defects, all in *derivation* code (none in exchange facts): +(1) fill events carried BingX's MARKET **protective bound price** (0.229, ++22% off tape) instead of the true fill price; (2) `realized_pnl()` and +`mark_price()` multiplied PnL by `slot.leverage` (exchange leverage — but +`slot.size` is exchange *quantity*, so every leg was 3× inflated); (3) the +Python settle baseline `_last_settled_pnl` resets empty on every restart, +so reconcile-adopted slots re-settle carried PnL. Exact replay of leg 1: +`26,007 × (0.229−0.1878)/0.1878 × 0.1878 × 3 = −3,214.4652` ✓ matches the +booked increment to the cent. + +A fourth structural finding: there are **three parallel ledgers** (Rust +`AccountState` K/E, Python `AccountProjection` — the one persistence reads, +fee-blind — and `AccountProjectionV2`, dead in the live path). This spec +consolidates to **E-facts as ledger of record + K as integrity checksum + +one atomic published snapshot**. + +--- + +## 1. Scope and non-goals + +IN SCOPE +1. Commit + activate the Phase-0 fixes already in the working tree. +2. E-anchored published capital; single atomic account snapshot. +3. Per-trade PnL provenance (`exchange | kernel_estimate`) end-to-end. +4. Sizer feedback off trade-realized PnL (not capital deltas). +5. Persistence hygiene: duplicate row emission, silent async-insert loss, + `event_seq` stamping, `bars_held` clamp, naive-UTC timestamps. +6. Kernel hardening leftovers: `resolve_slot` no-match sentinel, + FILL_SETTLED realized override of flagged estimate legs. + +OUT OF SCOPE (separate tickets) +- BLUE's exit-path masking bug (LINK −$1,248, `TODO_TP_SCAN_CADENCE_BUGFIX.md`) — BLUE stack, not DITAv2. +- VIOLET fork, sub-second clock, venue price-feed port, cadence quantizer. +- ch_writer head-of-line poison-row parking redesign (mitigations land here; + the full parking-lane design is its own task). +- prefect.db / ClickHouse TTL disk remediation. + +HARD INVARIANTS — MUST NOT CHANGE +- **Dual leverage**: `slot.size` = exchange quantity; `slot.leverage` = + exchange leverage (1–3x cap, set at BingX API); *our*-leverage + (conviction) = `size × entry_price / capital`, computed only at + `pink_direct._hz_publish` (line ~911). PnL is therefore **leverage-free**: + `qty × Δprice`, side-signed. Do not touch the conviction→exchange mapping + (`round_half_even_linear_0.5_to_9.0_to_1_to_exchange_cap`) or + `target_size` computation. +- **Exits are never skipped** (exec-router invariant set, §16 kernel ref). +- **BLUE-parity policy contract**: `DecisionEngine`/`IntentEngine` inputs + (MarketSnapshot + capital + slot state) unchanged in shape. +- **Namespace isolation**: zero writes to `dolphin.*` / `dolphin_prodgreen.*` + or BLUE/PRODGREEN HZ maps. Re-verify with `pink_ctl.py mode-verify`. +- **Data cadences are sacred** (operator rule 2026-06-10): never reduce a + data cadence for throughput. + +--- + +## 2. Phase 0 — Commit and activate the already-applied fixes + +These changes exist UNCOMMITTED in the working tree as of 2026-06-11 ~16:30. +Verify each hunk, commit as one reviewed unit, then restart `dolphin_pink`. + +### 0.1 `prod/clean_arch/dita_v2/_rust_kernel/src/lib.rs` +| Function | Change (already applied) | +|---|---| +| `KernelCore::realized_pnl` (~line 1153) | PnL = side-signed `qty × (exit − entry)`; **no leverage factor**; returns 0 when `entry<=0 ∨ exit_size<=0 ∨ exit_price<=0 ∨ !finite` | +| `TradeSlot::mark_price` (~line 394) | no `× leverage` in unrealized; a mark NEVER becomes entry basis — missing basis flags `metadata.entry_basis_missing=true`, unrealized stays 0 | +| `KernelCore::fill_matches_order` (new) | identity match on `venue_order_id` / `venue_client_id` | +| `KernelCore::apply_fill` | entry/exit routing by ORDER IDENTITY first, FSM state second (`!id_matches_exit` / `!id_matches_entry` guards); entry basis = **VWAP across entry fills** (`(prev_basis×prev_filled + price×fill)/accumulated`); price-less exit fill reduces size, books 0 PnL, flags `metadata.realized_skipped_no_price=true` | + +Rebuild required: `cargo build --release` in `_rust_kernel/` (the `.so` is +only auto-built when missing — **source/binary drift is a known hazard**; +add the build to the commit checklist). `cargo test`: 32/32 green as of spec. + +### 0.2 `prod/clean_arch/dita_v2/bingx_venue.py` +Fill events must carry a TRUE fill price or 0.0 — never the order's nominal +`price` / submit `receipt.price` (BingX MARKET bound price, ±20–25%): +- `_events_from_submit` fill event (~line 585): `_row_float(ack_row, + "avgPrice","ap","lastFillPrice","L", default=0.0)` +- `_event_from_row` (~line 697): fills use the same true-price chain; + non-fill events (ACK/CANCEL/REJECT) may keep nominal `price` as info +- `_fill_event_from_row` (~line 736): `"lastFillPrice","L","avgPrice","ap"` + +### 0.3 `prod/clean_arch/dita_v2/rust_backend.py` +- `reconcile_from_slots`: seeds `_last_settled_pnl[slot_id] = slot.realized_pnl` + and `_slot_was_closed[slot_id] = slot.closed` for every adopted slot. +- `restore_state`: same re-anchoring after successful restore. + +### 0.4 Adjacent fixes riding the same commit +- `prod/ch_writer.py`: insert URLs append `&date_time_input_format=best_effort`; + flush errors log at WARNING (first 10 + every 100th), counter `_flush_errors`. +- `prod/clean_arch/dita_v2/blue_parity.py` `price_of`: hyphen-tolerant + fallback (`FET-USDT` → `FETUSDT`) — fixes the unmanaged-position block. +- `prod/clickhouse/users.xml`: `date_time_input_format=best_effort` for the + `dolphin` user (NOTE: running CH container did not honor it even after + restart — the container does not mount compose configs; effective on next + compose recreation. The client-side URL param is the operative fix.) +- `prod/tests/test_dita_v2_kernel.py`: partial→full fill test updated to + incremental `filled_size` semantics (BingX WS `lastFilledQty`). + +### 0.5 Phase 0 gates +1. `cargo test` in `_rust_kernel`: 32/32. +2. `pytest prod/tests/test_dita_v2_kernel.py`: 7/7. +3. `pytest prod/clean_arch/dita_v2/test_exec_router_runtime.py + test_venue_reconcile.py test_orphan_prevention.py + prod/tests/test_pink_async_fill_pump.py + prod/clean_arch/dita_v2/test_account_core_v2.py test_bingx_bugs.py`: 134/134. +4. KNOWN pre-existing failures (NOT introduced by this work — verified by + hunk-revert): 4 tests in `prod/tests/test_dita_v2_bingx_adapter.py` + (snapshot-fill emission broke when sync `submit()` started passing None + snapshots on 2026-06-10). Fix or quarantine them explicitly in this phase + — do not let them mask new regressions. +5. Restart `dolphin_pink` at a FLAT moment; verify in logs: no + `realized_skipped_no_price` storms, no `entry_basis_missing` on fresh + entries, first round-trip books PnL within ±(fees+slippage) of + `GET /openApi/swap/v2/user/income` for the same trade. + +--- + +## 3. Phase 1 — E-anchored published capital + +**Goal**: the capital that persistence/HZ/sizer see is exchange-anchored; +K never publishes. + +### 3.1 `prod/clean_arch/dita_v2/account.py` +- Add to `AccountSnapshot`: `capital_source: str` (`"e_anchored" | + "k_bridged" | "seed"`), `e_wallet_balance: float`, `event_seq: int`. +- New method `AccountProjection.anchor_to_exchange(wallet_balance: float, + available_margin: float, event_seq: int)`: sets `capital = wallet_balance` + (guard `>0` and finite — the zero-wb frame lesson), `capital_source = + "e_anchored"`, recomputes equity. `settle()` remains for the BRIDGE case + only: between anchors, capital += realized (`capital_source="k_bridged"`). +- `settle(realized_pnl, fees)`: **stop ignoring fees** — `capital += + realized_pnl − fees` (today fees only accumulate in `fees_paid`; published + capital ignores them between reseeds). + +### 3.2 `prod/clean_arch/runtime/pink_direct.py` +- The existing reseed path (balance-bearing ACCOUNT_UPDATE → + `kernel.reset_and_seed(wb)`) additionally calls + `kernel.account.anchor_to_exchange(...)` — one anchoring action, two + ledgers consistent. +- Boot seed (launcher `exchange_balance_capital` block, pink_direct ~line + 262) goes through `anchor_to_exchange` instead of direct attribute writes. + +### 3.3 Gates +- New unit tests (`prod/tests/test_pink_account_anchor.py`): + anchor sets capital/source; zero/negative/NaN wb rejected; settle bridges + with fees; anchor after bridge snaps to wb exactly. +- Shadow check (live, 24 h on VST): published capital vs + `GET /openApi/swap/v2/user/balance` polled 1/min — max |Δ| outside a + trade-settlement window ≤ $0.01; during settlement ≤ pending-fee bound. + +--- + +## 4. Phase 2 — Single atomic snapshot, ledger consolidation + +**Goal**: one immutable, versioned account snapshot; the two redundant +ledgers demoted/removed. + +### 4.1 `prod/clean_arch/dita_v2/account.py` +- Make the published snapshot **immutable-replace**: `AccountProjection` + builds a new frozen `AccountSnapshot` (carry `event_seq`) on every + mutation and swaps a single reference (GIL-atomic). Readers must take + `snap = kernel.account.snapshot` once per use (audit call sites: + `pink_clickhouse.py`, `hazelcast_projection.py` HZ writer, `pink_direct`). +- `AccountProjectionV2`: DELETE, or move to `prod/clean_arch/dita_v2/ + _attic/` with a module docstring pointing here. Its only live-path import + is `exchange_event.py` — migrate that import or the dataclasses it uses + (`EPosition` is genuinely useful; keep it in `account.py`). +- The Rust `AccountState` K-ledger STAYS — demoted by documentation and by + Phase 1 (it no longer feeds published capital): its jobs are reconcile + classification (R1-style), `capital_frozen`, and E-dark bridging. Update + the module docstring to say exactly this. + +### 4.2 `prod/clean_arch/persistence/pink_clickhouse.py` +- Read capital/equity/peak/trade_seq from the single snapshot reference; + no recomputation. +- Add columns to emitted rows (and the matching `ALTER TABLE` DDLs under + `prod/clickhouse/pink/08_provenance.sql` — **apply DDLs to CH BEFORE + deploying code that emits them**; the missing-table head-of-line jam of + 2026-06-11 is the cautionary tale): + - `account_events`, `status_snapshots`: `capital_source LowCardinality(String) DEFAULT ''`, + `account_event_seq UInt64 DEFAULT 0` + - `trade_events`, `trade_exit_legs`: `pnl_source LowCardinality(String) DEFAULT ''` + (`exchange` | `kernel_estimate`) +- `bars_held`: clamp to `max(0, …)` at row-build time (UInt16 column; + negative values currently 400 on trade_events / silently vanish on + async tables). +- Timestamps: route every `ts` through one helper emitting **naive-UTC + microsecond ISO** (no `+00:00`) — best_effort already tolerates both, but + rows must stop depending on a parser setting. + +### 4.3 Duplicate-emission fix (same file) +Every CH row is currently emitted twice (visible in any query). Hunt the +double call: instrument `_sink()` with a per-(table, content-hash) debug +counter in a test, then trace the two call paths (suspect: `persist_result` +invoked both from the runtime step and from the fill pump for the same +event). Fix at the caller level; do NOT dedupe by content in the sink +(masks real double-events). Regression test: one simulated round trip → +exactly one row per logical event per table. + +### 4.4 `prod/ch_writer.py` +- `wait_for_async_insert`: `"1"` for ALL `dolphin_pink` tables (accounting + rows must never be silently lost; the spool absorbs latency). Keep `0` + acceptable only for high-volume shadow tables if measured necessary — + document any exception inline. +- Mitigation for head-of-line (full redesign out of scope): after + `attempts > 1000` on a row, log ERROR with the CH response body once per + 100 attempts (today the reject reason is invisible without manual replay). + +### 4.5 Gates +- Full offline suite (the 533+ DITAv2/PINK set) green, minus the Phase-0 + quarantined adapter tests if still open. +- One live VST round trip: every table gets exactly one row per event; + `pnl_source`/`capital_source` populated; CH `system.text_log` shows zero + parse rejections for `dolphin_pink`. + +--- + +## 5. Phase 3 — Sizer feedback off trade-realized PnL + +**THE one seam where this refactor can silently change alpha behavior.** + +### 5.1 `prod/clean_arch/runtime/pink_direct.py` — `_sizer_trade_feedback` (~line 1453) +Today: `pnl = acc.capital − self._sizer_entry_capital` (capital delta). +Under E-anchored capital this absorbs funding, fees of other activity, and +**foreign fills from the shared VST account** (PRODGREEN collision class). +Change to: +``` +pnl = slot_realized_for_trade(trade_id) # Σ slot.realized_pnl legs, i.e. + # kernel estimate, overridden by + # exchange rp when settled (5.2) +``` +Source: the closing slot dict already carries `realized_pnl`; use it (minus +the fees recorded for the trade when available) instead of the capital +delta. Keep the magnitude semantics the sizer expects (sign + rough size — +per the existing comment, bucket/streak multipliers only need that). + +### 5.2 Exchange override (E-led repair) — `bingx_user_stream.py` + `rust_backend.py` +- The WS `FILL_SETTLED` path already carries the exchange's realized (`rp`) + and fee (`n`, sign-flipped at boundary per BingX quirks memory). Extend + the kernel account-event payload with `trade_id`, and on receipt: + - if the matching slot leg was flagged `realized_skipped_no_price`, + ADD the exchange realized to `slot.realized_pnl` (repair) and clear + the flag; settle the increment through the normal baseline mechanism; + - else record `pnl_source="exchange"` for the trade-event row (the + estimate stays as the booked figure unless |estimate−rp| exceeds a + tolerance — then log ERROR + emit an `anomaly_events` row; do NOT + silently re-book). +- Rust: add `dita_kernel_repair_realized(slot_id, amount)` FFI (or fold the + repair into `on_account_event` with `slot_id` in payload). Keep it + idempotent via the existing account-event dedup. + +### 5.3 Gates +- Unit: feedback receives trade-realized, not capital delta (simulate a + foreign-fill capital jump mid-trade → feedback unaffected). +- Unit: price-less exit leg + later FILL_SETTLED repair → slot realized + equals exchange `rp`; settle baseline consistent (no double-settle). +- Parity: `test_blue_parity.py`, `test_alpha_blue_untouched_g7.py` green + (sizer behavior unchanged for normal fills). + +--- + +## 6. Phase 4 — Kernel hardening leftovers + +### 6.1 `lib.rs` — `resolve_slot` (~line 1099) +Falls back to **slot 0** when nothing matches. Change: return +`Option`; on `None`, `on_venue_event` returns +`UNRESOLVED_SLOT` (diagnostic exists already) without mutating any slot, +severity WARNING, event recorded in outcome details. Python callers: the +runtime treats UNRESOLVED_SLOT as a logged no-op (the `_fill_is_ours` +filter remains first-line defense; this is kernel-side defense for +venue-agnostic reuse). +NOTE: several tests construct events with `slot_id=-1` expecting slot-0 +fallback — update them to pass explicit `slot_id=0` (behavioral test +change; list each in the PR description). + +### 6.2 ID-less fill routing (documentation + metric, not code) +BingX WS omits clientOrderId, so identity routing can't always engage. +Add a counter metric (`fills_routed_by_state_total`) via an +`anomaly_events` row per occurrence, severity INFO — gives VIOLET the data +to justify per-venue synthetic ids later. No FSM behavior change. + +### 6.3 Gates +- New Rust tests: unresolved event mutates nothing; entry-id fill during + EXIT_WORKING routes to entry (already covered by Phase-0 routing — add + the explicit case); price-less exit leg books 0 + flag. + +--- + +## 7. Test matrix (run-order for the implementing agent) + +| Stage | Command (env: `PYTHONPATH=/mnt/dolphinng5_predict:/mnt/dolphinng5_predict/nautilus_dolphin`, venv `/home/dolphin/siloqy_env/bin/python3`) | Pass bar | +|---|---|---| +| Rust unit | `cargo test --release` in `_rust_kernel/` | 100% | +| Kernel FSM | `pytest prod/tests/test_dita_v2_kernel.py` | 100% | +| Bridge/accounting | `pytest prod/tests/test_pink_ditav2_kernel_bridge.py test_pink_ditav2_accounting_invariants.py prod/clean_arch/dita_v2/test_account_core_v2.py` | 100% | +| Runtime/reconcile | `pytest prod/clean_arch/dita_v2/test_venue_reconcile.py test_orphan_prevention.py test_exec_router_runtime.py prod/tests/test_pink_async_fill_pump.py test_pink_direct_runtime.py` | 100% | +| Chaos | `pytest prod/tests/test_pink_ditav2_chaos_harness.py` + `test_dita_v2_e2e_functional.py` | 100% | +| Parity | `pytest prod/clean_arch/dita_v2/test_blue_parity.py test_alpha_blue_untouched_g7.py` | 100% | +| Adapter | `pytest prod/tests/test_dita_v2_bingx_adapter.py` | 100% after Phase-0 item 4 resolution | +| LIVE VST E2E | `python prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT` | suite green | +| **Golden replays (NEW — write these)** | `prod/tests/test_pink_accounting_golden.py` | see below | +| Shadow soak | 24–48 h on VST | capital vs balance ≤ $0.01 idle | + +### Golden replay tests (the heart of the acceptance) +Feed the kernel the recorded FET event sequence (entry fills 195,259 + +7,017 @ 0.1878; exit fills 26,007 + remainder; the poisoned variant with +price=0.229 and the clean variant with 0.1866): +1. Clean prices → realized = `(0.1878−0.1866) × 202,276 ≈ +242.7` gross. +2. Poisoned price (0.229) reaching the kernel anyway → with the adapter fix + it must arrive as 0.0 → leg books 0 + `realized_skipped_no_price`; after + synthetic FILL_SETTLED rp=+164 → slot realized = +164, `pnl_source=exchange`. +3. Restart mid-position (save_state/restore_state + reconcile_from_slots) + → next venue event settles ONLY the incremental PnL. +4. VWAP: two entry fills at different prices → basis = weighted average. +5. Dual-leverage invariant: same fills at exchange-leverage 1 vs 3 → + **identical realized PnL**; only margin fields differ. + +--- + +## 8. Rollout & rollback + +1. Each phase = one PR-sized commit, gates green before the next. +2. Activation requires `supervisorctl restart dolphin_pink` — restart at a + FLAT moment (check `DOLPHIN_STATE_PINK` + exchange positions). The + restart-reconcile path is itself under test here; first restart after + Phase 0 should be watched live. +3. Rollback = `git revert` of the phase commit + rebuild `.so` + restart. + The Rust `.so` MUST be rebuilt on both apply and revert — stale-binary + drift is how the incremental-fill change sat uncompiled until 2026-06-11. +4. CH DDLs are additive (`ADD COLUMN ... DEFAULT`) — no destructive + migrations anywhere in this spec; rollback leaves unused columns, which + is fine. +5. PINK is VST (virtual funds) — it is the canary by construction. Nothing + in this spec touches BLUE files (verify with `git diff --name-only` + against the §38.7 checklist). + +## 9. Done criteria (the whole spec) + +- All phases merged; full matrix green; golden replays green. +- 48 h VST soak: zero UNEXPLAINED reconcile errors; published capital + tracks exchange balance; every closed trade's `trade_events.pnl` within + fees+slippage of the exchange income record, with `pnl_source` populated. +- `pink_ctl.py mode-verify` passes (namespace isolation intact). +- SYSTEM BIBLE §38 addendum updated (one paragraph: E-led ledger, K as + checksum, provenance fields) + `DITA_V2_KERNEL_REFERENCE.md` §"Capital + simplification" rewritten to match reality. diff --git a/prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md b/prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md new file mode 100644 index 0000000..d92798c --- /dev/null +++ b/prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md @@ -0,0 +1,2978 @@ +# VIBRISS Parameter Governance Spec + +**Name**: VIBRISS — Variational Input-driven Bandit-Reactive Intelligent Sensing System +**Status**: Design doctrine / implementation target +**Scope**: BLUE/PINK parameter governance, initially shadow/advisory only +**Canonical dependency**: `SYSTEM_BIBLE_v7.md` +**Operational stance**: shadow-first, replay-first, guardrail-first. VIBRISS +must be useful even when it never gets permission to actuate live. + +## 1. Purpose + +VIBRISS is the engine's active parameter-sensing and adaptive execution layer. +Its job is to replace brittle hardcoded execution constants with bounded, +auditable, continuously re-evaluated parameter recommendations. + +VIBRISS is not a new alpha model and not a full RL layer. It is an online +statistical parameter-governance system: observe outcomes, test safe candidate +values, score the realized response, retire weak settings, and keep enough +controlled exploration alive to detect drift. + +The first intended target is exit-parameter governance, especially ADVSL and +fast/cubic TP parameters such as hold-bar limits, floor thresholds, pressure +thresholds, and TP posture. Later targets can include sizing haircuts, urgency, +asset-selection posture, and venue-specific execution parameters. + +## 2. Design Stance + +VIBRISS must be modular, spec-driven, replayable, and safety bounded. + +Key doctrine: + +- One learner per parameter spec by default. +- Bundle/slate learning only after interaction effects are repeatedly material. +- Contextual bandits first; full RL only later if decisions are truly sequential + and materially coupled across multiple execution steps. +- Discrete and bucketed parameters use Thompson Sampling, UCB, LinTS, or LinUCB. +- Continuous bounded scalars are discretized into safe buckets first. +- Nonstationary behavior uses discounted or sliding-window evidence plus drift + detection. +- Safety-critical parameters require baseline-safe exploration, confidence + thresholds, step limits, cooldowns, and hard guardrails. +- Passive fill and time-to-fill decisions should use survival-analysis modules + where censoring matters. + +## 3. System Boundary + +VIBRISS must not silently mutate engine internals. + +The correct production shape is: + +```text +context ingestion + -> admissible candidate generation + -> learner scoring + -> guardrail filter + -> action selection + -> advice publication + -> allowed engine consumption point + -> delayed outcome capture + -> reward mapping + -> online update +``` + +The hot execution path consumes advice only at documented decision points. The +learner/update path is separate and may lag. If advice is stale, low-confidence, +or invalid, the engine falls back to the baseline parameter. + +BLUE is in-memory/paper and not BingX-enabled. PINK is the BingX venue-facing +world. VIBRISS may govern both, but its output contract must be namespace-aware +and must not assume that BLUE has exchange state. + +Non-goals: + +- VIBRISS does not pick assets. +- VIBRISS does not replace MARAS, OBF, V7, ACB, EFSM, or SurvivalStack. +- VIBRISS does not own exchange reconciliation. +- VIBRISS does not rewrite frozen champion configs. +- VIBRISS does not turn offline backtest winners into live settings without + a shadow/OPE/promotion path. + +Its only authority is to publish bounded, versioned parameter advice and to +learn from the outcome trail. + +## 4. Terminology + +| Term | Meaning | +|---|---| +| `vibrissa` | One probe-trade, parameter test, or market feeler. | +| `vibrissae` | The active parameter-probe array. | +| `parameter spec` | Loadable contract defining one tunable parameter. | +| `arm` | One candidate value or execution configuration. | +| `reward` | Bounded realized execution-quality score. | +| `posture` | Current preferred parameter set plus confidence and fallback metadata. | +| `baseline` | The currently trusted hardcoded or documented production value. | + +## 4.1 Control-Plane Elegance Constraints + +VIBRISS must remain a disciplined parameter-governance control plane, not an +unbounded mesh of subsystems mutating each other. Adaptive behavior is allowed +only when it preserves ownership, auditability, and bounded actuation. + +Hard architecture rules: + +1. One writer per parameter. + - A live parameter may have many sensors and many context inputs, but only + one ParamSet is allowed to publish the effective value for that parameter + in a given namespace. + +2. ParamSpecs and ParamSetSpecs own promotion rules. + - Promotion cadence, evidence gates, rollback rules, manual-approval + requirements, and replacement rhythm are part of the spec. The runner must + execute declared policy, not invent policy. + +3. Meta-cadence is itself a parameter, but only at a slower cadence. + - VIBRISS may tune replay cadence, promotion-review cadence, checkpoint + cadence, or reward-join cadence, but those meta-parameters must move more + slowly than the governed trading/execution parameter and must have + stronger guardrails. + +4. EsoF, ExoF, MARAS, OBF, V7, MHS, and drawdown state are context inputs, not + arbitrary controllers. + - They may influence candidate scoring, confidence, demotion, or fallback, + but they must not directly mutate live parameters outside the owning + ParamSet. + +5. Every live change must be reproducible. + - Log candidate set, chosen action, action probability or confidence, + context hash, reward mapping, model version, compiled config hash, + fallback reason, promotion state, and rollback path. + +6. No hidden cross-subsystem mutation. + - If one subsystem changes another subsystem's effective behavior, the change + must appear as a typed ParamSet advice event and an audited engine-consumed + posture update. + +7. Shadow first, replay/OPE second, canary third, live last. + - No safety-critical parameter may skip directly from idea or in-sample + replay to live actuation. Live promotion requires held-out evidence, + shadow logging, explicit approval when required, and automatic demotion + conditions. + +These constraints are mandatory for all future ADVSL, TP, DVOL/VOL, IRP, +asset-picker, EFSM/overlay, and meta-cadence ParamSets. If a design violates +them, the design is considered tangled and must be simplified before +implementation. + +## 5. Parameter Spec Contract + +Each adaptive parameter must be declared by a loadable spec. VIBRISS should not +hardcode knowledge of individual parameters. + +Important terminology: + +- `ParamSetSpec`: the loadable contract for a family of related parameters. +- `paramset_config`: configuration that applies to the ParamSet as a whole. +- `params`: the parameter declarations contained by the ParamSet. +- `param_defaults`: defaults inherited by every parameter in `params`. +- per-param override: a field inside one `params.` entry that + overrides `param_defaults` for that parameter only. + +The live runner must not perform complex inheritance during scoring. Specs are +authored in a rich hierarchical form, validated, compiled, and hash-stamped into +a flat canonical policy document before the runner consumes them. + +Required fields: + +```yaml +identity: + name: advsl.overlay_min_hold_bars + type: integer + units: bars + default: 6 + +domain: + candidates: [4, 6, 8, 10, 12, 16, 20] + hard_min: 0 + hard_max: 40 + +safety: + fallback_baseline: 6 + max_step_change: 4 + cooldown_trades: 5 + min_shadow_samples: 100 + min_live_confidence: 0.80 + max_exploration_rate: 0.05 + +placement: + consumer: advanced_sl + decision_point: open_trade_exit_evaluation + namespace: blue + +live_change_policy: + mode: between_trades + allow_intratrade_change: false + +candidate_policy: + learner: linucb + nonstationarity: sliding_window + window_trades: 300 + +success: + primary_metric: capital_curve_delta_after_cost + secondary_metrics: + - clipped_winner_cost + - saved_loss + - drawdown_delta + - recovery_lag + +inputs: + - maras_latest + - v7_decision_events + - advanced_sl_monitor_latest + - obf_universe_latest + - eigen_scan + - trade_path + +reward_mapping: + bounded_range: [-1.0, 1.0] + delayed_until: trade_close_or_counterfactual_terminal + components: + saved_loss: +1.0 + missed_profit: -1.5 + drawdown_reduction: +0.5 + tail_loss: -2.0 + +promotion_policy: + owner: param_set + technique: replay_shadow_canary + review_cadence_s: 900 + min_replay_trades: 300 + min_shadow_decisions: 200 + min_realized_rewards: 50 + min_contiguous_regions: 4 + required_evidence: + recursive_capital_curve_delta_after_cost: "> 0" + worst_region_delta: ">= configured_floor" + clipped_winner_cost: "<= configured_budget" + drawdown_delta: "<= 0" + allowed_transitions: + - disabled_to_shadow + - shadow_to_advisory + - advisory_to_canary_live + - canary_live_to_controlled_live + manual_approval_required: + - advisory_to_canary_live + - canary_live_to_controlled_live + automatic_demotion_on: + - stale_required_sensor + - reward_drift + - drawdown_alarm + - invalid_checkpoint + +meta_cadence_policy: + owner: param_set + status: shadow_first + tunable_cadences: + calibration_interval_s: [300, 900, 1800, 3600] + promotion_review_interval_s: [900, 1800, 3600, 7200] + checkpoint_interval_s: [30, 60, 120, 300] + shadow_to_canary_cooldown_trades: [25, 50, 100, 200] + context_inputs: + - maras_latest + - exof_latest + - esof_latest + - mhs_latest + - reward_backlog + - drawdown_state + success: + primary_metric: policy_stability_adjusted_reward + secondary_metrics: + - stale_advice_rate + - promotion_false_positive_rate + - missed_adaptation_cost + - operator_churn + - compute_cost + live_change_policy: + calibration_cadence: controlled_after_shadow + promotion_cadence: advisory_only_until_explicit_approval + +outputs: + hz_key: DOLPHIN_FEATURES.vibriss_param_advice + clickhouse_table: dolphin.vibriss_decisions + state_table: dolphin.vibriss_policy_state +``` + +### 5.1 ParamSet Config and Per-Parameter Overrides + +The canonical authoring shape is: + +```yaml +param_set: + id: advsl.hold_substitute.v1 + version: 1.0.0 + namespace_default: blue + status: shadow_first + +paramset_config: + consumer: advanced_sl + decision_family: exit_risk_timing + placement: + decision_point: trade_entry + live_replacement_rhythm: capture_on_entry + promotion_policy: + technique: replay_shadow_canary + review_cadence_s: 1800 + meta_cadence_policy: + status: shadow_first + outputs: + hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice + decision_table: dolphin.vibriss_decisions + reward_table: dolphin.vibriss_rewards + +param_defaults: + learner: + type: discounted_ucb + nonstationarity: sliding_window + window_trades: 300 + safety: + fallback_baseline: 12 + min_shadow_samples: 200 + min_live_confidence: 0.80 + max_exploration_rate: 0.0 + reward_mapping: + bounded_range: [-1.0, 1.0] + primary_metric: recursive_capital_curve_delta_after_cost + guardrails: + stale_sensor_policy: shrink_to_baseline + drawdown_alarm_policy: freeze_to_baseline + +params: + advsl.min_hold_bars_before_floor_arm: + type: integer + units: bars + domain: + candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40] + hard_min: 0 + hard_max: 48 + default: 12 + baseline_reference: 20 + + advsl.recovery_extension_max_bars: + type: integer + units: bars + domain: + candidates: [0, 4, 8, 12, 20, 34] + hard_min: 0 + hard_max: 40 + default: 0 + learner: + type: shadow_only_discounted_ucb + safety: + min_shadow_samples: 500 + min_live_confidence: 0.90 +``` + +Merge precedence: + +```text +compiled_param = + built_in_schema_defaults + < paramset_config + < param_defaults + < params. + < namespace/runtime override if explicitly allowed by spec +``` + +Rules: + +- ParamSet-wide promotion and meta-cadence policy live in `paramset_config` + unless a parameter explicitly overrides a narrower field. +- Per-param overrides may tighten safety, narrow domains, increase sample + requirements, or change learner type only if the ParamSet allows it. +- Per-param overrides may not weaken global catastrophic guardrails. +- The compiler must emit both the original source spec hash and the compiled + canonical hash. +- The runner consumes only the compiled canonical form. + +### 5.2 Spec Compiler and Validation Library + +Use an existing platform-agnostic schema/config tool for the authoring layer. +Do not invent a bespoke inheritance language. + +Recommended stance: + +| Need | Recommended tool | Runtime placement | +|---|---|---| +| Cross-language schema contract | JSON Schema | CI, compiler, runner validation. | +| Rich defaults, constraints, unification, inheritance-like config | CUE | Spec compiler / CI, not hot path. | +| Human-friendly authoring | YAML | Source only; compiled immediately. | +| Runner consumption | canonical JSON | Hot path. | +| Fast internal representation | dataclass / Pydantic / msgspec-style object | Runner load time only. | + +VIBRISS should prefer: + +```text +YAML authoring -> CUE/JSON-Schema validation -> canonical JSON -> runner cache +``` + +The live runner should never parse CUE, run template expansion, or resolve a +large inheritance tree during an advice decision. It should load a precompiled +canonical JSON document, verify hashes and schema version, then use direct field +access. + +Performance requirements: + +- spec compile can be slower because it is CI/worker time; +- runner spec load should be bounded and rare; +- advice scoring must use already-merged values; +- every compiled ParamSet must include a deterministic `compiled_config_hash`; +- all advice/audit rows must log `spec_hash` and `compiled_config_hash`. + +## 6. Candidate Algorithms + +V1 should support a small set of algorithms well, rather than a broad library +surface poorly. + +Recommended V1 learners: + +| Parameter type | Default learner | Notes | +|---|---|---| +| Small categorical | Thompson Sampling | Useful for urgency, route, retry, fixed mode selection. | +| Ordered discrete scalar | UCB or discounted UCB | Good for hold bars, TP buckets, pressure thresholds. | +| Contextual finite arms | LinUCB or LinTS | First choice for MARAS/OBF/V7-conditioned advice. | +| Continuous scalar | Adaptive discretization | Start bucketed; upgrade only if buckets are too coarse. | +| Passive fill/delay | Survival model | Explicitly handle censored fill and recovery windows. | + +Useful libraries to inspect: + +- Vowpal Wabbit for contextual bandits, logged propensities, and OPE. +- River for streaming statistics, online GLMs, and drift detection. +- Open Bandit Pipeline for offline policy evaluation. +- MABWiser for fast Python prototype comparison. +- lifelines or statsmodels for survival analysis. +- NumPyro/Pyro only when hierarchical Bayesian pooling is justified. + +### 6.1 Dependency Placement and Reliability Policy + +VIBRISS must distinguish algorithm research from live parameter governance. +Performance and reliability are more important than using the most general +library in the first live version. + +Dependency rule: + +- The live runner should have a small deterministic dependency surface. +- Heavy learning, OPE, simulation, Bayesian inference, and broad model + comparison belong in `vibriss_worker` or offline jobs. +- The engine consumes compact checkpointed policy state and advice payloads. It + must not shell out to a learner or wait on an offline library. +- ClickHouse writes, model updates, and replay jobs must never block the hot + advice publication loop. +- If a dependency is not needed to score the current checkpointed policy, it is + not a live-runner dependency. + +Recommended V1 split: + +| Layer | Allowed dependency posture | Reason | +|---|---|---| +| Engine hot path | no VIBRISS learner dependency | Engine reads validated advice only. | +| `vibriss_runner` | stdlib + NumPy/Pandas only if needed; optional River subset for drift/stats | Keep startup, memory, and failure modes bounded. | +| `vibriss_worker` | VW, River, OBP, MABWiser, lifelines, statsmodels, contextual libraries | Calibration, OPE, replay, walk-forward, and report generation. | +| Research/simulation | ABIDES, Pyro/NumPyro, CATX, experimental packages | Valuable, but not part of the live critical path. | + +### 6.2 Library Decision Matrix + +| Library / stack | VIBRISS use | Placement | Decision | +|---|---|---|---| +| Internal UCB/TS/LinUCB | First production learners for bounded discrete arms. | runner + worker | Use first; easiest to audit and checkpoint. | +| Vowpal Wabbit | Contextual bandit benchmark, action-dependent features, OPE workflows, possible future compact policy generator. | worker/offline | Approved for evaluation; not a V1 hot-path dependency. | +| River | Streaming stats, reward normalization, ADWIN/Page-Hinkley/KSWIN-style drift detection, progressive validation. | runner optional; worker default | Approved, but keep live usage narrow. | +| Open Bandit Pipeline | OPE estimator benchmarking and logged-bandit evaluation. | offline/worker | Approved for reports; not live. | +| MABWiser | Fast Python comparison of TS/UCB/LinTS/LinUCB policies. | offline/worker | Approved for prototyping; not live. | +| lifelines / statsmodels | Survival models, recursive diagnostics, stability checks. | worker/offline | Approved for passive fill/recovery modeling. | +| contextualbandits | Alternative contextual-bandit benchmark implementations. | offline/worker | Research benchmark only. | +| SMPyBandits / BanditPylib / PyBandits | Algorithm comparison and stochastic-bandit sandboxing. | offline/research | Optional; do not add to live image. | +| NumPyro / Pyro | Hierarchical Bayesian pooling for sparse per-symbol/per-hash modules. | research/worker | Defer until sparse-data pooling is clearly needed. | +| CATX | Continuous-action contextual bandit research. | research | Defer; bucketed actions first. | +| ABIDES / ABIDES-Gym | Market-interactive simulation and stress rehearsal. | research/simulation | Useful later; too heavy for V1 runner. | +| Kafka / Flink | Durable event-stream backbone and stateful stream processing. | future infra | Defer; Dolphin already has Hazelcast + ClickHouse + supervisord. | +| scikit-multiflow | Historical stream-learning reference. | none | Do not use for net-new code; prefer River. | +| banditml | Architectural reference for production bandit services. | research only | Do not depend on it without a fresh maintenance review. | + +### 6.3 Performance Budgets + +Initial budgets for the live runner: + +| Operation | Target | Hard behavior on miss | +|---|---:|---| +| Score one ParamSet advice snapshot | `p95 <= 10 ms` | publish fallback or previous checkpoint. | +| Full live advice loop over enabled ParamSets | `p95 <= 50 ms` | skip noncritical ParamSets first. | +| Hazelcast publish | nonblocking best effort | mark advice degraded if publish fails. | +| ClickHouse audit write | never blocks advice | spool locally and expose backlog. | +| Runner startup with warm checkpoint | `<5 s` target | publish no advice until checkpoint valid. | +| Memory footprint | bounded and observable | disable worker-style models in runner. | + +Candidate sets must stay small. For `advsl.hold_substitute.v1`, a dozen finite +hold-bar arms is acceptable; hundreds of arms are not. Continuous-action +learners are disallowed in live V1 because they make bounded behavior harder to +audit and harder to replay exactly. + +### 6.4 Algorithm Defaults by Parameter Class + +Concrete defaults: + +| Parameter situation | Default | Upgrade path | Notes | +|---|---|---|---| +| Small finite categorical, weak context | Thompson Sampling or UCB1 | discounted UCB if drift appears | Use for mode, urgency, route, retry-like knobs. | +| Ordered discrete scalar | discounted UCB with monotone/smoothness diagnostics | contextual finite-arm learner | Good first fit for hold bars and TP buckets. | +| Finite arms with rich context | LinUCB or LinTS | GLM-UCB/GLM-TS if reward shape demands it | Use MARAS/OBF/V7/EFSM context. | +| Continuous bounded scalar | adaptive discretization | continuous-action contextual bandit only after bucket failure | Prefer auditability over fine resolution. | +| Coupled parameter bundle | small safe bundle catalog | slate/combinatorial learner only if interaction is proven | Avoid action-space explosion. | +| Nonstationary regime | discounted/sliding-window learner + drift detector | replay-reset logic | Freeze or shrink on drift; do not blindly chase. | +| Safety/budget constrained parameter | baseline-safe gating around the learner | conservative contextual bandit / budgeted bandit | Guardrails must dominate learner output. | +| Passive fill or recovery delay | survival model | richer survival only after classical model stability | Treat censoring explicitly. | + +### 6.5 Explicit Deferrals + +VIBRISS V1 should not attempt: + +- full RL; +- continuous-action live control; +- live probe trades by default; +- Kafka/Flink migration; +- ABIDES-in-the-loop production scoring; +- hierarchical Bayesian pooling in the runner; +- joint optimization of many parameters before single-ParamSet evidence exists. + +These are not rejected ideas. They are deferred because the current bottleneck is +reliable evidence collection, replay/OPE discipline, and safe advice +publication. + +## 7. Reward Design + +Rewards must be decomposed, bounded, and auditable. Store both raw components +and normalized reward. + +Typical reward components: + +- positive: saved loss, lower drawdown, better realized terminal PnL, better + capital compounding trajectory, successful recovery without excess hold. +- negative: clipped winner, missed TP, extra adverse selection, slippage, timeout, + excessive hold, larger tail loss, oscillation, stale-data actuation. + +For ADVSL/TP research, the primary reward should be capital-curve delta after +opportunity cost, not terminal trade PnL alone. A rule that saves losses but +systematically clips larger winners must be penalized accordingly. + +## 8. Required Audit Logging + +Every VIBRISS decision must be replayable. + +Minimum decision log fields: + +- timestamp and scan number +- namespace: blue, pink, prodgreen, research +- parameter spec id and version +- context snapshot hash +- MARAS regime, scalar hash, composite hash when available +- candidate set +- chosen arm +- action probability or confidence +- baseline value +- guardrail decisions and fallback reason +- model version +- advice publication timestamp +- engine consumption timestamp, if consumed +- delayed reward components +- terminal reward +- policy update version + +## 9. Control-Plane Output + +VIBRISS publishes advice, not imperative mutations. + +Recommended HZ shape: + +```json +{ + "schema": "vibriss.param_advice.v1", + "namespace": "blue", + "ts": "2026-06-03T00:00:00Z", + "spec_id": "advsl.overlay_min_hold_bars", + "spec_version": "1.0.0", + "baseline_value": 6, + "recommended_value": 12, + "confidence": 0.82, + "candidate_set": [4, 6, 8, 10, 12, 16, 20], + "context_hash": "maras:57957|asset:XLMUSDT|side:LONG", + "learner": "linucb", + "guardrail_status": "PASS", + "fallback_reason": null, + "expires_at": "2026-06-03T00:05:00Z" +} +``` + +Consumption rule: the engine may consume this only if the parameter spec says +the current state is an allowed change point and all guardrails pass. Otherwise +the baseline remains in force. + +## 10. Initial VIBRISS Targets + +### 10.1 Conditional Fast TP + +First replay-backed target: + +- `fast_tp.tp_pct` +- `fast_tp.bars_held_min` +- `fast_tp.exit_pressure_min` +- `fast_tp.mfe_decay_min` +- `fast_tp.pnl_mfe_frac_max` + +Current evidence says blanket first-touch `0.20%` TP clips too many winners, but +conditional fast TP is net positive in both full corpus and capital-known BLUE +subset. The first VIBRISS job is to turn those calibrated constants into a +shadow policy with logged propensities and OOS replay. + +This TP percentage is a prime VIBRISS assistance target. Treat it as a +first-class tunable rather than a frozen constant once replay coverage is +sufficient. + +Open research note: + +- investigate whether the `0.20%` TP should be risk-normalized by notional + risked, using a monotone nonlinearity such as a cubic retract/expansion curve; +- the candidate question is whether high-notional or high-leverage trades should + have a proportionally different TP posture, while keeping the first-touch + semantics intact for replay accounting; +- if tested, this must be evaluated with full capital-curve compounding and + opportunity cost, not just raw win-rate or per-trade PnL. + +#### 10.1.1 Re-entry-Conditioned Fast TP + +Same-asset reentries after a profitable exit are a separate research bucket. +They should not inherit the exact same fast-TP posture as a first-entry trade +without evidence. In current BLUE history, same-asset reentries after wins are +usually profitable, but the average second-leg move is smaller than the initial +leg, which means a lower TP multiplier may preserve geometry better than a blunt +`2.0x` repeat. + +Recommended candidate arms: + +- `fast_tp.reentry_tp_multiplier = 1.2` +- `fast_tp.reentry_tp_multiplier = 1.5` +- `fast_tp.reentry_tp_multiplier = 2.0` + +Interpretation: + +- first-entry trades keep the baseline conditional fast TP +- re-entry-after-win trades may use a smaller multiplier band +- re-entry-after-loss trades should remain a separate bucket and may need a + slower TP or stronger confirmation, not just a smaller multiplier +- a mild nonlinear / cubic trim on re-entry is a valid shadow-only follow-up + candidate, but only after the flat multiplier band has been replayed first + +Ownering rule: + +- VIBRISS should learn and score the candidate multiplier in shadow replay +- EFSM should own live application if the runtime ever consumes the bucket +- do not flatten the geometric ROI curve by forcing a single multiplier on all + reentries + +#### 10.1.2 TP Near-Miss Replay + +The TP research set must include a distinct near-miss population: + +- trades that came within a small epsilon of the candidate TP but did not + satisfy the live trigger on the observed cadence +- trades that briefly exceeded the candidate TP and then reversed before the + engine observed the touch +- trades that later stopped out after first-touch proximity, because those are + the exact counterexamples needed to learn whether a lower TP bucket would + have been better + +This bucket is mandatory because a corpus dominated by profitable TP closes is +survivorship-biased. A learner trained only on winners can learn that the +current TP is "usually profitable" while remaining blind to the trades where a +slightly lower TP would have caught the move and prevented a later stop-loss. + +Required replay semantics: + +- use first-touch TP labels, not close-only labels +- keep near-miss candidates separate from clean TP hits +- score each candidate by recursive capital-curve delta after opportunity cost +- preserve scan-cadence effects when the live engine is scan-driven + +Primary use: + +- learn whether a tighter TP bucket is justified for specific regimes, assets, + or reentry conditions +- quantify the opportunity cost of the missed touch itself, not just the later + realized close +- explain repeated "why did this one not TP?" incidents without overfitting to + already-winning trades + +### 10.2 ADVSL Hold/Floor + +Second target: + +- `advsl.base_catastrophic_floor_pct` +- `advsl.overlay_catastrophic_floor_pct` +- `advsl.overlay_max_loss_usd` +- `advsl.overlay_min_hold_bars` +- `advsl.overlay_pressure_min` +- `advsl.overlay_mae_risk_min` + +This is safety-critical. VIBRISS may advise, but live application requires +strong guardrails, bounded step changes, and explicit fallback to the current +documented ADVSL values. + +Floor percentage is also a prime VIBRISS assistance target, but it must stay +outside the learner’s ability to disable the catastrophic floor entirely. + +Hard safety ceiling: + +- the operator may define a non-negotiable max-loss ceiling per trade, per leg, + or per session +- this ceiling is distinct from the replay optimum and distinct from the + learner’s preferred floor/TP/hold posture +- if a candidate policy exceeds the ceiling, the ceiling wins even when the + replayed recursive capital curve would otherwise look better +- VIBRISS may tune inside the ceiling, but it must not optimize the ceiling + away, relax it implicitly, or treat operator pain tolerance as a soft signal + +### 10.3 MARAS-Conditioned Hold Bars + +Third target: + +- per-hash or per-regime hold-bar posture +- per-label bias around known hash medians +- OBF-conditioned hold extension or contraction + +Do not use MARAS labels as hard filters. Labels such as CHOPPY can contain both +many wins and severe losses. Use the composite hash, raw signature dimensions, +confidence, conflict, and nearest-neighbor regime evidence as context features. + +### 10.4 DVOL/VOL Gate and Trade-Pause Posture + +Candidate carefulness-critical target: + +- `entry_gate.dvol_threshold` +- `entry_gate.vol_open_persistence_bars` +- `entry_gate.min_qualified_cross_rate` +- `entry_gate.pick_latency_pause_s` +- `entry_gate.open_gate_no_pick_pause_score` + +This target exists because a VOL/DVOL gate can be technically open while the +engine still sees low-quality entry conditions: few accepted threshold crosses, +weak asset-pick evidence, or no fresh accepted pick after a normally sufficient +latency window. + +The first useful derived sensor is: + +```text +open_gate_no_pick_pause_score = + VOL/DVOL gate open + + low recent vel_div threshold-cross density + + no accepted entry for expected_pick_latency_s + + neutral/hostile EsoF/ExoF/MARAS context + + no evidence of stale scans or halted runtime +``` + +This must not be treated as an urgent kill switch by default. It is a +carefulness parameter: VIBRISS should first log it, correlate it with later +trade quality, and test whether it predicts profitable trade pauses or smaller +position sizing. The baseline is no pause beyond current gate logic. + +Related empirical TODOs: + +- Reconsider `min_irp_alignment=0.0` empirically. The live gold config disables + the IRP alignment filter, but the larger current corpus may now be sufficient + to retest whether a nonzero IRP alignment floor improves asset-pick quality. +- Examine whether the apparent `VOL open / no immediate pick` condition is a + useful trade-pause state or simply the expected effect of the stricter + effective signal-strength gate (`vel_div < about -0.03`). +- Initial live observation: recent quiet after the last known good picks appears + protective rather than broken. This must be tested with opportunity cost: + measure what the system avoided during quiet periods and what it missed by not + entering. +- Examine whether MARAS composite hashes need more granularity: more distinct + market-descriptive buckets while preserving the sortable scalar hash and + nearest-neighbor/similarity behavior. + +### 10.5 Capital-Protect / Profit-Lock + +Fourth target: + +- `capital.protect_arm_threshold_pct` +- `capital.protect_full_threshold_pct` +- `capital.protect_tp_min_multiplier` +- `capital.protect_cubic_coeff` +- `capital.protect_reset_drawdown_pct` +- `capital.protect_hysteresis_bars` +- reset family selector: `capital.protect_reset_mode` +- time-based reset controls: `capital.protect_reset_time_trades`, `capital.protect_reset_time_seconds` +- regime/hash reset controls: `capital.protect_reset_regime_whitelist`, `capital.protect_reset_fingerprint_whitelist` +- sc-EsoF reset controls: `capital.protect_reset_sc_floor`, `capital.protect_reset_sc_neutral_floor`, `capital.protect_reset_sc_positive_floor` + +This is the profit-protect / peak-lock family. The idea is not to mute risk +management, but to preserve capital once the day/session has already become +meaningfully profitable. The study must test whether a gain threshold such as +`1.2%`, `2.3%`, `3.3%`, ... should arm a more conservative TP posture for +subsequent trades, and whether a cubic trim on the TP multiplier is better than +an abrupt step change. + +Required policy questions: + +- what profit threshold should arm the protect state +- how quickly TP should tighten once the threshold is crossed +- whether the tighten curve should be cubic, stepped, or mixed +- when the protect state must reset +- how much drawdown from the protected peak is required to disarm +- how many bars/trades of hysteresis are needed before a reset is valid +- whether reset should be keyed to time, regime, known fingerprint, sc-EsoF, or mixed logic +- whether reset should use a whitelist gate or a change-detection gate for regime/fingerprint families + +The baseline reset rule should be conservative: + +- arm only after the gain threshold is crossed on the recursive capital curve +- keep the lock until a real drawdown-from-peak or day/session reset occurs +- do not reset on a single noisy bar if the protected peak is still intact + +This target must be evaluated against: + +- recursive capital-curve delta after opportunity cost +- clipped-winner cost from over-tightening +- saved-loss from avoiding giveback after the day is already up +- win-return statistics after the arm event +- ceiling-violation count, because the profit protect should never create an + implicit max-loss escape hatch + +It is especially important to compare: + +- flat threshold steps vs cubic tightening +- no hysteresis vs bar-count hysteresis +- immediate reset vs drawdown-based reset +- day-reset vs rolling-session reset + +The tape should be replayed on the same capital curve used by the live engine, +so the protect state is evaluated recursively, not from a fixed post-hoc label. + +### 10.6 OB Cascade TP-Modulation (added 2026-06-12, LINK 5e05eeeb post-mortem) + +Candidate carefulness-critical target — the parameters of the OB +tail-avoidance layer in `alpha_exit_manager.evaluate()` that silently +modulate the "fixed" TP: + +- `ob_cascade.count_threshold` — number of assets withdrawing liquidity + (depth withdrawal velocity < CASCADE_THRESHOLD) required to enter cascade + mode. **Currently hardcoded as `cascade_count > 0`, i.e. a SINGLE asset + anywhere in the tracked set widens every open trade's TP by x1.40.** The + LINK 5e05eeeb diagnosis (2026-06-11, -$1,248.71) showed this trigger is + active on a large fraction of trades because entries occur during panics + by construction. Domain candidates: {1, 2, 3, n_assets//4, n_assets//2}; + fallback_baseline: 1 (current behavior). +- `ob_cascade.tp_widen_factor` — currently hardcoded 1.40. Population + evidence (post-2026-05-11 cohort): widening earned ~+$84.7K on + continuation trades vs ~-$16.9K given back on reversals, so the factor is + net-positive but fat-left-tailed. Domain: [1.0 .. 1.6]; 1.0 = modulation + off. +- `ob_cascade.withdrawal_velocity_threshold` — `CASCADE_THRESHOLD` in + `ob_features.py`, currently -0.10 (10% depth pulled over lookback). + +Required sensors already exist since 2026-06-12: `dynamic_tp_pct`, +`tp_mod_factor`, `cascade_count`, `ob_regime_signal`, `tp_floor_armed` are +logged on every `dolphin.v7_decision_events` row, so reward attribution can +be computed offline from the live tape with no new instrumentation. + +INTERPLAY (REQUIRED reading for the paramset author): these parameters +interact with (a) the TP_FLOOR profit-floor ratchet (2026-06-12, +`DOLPHIN_TP_FLOOR`) which caps the left tail of the widening — reward must +be computed on the JOINT policy (widen + floor), not the widen alone; and +(b) §10.1 Conditional Fast TP / the future ADAPTIVE TP THRESHOLD ("Dynamic +TP"): the adaptive TP threshold itself is hereby marked FIT FOR VIBRISS +GOVERNANCE — the effective TP should ultimately be one governed surface +(base x leverage-curve x market-state x cascade modulation), with VIBRISS +owning the modulation terms and the champion base (0.20%) remaining frozen +outside governance. A VIOLET-era sub-second exit guard changes the +actuation latency of both TP and floor; cadence is therefore a context +feature, not a governed parameter, per the data-cadence operator rule. + +## 11. First Concrete ParamSet: ADVSL Hold Substitute + +### 11.1 Objective + +This is the first concrete VIBRISS use case. + +The parameter set replaces a static ADVSL no-arm / min-hold rule with a bounded, +evidence-scored hold target. The original research problem was the legacy +`20`-bar hold window: it protects winners from premature ADVSL exits, but it can +also let fast adverse trades slip through before the floor arms. Replay work +found that shorter centers, especially around `12` bars, can protect capital in +tail events, while longer holds can be correct in snapback/recovery pockets. + +The VIBRISS answer is not "always use 12" and not "always use 20." It is: + +- choose a hold target from a bounded set, +- condition the choice on current trade/path/regime sensors, +- score it by recursive capital-curve impact after opportunity cost, +- keep catastrophic loss floors outside the learner as non-negotiable safety. + +The sweep geometry itself is also a VIBRISS parameter. The ParamSet may carry a +global sweep window plus per-regime/per-hash sweep windows in `sweep_policy`. +When the derived best band touches the search window boundary, treat that as a +signal that the search is still censored by the current bounds, not as proof +that the optimum is "wide open." In that case, expand the admissible sweep +window and re-evaluate before promoting the range. + +### 11.2 ParamSet Identity + +```yaml +param_set: + id: advsl.hold_substitute.v1 + name: ADVSL Hold Substitute + status: shadow_first + namespace_default: blue + consumer: advanced_sl + decision_family: exit_risk_timing + replaces: + - legacy_advsl_min_hold_bars_20 + related_live_controls: + - advsl.base_catastrophic_floor_pct + - advsl.overlay_catastrophic_floor_pct + - advsl.overlay_max_loss_usd + - advsl.overlay_pressure_min + - advsl.overlay_mae_risk_min +``` + +This spec governs the hold/arming decision only. It may recommend when ADVSL +is allowed to arm, but it must not remove the catastrophic floor. + +### 11.3 ParamSet Config and Parameters + +Shared ParamSet config: + +```yaml +paramset_config: + consumer: advanced_sl + decision_family: exit_risk_timing + placement: + decision_point: trade_entry + live_replacement_rhythm: capture_on_entry + intratrade_change_policy: shadow_only + outputs: + hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice + decision_table: dolphin.vibriss_decisions + reward_table: dolphin.vibriss_rewards + +param_defaults: + learner: + type: discounted_ucb + contextual_shadow_branch: linucb + nonstationarity: sliding_window + window_trades: 300 + safety: + fallback_baseline: 12 + max_exploration_rate: 0.0 + min_shadow_samples: 200 + min_live_confidence: 0.80 + reward_mapping: + primary_metric: recursive_capital_curve_delta_after_opportunity_cost + bounded_range: [-1.0, 1.0] + guardrails: + stale_obf_policy: ignore_obf_features + low_maras_confidence_policy: shrink_to_global_prior + drawdown_alarm_policy: freeze_to_safe_baseline +``` + +Primary learned parameter: + +```yaml +params: + advsl.min_hold_bars_before_floor_arm: + type: integer + units: bars + baseline_reference: 20 + starting_center: 12 + current_live_overlay_reference: 6 + default: 12 + domain: + candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40] + hard_min: 0 + hard_max: 48 +``` + +Companion deterministic guardrails: + +```yaml +params: + advsl.max_loss_usd_floor: + type: float + units: usd + default_overlay: 500.0 + research_candidate: 400.0 + learner_controlled: false + + advsl.catastrophic_floor_pct: + type: float + units: pct + default_base: 0.0120 + default_overlay: 0.0050 + learner_controlled: false + + advsl.recovery_extension_max_bars: + type: integer + units: bars + default: 0 + domain: + candidates: [0, 4, 8, 12, 20, 34] + hard_min: 0 + hard_max: 40 + learner_controlled: shadow_only_until_validated + safety: + min_shadow_samples: 500 + min_live_confidence: 0.90 +``` + +Interpretation: + +- `baseline_reference=20` preserves the historical question. +- `starting_center=12` is the current replay-derived center. +- `current_live_overlay_reference=6` records the tightened overlay state and + must be reported separately from the legacy 20-bar research baseline. +- `34` and `40` remain candidates because contiguous-region medians observed + during replay included materially longer optima. + +### 11.4 Required Sensors + +The hold substitute must use point-in-time sensors only. End-of-trade labels may +be used for reward calculation, not for action selection. + +Core context sensors: + +| Sensor | Source | Use | +|---|---|---| +| `asset` | live trade state | Asset-level prior and OBF join key. | +| `side` | live trade state / EFSM | Separate SHORT base from EFSM-flipped LONG contexts. | +| `bars_held` | live trade state | Determines current arming progress. | +| `entry_price` / `current_price` | live trade state | Signed path and current PnL. | +| `post_gross_path_pct` | trade path replay/live path state | Measures post-entry excursion shape. | +| `mae_pct` | live path state | Adverse excursion severity. | +| `mfe_pct` | live path state | Favorable excursion and recovery potential. | +| `mfe_decay` | derived from MFE/current PnL | Detects giveback and weakening recovery. | +| `current_pnl_mfe_frac` | derived from current PnL / MFE | Indicates whether recovery is intact or mostly lost. | +| `v7_exit_pressure` | `v7_decision_events` / live V7 snapshot | Pressure/continuation signal for recovery unlikely cases. | +| `v7_mae_risk` | V7 snapshot | Separates ordinary drawdown from risk-tier drawdown. | +| `v7_action` | V7 snapshot | EXIT/RETRACT/EXTEND/HOLD context. | +| `state_confidence` | market-state / MARAS / bundle confidence | Low confidence forces conservative fallback. | + +OBF sensors: + +| Sensor | Source | Use | +|---|---|---| +| `obf_depth_1pct_usd` | `obf_universe_latest` / OBF CH | Recovery-capacity and liquidity depth. | +| `obf_depth_quality` | OBF derived quality | Distinguishes deep snapback pockets from weak-book grinds. | +| `obf_spread_bps` | OBF | Penalizes bad microstructure. | +| `obf_imbalance` | OBF | Directional liquidity pressure. | +| `obf_imbalance_ma5` / `obf_imbalance_ma10` | OBF derived path | Smooths raw book pressure for in-trade TP/SL context. | +| `obf_imbalance_slope` | OBF derived path | Detects whether pressure is strengthening or fading. | +| `obf_imbalance_persistence` | OBF derived path | Measures sign stability rather than one-tick noise. | +| `obf_imbalance_reaccel` | OBF derived path | Detects renewed pressure after a mid-trade weakening/plateau. | +| `obf_staleness_s` | OBF timestamp | Guardrail; stale OBF cannot steer hold. | + +Regime sensors: + +| Sensor | Source | Use | +|---|---|---| +| `maras_regime` | `maras_latest` / `maras_fingerprint` | Label-level bias only, never hard filter. | +| `maras_composite_hash` | MARAS Scope B | Exact historical hash prior when sample size is enough. | +| `maras_scalar_hash` | MARAS Scope A | Coarse sortable regime prior. | +| `maras_confidence` | MARAS | Low confidence reduces live trust. | +| `maras_conflict_level` | MARAS | High conflict increases uncertainty/exploration penalty. | +| `s_eigen_vd`, `s_eigen_w50`, `s_eigen_w750` | MARAS raw signature | Eigen-state context. | +| `s_btc_dev_pct`, `raw_btc_ma99` | MARAS BTC tier | Trend/uptrend/downtrend pressure context. | +| `s_acb_boost`, `s_acb_beta` | MARAS/ACB | Protective/risk-on context. | + +Outcome-only reward sensors: + +| Sensor | Source | Use | +|---|---|---| +| `actual_exit_pnl` | `trade_events` | Realized baseline outcome. | +| `counterfactual_exit_pnl_by_hold` | tape replay | Arm-level reward. | +| `recovery_lag_s` | tape replay | Time to recover after floor/cut. | +| `extra_bars_to_recovery` | tape replay | Cost of too-short hold. | +| `clipped_winner_delta` | tape replay | Opportunity cost of premature exit. | +| `saved_loss_delta` | tape replay | Loss avoided by earlier floor arm. | +| `capital_curve_delta` | recursive replay | Primary reward accounting. | + +### 11.5 Feature Construction + +VIBRISS should compute a compact feature vector from the sensors: + +```text +path_speed = abs(post_gross_path_pct) / max(1, bars_held) +mae_velocity = mae_pct / max(1, bars_since_entry) +mfe_velocity = mfe_pct / max(1, bars_since_entry) +recovery_ratio = current_pnl_mfe_frac +giveback_ratio = 1.0 - current_pnl_mfe_frac +liquidity_score = f(obf_depth_1pct_usd, obf_depth_quality, obf_spread_bps) +signed_obf_imbalance = side_sign * obf_imbalance +imbalance_confirmation = f(signed_obf_imbalance_ma5, persistence, slope) +imbalance_reacceleration = f(prior_weakening, current_signed_slope, persistence) +pressure_score = f(v7_exit_pressure, v7_mae_risk, v7_action) +regime_key = maras_composite_hash if sample_count(hash) >= min_hash_n else maras_regime +confidence_weight = min(state_confidence, maras_confidence) * (1.0 - maras_conflict_level) +``` + +Feature requirements: + +- All features must be point-in-time. +- Missing OBF must not become zero-depth unless zero-depth is the actual + observation. Missing OBF is its own mask feature. +- MARAS labels are context, not filters. Use hash/sample priors and raw + signature dimensions where possible. +- Side must be explicit. EFSM-flipped LONG trades cannot share a blind SHORT + prior. +- OBF imbalance must be side-normalized. For a SHORT, negative raw imbalance is + confirming; for a LONG, positive raw imbalance is confirming. +- Raw imbalance is not enough. Use moving averages, persistence, slope, and + re-acceleration after weakening so a single noisy tick cannot steer ADVSL. + +### 11.5.1 OBF Imbalance Assistance Research + +Live ENJUSDT observation on `2026-06-04` motivates an explicit research feature +family for ADVSL/TP assistance. The trade entered SHORT near `10:06:14 UTC` and +closed `FIXED_TP` near `10:10:11 UTC` for `+$118.53`. + +Observed OBF path: + +- entry imbalance was near neutral (`~ -0.015` to `+0.001`); +- within seconds it snapped SHORT-confirming (`~ -0.18` to `-0.21`); +- mid-trade it weakened and oscillated around neutral in 30s buckets; +- into TP it re-strengthened materially (`~ -0.30` to `-0.35`). + +Conclusion: + +- Imbalance did not monotonically increase from entry to exit. +- It behaved as a confirmation/re-acceleration signal: neutral -> confirming + pressure -> weakening/plateau -> renewed confirming pressure into TP. +- Therefore VIBRISS should not use raw imbalance as a simple exit trigger. + +Candidate uses: + +| Use | Candidate rule | +|---|---| +| TP assist | If price is near TP and side-normalized imbalance re-accelerates in favor, avoid premature ADVSL/retract exits. | +| SL/ADVSL assist | If adverse PnL appears and side-normalized imbalance persistently contradicts the trade, recovery probability should shrink. | +| Hold assist | If imbalance is neutral/choppy but not contradictory, do not force an exit from imbalance alone. | +| Floor timing | Combine `price_progress_to_tp * imbalance_confirmation` with MAE/MFE path shape to decide whether the floor should wait or arm. | + +Candidate feature names: + +```text +imbalance_signed_for_trade +imbalance_ma5_signed +imbalance_ma10_signed +imbalance_slope_signed +imbalance_persistence_signed +imbalance_reacceleration_after_weakening +price_progress_to_tp_x_imbalance_confirmation +adverse_pnl_x_imbalance_contradiction +``` + +Research requirement: replay this across completed trades before live use. Score +it by recursive capital delta after opportunity cost, not by whether it explains +one ENJ winner. + +### 11.5.2 Macro-Thesis Persistence vs Local Danger Research + +Live XLMUSDT observation on `2026-06-04` motivates a mandatory ADVSL/VIBRISS +research direction. The trade suffered a large adverse excursion before closing +at `FIXED_TP`. Local OBF imbalance and V7 pressure were frightening during the +worst MAE; they did not cleanly foresee the recovery. The higher-level +eigen/MARAS context, however, stayed coherent with the trade thesis: bearish or +choppy-bearish posture, low conflict, active dislocation, and bearish BTC +context. + +Actionable lesson to test to exhaustion: + +```text +ADVSL/V7 local danger should be overruled only when macro thesis persistence +remains strong, MARAS conflict/novelty remains low, and OBF contradiction is not +persistent/deep enough to invalidate the thesis. +``` + +This is not a live rule yet. It is a research requirement for the first +VIBRISS-governed ADVSL/bar-hold policy. The learner must explicitly measure +when local pain is a true invalidation signal versus when it is survivable +excursion inside a still-valid macro/eigen thesis. + +The required research output is a weighting model, not a binary exception. The +policy must estimate how much authority belongs to local danger signals versus +macro-thesis persistence under the current context. Those weights are themselves +VIBRISS-tunable parameters and must be represented in the ParamSet spec with +safe defaults, bounded candidate ranges, promotion rules, and audit logging. + +Candidate feature names: + +```text +macro_thesis_persistence +maras_conflict_low_during_mae +maras_hash_knownness_during_mae +eigen_dislocation_persistence_during_mae +btc_context_alignment_during_mae +local_obf_contradiction_persistence +local_obf_contradiction_depth_weighted +v7_pressure_without_macro_invalidation +adverse_move_vs_macro_persistence +late_recovery_obf_reacceleration +``` + +Candidate tunable parameters: + +```text +local_danger_weight +macro_thesis_weight +obf_contradiction_weight +maras_conflict_weight +eigen_persistence_weight +btc_context_weight +v7_pressure_weight +macro_override_min_confidence +local_invalidation_min_persistence_bars +``` + +The initial decision form should be simple and auditable: + +```text +local_danger_score = + local_danger_weight * v7_pressure + + obf_contradiction_weight * local_obf_contradiction_persistence + + maras_conflict_weight * maras_conflict_or_novelty + +macro_thesis_score = + macro_thesis_weight * macro_thesis_persistence + + eigen_persistence_weight * eigen_dislocation_persistence_during_mae + + btc_context_weight * btc_context_alignment_during_mae + +hold_or_cut_bias = macro_thesis_score - local_danger_score +``` + +VIBRISS may tune the weights, but guardrails must prevent pathological behavior: +local danger cannot be ignored at extreme MAE, and macro thesis cannot override +persistent high-depth OBF contradiction plus MARAS conflict/novelty. + +Required tests: + +- replay all completed trades with this feature family available point-in-time; +- isolate high-MAE trades that later TP'd from high-MAE trades that continued + into real loss; +- charge every delayed cut for worst-case tail loss and every early cut for + missed recovery/opportunity cost; +- evaluate separately for base SHORTs and EFSM/overlay-flipped LONGs; +- report per-MARAS-hash, per-label, and nearest-neighbor raw-signature results; +- report learned/suggested weights and their stability by contiguous region, + MARAS hash, side, and asset-liquidity bucket; +- promote only if held-out contiguous regions improve recursive capital delta + without hiding clipped winners or worse tail events. + +### 11.5.3 Macro/OBF Evidence Hierarchy Research + +Live DASHUSDT observations on `2026-06-04` add a third case study to the XLM +and ETC findings. DASH produced two fast SHORT `FIXED_TP` trades, including +`efcc6dce`, which entered near `11:00:15 UTC` and closed near `11:00:38 UTC` +after only `2` bars for `+$367.92`. + +The large DASH trade was not a scary hold-through-MAE case: + +- V7 recorded `mae = 0` for the trade path; +- entry `vel_div` was extreme (`~ -0.2463`); +- MARAS at entry was `BEARISH`, low conflict, composite hash `58981`; +- BTC context remained bearish (`s_btc_above_ma99 = 0`); +- OBF imbalance initially leaned against the SHORT, then flipped materially + SHORT-confirming during the price break. + +This suggests an evidence hierarchy that must be tested explicitly: + +```text +macro/eigen OK + OBF confirms + > macro/eigen OK + OBF neutral/choppy + > macro/eigen OK + OBF counters transiently but then flips confirming + > macro/eigen OK + OBF persistently counters with depth + > macro/eigen weak/conflicted regardless of OBF +``` + +The hierarchy is not a live rule. DASH shows that a very strong macro/eigen +impulse can overcome early OBF contradiction when the contradiction is shallow +or transient. ETC shows the stronger case, where OBF remained SHORT-confirming +through adverse price movement. XLM shows the weaker/riskier case, where macro +thesis persistence carried the trade while OBF was ugly at the worst point. + +Candidate features: + +```text +macro_obf_alignment_class +macro_extreme_impulse_score +obf_counter_transience_bars +obf_counter_depth_weighted +obf_flip_to_confirmation_latency_s +obf_confirmation_after_macro_impulse +macro_ok_obf_confirm_weight +macro_ok_obf_counter_weight +macro_extreme_overrides_obf_counter_weight +``` + +Required tests: + +- rank outcomes by `macro_obf_alignment_class`; +- compare `macro OK + OBF confirm` against `macro OK + OBF counter`; +- split OBF counter cases into transient, shallow, persistent, and + depth-weighted contradiction; +- measure whether OBF flip-to-confirmation latency predicts TP speed; +- report whether extreme `vel_div` can safely receive more weight than early + OBF contradiction, and where that becomes unsafe; +- expose the learned hierarchy weights as VIBRISS-tunable parameters, not + hardcoded doctrine. + +### 11.5.4 Falling-Knife / Missing-Bounce-Sensor Case Study + +Live LTCUSDT observation on `2026-06-04` (`c0139cea`) adds an open/pending case +study for the opposite side of the DASH impulse capture. The trade entered SHORT +near `11:15:12 UTC` with extreme entry `vel_div` (`~ -0.1942`) and high notional, +but subsequently showed severe adverse excursion and no meaningful favorable +excursion at the time of review. V7 also emitted repeated `RETRACT` +recommendations, but V7 pressure is not treated as truth by itself; XLM showed +that V7 can scream during a trade that later recovers profitably. + +Observed at review time: + +- `inverse_ars_bounce_shadow` was stale; latest row was `2026-06-03 18:42:26 + UTC`, so the bounce detector was not assisting live; +- V7 repeatedly emitted `RETRACT / V7_RISK_DOMINANT`, which is local-pain + evidence only; +- V7 observed `mae ~ 0.854%`, `mfe = 0`, and `exit_pressure = 3`; +- OBF was mostly neutral/choppy with weak, oscillating side-normalized evidence, + not a strong rescue signal; +- MARAS/BTC remained broadly bearish/low-conflict, but recent eigen values were + intermittent rather than steadily thesis-confirming. + +Research meaning: + +```text +macro/eigen entry impulse alone is insufficient when local danger is extreme, +MFE remains zero, OBF does not confirm, and the bounce/inverse-risk sensor is +missing or stale. +``` + +V7 pressure must be weighted conditionally: + +```text +V7 pressure is discounted when macro thesis remains strong, OBF confirms, and +MFE exists. + +V7 pressure receives more weight only when independent local invalidation +features agree: zero MFE, rising MAE, neutral/counter OBF, stale/missing bounce +sensor, macro impulse decay, or MARAS conflict/novelty. +``` + +Candidate features: + +```text +bounce_sensor_freshness_s +bounce_sensor_missing_mask +extreme_macro_without_mfe +v7_retract_persistence_bars +zero_mfe_high_mae_flag +obf_neutral_or_counter_during_mae +macro_impulse_decay_after_entry +``` + +Required replay treatment: + +- stale/missing bounce data must be an explicit mask feature, not an assumed + neutral score; +- compare extreme-entry trades that get early MFE against extreme-entry trades + with zero MFE and rising MAE; +- treat persistent V7 `RETRACT` as a local-danger amplifier only when confirmed + by independent invalidation sensors such as stale bounce, zero MFE, rising + MAE, neutral/counter OBF, or macro impulse decay; +- only promote a macro override if it survives this LTC-style case family after + opportunity-cost and tail-loss accounting. + +### 11.6 Learning / Computing Model + +V1 should use a two-layer policy: + +1. Prior/posture estimator: + - computes candidate priors from historical replay by MARAS composite hash, + MARAS label, asset, side, and contiguous time region. + - uses shrinkage: hash prior -> label prior -> global prior. + - initializes the hold target near `12` bars unless the context prior has + enough evidence to move it. + +2. Online contextual bandit: + - learner: discounted LinUCB or LinTS over finite hold-bar arms. + - arms: `[4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]`. + - reward: delayed until trade close or replay terminal. + - discount/window: sliding 300 closed trades, plus faster decay when drift is + detected. + - exploration: shadow-only by default; live exploration cap starts at `0`. + +Recommended fallback if contextual coverage is sparse: + +```text +if hash_sample_n >= 30: + prior = median_best_hold_for_hash +elif label_side_sample_n >= 100: + prior = median_best_hold_for_label_side + label_bias +else: + prior = 12 + +advice = guardrail_filter(contextual_bandit(prior, candidates)) +``` + +Optional recovery model: + +- Train a survival model for `extra_bars_to_recovery`. +- Use it only as a veto/adjuster until validated. +- It may increase hold only when recovery probability is high and expected + extra hold is short. + +### 11.7 Success Definition + +Primary success metric: + +```text +recursive_capital_curve_delta_after_opportunity_cost +``` + +This means the replay must account for saved capital compounding forward, and +must subtract the opportunity cost of trades that would have recovered or won +after a premature floor/ADVSL action. + +Secondary metrics: + +- net PnL delta +- ROI delta +- max drawdown delta +- tail-loss count and severity +- number of hard/floor cuts +- number of clipped winners +- gross saved loss +- gross missed upside +- average and median recovery lag +- average and median extra bars to recovery +- TP near-miss count, TP near-miss recovery lag, and first-touch TP hit rate +- per-hash and per-label stability +- OOD region performance +- worst contiguous-region degradation +- explicit ceiling-violation count and worst single-loss size under the tested + policy, because a "best" replay result is not acceptable if it breaches the + operator's declared loss ceiling + +Promotion requires: + +- positive recursive capital-curve delta on held-out contiguous regions, +- no unacceptable increase in clipped-winner opportunity cost, +- no hidden dependence on a single asset or single MARAS hash, +- improvement or neutral behavior on EFSM-flipped LONG subset, +- deterministic replay reproducibility, +- shadow logging coverage sufficient for OPE. + +### 11.8 Calibration Protocol + +Calibration must run in this order: + +1. Full-tape replay: + - evaluate every candidate hold arm on every eligible historical trade path. + - include all available BLUE/PINK/PRODGREEN executed trade history only when + namespace semantics are kept separate. + +2. Capital-aware replay: + - recursively recompute capital after each counterfactual exit. + - preserve position sizing geometry when the saved/lost capital changes the + subsequent notional. + +3. Opportunity-cost audit: + - for every floor/ADVSL cut, measure whether the trade later recovered. + - record recovery lag, extra bars, and missed PnL. + +4. Region validation: + - split into contiguous time regions with enough trades. + - repeat with moving/randomized boundaries. + - report median/best hold per region. + +5. MARAS proximity validation: + - group by composite hash when sample size is enough. + - otherwise use nearest-neighbor distance over MARAS raw signature fields. + - report whether per-hash/per-neighbor priors outperform global 12-bar center. + +6. OBF validation: + - bind optimum hold to `obf_depth_1pct_usd`, `obf_depth_quality`, spread, and + imbalance. + +7. TP near-miss validation: + - include trades that nearly touched candidate TP but missed on the observed + cadence. + - compute first-touch labels from the highest-resolution available path. + - isolate the opportunity cost of late reversal after near-touch. + - compare the resulting TP bucket against the profitable-close-only sample. + - test on OOD time slices; do not promote an OBF rule from in-sample fit only. + +7. Walk-forward: + - train on region N, validate on N+1. + - repeat across the full history. + - freeze the learner if the current best policy degrades versus baseline. + +### 11.9 Advice Payload + +Example advice: + +```json +{ + "schema": "vibriss.param_set_advice.v1", + "namespace": "blue", + "param_set_id": "advsl.hold_substitute.v1", + "spec_version": "1.0.0", + "trade_scope": "on_entry", + "baseline_reference": 20, + "current_live_overlay_reference": 6, + "recommended": { + "advsl.min_hold_bars_before_floor_arm": 12, + "advsl.recovery_extension_max_bars": 0 + }, + "candidate_set": [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40], + "confidence": 0.74, + "context": { + "asset": "XLMUSDT", + "side": "LONG", + "maras_composite_hash": 57957, + "maras_regime": "CHOPPY_BEARISH", + "obf_depth_quality_bucket": "weak", + "v7_pressure_bucket": "high" + }, + "guardrail_status": "SHADOW_ONLY", + "fallback_value": 12, + "expires_at": "2026-06-03T00:05:00Z" +} +``` + +### 11.10 Guardrails + +Mandatory guardrails: + +- Shadow-only until walk-forward validation is positive. +- No live exploration by default. +- Do not allow the learner to disable catastrophic floors. +- If OBF is stale, ignore OBF-derived hold extension. +- If MARAS confidence is low or conflict is high, shrink toward global prior. +- If context is EFSM-flipped LONG and LONG sample count is sparse, use the + tighter safe prior, not a broad SHORT-derived prior. +- If the recommended hold would increase worst-case open loss beyond the active + floor/cap, the floor/cap wins. +- If capital drawdown alarm is active, freeze to deterministic safe baseline. + +### 11.11 Starting Priors From Current Research + +Current replay-derived starting posture: + +| Context | Starting prior | Rationale | +|---|---:|---| +| Global ADVSL hold substitute | `12` bars | Best current center for reducing 20-bar tail slips without assuming all contexts need long waits. | +| Legacy baseline comparison | `20` bars | Historical no-arm/min-hold reference. | +| Tight overlay reference | `6` bars | Current live overlay guardrail reference, not the general learned policy. | +| Recovery/snapback pockets | `24` to `40` bars | Some contiguous-region medians were materially longer; keep as candidates, not defaults. | +| Sparse/unknown context | `12` bars | Conservative research center with shrinkage. | +| EFSM-flipped LONG sparse context | `6` to `12` bars | Do not borrow broad SHORT recovery priors blindly. | + +Known caution: + +- A `$400` hard cap improved one capital-aware slice by about `+$592.83` versus + the 12-bar-only replay, but generated a gross forgone-upside bucket around + `+$6,617.30` on hard-cap hits. Therefore max-loss floors must be evaluated + with opportunity cost and recovery lag, not judged by saved-loss totals alone. + +### 11.12 Promotion Policy + +Promotion is part of this ParamSet, not a global runner decision. + +```yaml +promotion_policy: + owner: advsl.hold_substitute.v1 + technique: replay_shadow_canary + baseline_policy: + legacy_reference: 20 + current_overlay_reference: 6 + fallback_value: 12 + cadence: + replay_calibration: every_6h_or_50_new_rewards + promotion_review: every_30m + checkpoint_review: every_60s + live_replacement_rhythm: at_trade_entry_only + evidence_gates: + shadow_to_advisory: + min_replay_trades: 300 + min_contiguous_regions: 4 + recursive_capital_curve_delta_after_cost: "> 0" + worst_region_delta: ">= -0.10 * positive_total_delta" + clipped_winner_cost_budget: "documented_and_bounded" + advisory_to_canary_live: + min_shadow_decisions: 200 + min_closed_trade_rewards: 50 + min_days_observed: 3 + no_unexplained_tail_loss_cluster: true + manual_approval_required: true + canary_live_to_controlled_live: + min_live_consumed_trades: 50 + live_vs_shadow_regret: "<= 0" + no_guardrail_violation: true + manual_approval_required: true + canary_scope: + namespaces: [blue] + max_paramsets_live: 1 + max_live_exploration_rate: 0.0 + allow_only_capture_on_entry: true + automatic_demotion: + - stale_obf_or_maras_required_context + - reward_backlog_critical + - drawdown_alarm + - candidate_underperforms_baseline_in_shadow + - checkpoint_hash_mismatch +``` + +Interpretation: + +- `replay_calibration` answers how often the ParamSet re-estimates candidate + quality from historical/newly closed data. +- `promotion_review` answers how often the ParamSet is checked for stronger + mode eligibility. +- `live_replacement_rhythm` answers when the engine may replace the old + parameter with the VIBRISS value. For this ParamSet it is only at trade entry. +- The runner executes this contract. It does not invent promotion thresholds. + +### 11.13 Meta-Cadence Policy + +The cadence parameters are themselves governed by this ParamSet. They are not +free-floating daemon settings. + +```yaml +meta_cadence_policy: + owner: advsl.hold_substitute.v1 + status: shadow_first + learner: discounted_ucb_then_linucb + tunable_cadences: + replay_calibration_interval_s: + baseline: 21600 + candidates: [1800, 3600, 10800, 21600, 43200] + promotion_review_interval_s: + baseline: 1800 + candidates: [900, 1800, 3600, 7200] + checkpoint_interval_s: + baseline: 60 + candidates: [30, 60, 120, 300] + min_new_rewards_before_recalibration: + baseline: 50 + candidates: [10, 25, 50, 100] + shadow_to_canary_cooldown_trades: + baseline: 100 + candidates: [25, 50, 100, 200] + context_inputs: + maras: + - maras_composite_hash + - maras_confidence + - maras_conflict_level + - maras_nearest_distance + exof: + - exf_latest + - btc_regime_features + - market_volatility_context + esof: + - session_bucket + - day_of_week + - calendar_event_flags + ops: + - reward_backlog_age_s + - ch_write_failure_rate + - artifact_disk_free_gb + - drawdown_state + reward_mapping: + positive: + - faster_detection_of_degraded_hold_policy + - lower_stale_advice_rate + - lower_missed_adaptation_cost + negative: + - promotion_false_positive + - noisy_recalibration_churn + - excessive_compute_or_backlog + - operator_churn + live_change_policy: + replay_calibration_interval_s: controlled_after_shadow + promotion_review_interval_s: advisory_only_until_manual_approval + checkpoint_interval_s: fixed_by_ops_until_runner_load_tested + shadow_to_canary_cooldown_trades: advisory_only +``` + +This makes MARAS, ExoF, and EsoF eligible context for cadence advice. For +example, VIBRISS may learn that high MARAS novelty plus hostile ExoF context +requires faster recalibration review, while ordinary stable regimes can use a +slower cadence to avoid overreacting. + +Cadence testing is permitted, but first in shadow: + +- log what cadence would have been chosen; +- replay whether that cadence would have detected degradation sooner; +- charge compute/backlog cost; +- charge false-promotion cost; +- compare against fixed-cadence baseline. + +Only after the meta-cadence policy beats fixed cadence in walk-forward replay +and shadow operation may it control any real scheduler interval. + +### 11.14 Catastrophic Floor Derivation Study + +The floor percentage is now a dedicated shadow-only VIBRISS research target. + +```yaml +param_set: + id: advsl.catastrophic_floor_derivation.v1 + name: ADVSL Catastrophic Floor Derivation + status: shadow_first + success: + primary_metric: recursive_capital_curve_delta_after_opportunity_cost + artifact_kinds: [code, test, spec] + artifact_refs: + - prod/vibriss/floor_derivation.py + - prod/vibriss/test_floor_derivation.py + - prod/docs/ADVSL_CATASTROPHIC_FLOOR_DERIVATION_STUDY.md + - prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md +``` + +Current full-tape replay on the blue trade tape: + +- replayable trades: `802` +- actual end capital: `$51,937.21` +- floor-only best aggregate candidate: `1.50%` +- floor-only per-regime averages: still centered at `0.50%` + +Interpretation: + +- this study does **not** validate `1.20%` as a universal standalone floor; +- it validates the need for a derivation path and the ability to bind the + floor to code/test/spec evidence; +- `1.20%` remains a coupled-policy prior for the broader ADVSL/TP/hold stack, + not a floor-only truth. + +The floor-only study must remain shadow-only. Live use may only follow a +coupled policy that demonstrates positive recursive capital curve delta on +held-out contiguous regions. + +### 11.15 Acceptance Tests + +Minimum tests before implementation can be called complete: + +- Given a fixed replay window, the same hold recommendation and reward are + reproduced bit-for-bit or within declared float tolerance. +- Candidate arms outside the hard range are rejected. +- Stale OBF creates a masked feature, not a fake zero-depth observation. +- Low MARAS confidence or high conflict shrinks advice toward the global prior. +- EFSM-flipped LONG contexts do not use unqualified SHORT-only priors. +- Capital-aware replay compounds saved/lost capital forward. +- Opportunity cost is charged when a cut trade later recovers. +- The shadow advice payload contains candidate set, chosen arm, confidence, + baseline, guardrail result, and reproducibility keys. +- Promotion decisions are rejected when the ParamSet omits `promotion_policy`. +- Meta-cadence advice is logged as a ParamSet decision, not a runner-local + heuristic. + +## 12. VIBRISS Ops / Runner System + +### 12.1 Operational Objective + +VIBRISS must run as an observable production subsystem, not as an ad hoc +notebook or one-off replay script. + +The runner is responsible for: + +- loading parameter specs and ParamSet specs, +- ingesting live context from Hazelcast and historical context from ClickHouse, +- publishing shadow/advisory parameter postures, +- scheduling replay/calibration subtasks, +- writing full audit logs, +- exposing health sensors to MHS, +- feeding TUI/observability surfaces, +- checkpointing learner state so recommendations are reproducible after restart. + +The runner must reuse the existing infrastructure pattern: + +- supervisord is the process authority; +- Hazelcast is the live bus; +- ClickHouse is the audit/event store; +- NATS is the optional event transport for replay, reward, and policy-state + fanout when decoupled workers or durable queues are useful; +- MHS reads composite health from HZ and reports it in `DOLPHIN_META_HEALTH`; +- TUI observes primarily through HZ listeners and polls CH only for heavier + historical panels; +- Prefect is optional for scheduled offline jobs, not required for the hot + VIBRISS daemon. + +### 12.2 Process Topology + +VIBRISS should be containerized, but still owned by supervisord. +In the current production layout, the host supervisord owns only the +container bootstrap wrapper; the container itself runs its own supervisord +instance, which owns the live runner process. That makes later full-system +containerization easier without changing the runner contract. + +If sandboxing is enabled, gVisor is the outer runtime boundary for the +container or worker container. VIBRISS does not instantiate or manage gVisor +from inside the container; the host/container runtime selects that boundary at +launch time. The containerized runner must still reach host Hazelcast and +ClickHouse over the configured backplane. If NATS is enabled, it runs as a +sibling stack service on the host backplane and the container talks to it over +`nats://localhost:4222`. + +Recommended process shape: + +```text +supervisord + -> vibriss_runner container + -> live advice loop + -> spec loader + -> health publisher + -> lightweight replay scheduler + -> learner checkpoint writer + + -> optional vibriss_worker container(s) + -> full-tape replay + -> walk-forward validation + -> OBF/MARAS proximity calibration + -> offline policy evaluation +``` + +The live runner is a long-lived daemon. Heavy replay/calibration jobs are +separate subtasks so the live advice loop cannot be blocked by ML work. + +The experiment-side harness that replays trade episodes, sweep ranges, and +walk-forward windows is specified separately in +[`VIBRASS_EXPERIMENT_RUNNER_SPEC.md`](VIBRASS_EXPERIMENT_RUNNER_SPEC.md). + +Container runtime: + +- Docker or Podman is acceptable. +- Prefer Podman if rootless isolation becomes important. +- Optional sandbox runtime: gVisor may wrap the launched container or worker + container, but it is selected outside VIBRISS by the host/container runtime. + VIBRISS must not attempt to manage the sandbox boundary from inside the + container. +- Do not put Hazelcast in the VIBRISS container. +- Do not restart Hazelcast as part of VIBRISS recovery. +- Mount large replay outputs to `/mnt/dolphin_training/vibriss/`, not the SMB + repo path. +- Write only small docs/specs to `/mnt/dolphinng5_predict/prod/docs/`. + +### 12.3 Supervisor Contract + +Recommended supervisord entries: + +```ini +[program:vibriss_runner] +command=/usr/bin/podman run --rm --name dolphin-vibriss-runner + --network host + -v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro + -v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw + -v /mnt/ng6_data:/mnt/ng6_data:ro + -e HZ_HOST=localhost:5701 + -e CH_URL=http://localhost:8123/ + -e CH_DB=dolphin + dolphin-vibriss:latest + python -m vibriss.runner --mode shadow +directory=/mnt/dolphinng5_predict/prod +autostart=true +autorestart=true +startsecs=10 +startretries=5 +stopwaitsecs=20 +stopasgroup=true +killasgroup=true +stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner.log +stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner-error.log + +[program:vibriss_worker] +command=/usr/bin/podman run --rm --name dolphin-vibriss-worker + --network host + -v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro + -v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw + -v /mnt/ng6_data:/mnt/ng6_data:ro + dolphin-vibriss:latest + python -m vibriss.worker --idle +directory=/mnt/dolphinng5_predict/prod +autostart=false +autorestart=false +startsecs=0 +stopwaitsecs=30 +stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker.log +stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker-error.log +``` + +Group placement: + +```ini +[group:dolphin_data] +programs=exf_fetcher,acb_processor,obf_universe,meta_health,system_stats, + esof_advisor,maras_service,vibriss_runner +``` + +Rationale: + +- VIBRISS is data/control-plane infrastructure, not the trader itself. +- The runner can be autostarted because it begins shadow-only. +- Workers remain manual or scheduler-launched because full replay can be heavy. +- MHS must observe VIBRISS health, but must not fight the container runtime + through systemd. + +### 12.4 Container Interface + +Required environment variables: + +| Env | Meaning | +|---|---| +| `HZ_HOST` | Hazelcast host/port, default `localhost:5701`. | +| `CH_URL` | ClickHouse HTTP URL. | +| `CH_DB` | Namespace DB: `dolphin`, `dolphin_prodgreen`, or PINK-specific DB. | +| `CH_USER` / `CH_PASS` | ClickHouse credentials. | +| `NATS_URL` | Optional NATS server URL, default `nats://localhost:4222`. | +| `VIBRISS_ENABLE_NATS_TRANSPORT` | Enable best-effort NATS publication. | +| `VIBRISS_NATS_SUBJECT_PREFIX` | Subject prefix, default `vibriss`. | +| `VIBRISS_MODE` | `shadow`, `advisory`, `canary`, or `disabled`. | +| `VIBRISS_NAMESPACE` | `blue`, `pink`, `prodgreen`, or `research`. | +| `VIBRISS_SPEC_DIR` | Param spec directory. | +| `VIBRISS_STATE_DIR` | Checkpoint/output directory. | +| `VIBRISS_ENABLE_LIVE_ACTUATION` | Must default to `0`. | +| `VIBRISS_CALIBRATION_INTERVAL_S` | Default replay/calibration scheduler interval. | +| `VIBRISS_PROMOTION_REVIEW_INTERVAL_S` | Default promotion-gate review interval. | +| `VIBRISS_META_CADENCE_MODE` | `fixed`, `shadow`, or `controlled`; defaults to `fixed`. | +| `VIBRISS_MHS_SENSOR_KEY` | Default `vibriss_sensors_blue`. | +| `VIBRISS_HEALTH_INTERVAL_S` | Default `5`. | + +Filesystem contract: + +| Path | Mode | Use | +|---|---|---| +| `/mnt/dolphinng5_predict` | read-only in container | Code/spec/doc access. | +| `/mnt/dolphin_training/vibriss` | read-write | Learner state, replay artifacts, reports. | +| `/mnt/ng6_data` | read-only | Tape, OBF, scan data. | +| `/tmp` inside container | read-write ephemeral | Small temporary files only. | + +### 12.5 Internal Runner Loops + +The runner should have separate loops with independent health status: + +| Loop | Cadence | Responsibility | +|---|---:|---| +| `spec_loader` | startup + 60s | Load/validate ParamSpec and ParamSetSpec files. | +| `context_ingestor` | 0.5s to 5s | Read HZ live context and keep a point-in-time snapshot. | +| `advice_loop` | on context/trade event | Score candidates and publish shadow/advisory advice. | +| `reward_collector` | 10s to 60s | Join closed trades to advice and write delayed rewards. | +| `checkpoint_loop` | 60s | Persist learner state and model metadata. | +| `calibration_scheduler` | 5m+ | Queue replay/validation subtasks when new data warrants it. | +| `promotion_evaluator` | 15m+ | Evaluate whether a ParamSet may move to a stronger mode. | +| `meta_cadence_evaluator` | 15m+ | Shadow-test cadence settings for calibration/promotion/update loops. | +| `health_publisher` | 5s | Publish MHS-compatible sensor payload. | + +The advice loop must never wait on full replay, model training, or ClickHouse +backfill. If ClickHouse is slow, advice may continue from latest checkpoint and +mark reward collection degraded. + +### 12.6 Hazelcast Surfaces + +Recommended HZ maps/keys: + +| Map | Key | Producer | Consumer | Purpose | +|---|---|---|---|---| +| `DOLPHIN_FEATURES` | `vibriss_param_advice` | runner | BLUE/PINK/TUI | Latest general parameter advice. | +| `DOLPHIN_FEATURES` | `vibriss_hold_substitute_advice` | runner | ADVSL/TUI | Latest ADVSL hold-substitute advice. | +| `DOLPHIN_FEATURES` | `vibriss_latest` | runner | TUI/MHS/manual ops | Compact subsystem summary. | +| `DOLPHIN_META_HEALTH` | `vibriss_sensors_blue` | runner | MHS | BLUE VIBRISS sensor payload. | +| `DOLPHIN_META_HEALTH` | `vibriss_sensors_pink` | runner | MHS | PINK VIBRISS sensor payload. | +| `DOLPHIN_HEARTBEAT` | `vibriss_runner_heartbeat` | runner | MHS/TUI | Liveness heartbeat. | +| `DOLPHIN_CONTROL_PLANE` | `vibriss_commands` | ops/TUI | runner | Freeze, unfreeze, replay, reload specs. | + +Advice remains separate from commands. An advice key tells the engine what +VIBRISS recommends; a command key tells VIBRISS what operators want it to do. + +### 12.7 ClickHouse Tables + +VIBRISS needs durable audit tables. Recommended tables: + +| Table | Purpose | +|---|---| +| `dolphin.vibriss_decisions` | One row per candidate-scoring decision. | +| `dolphin.vibriss_rewards` | Delayed realized/counterfactual reward rows. | +| `dolphin.vibriss_policy_state` | Checkpoint metadata and active posture versions. | +| `dolphin.vibriss_paramset_status` | Per-ParamSet health/performance summary. | +| `dolphin.vibriss_subtasks` | Replay/calibration/ML subtask lifecycle. | + +Minimum `vibriss_decisions` fields: + +```sql +ts DateTime64(6, 'UTC'), +namespace LowCardinality(String), +mode LowCardinality(String), +param_set_id LowCardinality(String), +spec_version String, +decision_id String, +trade_id String, +asset LowCardinality(String), +side LowCardinality(String), +scan_number UInt64, +context_hash String, +maras_composite_hash UInt16, +maras_regime LowCardinality(String), +candidate_set_json String, +chosen_arm String, +baseline_value String, +recommended_value String, +confidence Float32, +propensity Float32, +guardrail_status LowCardinality(String), +fallback_reason String, +model_version String, +payload_json String +``` + +Minimum `vibriss_rewards` fields: + +```sql +ts DateTime64(6, 'UTC'), +decision_id String, +trade_id String, +reward_status LowCardinality(String), +raw_actual_pnl Float64, +raw_counterfactual_pnl Float64, +saved_loss_delta Float64, +clipped_winner_delta Float64, +capital_curve_delta Float64, +drawdown_delta Float64, +recovery_lag_s Float32, +extra_bars_to_recovery Float32, +normalized_reward Float32, +reward_components_json String +``` + +Subtask rows must include `subtask_id`, `param_set_id`, `kind`, `status`, +`started_at`, `finished_at`, `input_window`, `artifact_path`, `n_trades`, +`primary_metric`, `failure_reason`, and `parent_decision_id` when applicable. + +### 12.8 MHS Sensor Contract + +VIBRISS should expose an MHS-compatible composite payload, modeled after the +existing optional DITA sensor pattern. + +Recommended HZ key: + +```text +DOLPHIN_META_HEALTH["vibriss_sensors_blue"] +``` + +Payload: + +```json +{ + "schema": "vibriss.mhs_sensors.v1", + "namespace": "blue", + "ts": "2026-06-03T00:00:00Z", + "rm_meta": 0.93, + "status": "GREEN", + "m14_vibriss_runner_liveness": 1.0, + "m15_vibriss_spec_integrity": 1.0, + "m16_vibriss_data_freshness": 0.9, + "m17_vibriss_advice_integrity": 1.0, + "m18_vibriss_reward_backlog": 0.85, + "m19_vibriss_paramset_health": 0.95, + "param_sets": { + "advsl.hold_substitute.v1": { + "score": 0.94, + "status": "GREEN", + "mode": "shadow", + "last_advice_age_s": 2.4, + "last_reward_age_s": 31.0, + "open_decisions": 1, + "reward_backlog": 3, + "shadow_samples": 240, + "walk_forward_status": "pending", + "latest_recommended_hold": 12 + } + }, + "subtasks": { + "full_tape_replay": {"score": 1.0, "status": "IDLE"}, + "walk_forward": {"score": 0.8, "status": "STALE"}, + "obf_binding": {"score": 1.0, "status": "IDLE"} + } +} +``` + +Sensor scoring: + +| Sensor | Score rule | +|---|---| +| `m14_vibriss_runner_liveness` | 1 if heartbeat age < 15s, 0.5 if < 60s, else 0. | +| `m15_vibriss_spec_integrity` | Fraction of loaded specs passing validation. | +| `m16_vibriss_data_freshness` | Freshness of HZ context, CH close rows, OBF/MARAS context. | +| `m17_vibriss_advice_integrity` | 1 when latest advice is schema-valid and guardrailed. | +| `m18_vibriss_reward_backlog` | Penalizes unjoined decisions awaiting reward too long. | +| `m19_vibriss_paramset_health` | Mean score of all enabled ParamSets. | + +MHS integration rule: + +- VIBRISS starts with weight `0.0` in RM_META until stable. +- Then enable a small optional weight, analogous to DITA sensors. +- Suggested initial weight: `0.02`. +- Maximum allowed weight: `0.10` until the subsystem is live-actuating. +- If VIBRISS is disabled, MHS score must be neutral and must not degrade BLUE. + +Suggested MHS env shape: + +```text +DOLPHIN_MHS_USE_VIBRISS_SENSORS=1 +DOLPHIN_MHS_VIBRISS_SENSOR_WEIGHT=0.02 +DOLPHIN_VIBRISS_SENSOR_KEY=vibriss_sensors_blue +DOLPHIN_MHS_VIBRISS_SENSOR_MAPS=DOLPHIN_META_HEALTH,DOLPHIN_FEATURES +``` + +### 12.9 Observability / TUI Integration + +TUI integration should follow the existing v9 pattern: + +- use HZ listeners for latest VIBRISS state; +- add CH polling only for historical/replay-heavy summaries; +- never poll origin subsystems directly from the TUI. + +Recommended panels: + +| Panel | Source | Cadence | Content | +|---|---|---:|---| +| `VIBRISS` main panel | `DOLPHIN_FEATURES/vibriss_latest` | HZ listener | mode, status, latest ParamSet advice, confidence, MHS score. | +| `VIBRISS Hold` footer | `vibriss_hold_substitute_advice` + CH rewards | HZ + 60s CH | recommended hold, baseline, prior, reward backlog, recent net delta. | +| `VIBRISS Tasks` footer | `vibriss_subtasks` | 60s CH | replay/walk-forward/OBF binding status. | +| `MHS` existing panel | `DOLPHIN_META_HEALTH/latest` | HZ listener | include VIBRISS sensor details if enabled. | + +Display fields for `advsl.hold_substitute.v1`: + +```text +VIBRISS HOLD mode=shadow rec=12b base=20b live_ref=6b +conf=74% guard=PASS hash=57957 obf=weak pressure=high +reward_backlog=3 wf=pending samples=240 +``` + +The TUI must clearly distinguish: + +- baseline reference, +- current live reference, +- VIBRISS recommendation, +- whether recommendation is shadow-only or live-consumed. + +Implementation note: + +- `prod/vibriss/vibriss_tui.py` now provides the Textual dashboard, and + `python -m vibriss.vibriss_runner tui` launches it in read-only shadow mode. +- The UI is panel-registry based so additional metrics can be added without + rewriting the dashboard shell. + +### 12.10 Control Commands + +Commands should be written to `DOLPHIN_CONTROL_PLANE["vibriss_commands"]`. + +Allowed commands: + +| Command | Effect | +|---|---| +| `RELOAD_SPECS` | Reload ParamSpec/ParamSetSpec files and validate. | +| `FREEZE_PARAMSET` | Stop updating and publish fallback for one ParamSet. | +| `UNFREEZE_PARAMSET` | Resume shadow/advisory scoring. | +| `RUN_REPLAY` | Queue replay subtask for a parameter set/window. | +| `RUN_WALK_FORWARD` | Queue walk-forward validation. | +| `SET_MODE` | Move `disabled -> shadow -> advisory`; live/canary requires explicit code/config gate. | +| `CHECKPOINT_NOW` | Persist learner state immediately. | + +Commands must be acknowledged to: + +```text +DOLPHIN_CONTROL_PLANE["vibriss_command_ack"] +``` + +Ack payloads must include command id, acceptance/rejection, reason, and current +mode. Queue consumption alone is not success. + +### 12.11 Prefect Role + +Prefect is optional for VIBRISS. It should not be required for live advice. + +Acceptable Prefect use: + +- daily full-tape replay, +- scheduled walk-forward validation, +- artifact publication, +- long offline calibration runs. + +Not acceptable: + +- live advice loop, +- hot-path reward joining, +- health publication, +- operator freeze/unfreeze commands. + +If Prefect is unavailable, the VIBRISS runner should continue shadow/advisory +operation from the last checkpoint and mark scheduled calibration stale. + +### 12.12 Failure Modes and Fallback + +| Failure | Required behavior | +|---|---| +| HZ unavailable | Runner logs degraded, cannot publish advice, MHS score <= 0.5. | +| CH unavailable | Advice may continue from checkpoint; reward collector degrades. | +| OBF stale | Mask OBF features; do not use OBF hold extension. | +| MARAS stale | Shrink to global/label-free prior. | +| Spec validation failure | Disable affected ParamSet, publish fallback. | +| Learner checkpoint corrupt | Revert to last good checkpoint or baseline prior. | +| Replay worker OOM/fails | Mark subtask failed; live runner continues. | +| Advice schema invalid | Do not publish; MHS advice integrity drops. | +| Drawdown alarm | Freeze to deterministic safe baseline. | + +### 12.13 Promotion Gates + +Before any engine consumes VIBRISS hold advice live: + +1. Runner has been stable for at least 7 calendar days. +2. MHS VIBRISS sensors are GREEN or neutral for 95% of runner uptime. +3. `advsl.hold_substitute.v1` has completed full-tape replay. +4. Walk-forward is positive versus baseline on capital-curve delta after + opportunity cost. +5. OOD region performance has no catastrophic degradation. +6. TUI displays baseline/current/recommended state correctly. +7. Command ack path is verified. +8. Safe fallback is tested by intentionally freezing the ParamSet. +9. Engine consumption is limited to one ParamSet and one namespace. +10. `VIBRISS_ENABLE_LIVE_ACTUATION=1` is explicitly set and reviewed. + +## 13. V1 Rollout Plan + +1. Offline replay only: + - replay historical decisions from ClickHouse and tape. + - benchmark against baseline constants. + - compute OPE where logged propensities exist. + - report by asset, side, MARAS hash, regime label, V7 reason, OBF bucket, + and contiguous time region. + +2. Shadow mode: + - publish advice to HZ. + - do not allow engine consumption. + - write `vibriss_decisions`, `vibriss_rewards`, and `vibriss_policy_state`. + +3. Guarded advisory: + - engine reads advice and surfaces what it would have used. + - still no actuation. + +4. Canary live: + - one parameter only. + - no simultaneous bundle changes. + - low exploration cap. + - hard fallback on stale data, drawdown alarm, or drift alarm. + +5. Controlled live comparison: + - compare baseline-vs-advised on matched contexts. + - freeze policy if replay quality deteriorates. + +## 14. Safety Rules + +Mandatory: + +- no direct mutation of `blue.yml` or frozen champion config from VIBRISS. +- no live promotion without replay, shadow, and documented approval. +- no advice consumption when data is stale. +- no advice consumption inside disallowed live-change windows. +- no multi-parameter bundle learning until single-parameter learners prove that + independent adaptation is insufficient. +- every live-consumed recommendation must be reconstructable from logs. +- every safety-critical parameter must preserve a catastrophic fallback floor. + +## 15. Concrete Storage and Schema + +VIBRISS must be event-sourced. Current policy state is a cache; decisions and +rewards are the durable truth. + +### 15.1 ClickHouse DDL + +Recommended DDL: + +```sql +CREATE TABLE IF NOT EXISTS dolphin.vibriss_decisions +( + ts DateTime64(6, 'UTC'), + namespace LowCardinality(String), + mode LowCardinality(String), + param_set_id LowCardinality(String), + spec_version String, + decision_id String, + parent_decision_id String, + trade_id String, + asset LowCardinality(String), + side LowCardinality(String), + scan_number UInt64, + bars_held UInt32, + context_hash String, + context_schema String, + maras_composite_hash UInt32, + maras_scalar_hash UInt32, + maras_regime LowCardinality(String), + maras_confidence Float32, + maras_conflict Float32, + obf_stale UInt8, + obf_depth_1pct_usd Float64, + obf_depth_quality Float32, + v7_pressure Float32, + v7_mae_risk Float32, + candidate_set_json String, + chosen_arm String, + baseline_value String, + recommended_value String, + confidence Float32, + propensity Float32, + guardrail_status LowCardinality(String), + fallback_reason String, + model_version String, + policy_version String, + compiled_config_hash String, + consumed UInt8, + consumed_ts Nullable(DateTime64(6, 'UTC')), + payload_json String +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(ts) +ORDER BY (namespace, param_set_id, ts, decision_id) +TTL ts + INTERVAL 180 DAY; + +CREATE TABLE IF NOT EXISTS dolphin.vibriss_rewards +( + ts DateTime64(6, 'UTC'), + namespace LowCardinality(String), + param_set_id LowCardinality(String), + decision_id String, + trade_id String, + reward_status LowCardinality(String), + reward_delay_s Float32, + actual_exit_reason LowCardinality(String), + counterfactual_exit_reason LowCardinality(String), + actual_exit_pnl Float64, + counterfactual_exit_pnl Float64, + saved_loss_delta Float64, + clipped_winner_delta Float64, + capital_curve_delta Float64, + drawdown_delta Float64, + recovery_lag_s Float32, + extra_bars_to_recovery Float32, + normalized_reward Float32, + opportunity_cost_charged UInt8, + replay_artifact_path String, + reward_components_json String +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(ts) +ORDER BY (namespace, param_set_id, ts, decision_id) +TTL ts + INTERVAL 365 DAY; + +CREATE TABLE IF NOT EXISTS dolphin.vibriss_policy_state +( + ts DateTime64(6, 'UTC'), + namespace LowCardinality(String), + param_set_id LowCardinality(String), + policy_version String, + mode LowCardinality(String), + learner LowCardinality(String), + checkpoint_path String, + checkpoint_hash String, + spec_hash String, + compiled_config_hash String, + n_decisions UInt64, + n_rewards UInt64, + shadow_samples UInt64, + walk_forward_status LowCardinality(String), + active_baseline_value String, + active_recommended_value String, + confidence Float32, + state_json String +) +ENGINE = ReplacingMergeTree(ts) +ORDER BY (namespace, param_set_id, policy_version); + +CREATE TABLE IF NOT EXISTS dolphin.vibriss_subtasks +( + ts DateTime64(6, 'UTC'), + namespace LowCardinality(String), + subtask_id String, + param_set_id LowCardinality(String), + kind LowCardinality(String), + status LowCardinality(String), + started_at DateTime64(6, 'UTC'), + finished_at Nullable(DateTime64(6, 'UTC')), + input_window String, + n_trades UInt64, + n_decisions UInt64, + primary_metric Float64, + baseline_metric Float64, + artifact_path String, + artifact_hash String, + failure_reason String, + payload_json String +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(started_at) +ORDER BY (namespace, param_set_id, started_at, subtask_id) +TTL started_at + INTERVAL 365 DAY; + +CREATE TABLE IF NOT EXISTS dolphin.vibriss_promotions +( + ts DateTime64(6, 'UTC'), + namespace LowCardinality(String), + param_set_id LowCardinality(String), + promotion_id String, + from_mode LowCardinality(String), + to_mode LowCardinality(String), + requested_by LowCardinality(String), + approved_by LowCardinality(String), + policy_version String, + checkpoint_hash String, + evidence_window String, + n_decisions UInt64, + n_rewards UInt64, + n_shadow_samples UInt64, + n_live_samples UInt64, + recursive_capital_delta Float64, + opportunity_cost_delta Float64, + max_drawdown_delta Float64, + worst_region_delta Float64, + baseline_metric Float64, + candidate_metric Float64, + guardrail_status LowCardinality(String), + decision LowCardinality(String), + reason String, + artifact_path String, + payload_json String +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(ts) +ORDER BY (namespace, param_set_id, ts, promotion_id) +TTL ts + INTERVAL 730 DAY; + +CREATE TABLE IF NOT EXISTS dolphin.vibriss_meta_cadence_decisions +( + ts DateTime64(6, 'UTC'), + namespace LowCardinality(String), + param_set_id LowCardinality(String), + cadence_id LowCardinality(String), + decision_id String, + mode LowCardinality(String), + context_hash String, + maras_composite_hash UInt32, + maras_regime LowCardinality(String), + exof_state String, + esof_state String, + candidate_set_json String, + chosen_value String, + baseline_value String, + confidence Float32, + reward_status LowCardinality(String), + reward_value Float32, + guardrail_status LowCardinality(String), + fallback_reason String, + policy_version String, + payload_json String +) +ENGINE = MergeTree +PARTITION BY toYYYYMM(ts) +ORDER BY (namespace, param_set_id, cadence_id, ts, decision_id) +TTL ts + INTERVAL 365 DAY; +``` + +These tables are deliberately narrow enough for hot audit reads and broad enough +to replay the decision. Large path arrays, per-bar simulations, and model +artifacts must be written to artifact storage, not inlined into ClickHouse. + +### 15.2 Artifact Layout + +Use a non-SMB path for generated artifacts: + +```text +/mnt/dolphin_training/vibriss/ + specs/ + advsl.hold_substitute.v1.yaml + checkpoints/ + blue/advsl.hold_substitute.v1// + state.json + learner.pkl + manifest.json + replays/ + // + config.yaml + replay_summary.json + capital_curve.csv + per_trade_counterfactuals.parquet + opportunity_cost_audit.parquet + reports/ + walk_forward/ + obf_binding/ + maras_hash_priors/ +``` + +Every artifact directory must contain a `manifest.json`: + +```json +{ + "schema": "vibriss.artifact_manifest.v1", + "subtask_id": "wf-20260603-001", + "param_set_id": "advsl.hold_substitute.v1", + "namespace": "blue", + "created_at": "2026-06-03T00:00:00Z", + "git_sha": "unknown-or-sha", + "spec_hash": "sha256:...", + "input_tables": { + "trade_events": {"min_ts": "...", "max_ts": "...", "row_count": 1234}, + "v7_decision_events": {"min_ts": "...", "max_ts": "...", "row_count": 9999} + }, + "tape_sources": ["/mnt/ng6_data/arrow_scans/..."], + "random_seed": 0, + "artifact_hashes": { + "replay_summary.json": "sha256:...", + "per_trade_counterfactuals.parquet": "sha256:..." + } +} +``` + +## 16. Replay, OPE, and Causality Rules + +VIBRISS must be explicit about what kind of evidence it has. + +Evidence classes: + +| Class | Meaning | Allowed use | +|---|---|---| +| `realized_live` | Parameter was actually used live. | Highest-quality reward. | +| `shadow_counterfactual` | Advice logged, baseline used, tape can replay alternative. | OPE/research only unless validated. | +| `historical_replay` | Offline replay over historical trades with no logged propensity. | Calibration prior, not proof. | +| `synthetic_mc` | Monte Carlo augmentation from validated distribution. | Stress coverage only. | +| `expert_baseline` | Human/research default such as 12 bars. | Fallback/prior. | + +Counterfactual replay must store: + +- actual entry, actual exit, and actual capital before/after; +- counterfactual exit scan/bar and price; +- whether the counterfactual exit depends on sub-bar, bar-close, or tape-close + cadence; +- whether the trade later recovered; +- how many bars/seconds were needed for recovery; +- opportunity cost charged; +- recursive capital state after applying the counterfactual. + +OPE rules: + +- Use inverse propensity or doubly robust estimators only when propensities were + actually logged. +- Do not pretend historical replay has logged propensities. +- For shadow decisions without randomized action, report them as model + counterfactuals, not causal estimates. +- Region splits must be contiguous first; randomized splits are secondary + robustness checks only. +- A policy that wins by one tail event and loses broadly must be flagged as + fragile even when net capital delta is positive. + +Minimum replay report: + +```text +baseline_end_capital +policy_end_capital +recursive_delta +gross_saved_loss +gross_opportunity_cost +net_trade_pnl_delta +max_drawdown_delta +tail_loss_count_delta +clipped_winner_count +recovered_cut_count +median_recovery_lag_s +worst_region_delta +best_region_delta +per_asset_concentration +per_hash_concentration +``` + +## 17. Mode State Machine + +VIBRISS modes are explicit and monotonic unless an operator command or guardrail +forces demotion. + +```text +disabled + -> shadow + -> advisory + -> canary_live + -> controlled_live +``` + +Mode meanings: + +| Mode | Publishes advice | Engine may read | Engine may act | Learner updates | +|---|---:|---:|---:|---:| +| `disabled` | no | no | no | no | +| `shadow` | yes | no | no | yes | +| `advisory` | yes | yes, display only | no | yes | +| `canary_live` | yes | yes | yes, one ParamSet/namespace | yes | +| `controlled_live` | yes | yes | yes, bounded | yes | + +Automatic demotions: + +- stale required sensor -> `shadow` or fallback advice; +- invalid spec -> affected ParamSet disabled; +- reward backlog beyond threshold -> freeze learner updates; +- drawdown alarm -> deterministic safe baseline; +- ClickHouse unavailable -> keep publishing only if checkpoint is fresh; mark + reward collection degraded; +- Hazelcast unavailable -> no advice publication; +- policy drift alarm -> freeze to last known-good checkpoint. + +Promotion technique, thresholds, cadence, and evidence gates must be declared +inside the affected ParamSet spec. The runner evaluates and records those gates; +it is not allowed to invent a promotion policy from global defaults. + +Promotion must be manual and auditable for any transition that enables live +actuation. No health recovery path may silently promote VIBRISS into a stronger +actuation mode. + +### 17.1 ParamSet-Owned Promotion Lifecycle + +Every ParamSet must answer these questions before it can leave `shadow`: + +| Question | Required ParamSet field | +|---|---| +| What baseline is being challenged? | `promotion_policy.baseline_policy` | +| What evidence class is allowed? | `promotion_policy.technique` and `evidence_gates` | +| How often is the evidence recomputed? | `promotion_policy.cadence.replay_calibration` | +| How often is promotion eligibility reviewed? | `promotion_policy.cadence.promotion_review` | +| When may the engine replace the old value? | `promotion_policy.cadence.live_replacement_rhythm` | +| What samples are required? | `promotion_policy.evidence_gates.*min*` | +| What demotes it? | `promotion_policy.automatic_demotion` | +| Who approves live use? | `promotion_policy.*manual_approval_required` | + +Promotion is also subject to the control-plane elegance constraints in §4.1: +one writer per parameter, spec-owned promotion, slow-governed meta-cadence, +context inputs instead of arbitrary controllers, reproducible live changes, no +hidden cross-subsystem mutation, and shadow/replay/canary before live. + +Default lifecycle: + +```text +historical_replay + -> walk_forward_replay + -> shadow_advice_logging + -> advisory_display + -> canary_live_capture + -> controlled_live +``` + +The cadence of each phase is also ParamSet-owned: + +- `advice cadence`: how often the ParamSet emits advice. +- `reward cadence`: how often delayed rewards are joined and scored. +- `calibration cadence`: how often the learner updates from replay/rewards. +- `promotion-review cadence`: how often mode eligibility is evaluated. +- `replacement rhythm`: the exact engine decision point where a live parameter + can replace the baseline. + +For safety-critical exit parameters, replacement rhythm should usually be +`capture_on_entry` or `between_trades`, not arbitrary intratrade mutation. + +### 17.2 Meta-Cadences as Governed Parameters + +Meta-cadences are tunable parameters. If VIBRISS changes them, they must be +declared in the ParamSet under `meta_cadence_policy`. + +Examples: + +| Meta-cadence | Meaning | +|---|---| +| `replay_calibration_interval_s` | How often to re-run replay/calibration. | +| `promotion_review_interval_s` | How often to evaluate mode promotion/demotion. | +| `checkpoint_interval_s` | How often to persist learner state. | +| `min_new_rewards_before_recalibration` | Event-driven cadence threshold. | +| `shadow_to_canary_cooldown_trades` | Minimum stable evidence before live canary. | + +MARAS, ExoF, EsoF, OBF, V7, MHS, and drawdown state may be context inputs for +meta-cadence advice, but the cadence learner is subject to the same evidence +rules as any other parameter learner. In particular: + +- fixed cadence is the baseline; +- shadow cadence decisions must be logged with candidate set and confidence; +- replay must estimate missed-adaptation cost and false-promotion cost; +- compute/backlog cost is part of reward; +- live control of promotion cadence requires explicit manual approval. + +## 18. Engine Consumption Contract + +The engine must treat VIBRISS advice as optional, expiring input. + +Consumption algorithm: + +```text +read advice payload +validate schema and spec_version +check namespace matches runtime +check mode permits consumption +check expires_at > now +check trade_scope is current decision point +check recommendation within hard range +check guardrail_status == PASS or permitted advisory state +check fallback/catastrophic floor remains active +capture value into trade-local immutable parameter snapshot +emit consumption audit +``` + +For `advsl.hold_substitute.v1`, the first live contract should be: + +- consume only on entry; +- store the selected hold bars in the pending/open trade state; +- do not mutate it intratrade; +- allow intratrade VIBRISS values only as shadow comparisons; +- let catastrophic floor and max-dollar floor override hold advice. + +This avoids a subtle failure mode where a learner changes the hold target after +seeing adverse movement that was not available at entry. Intratrade contraction +can be researched later, but it is a different ParamSet. + +## 19. Drift, Novelty, and Freezing + +VIBRISS must separate three conditions: + +1. data-quality degradation, +2. market/regime novelty, +3. policy underperformance. + +Drift sensors: + +| Sensor | Trigger | +|---|---| +| context distribution drift | MARAS/OBF/V7 feature distribution shifts versus training window. | +| reward drift | rolling reward lower than baseline beyond confidence bound. | +| regret drift | chosen arm underperforms baseline arm in shadow replay. | +| tail cluster | tail-loss or floor-hit count above historical percentile. | +| sparse regime | nearest-neighbor distance to known MARAS/OBF contexts too high. | + +Actions: + +- distribution drift alone: shrink toward baseline and raise uncertainty; +- reward drift: freeze learner updates and publish fallback; +- tail cluster: tighten safety floors only if pre-authorized by the ParamSet; +- sparse regime: use global safe prior, not nearest hash overfit; +- data-quality drift: stop consuming affected sensors. + +VIBRISS should publish drift state in `vibriss_latest` and +`vibriss_paramset_status`. + +## 20. Data Volume and Backpressure + +The ClickHouse outage and spool backlog failure mode matters for VIBRISS. + +Rules: + +- VIBRISS must have its own spool and backlog metric. +- Advice publication must not block on ClickHouse. +- Reward collection may lag, but the lag must be visible in MHS. +- Large per-bar OBF or path arrays must not be written to hot audit tables. +- Calibration workers must rate-limit writes and should prefer compact Parquet + artifacts for heavy outputs. +- If ClickHouse spool backlog exceeds threshold, VIBRISS must degrade to + `shadow_no_update`: publish from checkpoint only, do not update learners from + partial reward data. + +Recommended thresholds: + +| Metric | GREEN | DEGRADED | CRITICAL | +|---|---:|---:|---:| +| decision spool backlog | `<1k` | `1k-50k` | `>50k` | +| reward backlog age | `<10m` | `10m-2h` | `>2h` | +| artifact disk free | `>20GB` | `5-20GB` | `<5GB` | +| CH write failure rate | `<1%` | `1-10%` | `>10%` | + +VIBRISS must not repeat the OBF-style failure mode of letting millions of +low-priority rows delay high-priority trade/reward rows. Use priority queues: + +1. decisions, rewards, policy state; +2. trade/path summary; +3. calibration summary; +4. heavy diagnostics. + +## 21. Security and Operational Guardrails + +Secrets: + +- use existing ClickHouse user/password env pattern; +- do not write credentials into spec files; +- do not put secrets in artifact manifests. + +Filesystem: + +- code/spec mount is read-only inside the container; +- learner state and replay artifacts are written outside the SMB repo path; +- runner must check free disk before replay subtasks; +- no large file writes to `/mnt/dolphinng5_predict`. + +Runtime: + +- do not restart Hazelcast; +- do not use systemd for Dolphin services; +- use supervisord as the owner of the container process; +- if gVisor is used, treat it as a host-selected sandbox/runtime wrapper, not a + process owned by VIBRISS internals; +- worker OOM must not kill the live advice runner; +- health checks must distinguish runner alive from learner valid. + +## 22. Implementation Defaults + +These decisions are now recommended defaults, not open questions: + +- First learner: discounted UCB for non-contextual hold-bar baseline plus LinUCB + shadow branch for MARAS/OBF/V7 context. +- First live dependency posture: internal finite-arm learners and compact + checkpointed state in the runner; no VW, OBP, ABIDES, Pyro/NumPyro, CATX, or + broad benchmark libraries in the live advice path. +- First worker dependency posture: VW, River, OBP, MABWiser, lifelines, + statsmodels, and benchmark libraries are allowed only in replay/OPE/calibration + jobs with bounded memory and artifact output. +- First drift implementation: simple internal rolling statistics plus optional + River-backed detectors if the dependency remains stable inside the runner. +- First HZ publication surface: `DOLPHIN_FEATURES["vibriss_param_advice"]` plus + dedicated keys for high-value ParamSets such as + `vibriss_hold_substitute_advice`. +- First consumption point for ADVSL hold substitute: capture-on-entry only. +- Counterfactual rewards: store as `shadow_counterfactual` with explicit + replay artifact path and no causal-propensity claim. +- Drift ownership: VIBRISS computes policy/reward drift and subscribes to MHS, + MARAS, OBF, and SurvivalStack for external drift/context. +- Container launch: use a small wrapper script under supervisord in production + so image existence, disk space, mount health, and env are checked before + `podman run` or `docker run`. +- MHS integration: prefer a generic external-sensor loader eventually, but V1 + may implement a VIBRISS-specific optional sensor as long as it is neutral when + disabled. +- Infrastructure posture: keep Hazelcast + ClickHouse + supervisord for V1; + Kafka/Flink are deferred until measured event volume or recovery requirements + exceed the existing bus/audit pattern. + +## 23. Open Implementation Questions + +- Exact minimum sample thresholds per parameter family after the full 1.7k+ + trade corpus is rebuilt under the same capital geometry. +- Whether hard `$400` floors should be a separate ParamSet or remain outside + VIBRISS as fixed safety policy. +- How to measure sub-bar TP/cadence opportunity cost in a way compatible with + bar-based ADVSL replay. +- Whether intratrade hold contraction deserves a second ParamSet after + entry-captured hold advice is validated. +- How much MC/synthetic data is statistically acceptable without overstating + confidence in rare-tail regimes. +- Whether PINK can share BLUE priors after venue slippage, fills, and exchange + state are included, or must maintain separate priors from day one. + +## 24. Recommended First Build + +Build VIBRISS V1 as a shadow-only package with: + +- `ParamSpec` dataclasses and YAML loader. +- `ParamSetSpec` support for `advsl.hold_substitute.v1`. +- discrete UCB/Thompson learner. +- contextual LinUCB learner stub or implementation. +- advice publisher. +- ClickHouse audit writer. +- MHS-compatible sensor publisher. +- supervisord/container runner definition. +- offline replay harness for conditional fast TP and ADVSL hold bars. +- capital-aware replay and opportunity-cost accounting for the hold substitute. +- no live actuation. + +Recommended package layout: + +```text +/mnt/dolphinng5_predict/vibriss/ + __init__.py + specs.py # ParamSpec / ParamSetSpec dataclasses and validation + context.py # HZ/CH context snapshots, masks, point-in-time joins + features.py # deterministic feature construction + learners/ + __init__.py + ucb.py # discounted UCB over finite arms + thompson.py # categorical Thompson sampling + linucb.py # contextual finite-arm learner + priors.py # MARAS/label/asset/side shrinkage priors + guardrails.py # hard range, freshness, confidence, drawdown gates + advice.py # advice payload builder + schema validation + publisher.py # Hazelcast publication + audit.py # ClickHouse writer facade and spool priority + rewards.py # delayed reward joining and opportunity cost + replay/ + tape.py # tape/path loading + capital_curve.py # recursive capital replay + counterfactuals.py # arm-level exit simulation + walk_forward.py # contiguous and moving-window validation + reports.py # JSON/CSV/Parquet artifact writers + runner.py # live shadow/advisory daemon + worker.py # offline subtasks + cli.py # ops commands and local replay entry points + tests/ +``` + +V1 module responsibilities: + +| Module | Must do | Must not do | +|---|---|---| +| `specs.py` | validate ranges, modes, required sensors, output surfaces | import live trader code | +| `context.py` | build point-in-time snapshots with freshness masks | fill missing market data with fake zeros | +| `features.py` | compute deterministic feature vectors | read future outcome labels | +| `learners/*` | expose `choose`, `update`, `checkpoint`, `restore` | know about ADVSL internals | +| `guardrails.py` | enforce hard safety and fallback | optimize reward | +| `advice.py` | produce schema-valid advice payloads | publish directly to HZ | +| `publisher.py` | write HZ advice and heartbeat | mutate engine state | +| `rewards.py` | join decisions to realized/counterfactual outcomes | update policy without reward status | +| `replay/*` | reproduce capital-aware backtests | depend on live HZ | +| `runner.py` | run shadow loops and MHS payloads | run full replay inline | +| `worker.py` | run heavy calibration/replay jobs | publish live advice | + +Minimum local commands: + +```bash +python -m vibriss.cli validate-specs \ + --spec-dir /mnt/dolphin_training/vibriss/specs + +python -m vibriss.cli replay \ + --param-set advsl.hold_substitute.v1 \ + --namespace blue \ + --from 2026-05-01 --to 2026-06-04 \ + --out /mnt/dolphin_training/vibriss/replays/manual + +python -m vibriss.runner \ + --mode shadow \ + --namespace blue \ + --spec-dir /mnt/dolphin_training/vibriss/specs \ + --state-dir /mnt/dolphin_training/vibriss/checkpoints +``` + +Minimum test set: + +| Test | Purpose | +|---|---| +| `test_spec_validation.py` | rejects invalid ranges, missing sensors, unsafe live policies. | +| `test_advice_schema.py` | validates HZ payloads and expiry/fallback fields. | +| `test_guardrails.py` | proves stale OBF/MARAS and drawdown alarms force fallback. | +| `test_replay_determinism.py` | same tape/spec/seed gives same capital curve. | +| `test_opportunity_cost.py` | recovered cut trades charge missed upside. | +| `test_priority_spool.py` | high-priority decision/reward rows flush before diagnostics. | +| `test_mode_state_machine.py` | promotion is manual; demotion is automatic. | +| `test_no_live_actuation_default.py` | default env cannot make engine consume advice. | + +The first acceptance test is not "did it make more money in-sample." The first +acceptance test is: + +1. the same historical decision can be replayed deterministically, +2. every recommended parameter has a valid spec and guardrail trail, +3. baseline fallback is used under stale/low-confidence context, +4. reward accounting includes clipped-winner opportunity cost, +5. the replayed capital curve is reproducible. + +The first useful artifact is a replay bundle, not a daemon: + +```text +replay_summary.json +capital_curve.csv +per_trade_counterfactuals.parquet +opportunity_cost_audit.parquet +maras_hash_hold_priors.parquet +obf_hold_binding_report.json +walk_forward_summary.json +``` + +Only after that bundle is reproducible should the shadow runner be started.