Files
siloqy/prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md
Codex c3a18f693a docs: VIBRISS spec (+ §10.6 cascade/adaptive-TP paramsets), PINK accounting fix spec, BLUE incident docs
VIBRISS_PARAMETER_GOVERNANCE_SPEC §10.6: ob_cascade.count_threshold
(currently cascade_count>0 = ONE asset widens every TP x1.40),
tp_widen_factor, withdrawal_velocity_threshold as governance candidates;
adaptive/Dynamic-TP threshold marked fit for VIBRISS governance; TP_FLOOR
joint-policy reward requirement.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 15:04:15 +02:00

2979 lines
111 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# VIBRISS Parameter Governance Spec
**Name**: VIBRISS — Variational Input-driven Bandit-Reactive Intelligent Sensing System
**Status**: Design doctrine / implementation target
**Scope**: BLUE/PINK parameter governance, initially shadow/advisory only
**Canonical dependency**: `SYSTEM_BIBLE_v7.md`
**Operational stance**: shadow-first, replay-first, guardrail-first. VIBRISS
must be useful even when it never gets permission to actuate live.
## 1. Purpose
VIBRISS is the engine's active parameter-sensing and adaptive execution layer.
Its job is to replace brittle hardcoded execution constants with bounded,
auditable, continuously re-evaluated parameter recommendations.
VIBRISS is not a new alpha model and not a full RL layer. It is an online
statistical parameter-governance system: observe outcomes, test safe candidate
values, score the realized response, retire weak settings, and keep enough
controlled exploration alive to detect drift.
The first intended target is exit-parameter governance, especially ADVSL and
fast/cubic TP parameters such as hold-bar limits, floor thresholds, pressure
thresholds, and TP posture. Later targets can include sizing haircuts, urgency,
asset-selection posture, and venue-specific execution parameters.
## 2. Design Stance
VIBRISS must be modular, spec-driven, replayable, and safety bounded.
Key doctrine:
- One learner per parameter spec by default.
- Bundle/slate learning only after interaction effects are repeatedly material.
- Contextual bandits first; full RL only later if decisions are truly sequential
and materially coupled across multiple execution steps.
- Discrete and bucketed parameters use Thompson Sampling, UCB, LinTS, or LinUCB.
- Continuous bounded scalars are discretized into safe buckets first.
- Nonstationary behavior uses discounted or sliding-window evidence plus drift
detection.
- Safety-critical parameters require baseline-safe exploration, confidence
thresholds, step limits, cooldowns, and hard guardrails.
- Passive fill and time-to-fill decisions should use survival-analysis modules
where censoring matters.
## 3. System Boundary
VIBRISS must not silently mutate engine internals.
The correct production shape is:
```text
context ingestion
-> admissible candidate generation
-> learner scoring
-> guardrail filter
-> action selection
-> advice publication
-> allowed engine consumption point
-> delayed outcome capture
-> reward mapping
-> online update
```
The hot execution path consumes advice only at documented decision points. The
learner/update path is separate and may lag. If advice is stale, low-confidence,
or invalid, the engine falls back to the baseline parameter.
BLUE is in-memory/paper and not BingX-enabled. PINK is the BingX venue-facing
world. VIBRISS may govern both, but its output contract must be namespace-aware
and must not assume that BLUE has exchange state.
Non-goals:
- VIBRISS does not pick assets.
- VIBRISS does not replace MARAS, OBF, V7, ACB, EFSM, or SurvivalStack.
- VIBRISS does not own exchange reconciliation.
- VIBRISS does not rewrite frozen champion configs.
- VIBRISS does not turn offline backtest winners into live settings without
a shadow/OPE/promotion path.
Its only authority is to publish bounded, versioned parameter advice and to
learn from the outcome trail.
## 4. Terminology
| Term | Meaning |
|---|---|
| `vibrissa` | One probe-trade, parameter test, or market feeler. |
| `vibrissae` | The active parameter-probe array. |
| `parameter spec` | Loadable contract defining one tunable parameter. |
| `arm` | One candidate value or execution configuration. |
| `reward` | Bounded realized execution-quality score. |
| `posture` | Current preferred parameter set plus confidence and fallback metadata. |
| `baseline` | The currently trusted hardcoded or documented production value. |
## 4.1 Control-Plane Elegance Constraints
VIBRISS must remain a disciplined parameter-governance control plane, not an
unbounded mesh of subsystems mutating each other. Adaptive behavior is allowed
only when it preserves ownership, auditability, and bounded actuation.
Hard architecture rules:
1. One writer per parameter.
- A live parameter may have many sensors and many context inputs, but only
one ParamSet is allowed to publish the effective value for that parameter
in a given namespace.
2. ParamSpecs and ParamSetSpecs own promotion rules.
- Promotion cadence, evidence gates, rollback rules, manual-approval
requirements, and replacement rhythm are part of the spec. The runner must
execute declared policy, not invent policy.
3. Meta-cadence is itself a parameter, but only at a slower cadence.
- VIBRISS may tune replay cadence, promotion-review cadence, checkpoint
cadence, or reward-join cadence, but those meta-parameters must move more
slowly than the governed trading/execution parameter and must have
stronger guardrails.
4. EsoF, ExoF, MARAS, OBF, V7, MHS, and drawdown state are context inputs, not
arbitrary controllers.
- They may influence candidate scoring, confidence, demotion, or fallback,
but they must not directly mutate live parameters outside the owning
ParamSet.
5. Every live change must be reproducible.
- Log candidate set, chosen action, action probability or confidence,
context hash, reward mapping, model version, compiled config hash,
fallback reason, promotion state, and rollback path.
6. No hidden cross-subsystem mutation.
- If one subsystem changes another subsystem's effective behavior, the change
must appear as a typed ParamSet advice event and an audited engine-consumed
posture update.
7. Shadow first, replay/OPE second, canary third, live last.
- No safety-critical parameter may skip directly from idea or in-sample
replay to live actuation. Live promotion requires held-out evidence,
shadow logging, explicit approval when required, and automatic demotion
conditions.
These constraints are mandatory for all future ADVSL, TP, DVOL/VOL, IRP,
asset-picker, EFSM/overlay, and meta-cadence ParamSets. If a design violates
them, the design is considered tangled and must be simplified before
implementation.
## 5. Parameter Spec Contract
Each adaptive parameter must be declared by a loadable spec. VIBRISS should not
hardcode knowledge of individual parameters.
Important terminology:
- `ParamSetSpec`: the loadable contract for a family of related parameters.
- `paramset_config`: configuration that applies to the ParamSet as a whole.
- `params`: the parameter declarations contained by the ParamSet.
- `param_defaults`: defaults inherited by every parameter in `params`.
- per-param override: a field inside one `params.<param_name>` entry that
overrides `param_defaults` for that parameter only.
The live runner must not perform complex inheritance during scoring. Specs are
authored in a rich hierarchical form, validated, compiled, and hash-stamped into
a flat canonical policy document before the runner consumes them.
Required fields:
```yaml
identity:
name: advsl.overlay_min_hold_bars
type: integer
units: bars
default: 6
domain:
candidates: [4, 6, 8, 10, 12, 16, 20]
hard_min: 0
hard_max: 40
safety:
fallback_baseline: 6
max_step_change: 4
cooldown_trades: 5
min_shadow_samples: 100
min_live_confidence: 0.80
max_exploration_rate: 0.05
placement:
consumer: advanced_sl
decision_point: open_trade_exit_evaluation
namespace: blue
live_change_policy:
mode: between_trades
allow_intratrade_change: false
candidate_policy:
learner: linucb
nonstationarity: sliding_window
window_trades: 300
success:
primary_metric: capital_curve_delta_after_cost
secondary_metrics:
- clipped_winner_cost
- saved_loss
- drawdown_delta
- recovery_lag
inputs:
- maras_latest
- v7_decision_events
- advanced_sl_monitor_latest
- obf_universe_latest
- eigen_scan
- trade_path
reward_mapping:
bounded_range: [-1.0, 1.0]
delayed_until: trade_close_or_counterfactual_terminal
components:
saved_loss: +1.0
missed_profit: -1.5
drawdown_reduction: +0.5
tail_loss: -2.0
promotion_policy:
owner: param_set
technique: replay_shadow_canary
review_cadence_s: 900
min_replay_trades: 300
min_shadow_decisions: 200
min_realized_rewards: 50
min_contiguous_regions: 4
required_evidence:
recursive_capital_curve_delta_after_cost: "> 0"
worst_region_delta: ">= configured_floor"
clipped_winner_cost: "<= configured_budget"
drawdown_delta: "<= 0"
allowed_transitions:
- disabled_to_shadow
- shadow_to_advisory
- advisory_to_canary_live
- canary_live_to_controlled_live
manual_approval_required:
- advisory_to_canary_live
- canary_live_to_controlled_live
automatic_demotion_on:
- stale_required_sensor
- reward_drift
- drawdown_alarm
- invalid_checkpoint
meta_cadence_policy:
owner: param_set
status: shadow_first
tunable_cadences:
calibration_interval_s: [300, 900, 1800, 3600]
promotion_review_interval_s: [900, 1800, 3600, 7200]
checkpoint_interval_s: [30, 60, 120, 300]
shadow_to_canary_cooldown_trades: [25, 50, 100, 200]
context_inputs:
- maras_latest
- exof_latest
- esof_latest
- mhs_latest
- reward_backlog
- drawdown_state
success:
primary_metric: policy_stability_adjusted_reward
secondary_metrics:
- stale_advice_rate
- promotion_false_positive_rate
- missed_adaptation_cost
- operator_churn
- compute_cost
live_change_policy:
calibration_cadence: controlled_after_shadow
promotion_cadence: advisory_only_until_explicit_approval
outputs:
hz_key: DOLPHIN_FEATURES.vibriss_param_advice
clickhouse_table: dolphin.vibriss_decisions
state_table: dolphin.vibriss_policy_state
```
### 5.1 ParamSet Config and Per-Parameter Overrides
The canonical authoring shape is:
```yaml
param_set:
id: advsl.hold_substitute.v1
version: 1.0.0
namespace_default: blue
status: shadow_first
paramset_config:
consumer: advanced_sl
decision_family: exit_risk_timing
placement:
decision_point: trade_entry
live_replacement_rhythm: capture_on_entry
promotion_policy:
technique: replay_shadow_canary
review_cadence_s: 1800
meta_cadence_policy:
status: shadow_first
outputs:
hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice
decision_table: dolphin.vibriss_decisions
reward_table: dolphin.vibriss_rewards
param_defaults:
learner:
type: discounted_ucb
nonstationarity: sliding_window
window_trades: 300
safety:
fallback_baseline: 12
min_shadow_samples: 200
min_live_confidence: 0.80
max_exploration_rate: 0.0
reward_mapping:
bounded_range: [-1.0, 1.0]
primary_metric: recursive_capital_curve_delta_after_cost
guardrails:
stale_sensor_policy: shrink_to_baseline
drawdown_alarm_policy: freeze_to_baseline
params:
advsl.min_hold_bars_before_floor_arm:
type: integer
units: bars
domain:
candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]
hard_min: 0
hard_max: 48
default: 12
baseline_reference: 20
advsl.recovery_extension_max_bars:
type: integer
units: bars
domain:
candidates: [0, 4, 8, 12, 20, 34]
hard_min: 0
hard_max: 40
default: 0
learner:
type: shadow_only_discounted_ucb
safety:
min_shadow_samples: 500
min_live_confidence: 0.90
```
Merge precedence:
```text
compiled_param =
built_in_schema_defaults
< paramset_config
< param_defaults
< params.<param_name>
< namespace/runtime override if explicitly allowed by spec
```
Rules:
- ParamSet-wide promotion and meta-cadence policy live in `paramset_config`
unless a parameter explicitly overrides a narrower field.
- Per-param overrides may tighten safety, narrow domains, increase sample
requirements, or change learner type only if the ParamSet allows it.
- Per-param overrides may not weaken global catastrophic guardrails.
- The compiler must emit both the original source spec hash and the compiled
canonical hash.
- The runner consumes only the compiled canonical form.
### 5.2 Spec Compiler and Validation Library
Use an existing platform-agnostic schema/config tool for the authoring layer.
Do not invent a bespoke inheritance language.
Recommended stance:
| Need | Recommended tool | Runtime placement |
|---|---|---|
| Cross-language schema contract | JSON Schema | CI, compiler, runner validation. |
| Rich defaults, constraints, unification, inheritance-like config | CUE | Spec compiler / CI, not hot path. |
| Human-friendly authoring | YAML | Source only; compiled immediately. |
| Runner consumption | canonical JSON | Hot path. |
| Fast internal representation | dataclass / Pydantic / msgspec-style object | Runner load time only. |
VIBRISS should prefer:
```text
YAML authoring -> CUE/JSON-Schema validation -> canonical JSON -> runner cache
```
The live runner should never parse CUE, run template expansion, or resolve a
large inheritance tree during an advice decision. It should load a precompiled
canonical JSON document, verify hashes and schema version, then use direct field
access.
Performance requirements:
- spec compile can be slower because it is CI/worker time;
- runner spec load should be bounded and rare;
- advice scoring must use already-merged values;
- every compiled ParamSet must include a deterministic `compiled_config_hash`;
- all advice/audit rows must log `spec_hash` and `compiled_config_hash`.
## 6. Candidate Algorithms
V1 should support a small set of algorithms well, rather than a broad library
surface poorly.
Recommended V1 learners:
| Parameter type | Default learner | Notes |
|---|---|---|
| Small categorical | Thompson Sampling | Useful for urgency, route, retry, fixed mode selection. |
| Ordered discrete scalar | UCB or discounted UCB | Good for hold bars, TP buckets, pressure thresholds. |
| Contextual finite arms | LinUCB or LinTS | First choice for MARAS/OBF/V7-conditioned advice. |
| Continuous scalar | Adaptive discretization | Start bucketed; upgrade only if buckets are too coarse. |
| Passive fill/delay | Survival model | Explicitly handle censored fill and recovery windows. |
Useful libraries to inspect:
- Vowpal Wabbit for contextual bandits, logged propensities, and OPE.
- River for streaming statistics, online GLMs, and drift detection.
- Open Bandit Pipeline for offline policy evaluation.
- MABWiser for fast Python prototype comparison.
- lifelines or statsmodels for survival analysis.
- NumPyro/Pyro only when hierarchical Bayesian pooling is justified.
### 6.1 Dependency Placement and Reliability Policy
VIBRISS must distinguish algorithm research from live parameter governance.
Performance and reliability are more important than using the most general
library in the first live version.
Dependency rule:
- The live runner should have a small deterministic dependency surface.
- Heavy learning, OPE, simulation, Bayesian inference, and broad model
comparison belong in `vibriss_worker` or offline jobs.
- The engine consumes compact checkpointed policy state and advice payloads. It
must not shell out to a learner or wait on an offline library.
- ClickHouse writes, model updates, and replay jobs must never block the hot
advice publication loop.
- If a dependency is not needed to score the current checkpointed policy, it is
not a live-runner dependency.
Recommended V1 split:
| Layer | Allowed dependency posture | Reason |
|---|---|---|
| Engine hot path | no VIBRISS learner dependency | Engine reads validated advice only. |
| `vibriss_runner` | stdlib + NumPy/Pandas only if needed; optional River subset for drift/stats | Keep startup, memory, and failure modes bounded. |
| `vibriss_worker` | VW, River, OBP, MABWiser, lifelines, statsmodels, contextual libraries | Calibration, OPE, replay, walk-forward, and report generation. |
| Research/simulation | ABIDES, Pyro/NumPyro, CATX, experimental packages | Valuable, but not part of the live critical path. |
### 6.2 Library Decision Matrix
| Library / stack | VIBRISS use | Placement | Decision |
|---|---|---|---|
| Internal UCB/TS/LinUCB | First production learners for bounded discrete arms. | runner + worker | Use first; easiest to audit and checkpoint. |
| Vowpal Wabbit | Contextual bandit benchmark, action-dependent features, OPE workflows, possible future compact policy generator. | worker/offline | Approved for evaluation; not a V1 hot-path dependency. |
| River | Streaming stats, reward normalization, ADWIN/Page-Hinkley/KSWIN-style drift detection, progressive validation. | runner optional; worker default | Approved, but keep live usage narrow. |
| Open Bandit Pipeline | OPE estimator benchmarking and logged-bandit evaluation. | offline/worker | Approved for reports; not live. |
| MABWiser | Fast Python comparison of TS/UCB/LinTS/LinUCB policies. | offline/worker | Approved for prototyping; not live. |
| lifelines / statsmodels | Survival models, recursive diagnostics, stability checks. | worker/offline | Approved for passive fill/recovery modeling. |
| contextualbandits | Alternative contextual-bandit benchmark implementations. | offline/worker | Research benchmark only. |
| SMPyBandits / BanditPylib / PyBandits | Algorithm comparison and stochastic-bandit sandboxing. | offline/research | Optional; do not add to live image. |
| NumPyro / Pyro | Hierarchical Bayesian pooling for sparse per-symbol/per-hash modules. | research/worker | Defer until sparse-data pooling is clearly needed. |
| CATX | Continuous-action contextual bandit research. | research | Defer; bucketed actions first. |
| ABIDES / ABIDES-Gym | Market-interactive simulation and stress rehearsal. | research/simulation | Useful later; too heavy for V1 runner. |
| Kafka / Flink | Durable event-stream backbone and stateful stream processing. | future infra | Defer; Dolphin already has Hazelcast + ClickHouse + supervisord. |
| scikit-multiflow | Historical stream-learning reference. | none | Do not use for net-new code; prefer River. |
| banditml | Architectural reference for production bandit services. | research only | Do not depend on it without a fresh maintenance review. |
### 6.3 Performance Budgets
Initial budgets for the live runner:
| Operation | Target | Hard behavior on miss |
|---|---:|---|
| Score one ParamSet advice snapshot | `p95 <= 10 ms` | publish fallback or previous checkpoint. |
| Full live advice loop over enabled ParamSets | `p95 <= 50 ms` | skip noncritical ParamSets first. |
| Hazelcast publish | nonblocking best effort | mark advice degraded if publish fails. |
| ClickHouse audit write | never blocks advice | spool locally and expose backlog. |
| Runner startup with warm checkpoint | `<5 s` target | publish no advice until checkpoint valid. |
| Memory footprint | bounded and observable | disable worker-style models in runner. |
Candidate sets must stay small. For `advsl.hold_substitute.v1`, a dozen finite
hold-bar arms is acceptable; hundreds of arms are not. Continuous-action
learners are disallowed in live V1 because they make bounded behavior harder to
audit and harder to replay exactly.
### 6.4 Algorithm Defaults by Parameter Class
Concrete defaults:
| Parameter situation | Default | Upgrade path | Notes |
|---|---|---|---|
| Small finite categorical, weak context | Thompson Sampling or UCB1 | discounted UCB if drift appears | Use for mode, urgency, route, retry-like knobs. |
| Ordered discrete scalar | discounted UCB with monotone/smoothness diagnostics | contextual finite-arm learner | Good first fit for hold bars and TP buckets. |
| Finite arms with rich context | LinUCB or LinTS | GLM-UCB/GLM-TS if reward shape demands it | Use MARAS/OBF/V7/EFSM context. |
| Continuous bounded scalar | adaptive discretization | continuous-action contextual bandit only after bucket failure | Prefer auditability over fine resolution. |
| Coupled parameter bundle | small safe bundle catalog | slate/combinatorial learner only if interaction is proven | Avoid action-space explosion. |
| Nonstationary regime | discounted/sliding-window learner + drift detector | replay-reset logic | Freeze or shrink on drift; do not blindly chase. |
| Safety/budget constrained parameter | baseline-safe gating around the learner | conservative contextual bandit / budgeted bandit | Guardrails must dominate learner output. |
| Passive fill or recovery delay | survival model | richer survival only after classical model stability | Treat censoring explicitly. |
### 6.5 Explicit Deferrals
VIBRISS V1 should not attempt:
- full RL;
- continuous-action live control;
- live probe trades by default;
- Kafka/Flink migration;
- ABIDES-in-the-loop production scoring;
- hierarchical Bayesian pooling in the runner;
- joint optimization of many parameters before single-ParamSet evidence exists.
These are not rejected ideas. They are deferred because the current bottleneck is
reliable evidence collection, replay/OPE discipline, and safe advice
publication.
## 7. Reward Design
Rewards must be decomposed, bounded, and auditable. Store both raw components
and normalized reward.
Typical reward components:
- positive: saved loss, lower drawdown, better realized terminal PnL, better
capital compounding trajectory, successful recovery without excess hold.
- negative: clipped winner, missed TP, extra adverse selection, slippage, timeout,
excessive hold, larger tail loss, oscillation, stale-data actuation.
For ADVSL/TP research, the primary reward should be capital-curve delta after
opportunity cost, not terminal trade PnL alone. A rule that saves losses but
systematically clips larger winners must be penalized accordingly.
## 8. Required Audit Logging
Every VIBRISS decision must be replayable.
Minimum decision log fields:
- timestamp and scan number
- namespace: blue, pink, prodgreen, research
- parameter spec id and version
- context snapshot hash
- MARAS regime, scalar hash, composite hash when available
- candidate set
- chosen arm
- action probability or confidence
- baseline value
- guardrail decisions and fallback reason
- model version
- advice publication timestamp
- engine consumption timestamp, if consumed
- delayed reward components
- terminal reward
- policy update version
## 9. Control-Plane Output
VIBRISS publishes advice, not imperative mutations.
Recommended HZ shape:
```json
{
"schema": "vibriss.param_advice.v1",
"namespace": "blue",
"ts": "2026-06-03T00:00:00Z",
"spec_id": "advsl.overlay_min_hold_bars",
"spec_version": "1.0.0",
"baseline_value": 6,
"recommended_value": 12,
"confidence": 0.82,
"candidate_set": [4, 6, 8, 10, 12, 16, 20],
"context_hash": "maras:57957|asset:XLMUSDT|side:LONG",
"learner": "linucb",
"guardrail_status": "PASS",
"fallback_reason": null,
"expires_at": "2026-06-03T00:05:00Z"
}
```
Consumption rule: the engine may consume this only if the parameter spec says
the current state is an allowed change point and all guardrails pass. Otherwise
the baseline remains in force.
## 10. Initial VIBRISS Targets
### 10.1 Conditional Fast TP
First replay-backed target:
- `fast_tp.tp_pct`
- `fast_tp.bars_held_min`
- `fast_tp.exit_pressure_min`
- `fast_tp.mfe_decay_min`
- `fast_tp.pnl_mfe_frac_max`
Current evidence says blanket first-touch `0.20%` TP clips too many winners, but
conditional fast TP is net positive in both full corpus and capital-known BLUE
subset. The first VIBRISS job is to turn those calibrated constants into a
shadow policy with logged propensities and OOS replay.
This TP percentage is a prime VIBRISS assistance target. Treat it as a
first-class tunable rather than a frozen constant once replay coverage is
sufficient.
Open research note:
- investigate whether the `0.20%` TP should be risk-normalized by notional
risked, using a monotone nonlinearity such as a cubic retract/expansion curve;
- the candidate question is whether high-notional or high-leverage trades should
have a proportionally different TP posture, while keeping the first-touch
semantics intact for replay accounting;
- if tested, this must be evaluated with full capital-curve compounding and
opportunity cost, not just raw win-rate or per-trade PnL.
#### 10.1.1 Re-entry-Conditioned Fast TP
Same-asset reentries after a profitable exit are a separate research bucket.
They should not inherit the exact same fast-TP posture as a first-entry trade
without evidence. In current BLUE history, same-asset reentries after wins are
usually profitable, but the average second-leg move is smaller than the initial
leg, which means a lower TP multiplier may preserve geometry better than a blunt
`2.0x` repeat.
Recommended candidate arms:
- `fast_tp.reentry_tp_multiplier = 1.2`
- `fast_tp.reentry_tp_multiplier = 1.5`
- `fast_tp.reentry_tp_multiplier = 2.0`
Interpretation:
- first-entry trades keep the baseline conditional fast TP
- re-entry-after-win trades may use a smaller multiplier band
- re-entry-after-loss trades should remain a separate bucket and may need a
slower TP or stronger confirmation, not just a smaller multiplier
- a mild nonlinear / cubic trim on re-entry is a valid shadow-only follow-up
candidate, but only after the flat multiplier band has been replayed first
Ownering rule:
- VIBRISS should learn and score the candidate multiplier in shadow replay
- EFSM should own live application if the runtime ever consumes the bucket
- do not flatten the geometric ROI curve by forcing a single multiplier on all
reentries
#### 10.1.2 TP Near-Miss Replay
The TP research set must include a distinct near-miss population:
- trades that came within a small epsilon of the candidate TP but did not
satisfy the live trigger on the observed cadence
- trades that briefly exceeded the candidate TP and then reversed before the
engine observed the touch
- trades that later stopped out after first-touch proximity, because those are
the exact counterexamples needed to learn whether a lower TP bucket would
have been better
This bucket is mandatory because a corpus dominated by profitable TP closes is
survivorship-biased. A learner trained only on winners can learn that the
current TP is "usually profitable" while remaining blind to the trades where a
slightly lower TP would have caught the move and prevented a later stop-loss.
Required replay semantics:
- use first-touch TP labels, not close-only labels
- keep near-miss candidates separate from clean TP hits
- score each candidate by recursive capital-curve delta after opportunity cost
- preserve scan-cadence effects when the live engine is scan-driven
Primary use:
- learn whether a tighter TP bucket is justified for specific regimes, assets,
or reentry conditions
- quantify the opportunity cost of the missed touch itself, not just the later
realized close
- explain repeated "why did this one not TP?" incidents without overfitting to
already-winning trades
### 10.2 ADVSL Hold/Floor
Second target:
- `advsl.base_catastrophic_floor_pct`
- `advsl.overlay_catastrophic_floor_pct`
- `advsl.overlay_max_loss_usd`
- `advsl.overlay_min_hold_bars`
- `advsl.overlay_pressure_min`
- `advsl.overlay_mae_risk_min`
This is safety-critical. VIBRISS may advise, but live application requires
strong guardrails, bounded step changes, and explicit fallback to the current
documented ADVSL values.
Floor percentage is also a prime VIBRISS assistance target, but it must stay
outside the learners ability to disable the catastrophic floor entirely.
Hard safety ceiling:
- the operator may define a non-negotiable max-loss ceiling per trade, per leg,
or per session
- this ceiling is distinct from the replay optimum and distinct from the
learners preferred floor/TP/hold posture
- if a candidate policy exceeds the ceiling, the ceiling wins even when the
replayed recursive capital curve would otherwise look better
- VIBRISS may tune inside the ceiling, but it must not optimize the ceiling
away, relax it implicitly, or treat operator pain tolerance as a soft signal
### 10.3 MARAS-Conditioned Hold Bars
Third target:
- per-hash or per-regime hold-bar posture
- per-label bias around known hash medians
- OBF-conditioned hold extension or contraction
Do not use MARAS labels as hard filters. Labels such as CHOPPY can contain both
many wins and severe losses. Use the composite hash, raw signature dimensions,
confidence, conflict, and nearest-neighbor regime evidence as context features.
### 10.4 DVOL/VOL Gate and Trade-Pause Posture
Candidate carefulness-critical target:
- `entry_gate.dvol_threshold`
- `entry_gate.vol_open_persistence_bars`
- `entry_gate.min_qualified_cross_rate`
- `entry_gate.pick_latency_pause_s`
- `entry_gate.open_gate_no_pick_pause_score`
This target exists because a VOL/DVOL gate can be technically open while the
engine still sees low-quality entry conditions: few accepted threshold crosses,
weak asset-pick evidence, or no fresh accepted pick after a normally sufficient
latency window.
The first useful derived sensor is:
```text
open_gate_no_pick_pause_score =
VOL/DVOL gate open
+ low recent vel_div threshold-cross density
+ no accepted entry for expected_pick_latency_s
+ neutral/hostile EsoF/ExoF/MARAS context
+ no evidence of stale scans or halted runtime
```
This must not be treated as an urgent kill switch by default. It is a
carefulness parameter: VIBRISS should first log it, correlate it with later
trade quality, and test whether it predicts profitable trade pauses or smaller
position sizing. The baseline is no pause beyond current gate logic.
Related empirical TODOs:
- Reconsider `min_irp_alignment=0.0` empirically. The live gold config disables
the IRP alignment filter, but the larger current corpus may now be sufficient
to retest whether a nonzero IRP alignment floor improves asset-pick quality.
- Examine whether the apparent `VOL open / no immediate pick` condition is a
useful trade-pause state or simply the expected effect of the stricter
effective signal-strength gate (`vel_div < about -0.03`).
- Initial live observation: recent quiet after the last known good picks appears
protective rather than broken. This must be tested with opportunity cost:
measure what the system avoided during quiet periods and what it missed by not
entering.
- Examine whether MARAS composite hashes need more granularity: more distinct
market-descriptive buckets while preserving the sortable scalar hash and
nearest-neighbor/similarity behavior.
### 10.5 Capital-Protect / Profit-Lock
Fourth target:
- `capital.protect_arm_threshold_pct`
- `capital.protect_full_threshold_pct`
- `capital.protect_tp_min_multiplier`
- `capital.protect_cubic_coeff`
- `capital.protect_reset_drawdown_pct`
- `capital.protect_hysteresis_bars`
- reset family selector: `capital.protect_reset_mode`
- time-based reset controls: `capital.protect_reset_time_trades`, `capital.protect_reset_time_seconds`
- regime/hash reset controls: `capital.protect_reset_regime_whitelist`, `capital.protect_reset_fingerprint_whitelist`
- sc-EsoF reset controls: `capital.protect_reset_sc_floor`, `capital.protect_reset_sc_neutral_floor`, `capital.protect_reset_sc_positive_floor`
This is the profit-protect / peak-lock family. The idea is not to mute risk
management, but to preserve capital once the day/session has already become
meaningfully profitable. The study must test whether a gain threshold such as
`1.2%`, `2.3%`, `3.3%`, ... should arm a more conservative TP posture for
subsequent trades, and whether a cubic trim on the TP multiplier is better than
an abrupt step change.
Required policy questions:
- what profit threshold should arm the protect state
- how quickly TP should tighten once the threshold is crossed
- whether the tighten curve should be cubic, stepped, or mixed
- when the protect state must reset
- how much drawdown from the protected peak is required to disarm
- how many bars/trades of hysteresis are needed before a reset is valid
- whether reset should be keyed to time, regime, known fingerprint, sc-EsoF, or mixed logic
- whether reset should use a whitelist gate or a change-detection gate for regime/fingerprint families
The baseline reset rule should be conservative:
- arm only after the gain threshold is crossed on the recursive capital curve
- keep the lock until a real drawdown-from-peak or day/session reset occurs
- do not reset on a single noisy bar if the protected peak is still intact
This target must be evaluated against:
- recursive capital-curve delta after opportunity cost
- clipped-winner cost from over-tightening
- saved-loss from avoiding giveback after the day is already up
- win-return statistics after the arm event
- ceiling-violation count, because the profit protect should never create an
implicit max-loss escape hatch
It is especially important to compare:
- flat threshold steps vs cubic tightening
- no hysteresis vs bar-count hysteresis
- immediate reset vs drawdown-based reset
- day-reset vs rolling-session reset
The tape should be replayed on the same capital curve used by the live engine,
so the protect state is evaluated recursively, not from a fixed post-hoc label.
### 10.6 OB Cascade TP-Modulation (added 2026-06-12, LINK 5e05eeeb post-mortem)
Candidate carefulness-critical target — the parameters of the OB
tail-avoidance layer in `alpha_exit_manager.evaluate()` that silently
modulate the "fixed" TP:
- `ob_cascade.count_threshold` — number of assets withdrawing liquidity
(depth withdrawal velocity < CASCADE_THRESHOLD) required to enter cascade
mode. **Currently hardcoded as `cascade_count > 0`, i.e. a SINGLE asset
anywhere in the tracked set widens every open trade's TP by x1.40.** The
LINK 5e05eeeb diagnosis (2026-06-11, -$1,248.71) showed this trigger is
active on a large fraction of trades because entries occur during panics
by construction. Domain candidates: {1, 2, 3, n_assets//4, n_assets//2};
fallback_baseline: 1 (current behavior).
- `ob_cascade.tp_widen_factor` currently hardcoded 1.40. Population
evidence (post-2026-05-11 cohort): widening earned ~+$84.7K on
continuation trades vs ~-$16.9K given back on reversals, so the factor is
net-positive but fat-left-tailed. Domain: [1.0 .. 1.6]; 1.0 = modulation
off.
- `ob_cascade.withdrawal_velocity_threshold` `CASCADE_THRESHOLD` in
`ob_features.py`, currently -0.10 (10% depth pulled over lookback).
Required sensors already exist since 2026-06-12: `dynamic_tp_pct`,
`tp_mod_factor`, `cascade_count`, `ob_regime_signal`, `tp_floor_armed` are
logged on every `dolphin.v7_decision_events` row, so reward attribution can
be computed offline from the live tape with no new instrumentation.
INTERPLAY (REQUIRED reading for the paramset author): these parameters
interact with (a) the TP_FLOOR profit-floor ratchet (2026-06-12,
`DOLPHIN_TP_FLOOR`) which caps the left tail of the widening reward must
be computed on the JOINT policy (widen + floor), not the widen alone; and
(b) §10.1 Conditional Fast TP / the future ADAPTIVE TP THRESHOLD ("Dynamic
TP"): the adaptive TP threshold itself is hereby marked FIT FOR VIBRISS
GOVERNANCE the effective TP should ultimately be one governed surface
(base x leverage-curve x market-state x cascade modulation), with VIBRISS
owning the modulation terms and the champion base (0.20%) remaining frozen
outside governance. A VIOLET-era sub-second exit guard changes the
actuation latency of both TP and floor; cadence is therefore a context
feature, not a governed parameter, per the data-cadence operator rule.
## 11. First Concrete ParamSet: ADVSL Hold Substitute
### 11.1 Objective
This is the first concrete VIBRISS use case.
The parameter set replaces a static ADVSL no-arm / min-hold rule with a bounded,
evidence-scored hold target. The original research problem was the legacy
`20`-bar hold window: it protects winners from premature ADVSL exits, but it can
also let fast adverse trades slip through before the floor arms. Replay work
found that shorter centers, especially around `12` bars, can protect capital in
tail events, while longer holds can be correct in snapback/recovery pockets.
The VIBRISS answer is not "always use 12" and not "always use 20." It is:
- choose a hold target from a bounded set,
- condition the choice on current trade/path/regime sensors,
- score it by recursive capital-curve impact after opportunity cost,
- keep catastrophic loss floors outside the learner as non-negotiable safety.
The sweep geometry itself is also a VIBRISS parameter. The ParamSet may carry a
global sweep window plus per-regime/per-hash sweep windows in `sweep_policy`.
When the derived best band touches the search window boundary, treat that as a
signal that the search is still censored by the current bounds, not as proof
that the optimum is "wide open." In that case, expand the admissible sweep
window and re-evaluate before promoting the range.
### 11.2 ParamSet Identity
```yaml
param_set:
id: advsl.hold_substitute.v1
name: ADVSL Hold Substitute
status: shadow_first
namespace_default: blue
consumer: advanced_sl
decision_family: exit_risk_timing
replaces:
- legacy_advsl_min_hold_bars_20
related_live_controls:
- advsl.base_catastrophic_floor_pct
- advsl.overlay_catastrophic_floor_pct
- advsl.overlay_max_loss_usd
- advsl.overlay_pressure_min
- advsl.overlay_mae_risk_min
```
This spec governs the hold/arming decision only. It may recommend when ADVSL
is allowed to arm, but it must not remove the catastrophic floor.
### 11.3 ParamSet Config and Parameters
Shared ParamSet config:
```yaml
paramset_config:
consumer: advanced_sl
decision_family: exit_risk_timing
placement:
decision_point: trade_entry
live_replacement_rhythm: capture_on_entry
intratrade_change_policy: shadow_only
outputs:
hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice
decision_table: dolphin.vibriss_decisions
reward_table: dolphin.vibriss_rewards
param_defaults:
learner:
type: discounted_ucb
contextual_shadow_branch: linucb
nonstationarity: sliding_window
window_trades: 300
safety:
fallback_baseline: 12
max_exploration_rate: 0.0
min_shadow_samples: 200
min_live_confidence: 0.80
reward_mapping:
primary_metric: recursive_capital_curve_delta_after_opportunity_cost
bounded_range: [-1.0, 1.0]
guardrails:
stale_obf_policy: ignore_obf_features
low_maras_confidence_policy: shrink_to_global_prior
drawdown_alarm_policy: freeze_to_safe_baseline
```
Primary learned parameter:
```yaml
params:
advsl.min_hold_bars_before_floor_arm:
type: integer
units: bars
baseline_reference: 20
starting_center: 12
current_live_overlay_reference: 6
default: 12
domain:
candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]
hard_min: 0
hard_max: 48
```
Companion deterministic guardrails:
```yaml
params:
advsl.max_loss_usd_floor:
type: float
units: usd
default_overlay: 500.0
research_candidate: 400.0
learner_controlled: false
advsl.catastrophic_floor_pct:
type: float
units: pct
default_base: 0.0120
default_overlay: 0.0050
learner_controlled: false
advsl.recovery_extension_max_bars:
type: integer
units: bars
default: 0
domain:
candidates: [0, 4, 8, 12, 20, 34]
hard_min: 0
hard_max: 40
learner_controlled: shadow_only_until_validated
safety:
min_shadow_samples: 500
min_live_confidence: 0.90
```
Interpretation:
- `baseline_reference=20` preserves the historical question.
- `starting_center=12` is the current replay-derived center.
- `current_live_overlay_reference=6` records the tightened overlay state and
must be reported separately from the legacy 20-bar research baseline.
- `34` and `40` remain candidates because contiguous-region medians observed
during replay included materially longer optima.
### 11.4 Required Sensors
The hold substitute must use point-in-time sensors only. End-of-trade labels may
be used for reward calculation, not for action selection.
Core context sensors:
| Sensor | Source | Use |
|---|---|---|
| `asset` | live trade state | Asset-level prior and OBF join key. |
| `side` | live trade state / EFSM | Separate SHORT base from EFSM-flipped LONG contexts. |
| `bars_held` | live trade state | Determines current arming progress. |
| `entry_price` / `current_price` | live trade state | Signed path and current PnL. |
| `post_gross_path_pct` | trade path replay/live path state | Measures post-entry excursion shape. |
| `mae_pct` | live path state | Adverse excursion severity. |
| `mfe_pct` | live path state | Favorable excursion and recovery potential. |
| `mfe_decay` | derived from MFE/current PnL | Detects giveback and weakening recovery. |
| `current_pnl_mfe_frac` | derived from current PnL / MFE | Indicates whether recovery is intact or mostly lost. |
| `v7_exit_pressure` | `v7_decision_events` / live V7 snapshot | Pressure/continuation signal for recovery unlikely cases. |
| `v7_mae_risk` | V7 snapshot | Separates ordinary drawdown from risk-tier drawdown. |
| `v7_action` | V7 snapshot | EXIT/RETRACT/EXTEND/HOLD context. |
| `state_confidence` | market-state / MARAS / bundle confidence | Low confidence forces conservative fallback. |
OBF sensors:
| Sensor | Source | Use |
|---|---|---|
| `obf_depth_1pct_usd` | `obf_universe_latest` / OBF CH | Recovery-capacity and liquidity depth. |
| `obf_depth_quality` | OBF derived quality | Distinguishes deep snapback pockets from weak-book grinds. |
| `obf_spread_bps` | OBF | Penalizes bad microstructure. |
| `obf_imbalance` | OBF | Directional liquidity pressure. |
| `obf_imbalance_ma5` / `obf_imbalance_ma10` | OBF derived path | Smooths raw book pressure for in-trade TP/SL context. |
| `obf_imbalance_slope` | OBF derived path | Detects whether pressure is strengthening or fading. |
| `obf_imbalance_persistence` | OBF derived path | Measures sign stability rather than one-tick noise. |
| `obf_imbalance_reaccel` | OBF derived path | Detects renewed pressure after a mid-trade weakening/plateau. |
| `obf_staleness_s` | OBF timestamp | Guardrail; stale OBF cannot steer hold. |
Regime sensors:
| Sensor | Source | Use |
|---|---|---|
| `maras_regime` | `maras_latest` / `maras_fingerprint` | Label-level bias only, never hard filter. |
| `maras_composite_hash` | MARAS Scope B | Exact historical hash prior when sample size is enough. |
| `maras_scalar_hash` | MARAS Scope A | Coarse sortable regime prior. |
| `maras_confidence` | MARAS | Low confidence reduces live trust. |
| `maras_conflict_level` | MARAS | High conflict increases uncertainty/exploration penalty. |
| `s_eigen_vd`, `s_eigen_w50`, `s_eigen_w750` | MARAS raw signature | Eigen-state context. |
| `s_btc_dev_pct`, `raw_btc_ma99` | MARAS BTC tier | Trend/uptrend/downtrend pressure context. |
| `s_acb_boost`, `s_acb_beta` | MARAS/ACB | Protective/risk-on context. |
Outcome-only reward sensors:
| Sensor | Source | Use |
|---|---|---|
| `actual_exit_pnl` | `trade_events` | Realized baseline outcome. |
| `counterfactual_exit_pnl_by_hold` | tape replay | Arm-level reward. |
| `recovery_lag_s` | tape replay | Time to recover after floor/cut. |
| `extra_bars_to_recovery` | tape replay | Cost of too-short hold. |
| `clipped_winner_delta` | tape replay | Opportunity cost of premature exit. |
| `saved_loss_delta` | tape replay | Loss avoided by earlier floor arm. |
| `capital_curve_delta` | recursive replay | Primary reward accounting. |
### 11.5 Feature Construction
VIBRISS should compute a compact feature vector from the sensors:
```text
path_speed = abs(post_gross_path_pct) / max(1, bars_held)
mae_velocity = mae_pct / max(1, bars_since_entry)
mfe_velocity = mfe_pct / max(1, bars_since_entry)
recovery_ratio = current_pnl_mfe_frac
giveback_ratio = 1.0 - current_pnl_mfe_frac
liquidity_score = f(obf_depth_1pct_usd, obf_depth_quality, obf_spread_bps)
signed_obf_imbalance = side_sign * obf_imbalance
imbalance_confirmation = f(signed_obf_imbalance_ma5, persistence, slope)
imbalance_reacceleration = f(prior_weakening, current_signed_slope, persistence)
pressure_score = f(v7_exit_pressure, v7_mae_risk, v7_action)
regime_key = maras_composite_hash if sample_count(hash) >= min_hash_n else maras_regime
confidence_weight = min(state_confidence, maras_confidence) * (1.0 - maras_conflict_level)
```
Feature requirements:
- All features must be point-in-time.
- Missing OBF must not become zero-depth unless zero-depth is the actual
observation. Missing OBF is its own mask feature.
- MARAS labels are context, not filters. Use hash/sample priors and raw
signature dimensions where possible.
- Side must be explicit. EFSM-flipped LONG trades cannot share a blind SHORT
prior.
- OBF imbalance must be side-normalized. For a SHORT, negative raw imbalance is
confirming; for a LONG, positive raw imbalance is confirming.
- Raw imbalance is not enough. Use moving averages, persistence, slope, and
re-acceleration after weakening so a single noisy tick cannot steer ADVSL.
### 11.5.1 OBF Imbalance Assistance Research
Live ENJUSDT observation on `2026-06-04` motivates an explicit research feature
family for ADVSL/TP assistance. The trade entered SHORT near `10:06:14 UTC` and
closed `FIXED_TP` near `10:10:11 UTC` for `+$118.53`.
Observed OBF path:
- entry imbalance was near neutral (`~ -0.015` to `+0.001`);
- within seconds it snapped SHORT-confirming (`~ -0.18` to `-0.21`);
- mid-trade it weakened and oscillated around neutral in 30s buckets;
- into TP it re-strengthened materially (`~ -0.30` to `-0.35`).
Conclusion:
- Imbalance did not monotonically increase from entry to exit.
- It behaved as a confirmation/re-acceleration signal: neutral -> confirming
pressure -> weakening/plateau -> renewed confirming pressure into TP.
- Therefore VIBRISS should not use raw imbalance as a simple exit trigger.
Candidate uses:
| Use | Candidate rule |
|---|---|
| TP assist | If price is near TP and side-normalized imbalance re-accelerates in favor, avoid premature ADVSL/retract exits. |
| SL/ADVSL assist | If adverse PnL appears and side-normalized imbalance persistently contradicts the trade, recovery probability should shrink. |
| Hold assist | If imbalance is neutral/choppy but not contradictory, do not force an exit from imbalance alone. |
| Floor timing | Combine `price_progress_to_tp * imbalance_confirmation` with MAE/MFE path shape to decide whether the floor should wait or arm. |
Candidate feature names:
```text
imbalance_signed_for_trade
imbalance_ma5_signed
imbalance_ma10_signed
imbalance_slope_signed
imbalance_persistence_signed
imbalance_reacceleration_after_weakening
price_progress_to_tp_x_imbalance_confirmation
adverse_pnl_x_imbalance_contradiction
```
Research requirement: replay this across completed trades before live use. Score
it by recursive capital delta after opportunity cost, not by whether it explains
one ENJ winner.
### 11.5.2 Macro-Thesis Persistence vs Local Danger Research
Live XLMUSDT observation on `2026-06-04` motivates a mandatory ADVSL/VIBRISS
research direction. The trade suffered a large adverse excursion before closing
at `FIXED_TP`. Local OBF imbalance and V7 pressure were frightening during the
worst MAE; they did not cleanly foresee the recovery. The higher-level
eigen/MARAS context, however, stayed coherent with the trade thesis: bearish or
choppy-bearish posture, low conflict, active dislocation, and bearish BTC
context.
Actionable lesson to test to exhaustion:
```text
ADVSL/V7 local danger should be overruled only when macro thesis persistence
remains strong, MARAS conflict/novelty remains low, and OBF contradiction is not
persistent/deep enough to invalidate the thesis.
```
This is not a live rule yet. It is a research requirement for the first
VIBRISS-governed ADVSL/bar-hold policy. The learner must explicitly measure
when local pain is a true invalidation signal versus when it is survivable
excursion inside a still-valid macro/eigen thesis.
The required research output is a weighting model, not a binary exception. The
policy must estimate how much authority belongs to local danger signals versus
macro-thesis persistence under the current context. Those weights are themselves
VIBRISS-tunable parameters and must be represented in the ParamSet spec with
safe defaults, bounded candidate ranges, promotion rules, and audit logging.
Candidate feature names:
```text
macro_thesis_persistence
maras_conflict_low_during_mae
maras_hash_knownness_during_mae
eigen_dislocation_persistence_during_mae
btc_context_alignment_during_mae
local_obf_contradiction_persistence
local_obf_contradiction_depth_weighted
v7_pressure_without_macro_invalidation
adverse_move_vs_macro_persistence
late_recovery_obf_reacceleration
```
Candidate tunable parameters:
```text
local_danger_weight
macro_thesis_weight
obf_contradiction_weight
maras_conflict_weight
eigen_persistence_weight
btc_context_weight
v7_pressure_weight
macro_override_min_confidence
local_invalidation_min_persistence_bars
```
The initial decision form should be simple and auditable:
```text
local_danger_score =
local_danger_weight * v7_pressure
+ obf_contradiction_weight * local_obf_contradiction_persistence
+ maras_conflict_weight * maras_conflict_or_novelty
macro_thesis_score =
macro_thesis_weight * macro_thesis_persistence
+ eigen_persistence_weight * eigen_dislocation_persistence_during_mae
+ btc_context_weight * btc_context_alignment_during_mae
hold_or_cut_bias = macro_thesis_score - local_danger_score
```
VIBRISS may tune the weights, but guardrails must prevent pathological behavior:
local danger cannot be ignored at extreme MAE, and macro thesis cannot override
persistent high-depth OBF contradiction plus MARAS conflict/novelty.
Required tests:
- replay all completed trades with this feature family available point-in-time;
- isolate high-MAE trades that later TP'd from high-MAE trades that continued
into real loss;
- charge every delayed cut for worst-case tail loss and every early cut for
missed recovery/opportunity cost;
- evaluate separately for base SHORTs and EFSM/overlay-flipped LONGs;
- report per-MARAS-hash, per-label, and nearest-neighbor raw-signature results;
- report learned/suggested weights and their stability by contiguous region,
MARAS hash, side, and asset-liquidity bucket;
- promote only if held-out contiguous regions improve recursive capital delta
without hiding clipped winners or worse tail events.
### 11.5.3 Macro/OBF Evidence Hierarchy Research
Live DASHUSDT observations on `2026-06-04` add a third case study to the XLM
and ETC findings. DASH produced two fast SHORT `FIXED_TP` trades, including
`efcc6dce`, which entered near `11:00:15 UTC` and closed near `11:00:38 UTC`
after only `2` bars for `+$367.92`.
The large DASH trade was not a scary hold-through-MAE case:
- V7 recorded `mae = 0` for the trade path;
- entry `vel_div` was extreme (`~ -0.2463`);
- MARAS at entry was `BEARISH`, low conflict, composite hash `58981`;
- BTC context remained bearish (`s_btc_above_ma99 = 0`);
- OBF imbalance initially leaned against the SHORT, then flipped materially
SHORT-confirming during the price break.
This suggests an evidence hierarchy that must be tested explicitly:
```text
macro/eigen OK + OBF confirms
> macro/eigen OK + OBF neutral/choppy
> macro/eigen OK + OBF counters transiently but then flips confirming
> macro/eigen OK + OBF persistently counters with depth
> macro/eigen weak/conflicted regardless of OBF
```
The hierarchy is not a live rule. DASH shows that a very strong macro/eigen
impulse can overcome early OBF contradiction when the contradiction is shallow
or transient. ETC shows the stronger case, where OBF remained SHORT-confirming
through adverse price movement. XLM shows the weaker/riskier case, where macro
thesis persistence carried the trade while OBF was ugly at the worst point.
Candidate features:
```text
macro_obf_alignment_class
macro_extreme_impulse_score
obf_counter_transience_bars
obf_counter_depth_weighted
obf_flip_to_confirmation_latency_s
obf_confirmation_after_macro_impulse
macro_ok_obf_confirm_weight
macro_ok_obf_counter_weight
macro_extreme_overrides_obf_counter_weight
```
Required tests:
- rank outcomes by `macro_obf_alignment_class`;
- compare `macro OK + OBF confirm` against `macro OK + OBF counter`;
- split OBF counter cases into transient, shallow, persistent, and
depth-weighted contradiction;
- measure whether OBF flip-to-confirmation latency predicts TP speed;
- report whether extreme `vel_div` can safely receive more weight than early
OBF contradiction, and where that becomes unsafe;
- expose the learned hierarchy weights as VIBRISS-tunable parameters, not
hardcoded doctrine.
### 11.5.4 Falling-Knife / Missing-Bounce-Sensor Case Study
Live LTCUSDT observation on `2026-06-04` (`c0139cea`) adds an open/pending case
study for the opposite side of the DASH impulse capture. The trade entered SHORT
near `11:15:12 UTC` with extreme entry `vel_div` (`~ -0.1942`) and high notional,
but subsequently showed severe adverse excursion and no meaningful favorable
excursion at the time of review. V7 also emitted repeated `RETRACT`
recommendations, but V7 pressure is not treated as truth by itself; XLM showed
that V7 can scream during a trade that later recovers profitably.
Observed at review time:
- `inverse_ars_bounce_shadow` was stale; latest row was `2026-06-03 18:42:26
UTC`, so the bounce detector was not assisting live;
- V7 repeatedly emitted `RETRACT / V7_RISK_DOMINANT`, which is local-pain
evidence only;
- V7 observed `mae ~ 0.854%`, `mfe = 0`, and `exit_pressure = 3`;
- OBF was mostly neutral/choppy with weak, oscillating side-normalized evidence,
not a strong rescue signal;
- MARAS/BTC remained broadly bearish/low-conflict, but recent eigen values were
intermittent rather than steadily thesis-confirming.
Research meaning:
```text
macro/eigen entry impulse alone is insufficient when local danger is extreme,
MFE remains zero, OBF does not confirm, and the bounce/inverse-risk sensor is
missing or stale.
```
V7 pressure must be weighted conditionally:
```text
V7 pressure is discounted when macro thesis remains strong, OBF confirms, and
MFE exists.
V7 pressure receives more weight only when independent local invalidation
features agree: zero MFE, rising MAE, neutral/counter OBF, stale/missing bounce
sensor, macro impulse decay, or MARAS conflict/novelty.
```
Candidate features:
```text
bounce_sensor_freshness_s
bounce_sensor_missing_mask
extreme_macro_without_mfe
v7_retract_persistence_bars
zero_mfe_high_mae_flag
obf_neutral_or_counter_during_mae
macro_impulse_decay_after_entry
```
Required replay treatment:
- stale/missing bounce data must be an explicit mask feature, not an assumed
neutral score;
- compare extreme-entry trades that get early MFE against extreme-entry trades
with zero MFE and rising MAE;
- treat persistent V7 `RETRACT` as a local-danger amplifier only when confirmed
by independent invalidation sensors such as stale bounce, zero MFE, rising
MAE, neutral/counter OBF, or macro impulse decay;
- only promote a macro override if it survives this LTC-style case family after
opportunity-cost and tail-loss accounting.
### 11.6 Learning / Computing Model
V1 should use a two-layer policy:
1. Prior/posture estimator:
- computes candidate priors from historical replay by MARAS composite hash,
MARAS label, asset, side, and contiguous time region.
- uses shrinkage: hash prior -> label prior -> global prior.
- initializes the hold target near `12` bars unless the context prior has
enough evidence to move it.
2. Online contextual bandit:
- learner: discounted LinUCB or LinTS over finite hold-bar arms.
- arms: `[4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]`.
- reward: delayed until trade close or replay terminal.
- discount/window: sliding 300 closed trades, plus faster decay when drift is
detected.
- exploration: shadow-only by default; live exploration cap starts at `0`.
Recommended fallback if contextual coverage is sparse:
```text
if hash_sample_n >= 30:
prior = median_best_hold_for_hash
elif label_side_sample_n >= 100:
prior = median_best_hold_for_label_side + label_bias
else:
prior = 12
advice = guardrail_filter(contextual_bandit(prior, candidates))
```
Optional recovery model:
- Train a survival model for `extra_bars_to_recovery`.
- Use it only as a veto/adjuster until validated.
- It may increase hold only when recovery probability is high and expected
extra hold is short.
### 11.7 Success Definition
Primary success metric:
```text
recursive_capital_curve_delta_after_opportunity_cost
```
This means the replay must account for saved capital compounding forward, and
must subtract the opportunity cost of trades that would have recovered or won
after a premature floor/ADVSL action.
Secondary metrics:
- net PnL delta
- ROI delta
- max drawdown delta
- tail-loss count and severity
- number of hard/floor cuts
- number of clipped winners
- gross saved loss
- gross missed upside
- average and median recovery lag
- average and median extra bars to recovery
- TP near-miss count, TP near-miss recovery lag, and first-touch TP hit rate
- per-hash and per-label stability
- OOD region performance
- worst contiguous-region degradation
- explicit ceiling-violation count and worst single-loss size under the tested
policy, because a "best" replay result is not acceptable if it breaches the
operator's declared loss ceiling
Promotion requires:
- positive recursive capital-curve delta on held-out contiguous regions,
- no unacceptable increase in clipped-winner opportunity cost,
- no hidden dependence on a single asset or single MARAS hash,
- improvement or neutral behavior on EFSM-flipped LONG subset,
- deterministic replay reproducibility,
- shadow logging coverage sufficient for OPE.
### 11.8 Calibration Protocol
Calibration must run in this order:
1. Full-tape replay:
- evaluate every candidate hold arm on every eligible historical trade path.
- include all available BLUE/PINK/PRODGREEN executed trade history only when
namespace semantics are kept separate.
2. Capital-aware replay:
- recursively recompute capital after each counterfactual exit.
- preserve position sizing geometry when the saved/lost capital changes the
subsequent notional.
3. Opportunity-cost audit:
- for every floor/ADVSL cut, measure whether the trade later recovered.
- record recovery lag, extra bars, and missed PnL.
4. Region validation:
- split into contiguous time regions with enough trades.
- repeat with moving/randomized boundaries.
- report median/best hold per region.
5. MARAS proximity validation:
- group by composite hash when sample size is enough.
- otherwise use nearest-neighbor distance over MARAS raw signature fields.
- report whether per-hash/per-neighbor priors outperform global 12-bar center.
6. OBF validation:
- bind optimum hold to `obf_depth_1pct_usd`, `obf_depth_quality`, spread, and
imbalance.
7. TP near-miss validation:
- include trades that nearly touched candidate TP but missed on the observed
cadence.
- compute first-touch labels from the highest-resolution available path.
- isolate the opportunity cost of late reversal after near-touch.
- compare the resulting TP bucket against the profitable-close-only sample.
- test on OOD time slices; do not promote an OBF rule from in-sample fit only.
7. Walk-forward:
- train on region N, validate on N+1.
- repeat across the full history.
- freeze the learner if the current best policy degrades versus baseline.
### 11.9 Advice Payload
Example advice:
```json
{
"schema": "vibriss.param_set_advice.v1",
"namespace": "blue",
"param_set_id": "advsl.hold_substitute.v1",
"spec_version": "1.0.0",
"trade_scope": "on_entry",
"baseline_reference": 20,
"current_live_overlay_reference": 6,
"recommended": {
"advsl.min_hold_bars_before_floor_arm": 12,
"advsl.recovery_extension_max_bars": 0
},
"candidate_set": [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40],
"confidence": 0.74,
"context": {
"asset": "XLMUSDT",
"side": "LONG",
"maras_composite_hash": 57957,
"maras_regime": "CHOPPY_BEARISH",
"obf_depth_quality_bucket": "weak",
"v7_pressure_bucket": "high"
},
"guardrail_status": "SHADOW_ONLY",
"fallback_value": 12,
"expires_at": "2026-06-03T00:05:00Z"
}
```
### 11.10 Guardrails
Mandatory guardrails:
- Shadow-only until walk-forward validation is positive.
- No live exploration by default.
- Do not allow the learner to disable catastrophic floors.
- If OBF is stale, ignore OBF-derived hold extension.
- If MARAS confidence is low or conflict is high, shrink toward global prior.
- If context is EFSM-flipped LONG and LONG sample count is sparse, use the
tighter safe prior, not a broad SHORT-derived prior.
- If the recommended hold would increase worst-case open loss beyond the active
floor/cap, the floor/cap wins.
- If capital drawdown alarm is active, freeze to deterministic safe baseline.
### 11.11 Starting Priors From Current Research
Current replay-derived starting posture:
| Context | Starting prior | Rationale |
|---|---:|---|
| Global ADVSL hold substitute | `12` bars | Best current center for reducing 20-bar tail slips without assuming all contexts need long waits. |
| Legacy baseline comparison | `20` bars | Historical no-arm/min-hold reference. |
| Tight overlay reference | `6` bars | Current live overlay guardrail reference, not the general learned policy. |
| Recovery/snapback pockets | `24` to `40` bars | Some contiguous-region medians were materially longer; keep as candidates, not defaults. |
| Sparse/unknown context | `12` bars | Conservative research center with shrinkage. |
| EFSM-flipped LONG sparse context | `6` to `12` bars | Do not borrow broad SHORT recovery priors blindly. |
Known caution:
- A `$400` hard cap improved one capital-aware slice by about `+$592.83` versus
the 12-bar-only replay, but generated a gross forgone-upside bucket around
`+$6,617.30` on hard-cap hits. Therefore max-loss floors must be evaluated
with opportunity cost and recovery lag, not judged by saved-loss totals alone.
### 11.12 Promotion Policy
Promotion is part of this ParamSet, not a global runner decision.
```yaml
promotion_policy:
owner: advsl.hold_substitute.v1
technique: replay_shadow_canary
baseline_policy:
legacy_reference: 20
current_overlay_reference: 6
fallback_value: 12
cadence:
replay_calibration: every_6h_or_50_new_rewards
promotion_review: every_30m
checkpoint_review: every_60s
live_replacement_rhythm: at_trade_entry_only
evidence_gates:
shadow_to_advisory:
min_replay_trades: 300
min_contiguous_regions: 4
recursive_capital_curve_delta_after_cost: "> 0"
worst_region_delta: ">= -0.10 * positive_total_delta"
clipped_winner_cost_budget: "documented_and_bounded"
advisory_to_canary_live:
min_shadow_decisions: 200
min_closed_trade_rewards: 50
min_days_observed: 3
no_unexplained_tail_loss_cluster: true
manual_approval_required: true
canary_live_to_controlled_live:
min_live_consumed_trades: 50
live_vs_shadow_regret: "<= 0"
no_guardrail_violation: true
manual_approval_required: true
canary_scope:
namespaces: [blue]
max_paramsets_live: 1
max_live_exploration_rate: 0.0
allow_only_capture_on_entry: true
automatic_demotion:
- stale_obf_or_maras_required_context
- reward_backlog_critical
- drawdown_alarm
- candidate_underperforms_baseline_in_shadow
- checkpoint_hash_mismatch
```
Interpretation:
- `replay_calibration` answers how often the ParamSet re-estimates candidate
quality from historical/newly closed data.
- `promotion_review` answers how often the ParamSet is checked for stronger
mode eligibility.
- `live_replacement_rhythm` answers when the engine may replace the old
parameter with the VIBRISS value. For this ParamSet it is only at trade entry.
- The runner executes this contract. It does not invent promotion thresholds.
### 11.13 Meta-Cadence Policy
The cadence parameters are themselves governed by this ParamSet. They are not
free-floating daemon settings.
```yaml
meta_cadence_policy:
owner: advsl.hold_substitute.v1
status: shadow_first
learner: discounted_ucb_then_linucb
tunable_cadences:
replay_calibration_interval_s:
baseline: 21600
candidates: [1800, 3600, 10800, 21600, 43200]
promotion_review_interval_s:
baseline: 1800
candidates: [900, 1800, 3600, 7200]
checkpoint_interval_s:
baseline: 60
candidates: [30, 60, 120, 300]
min_new_rewards_before_recalibration:
baseline: 50
candidates: [10, 25, 50, 100]
shadow_to_canary_cooldown_trades:
baseline: 100
candidates: [25, 50, 100, 200]
context_inputs:
maras:
- maras_composite_hash
- maras_confidence
- maras_conflict_level
- maras_nearest_distance
exof:
- exf_latest
- btc_regime_features
- market_volatility_context
esof:
- session_bucket
- day_of_week
- calendar_event_flags
ops:
- reward_backlog_age_s
- ch_write_failure_rate
- artifact_disk_free_gb
- drawdown_state
reward_mapping:
positive:
- faster_detection_of_degraded_hold_policy
- lower_stale_advice_rate
- lower_missed_adaptation_cost
negative:
- promotion_false_positive
- noisy_recalibration_churn
- excessive_compute_or_backlog
- operator_churn
live_change_policy:
replay_calibration_interval_s: controlled_after_shadow
promotion_review_interval_s: advisory_only_until_manual_approval
checkpoint_interval_s: fixed_by_ops_until_runner_load_tested
shadow_to_canary_cooldown_trades: advisory_only
```
This makes MARAS, ExoF, and EsoF eligible context for cadence advice. For
example, VIBRISS may learn that high MARAS novelty plus hostile ExoF context
requires faster recalibration review, while ordinary stable regimes can use a
slower cadence to avoid overreacting.
Cadence testing is permitted, but first in shadow:
- log what cadence would have been chosen;
- replay whether that cadence would have detected degradation sooner;
- charge compute/backlog cost;
- charge false-promotion cost;
- compare against fixed-cadence baseline.
Only after the meta-cadence policy beats fixed cadence in walk-forward replay
and shadow operation may it control any real scheduler interval.
### 11.14 Catastrophic Floor Derivation Study
The floor percentage is now a dedicated shadow-only VIBRISS research target.
```yaml
param_set:
id: advsl.catastrophic_floor_derivation.v1
name: ADVSL Catastrophic Floor Derivation
status: shadow_first
success:
primary_metric: recursive_capital_curve_delta_after_opportunity_cost
artifact_kinds: [code, test, spec]
artifact_refs:
- prod/vibriss/floor_derivation.py
- prod/vibriss/test_floor_derivation.py
- prod/docs/ADVSL_CATASTROPHIC_FLOOR_DERIVATION_STUDY.md
- prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md
```
Current full-tape replay on the blue trade tape:
- replayable trades: `802`
- actual end capital: `$51,937.21`
- floor-only best aggregate candidate: `1.50%`
- floor-only per-regime averages: still centered at `0.50%`
Interpretation:
- this study does **not** validate `1.20%` as a universal standalone floor;
- it validates the need for a derivation path and the ability to bind the
floor to code/test/spec evidence;
- `1.20%` remains a coupled-policy prior for the broader ADVSL/TP/hold stack,
not a floor-only truth.
The floor-only study must remain shadow-only. Live use may only follow a
coupled policy that demonstrates positive recursive capital curve delta on
held-out contiguous regions.
### 11.15 Acceptance Tests
Minimum tests before implementation can be called complete:
- Given a fixed replay window, the same hold recommendation and reward are
reproduced bit-for-bit or within declared float tolerance.
- Candidate arms outside the hard range are rejected.
- Stale OBF creates a masked feature, not a fake zero-depth observation.
- Low MARAS confidence or high conflict shrinks advice toward the global prior.
- EFSM-flipped LONG contexts do not use unqualified SHORT-only priors.
- Capital-aware replay compounds saved/lost capital forward.
- Opportunity cost is charged when a cut trade later recovers.
- The shadow advice payload contains candidate set, chosen arm, confidence,
baseline, guardrail result, and reproducibility keys.
- Promotion decisions are rejected when the ParamSet omits `promotion_policy`.
- Meta-cadence advice is logged as a ParamSet decision, not a runner-local
heuristic.
## 12. VIBRISS Ops / Runner System
### 12.1 Operational Objective
VIBRISS must run as an observable production subsystem, not as an ad hoc
notebook or one-off replay script.
The runner is responsible for:
- loading parameter specs and ParamSet specs,
- ingesting live context from Hazelcast and historical context from ClickHouse,
- publishing shadow/advisory parameter postures,
- scheduling replay/calibration subtasks,
- writing full audit logs,
- exposing health sensors to MHS,
- feeding TUI/observability surfaces,
- checkpointing learner state so recommendations are reproducible after restart.
The runner must reuse the existing infrastructure pattern:
- supervisord is the process authority;
- Hazelcast is the live bus;
- ClickHouse is the audit/event store;
- NATS is the optional event transport for replay, reward, and policy-state
fanout when decoupled workers or durable queues are useful;
- MHS reads composite health from HZ and reports it in `DOLPHIN_META_HEALTH`;
- TUI observes primarily through HZ listeners and polls CH only for heavier
historical panels;
- Prefect is optional for scheduled offline jobs, not required for the hot
VIBRISS daemon.
### 12.2 Process Topology
VIBRISS should be containerized, but still owned by supervisord.
In the current production layout, the host supervisord owns only the
container bootstrap wrapper; the container itself runs its own supervisord
instance, which owns the live runner process. That makes later full-system
containerization easier without changing the runner contract.
If sandboxing is enabled, gVisor is the outer runtime boundary for the
container or worker container. VIBRISS does not instantiate or manage gVisor
from inside the container; the host/container runtime selects that boundary at
launch time. The containerized runner must still reach host Hazelcast and
ClickHouse over the configured backplane. If NATS is enabled, it runs as a
sibling stack service on the host backplane and the container talks to it over
`nats://localhost:4222`.
Recommended process shape:
```text
supervisord
-> vibriss_runner container
-> live advice loop
-> spec loader
-> health publisher
-> lightweight replay scheduler
-> learner checkpoint writer
-> optional vibriss_worker container(s)
-> full-tape replay
-> walk-forward validation
-> OBF/MARAS proximity calibration
-> offline policy evaluation
```
The live runner is a long-lived daemon. Heavy replay/calibration jobs are
separate subtasks so the live advice loop cannot be blocked by ML work.
The experiment-side harness that replays trade episodes, sweep ranges, and
walk-forward windows is specified separately in
[`VIBRASS_EXPERIMENT_RUNNER_SPEC.md`](VIBRASS_EXPERIMENT_RUNNER_SPEC.md).
Container runtime:
- Docker or Podman is acceptable.
- Prefer Podman if rootless isolation becomes important.
- Optional sandbox runtime: gVisor may wrap the launched container or worker
container, but it is selected outside VIBRISS by the host/container runtime.
VIBRISS must not attempt to manage the sandbox boundary from inside the
container.
- Do not put Hazelcast in the VIBRISS container.
- Do not restart Hazelcast as part of VIBRISS recovery.
- Mount large replay outputs to `/mnt/dolphin_training/vibriss/`, not the SMB
repo path.
- Write only small docs/specs to `/mnt/dolphinng5_predict/prod/docs/`.
### 12.3 Supervisor Contract
Recommended supervisord entries:
```ini
[program:vibriss_runner]
command=/usr/bin/podman run --rm --name dolphin-vibriss-runner
--network host
-v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro
-v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw
-v /mnt/ng6_data:/mnt/ng6_data:ro
-e HZ_HOST=localhost:5701
-e CH_URL=http://localhost:8123/
-e CH_DB=dolphin
dolphin-vibriss:latest
python -m vibriss.runner --mode shadow
directory=/mnt/dolphinng5_predict/prod
autostart=true
autorestart=true
startsecs=10
startretries=5
stopwaitsecs=20
stopasgroup=true
killasgroup=true
stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner.log
stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner-error.log
[program:vibriss_worker]
command=/usr/bin/podman run --rm --name dolphin-vibriss-worker
--network host
-v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro
-v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw
-v /mnt/ng6_data:/mnt/ng6_data:ro
dolphin-vibriss:latest
python -m vibriss.worker --idle
directory=/mnt/dolphinng5_predict/prod
autostart=false
autorestart=false
startsecs=0
stopwaitsecs=30
stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker.log
stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker-error.log
```
Group placement:
```ini
[group:dolphin_data]
programs=exf_fetcher,acb_processor,obf_universe,meta_health,system_stats,
esof_advisor,maras_service,vibriss_runner
```
Rationale:
- VIBRISS is data/control-plane infrastructure, not the trader itself.
- The runner can be autostarted because it begins shadow-only.
- Workers remain manual or scheduler-launched because full replay can be heavy.
- MHS must observe VIBRISS health, but must not fight the container runtime
through systemd.
### 12.4 Container Interface
Required environment variables:
| Env | Meaning |
|---|---|
| `HZ_HOST` | Hazelcast host/port, default `localhost:5701`. |
| `CH_URL` | ClickHouse HTTP URL. |
| `CH_DB` | Namespace DB: `dolphin`, `dolphin_prodgreen`, or PINK-specific DB. |
| `CH_USER` / `CH_PASS` | ClickHouse credentials. |
| `NATS_URL` | Optional NATS server URL, default `nats://localhost:4222`. |
| `VIBRISS_ENABLE_NATS_TRANSPORT` | Enable best-effort NATS publication. |
| `VIBRISS_NATS_SUBJECT_PREFIX` | Subject prefix, default `vibriss`. |
| `VIBRISS_MODE` | `shadow`, `advisory`, `canary`, or `disabled`. |
| `VIBRISS_NAMESPACE` | `blue`, `pink`, `prodgreen`, or `research`. |
| `VIBRISS_SPEC_DIR` | Param spec directory. |
| `VIBRISS_STATE_DIR` | Checkpoint/output directory. |
| `VIBRISS_ENABLE_LIVE_ACTUATION` | Must default to `0`. |
| `VIBRISS_CALIBRATION_INTERVAL_S` | Default replay/calibration scheduler interval. |
| `VIBRISS_PROMOTION_REVIEW_INTERVAL_S` | Default promotion-gate review interval. |
| `VIBRISS_META_CADENCE_MODE` | `fixed`, `shadow`, or `controlled`; defaults to `fixed`. |
| `VIBRISS_MHS_SENSOR_KEY` | Default `vibriss_sensors_blue`. |
| `VIBRISS_HEALTH_INTERVAL_S` | Default `5`. |
Filesystem contract:
| Path | Mode | Use |
|---|---|---|
| `/mnt/dolphinng5_predict` | read-only in container | Code/spec/doc access. |
| `/mnt/dolphin_training/vibriss` | read-write | Learner state, replay artifacts, reports. |
| `/mnt/ng6_data` | read-only | Tape, OBF, scan data. |
| `/tmp` inside container | read-write ephemeral | Small temporary files only. |
### 12.5 Internal Runner Loops
The runner should have separate loops with independent health status:
| Loop | Cadence | Responsibility |
|---|---:|---|
| `spec_loader` | startup + 60s | Load/validate ParamSpec and ParamSetSpec files. |
| `context_ingestor` | 0.5s to 5s | Read HZ live context and keep a point-in-time snapshot. |
| `advice_loop` | on context/trade event | Score candidates and publish shadow/advisory advice. |
| `reward_collector` | 10s to 60s | Join closed trades to advice and write delayed rewards. |
| `checkpoint_loop` | 60s | Persist learner state and model metadata. |
| `calibration_scheduler` | 5m+ | Queue replay/validation subtasks when new data warrants it. |
| `promotion_evaluator` | 15m+ | Evaluate whether a ParamSet may move to a stronger mode. |
| `meta_cadence_evaluator` | 15m+ | Shadow-test cadence settings for calibration/promotion/update loops. |
| `health_publisher` | 5s | Publish MHS-compatible sensor payload. |
The advice loop must never wait on full replay, model training, or ClickHouse
backfill. If ClickHouse is slow, advice may continue from latest checkpoint and
mark reward collection degraded.
### 12.6 Hazelcast Surfaces
Recommended HZ maps/keys:
| Map | Key | Producer | Consumer | Purpose |
|---|---|---|---|---|
| `DOLPHIN_FEATURES` | `vibriss_param_advice` | runner | BLUE/PINK/TUI | Latest general parameter advice. |
| `DOLPHIN_FEATURES` | `vibriss_hold_substitute_advice` | runner | ADVSL/TUI | Latest ADVSL hold-substitute advice. |
| `DOLPHIN_FEATURES` | `vibriss_latest` | runner | TUI/MHS/manual ops | Compact subsystem summary. |
| `DOLPHIN_META_HEALTH` | `vibriss_sensors_blue` | runner | MHS | BLUE VIBRISS sensor payload. |
| `DOLPHIN_META_HEALTH` | `vibriss_sensors_pink` | runner | MHS | PINK VIBRISS sensor payload. |
| `DOLPHIN_HEARTBEAT` | `vibriss_runner_heartbeat` | runner | MHS/TUI | Liveness heartbeat. |
| `DOLPHIN_CONTROL_PLANE` | `vibriss_commands` | ops/TUI | runner | Freeze, unfreeze, replay, reload specs. |
Advice remains separate from commands. An advice key tells the engine what
VIBRISS recommends; a command key tells VIBRISS what operators want it to do.
### 12.7 ClickHouse Tables
VIBRISS needs durable audit tables. Recommended tables:
| Table | Purpose |
|---|---|
| `dolphin.vibriss_decisions` | One row per candidate-scoring decision. |
| `dolphin.vibriss_rewards` | Delayed realized/counterfactual reward rows. |
| `dolphin.vibriss_policy_state` | Checkpoint metadata and active posture versions. |
| `dolphin.vibriss_paramset_status` | Per-ParamSet health/performance summary. |
| `dolphin.vibriss_subtasks` | Replay/calibration/ML subtask lifecycle. |
Minimum `vibriss_decisions` fields:
```sql
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
mode LowCardinality(String),
param_set_id LowCardinality(String),
spec_version String,
decision_id String,
trade_id String,
asset LowCardinality(String),
side LowCardinality(String),
scan_number UInt64,
context_hash String,
maras_composite_hash UInt16,
maras_regime LowCardinality(String),
candidate_set_json String,
chosen_arm String,
baseline_value String,
recommended_value String,
confidence Float32,
propensity Float32,
guardrail_status LowCardinality(String),
fallback_reason String,
model_version String,
payload_json String
```
Minimum `vibriss_rewards` fields:
```sql
ts DateTime64(6, 'UTC'),
decision_id String,
trade_id String,
reward_status LowCardinality(String),
raw_actual_pnl Float64,
raw_counterfactual_pnl Float64,
saved_loss_delta Float64,
clipped_winner_delta Float64,
capital_curve_delta Float64,
drawdown_delta Float64,
recovery_lag_s Float32,
extra_bars_to_recovery Float32,
normalized_reward Float32,
reward_components_json String
```
Subtask rows must include `subtask_id`, `param_set_id`, `kind`, `status`,
`started_at`, `finished_at`, `input_window`, `artifact_path`, `n_trades`,
`primary_metric`, `failure_reason`, and `parent_decision_id` when applicable.
### 12.8 MHS Sensor Contract
VIBRISS should expose an MHS-compatible composite payload, modeled after the
existing optional DITA sensor pattern.
Recommended HZ key:
```text
DOLPHIN_META_HEALTH["vibriss_sensors_blue"]
```
Payload:
```json
{
"schema": "vibriss.mhs_sensors.v1",
"namespace": "blue",
"ts": "2026-06-03T00:00:00Z",
"rm_meta": 0.93,
"status": "GREEN",
"m14_vibriss_runner_liveness": 1.0,
"m15_vibriss_spec_integrity": 1.0,
"m16_vibriss_data_freshness": 0.9,
"m17_vibriss_advice_integrity": 1.0,
"m18_vibriss_reward_backlog": 0.85,
"m19_vibriss_paramset_health": 0.95,
"param_sets": {
"advsl.hold_substitute.v1": {
"score": 0.94,
"status": "GREEN",
"mode": "shadow",
"last_advice_age_s": 2.4,
"last_reward_age_s": 31.0,
"open_decisions": 1,
"reward_backlog": 3,
"shadow_samples": 240,
"walk_forward_status": "pending",
"latest_recommended_hold": 12
}
},
"subtasks": {
"full_tape_replay": {"score": 1.0, "status": "IDLE"},
"walk_forward": {"score": 0.8, "status": "STALE"},
"obf_binding": {"score": 1.0, "status": "IDLE"}
}
}
```
Sensor scoring:
| Sensor | Score rule |
|---|---|
| `m14_vibriss_runner_liveness` | 1 if heartbeat age < 15s, 0.5 if < 60s, else 0. |
| `m15_vibriss_spec_integrity` | Fraction of loaded specs passing validation. |
| `m16_vibriss_data_freshness` | Freshness of HZ context, CH close rows, OBF/MARAS context. |
| `m17_vibriss_advice_integrity` | 1 when latest advice is schema-valid and guardrailed. |
| `m18_vibriss_reward_backlog` | Penalizes unjoined decisions awaiting reward too long. |
| `m19_vibriss_paramset_health` | Mean score of all enabled ParamSets. |
MHS integration rule:
- VIBRISS starts with weight `0.0` in RM_META until stable.
- Then enable a small optional weight, analogous to DITA sensors.
- Suggested initial weight: `0.02`.
- Maximum allowed weight: `0.10` until the subsystem is live-actuating.
- If VIBRISS is disabled, MHS score must be neutral and must not degrade BLUE.
Suggested MHS env shape:
```text
DOLPHIN_MHS_USE_VIBRISS_SENSORS=1
DOLPHIN_MHS_VIBRISS_SENSOR_WEIGHT=0.02
DOLPHIN_VIBRISS_SENSOR_KEY=vibriss_sensors_blue
DOLPHIN_MHS_VIBRISS_SENSOR_MAPS=DOLPHIN_META_HEALTH,DOLPHIN_FEATURES
```
### 12.9 Observability / TUI Integration
TUI integration should follow the existing v9 pattern:
- use HZ listeners for latest VIBRISS state;
- add CH polling only for historical/replay-heavy summaries;
- never poll origin subsystems directly from the TUI.
Recommended panels:
| Panel | Source | Cadence | Content |
|---|---|---:|---|
| `VIBRISS` main panel | `DOLPHIN_FEATURES/vibriss_latest` | HZ listener | mode, status, latest ParamSet advice, confidence, MHS score. |
| `VIBRISS Hold` footer | `vibriss_hold_substitute_advice` + CH rewards | HZ + 60s CH | recommended hold, baseline, prior, reward backlog, recent net delta. |
| `VIBRISS Tasks` footer | `vibriss_subtasks` | 60s CH | replay/walk-forward/OBF binding status. |
| `MHS` existing panel | `DOLPHIN_META_HEALTH/latest` | HZ listener | include VIBRISS sensor details if enabled. |
Display fields for `advsl.hold_substitute.v1`:
```text
VIBRISS HOLD mode=shadow rec=12b base=20b live_ref=6b
conf=74% guard=PASS hash=57957 obf=weak pressure=high
reward_backlog=3 wf=pending samples=240
```
The TUI must clearly distinguish:
- baseline reference,
- current live reference,
- VIBRISS recommendation,
- whether recommendation is shadow-only or live-consumed.
Implementation note:
- `prod/vibriss/vibriss_tui.py` now provides the Textual dashboard, and
`python -m vibriss.vibriss_runner tui` launches it in read-only shadow mode.
- The UI is panel-registry based so additional metrics can be added without
rewriting the dashboard shell.
### 12.10 Control Commands
Commands should be written to `DOLPHIN_CONTROL_PLANE["vibriss_commands"]`.
Allowed commands:
| Command | Effect |
|---|---|
| `RELOAD_SPECS` | Reload ParamSpec/ParamSetSpec files and validate. |
| `FREEZE_PARAMSET` | Stop updating and publish fallback for one ParamSet. |
| `UNFREEZE_PARAMSET` | Resume shadow/advisory scoring. |
| `RUN_REPLAY` | Queue replay subtask for a parameter set/window. |
| `RUN_WALK_FORWARD` | Queue walk-forward validation. |
| `SET_MODE` | Move `disabled -> shadow -> advisory`; live/canary requires explicit code/config gate. |
| `CHECKPOINT_NOW` | Persist learner state immediately. |
Commands must be acknowledged to:
```text
DOLPHIN_CONTROL_PLANE["vibriss_command_ack"]
```
Ack payloads must include command id, acceptance/rejection, reason, and current
mode. Queue consumption alone is not success.
### 12.11 Prefect Role
Prefect is optional for VIBRISS. It should not be required for live advice.
Acceptable Prefect use:
- daily full-tape replay,
- scheduled walk-forward validation,
- artifact publication,
- long offline calibration runs.
Not acceptable:
- live advice loop,
- hot-path reward joining,
- health publication,
- operator freeze/unfreeze commands.
If Prefect is unavailable, the VIBRISS runner should continue shadow/advisory
operation from the last checkpoint and mark scheduled calibration stale.
### 12.12 Failure Modes and Fallback
| Failure | Required behavior |
|---|---|
| HZ unavailable | Runner logs degraded, cannot publish advice, MHS score <= 0.5. |
| CH unavailable | Advice may continue from checkpoint; reward collector degrades. |
| OBF stale | Mask OBF features; do not use OBF hold extension. |
| MARAS stale | Shrink to global/label-free prior. |
| Spec validation failure | Disable affected ParamSet, publish fallback. |
| Learner checkpoint corrupt | Revert to last good checkpoint or baseline prior. |
| Replay worker OOM/fails | Mark subtask failed; live runner continues. |
| Advice schema invalid | Do not publish; MHS advice integrity drops. |
| Drawdown alarm | Freeze to deterministic safe baseline. |
### 12.13 Promotion Gates
Before any engine consumes VIBRISS hold advice live:
1. Runner has been stable for at least 7 calendar days.
2. MHS VIBRISS sensors are GREEN or neutral for 95% of runner uptime.
3. `advsl.hold_substitute.v1` has completed full-tape replay.
4. Walk-forward is positive versus baseline on capital-curve delta after
opportunity cost.
5. OOD region performance has no catastrophic degradation.
6. TUI displays baseline/current/recommended state correctly.
7. Command ack path is verified.
8. Safe fallback is tested by intentionally freezing the ParamSet.
9. Engine consumption is limited to one ParamSet and one namespace.
10. `VIBRISS_ENABLE_LIVE_ACTUATION=1` is explicitly set and reviewed.
## 13. V1 Rollout Plan
1. Offline replay only:
- replay historical decisions from ClickHouse and tape.
- benchmark against baseline constants.
- compute OPE where logged propensities exist.
- report by asset, side, MARAS hash, regime label, V7 reason, OBF bucket,
and contiguous time region.
2. Shadow mode:
- publish advice to HZ.
- do not allow engine consumption.
- write `vibriss_decisions`, `vibriss_rewards`, and `vibriss_policy_state`.
3. Guarded advisory:
- engine reads advice and surfaces what it would have used.
- still no actuation.
4. Canary live:
- one parameter only.
- no simultaneous bundle changes.
- low exploration cap.
- hard fallback on stale data, drawdown alarm, or drift alarm.
5. Controlled live comparison:
- compare baseline-vs-advised on matched contexts.
- freeze policy if replay quality deteriorates.
## 14. Safety Rules
Mandatory:
- no direct mutation of `blue.yml` or frozen champion config from VIBRISS.
- no live promotion without replay, shadow, and documented approval.
- no advice consumption when data is stale.
- no advice consumption inside disallowed live-change windows.
- no multi-parameter bundle learning until single-parameter learners prove that
independent adaptation is insufficient.
- every live-consumed recommendation must be reconstructable from logs.
- every safety-critical parameter must preserve a catastrophic fallback floor.
## 15. Concrete Storage and Schema
VIBRISS must be event-sourced. Current policy state is a cache; decisions and
rewards are the durable truth.
### 15.1 ClickHouse DDL
Recommended DDL:
```sql
CREATE TABLE IF NOT EXISTS dolphin.vibriss_decisions
(
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
mode LowCardinality(String),
param_set_id LowCardinality(String),
spec_version String,
decision_id String,
parent_decision_id String,
trade_id String,
asset LowCardinality(String),
side LowCardinality(String),
scan_number UInt64,
bars_held UInt32,
context_hash String,
context_schema String,
maras_composite_hash UInt32,
maras_scalar_hash UInt32,
maras_regime LowCardinality(String),
maras_confidence Float32,
maras_conflict Float32,
obf_stale UInt8,
obf_depth_1pct_usd Float64,
obf_depth_quality Float32,
v7_pressure Float32,
v7_mae_risk Float32,
candidate_set_json String,
chosen_arm String,
baseline_value String,
recommended_value String,
confidence Float32,
propensity Float32,
guardrail_status LowCardinality(String),
fallback_reason String,
model_version String,
policy_version String,
compiled_config_hash String,
consumed UInt8,
consumed_ts Nullable(DateTime64(6, 'UTC')),
payload_json String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (namespace, param_set_id, ts, decision_id)
TTL ts + INTERVAL 180 DAY;
CREATE TABLE IF NOT EXISTS dolphin.vibriss_rewards
(
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
param_set_id LowCardinality(String),
decision_id String,
trade_id String,
reward_status LowCardinality(String),
reward_delay_s Float32,
actual_exit_reason LowCardinality(String),
counterfactual_exit_reason LowCardinality(String),
actual_exit_pnl Float64,
counterfactual_exit_pnl Float64,
saved_loss_delta Float64,
clipped_winner_delta Float64,
capital_curve_delta Float64,
drawdown_delta Float64,
recovery_lag_s Float32,
extra_bars_to_recovery Float32,
normalized_reward Float32,
opportunity_cost_charged UInt8,
replay_artifact_path String,
reward_components_json String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (namespace, param_set_id, ts, decision_id)
TTL ts + INTERVAL 365 DAY;
CREATE TABLE IF NOT EXISTS dolphin.vibriss_policy_state
(
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
param_set_id LowCardinality(String),
policy_version String,
mode LowCardinality(String),
learner LowCardinality(String),
checkpoint_path String,
checkpoint_hash String,
spec_hash String,
compiled_config_hash String,
n_decisions UInt64,
n_rewards UInt64,
shadow_samples UInt64,
walk_forward_status LowCardinality(String),
active_baseline_value String,
active_recommended_value String,
confidence Float32,
state_json String
)
ENGINE = ReplacingMergeTree(ts)
ORDER BY (namespace, param_set_id, policy_version);
CREATE TABLE IF NOT EXISTS dolphin.vibriss_subtasks
(
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
subtask_id String,
param_set_id LowCardinality(String),
kind LowCardinality(String),
status LowCardinality(String),
started_at DateTime64(6, 'UTC'),
finished_at Nullable(DateTime64(6, 'UTC')),
input_window String,
n_trades UInt64,
n_decisions UInt64,
primary_metric Float64,
baseline_metric Float64,
artifact_path String,
artifact_hash String,
failure_reason String,
payload_json String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(started_at)
ORDER BY (namespace, param_set_id, started_at, subtask_id)
TTL started_at + INTERVAL 365 DAY;
CREATE TABLE IF NOT EXISTS dolphin.vibriss_promotions
(
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
param_set_id LowCardinality(String),
promotion_id String,
from_mode LowCardinality(String),
to_mode LowCardinality(String),
requested_by LowCardinality(String),
approved_by LowCardinality(String),
policy_version String,
checkpoint_hash String,
evidence_window String,
n_decisions UInt64,
n_rewards UInt64,
n_shadow_samples UInt64,
n_live_samples UInt64,
recursive_capital_delta Float64,
opportunity_cost_delta Float64,
max_drawdown_delta Float64,
worst_region_delta Float64,
baseline_metric Float64,
candidate_metric Float64,
guardrail_status LowCardinality(String),
decision LowCardinality(String),
reason String,
artifact_path String,
payload_json String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (namespace, param_set_id, ts, promotion_id)
TTL ts + INTERVAL 730 DAY;
CREATE TABLE IF NOT EXISTS dolphin.vibriss_meta_cadence_decisions
(
ts DateTime64(6, 'UTC'),
namespace LowCardinality(String),
param_set_id LowCardinality(String),
cadence_id LowCardinality(String),
decision_id String,
mode LowCardinality(String),
context_hash String,
maras_composite_hash UInt32,
maras_regime LowCardinality(String),
exof_state String,
esof_state String,
candidate_set_json String,
chosen_value String,
baseline_value String,
confidence Float32,
reward_status LowCardinality(String),
reward_value Float32,
guardrail_status LowCardinality(String),
fallback_reason String,
policy_version String,
payload_json String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(ts)
ORDER BY (namespace, param_set_id, cadence_id, ts, decision_id)
TTL ts + INTERVAL 365 DAY;
```
These tables are deliberately narrow enough for hot audit reads and broad enough
to replay the decision. Large path arrays, per-bar simulations, and model
artifacts must be written to artifact storage, not inlined into ClickHouse.
### 15.2 Artifact Layout
Use a non-SMB path for generated artifacts:
```text
/mnt/dolphin_training/vibriss/
specs/
advsl.hold_substitute.v1.yaml
checkpoints/
blue/advsl.hold_substitute.v1/<policy_version>/
state.json
learner.pkl
manifest.json
replays/
<YYYY-MM-DD>/<subtask_id>/
config.yaml
replay_summary.json
capital_curve.csv
per_trade_counterfactuals.parquet
opportunity_cost_audit.parquet
reports/
walk_forward/
obf_binding/
maras_hash_priors/
```
Every artifact directory must contain a `manifest.json`:
```json
{
"schema": "vibriss.artifact_manifest.v1",
"subtask_id": "wf-20260603-001",
"param_set_id": "advsl.hold_substitute.v1",
"namespace": "blue",
"created_at": "2026-06-03T00:00:00Z",
"git_sha": "unknown-or-sha",
"spec_hash": "sha256:...",
"input_tables": {
"trade_events": {"min_ts": "...", "max_ts": "...", "row_count": 1234},
"v7_decision_events": {"min_ts": "...", "max_ts": "...", "row_count": 9999}
},
"tape_sources": ["/mnt/ng6_data/arrow_scans/..."],
"random_seed": 0,
"artifact_hashes": {
"replay_summary.json": "sha256:...",
"per_trade_counterfactuals.parquet": "sha256:..."
}
}
```
## 16. Replay, OPE, and Causality Rules
VIBRISS must be explicit about what kind of evidence it has.
Evidence classes:
| Class | Meaning | Allowed use |
|---|---|---|
| `realized_live` | Parameter was actually used live. | Highest-quality reward. |
| `shadow_counterfactual` | Advice logged, baseline used, tape can replay alternative. | OPE/research only unless validated. |
| `historical_replay` | Offline replay over historical trades with no logged propensity. | Calibration prior, not proof. |
| `synthetic_mc` | Monte Carlo augmentation from validated distribution. | Stress coverage only. |
| `expert_baseline` | Human/research default such as 12 bars. | Fallback/prior. |
Counterfactual replay must store:
- actual entry, actual exit, and actual capital before/after;
- counterfactual exit scan/bar and price;
- whether the counterfactual exit depends on sub-bar, bar-close, or tape-close
cadence;
- whether the trade later recovered;
- how many bars/seconds were needed for recovery;
- opportunity cost charged;
- recursive capital state after applying the counterfactual.
OPE rules:
- Use inverse propensity or doubly robust estimators only when propensities were
actually logged.
- Do not pretend historical replay has logged propensities.
- For shadow decisions without randomized action, report them as model
counterfactuals, not causal estimates.
- Region splits must be contiguous first; randomized splits are secondary
robustness checks only.
- A policy that wins by one tail event and loses broadly must be flagged as
fragile even when net capital delta is positive.
Minimum replay report:
```text
baseline_end_capital
policy_end_capital
recursive_delta
gross_saved_loss
gross_opportunity_cost
net_trade_pnl_delta
max_drawdown_delta
tail_loss_count_delta
clipped_winner_count
recovered_cut_count
median_recovery_lag_s
worst_region_delta
best_region_delta
per_asset_concentration
per_hash_concentration
```
## 17. Mode State Machine
VIBRISS modes are explicit and monotonic unless an operator command or guardrail
forces demotion.
```text
disabled
-> shadow
-> advisory
-> canary_live
-> controlled_live
```
Mode meanings:
| Mode | Publishes advice | Engine may read | Engine may act | Learner updates |
|---|---:|---:|---:|---:|
| `disabled` | no | no | no | no |
| `shadow` | yes | no | no | yes |
| `advisory` | yes | yes, display only | no | yes |
| `canary_live` | yes | yes | yes, one ParamSet/namespace | yes |
| `controlled_live` | yes | yes | yes, bounded | yes |
Automatic demotions:
- stale required sensor -> `shadow` or fallback advice;
- invalid spec -> affected ParamSet disabled;
- reward backlog beyond threshold -> freeze learner updates;
- drawdown alarm -> deterministic safe baseline;
- ClickHouse unavailable -> keep publishing only if checkpoint is fresh; mark
reward collection degraded;
- Hazelcast unavailable -> no advice publication;
- policy drift alarm -> freeze to last known-good checkpoint.
Promotion technique, thresholds, cadence, and evidence gates must be declared
inside the affected ParamSet spec. The runner evaluates and records those gates;
it is not allowed to invent a promotion policy from global defaults.
Promotion must be manual and auditable for any transition that enables live
actuation. No health recovery path may silently promote VIBRISS into a stronger
actuation mode.
### 17.1 ParamSet-Owned Promotion Lifecycle
Every ParamSet must answer these questions before it can leave `shadow`:
| Question | Required ParamSet field |
|---|---|
| What baseline is being challenged? | `promotion_policy.baseline_policy` |
| What evidence class is allowed? | `promotion_policy.technique` and `evidence_gates` |
| How often is the evidence recomputed? | `promotion_policy.cadence.replay_calibration` |
| How often is promotion eligibility reviewed? | `promotion_policy.cadence.promotion_review` |
| When may the engine replace the old value? | `promotion_policy.cadence.live_replacement_rhythm` |
| What samples are required? | `promotion_policy.evidence_gates.*min*` |
| What demotes it? | `promotion_policy.automatic_demotion` |
| Who approves live use? | `promotion_policy.*manual_approval_required` |
Promotion is also subject to the control-plane elegance constraints in §4.1:
one writer per parameter, spec-owned promotion, slow-governed meta-cadence,
context inputs instead of arbitrary controllers, reproducible live changes, no
hidden cross-subsystem mutation, and shadow/replay/canary before live.
Default lifecycle:
```text
historical_replay
-> walk_forward_replay
-> shadow_advice_logging
-> advisory_display
-> canary_live_capture
-> controlled_live
```
The cadence of each phase is also ParamSet-owned:
- `advice cadence`: how often the ParamSet emits advice.
- `reward cadence`: how often delayed rewards are joined and scored.
- `calibration cadence`: how often the learner updates from replay/rewards.
- `promotion-review cadence`: how often mode eligibility is evaluated.
- `replacement rhythm`: the exact engine decision point where a live parameter
can replace the baseline.
For safety-critical exit parameters, replacement rhythm should usually be
`capture_on_entry` or `between_trades`, not arbitrary intratrade mutation.
### 17.2 Meta-Cadences as Governed Parameters
Meta-cadences are tunable parameters. If VIBRISS changes them, they must be
declared in the ParamSet under `meta_cadence_policy`.
Examples:
| Meta-cadence | Meaning |
|---|---|
| `replay_calibration_interval_s` | How often to re-run replay/calibration. |
| `promotion_review_interval_s` | How often to evaluate mode promotion/demotion. |
| `checkpoint_interval_s` | How often to persist learner state. |
| `min_new_rewards_before_recalibration` | Event-driven cadence threshold. |
| `shadow_to_canary_cooldown_trades` | Minimum stable evidence before live canary. |
MARAS, ExoF, EsoF, OBF, V7, MHS, and drawdown state may be context inputs for
meta-cadence advice, but the cadence learner is subject to the same evidence
rules as any other parameter learner. In particular:
- fixed cadence is the baseline;
- shadow cadence decisions must be logged with candidate set and confidence;
- replay must estimate missed-adaptation cost and false-promotion cost;
- compute/backlog cost is part of reward;
- live control of promotion cadence requires explicit manual approval.
## 18. Engine Consumption Contract
The engine must treat VIBRISS advice as optional, expiring input.
Consumption algorithm:
```text
read advice payload
validate schema and spec_version
check namespace matches runtime
check mode permits consumption
check expires_at > now
check trade_scope is current decision point
check recommendation within hard range
check guardrail_status == PASS or permitted advisory state
check fallback/catastrophic floor remains active
capture value into trade-local immutable parameter snapshot
emit consumption audit
```
For `advsl.hold_substitute.v1`, the first live contract should be:
- consume only on entry;
- store the selected hold bars in the pending/open trade state;
- do not mutate it intratrade;
- allow intratrade VIBRISS values only as shadow comparisons;
- let catastrophic floor and max-dollar floor override hold advice.
This avoids a subtle failure mode where a learner changes the hold target after
seeing adverse movement that was not available at entry. Intratrade contraction
can be researched later, but it is a different ParamSet.
## 19. Drift, Novelty, and Freezing
VIBRISS must separate three conditions:
1. data-quality degradation,
2. market/regime novelty,
3. policy underperformance.
Drift sensors:
| Sensor | Trigger |
|---|---|
| context distribution drift | MARAS/OBF/V7 feature distribution shifts versus training window. |
| reward drift | rolling reward lower than baseline beyond confidence bound. |
| regret drift | chosen arm underperforms baseline arm in shadow replay. |
| tail cluster | tail-loss or floor-hit count above historical percentile. |
| sparse regime | nearest-neighbor distance to known MARAS/OBF contexts too high. |
Actions:
- distribution drift alone: shrink toward baseline and raise uncertainty;
- reward drift: freeze learner updates and publish fallback;
- tail cluster: tighten safety floors only if pre-authorized by the ParamSet;
- sparse regime: use global safe prior, not nearest hash overfit;
- data-quality drift: stop consuming affected sensors.
VIBRISS should publish drift state in `vibriss_latest` and
`vibriss_paramset_status`.
## 20. Data Volume and Backpressure
The ClickHouse outage and spool backlog failure mode matters for VIBRISS.
Rules:
- VIBRISS must have its own spool and backlog metric.
- Advice publication must not block on ClickHouse.
- Reward collection may lag, but the lag must be visible in MHS.
- Large per-bar OBF or path arrays must not be written to hot audit tables.
- Calibration workers must rate-limit writes and should prefer compact Parquet
artifacts for heavy outputs.
- If ClickHouse spool backlog exceeds threshold, VIBRISS must degrade to
`shadow_no_update`: publish from checkpoint only, do not update learners from
partial reward data.
Recommended thresholds:
| Metric | GREEN | DEGRADED | CRITICAL |
|---|---:|---:|---:|
| decision spool backlog | `<1k` | `1k-50k` | `>50k` |
| reward backlog age | `<10m` | `10m-2h` | `>2h` |
| artifact disk free | `>20GB` | `5-20GB` | `<5GB` |
| CH write failure rate | `<1%` | `1-10%` | `>10%` |
VIBRISS must not repeat the OBF-style failure mode of letting millions of
low-priority rows delay high-priority trade/reward rows. Use priority queues:
1. decisions, rewards, policy state;
2. trade/path summary;
3. calibration summary;
4. heavy diagnostics.
## 21. Security and Operational Guardrails
Secrets:
- use existing ClickHouse user/password env pattern;
- do not write credentials into spec files;
- do not put secrets in artifact manifests.
Filesystem:
- code/spec mount is read-only inside the container;
- learner state and replay artifacts are written outside the SMB repo path;
- runner must check free disk before replay subtasks;
- no large file writes to `/mnt/dolphinng5_predict`.
Runtime:
- do not restart Hazelcast;
- do not use systemd for Dolphin services;
- use supervisord as the owner of the container process;
- if gVisor is used, treat it as a host-selected sandbox/runtime wrapper, not a
process owned by VIBRISS internals;
- worker OOM must not kill the live advice runner;
- health checks must distinguish runner alive from learner valid.
## 22. Implementation Defaults
These decisions are now recommended defaults, not open questions:
- First learner: discounted UCB for non-contextual hold-bar baseline plus LinUCB
shadow branch for MARAS/OBF/V7 context.
- First live dependency posture: internal finite-arm learners and compact
checkpointed state in the runner; no VW, OBP, ABIDES, Pyro/NumPyro, CATX, or
broad benchmark libraries in the live advice path.
- First worker dependency posture: VW, River, OBP, MABWiser, lifelines,
statsmodels, and benchmark libraries are allowed only in replay/OPE/calibration
jobs with bounded memory and artifact output.
- First drift implementation: simple internal rolling statistics plus optional
River-backed detectors if the dependency remains stable inside the runner.
- First HZ publication surface: `DOLPHIN_FEATURES["vibriss_param_advice"]` plus
dedicated keys for high-value ParamSets such as
`vibriss_hold_substitute_advice`.
- First consumption point for ADVSL hold substitute: capture-on-entry only.
- Counterfactual rewards: store as `shadow_counterfactual` with explicit
replay artifact path and no causal-propensity claim.
- Drift ownership: VIBRISS computes policy/reward drift and subscribes to MHS,
MARAS, OBF, and SurvivalStack for external drift/context.
- Container launch: use a small wrapper script under supervisord in production
so image existence, disk space, mount health, and env are checked before
`podman run` or `docker run`.
- MHS integration: prefer a generic external-sensor loader eventually, but V1
may implement a VIBRISS-specific optional sensor as long as it is neutral when
disabled.
- Infrastructure posture: keep Hazelcast + ClickHouse + supervisord for V1;
Kafka/Flink are deferred until measured event volume or recovery requirements
exceed the existing bus/audit pattern.
## 23. Open Implementation Questions
- Exact minimum sample thresholds per parameter family after the full 1.7k+
trade corpus is rebuilt under the same capital geometry.
- Whether hard `$400` floors should be a separate ParamSet or remain outside
VIBRISS as fixed safety policy.
- How to measure sub-bar TP/cadence opportunity cost in a way compatible with
bar-based ADVSL replay.
- Whether intratrade hold contraction deserves a second ParamSet after
entry-captured hold advice is validated.
- How much MC/synthetic data is statistically acceptable without overstating
confidence in rare-tail regimes.
- Whether PINK can share BLUE priors after venue slippage, fills, and exchange
state are included, or must maintain separate priors from day one.
## 24. Recommended First Build
Build VIBRISS V1 as a shadow-only package with:
- `ParamSpec` dataclasses and YAML loader.
- `ParamSetSpec` support for `advsl.hold_substitute.v1`.
- discrete UCB/Thompson learner.
- contextual LinUCB learner stub or implementation.
- advice publisher.
- ClickHouse audit writer.
- MHS-compatible sensor publisher.
- supervisord/container runner definition.
- offline replay harness for conditional fast TP and ADVSL hold bars.
- capital-aware replay and opportunity-cost accounting for the hold substitute.
- no live actuation.
Recommended package layout:
```text
/mnt/dolphinng5_predict/vibriss/
__init__.py
specs.py # ParamSpec / ParamSetSpec dataclasses and validation
context.py # HZ/CH context snapshots, masks, point-in-time joins
features.py # deterministic feature construction
learners/
__init__.py
ucb.py # discounted UCB over finite arms
thompson.py # categorical Thompson sampling
linucb.py # contextual finite-arm learner
priors.py # MARAS/label/asset/side shrinkage priors
guardrails.py # hard range, freshness, confidence, drawdown gates
advice.py # advice payload builder + schema validation
publisher.py # Hazelcast publication
audit.py # ClickHouse writer facade and spool priority
rewards.py # delayed reward joining and opportunity cost
replay/
tape.py # tape/path loading
capital_curve.py # recursive capital replay
counterfactuals.py # arm-level exit simulation
walk_forward.py # contiguous and moving-window validation
reports.py # JSON/CSV/Parquet artifact writers
runner.py # live shadow/advisory daemon
worker.py # offline subtasks
cli.py # ops commands and local replay entry points
tests/
```
V1 module responsibilities:
| Module | Must do | Must not do |
|---|---|---|
| `specs.py` | validate ranges, modes, required sensors, output surfaces | import live trader code |
| `context.py` | build point-in-time snapshots with freshness masks | fill missing market data with fake zeros |
| `features.py` | compute deterministic feature vectors | read future outcome labels |
| `learners/*` | expose `choose`, `update`, `checkpoint`, `restore` | know about ADVSL internals |
| `guardrails.py` | enforce hard safety and fallback | optimize reward |
| `advice.py` | produce schema-valid advice payloads | publish directly to HZ |
| `publisher.py` | write HZ advice and heartbeat | mutate engine state |
| `rewards.py` | join decisions to realized/counterfactual outcomes | update policy without reward status |
| `replay/*` | reproduce capital-aware backtests | depend on live HZ |
| `runner.py` | run shadow loops and MHS payloads | run full replay inline |
| `worker.py` | run heavy calibration/replay jobs | publish live advice |
Minimum local commands:
```bash
python -m vibriss.cli validate-specs \
--spec-dir /mnt/dolphin_training/vibriss/specs
python -m vibriss.cli replay \
--param-set advsl.hold_substitute.v1 \
--namespace blue \
--from 2026-05-01 --to 2026-06-04 \
--out /mnt/dolphin_training/vibriss/replays/manual
python -m vibriss.runner \
--mode shadow \
--namespace blue \
--spec-dir /mnt/dolphin_training/vibriss/specs \
--state-dir /mnt/dolphin_training/vibriss/checkpoints
```
Minimum test set:
| Test | Purpose |
|---|---|
| `test_spec_validation.py` | rejects invalid ranges, missing sensors, unsafe live policies. |
| `test_advice_schema.py` | validates HZ payloads and expiry/fallback fields. |
| `test_guardrails.py` | proves stale OBF/MARAS and drawdown alarms force fallback. |
| `test_replay_determinism.py` | same tape/spec/seed gives same capital curve. |
| `test_opportunity_cost.py` | recovered cut trades charge missed upside. |
| `test_priority_spool.py` | high-priority decision/reward rows flush before diagnostics. |
| `test_mode_state_machine.py` | promotion is manual; demotion is automatic. |
| `test_no_live_actuation_default.py` | default env cannot make engine consume advice. |
The first acceptance test is not "did it make more money in-sample." The first
acceptance test is:
1. the same historical decision can be replayed deterministically,
2. every recommended parameter has a valid spec and guardrail trail,
3. baseline fallback is used under stale/low-confidence context,
4. reward accounting includes clipped-winner opportunity cost,
5. the replayed capital curve is reproducible.
The first useful artifact is a replay bundle, not a daemon:
```text
replay_summary.json
capital_curve.csv
per_trade_counterfactuals.parquet
opportunity_cost_audit.parquet
maras_hash_hold_priors.parquet
obf_hold_binding_report.json
walk_forward_summary.json
```
Only after that bundle is reproducible should the shadow runner be started.