VIBRISS_PARAMETER_GOVERNANCE_SPEC §10.6: ob_cascade.count_threshold (currently cascade_count>0 = ONE asset widens every TP x1.40), tp_widen_factor, withdrawal_velocity_threshold as governance candidates; adaptive/Dynamic-TP threshold marked fit for VIBRISS governance; TP_FLOOR joint-policy reward requirement. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2979 lines
111 KiB
Markdown
2979 lines
111 KiB
Markdown
# VIBRISS Parameter Governance Spec
|
||
|
||
**Name**: VIBRISS — Variational Input-driven Bandit-Reactive Intelligent Sensing System
|
||
**Status**: Design doctrine / implementation target
|
||
**Scope**: BLUE/PINK parameter governance, initially shadow/advisory only
|
||
**Canonical dependency**: `SYSTEM_BIBLE_v7.md`
|
||
**Operational stance**: shadow-first, replay-first, guardrail-first. VIBRISS
|
||
must be useful even when it never gets permission to actuate live.
|
||
|
||
## 1. Purpose
|
||
|
||
VIBRISS is the engine's active parameter-sensing and adaptive execution layer.
|
||
Its job is to replace brittle hardcoded execution constants with bounded,
|
||
auditable, continuously re-evaluated parameter recommendations.
|
||
|
||
VIBRISS is not a new alpha model and not a full RL layer. It is an online
|
||
statistical parameter-governance system: observe outcomes, test safe candidate
|
||
values, score the realized response, retire weak settings, and keep enough
|
||
controlled exploration alive to detect drift.
|
||
|
||
The first intended target is exit-parameter governance, especially ADVSL and
|
||
fast/cubic TP parameters such as hold-bar limits, floor thresholds, pressure
|
||
thresholds, and TP posture. Later targets can include sizing haircuts, urgency,
|
||
asset-selection posture, and venue-specific execution parameters.
|
||
|
||
## 2. Design Stance
|
||
|
||
VIBRISS must be modular, spec-driven, replayable, and safety bounded.
|
||
|
||
Key doctrine:
|
||
|
||
- One learner per parameter spec by default.
|
||
- Bundle/slate learning only after interaction effects are repeatedly material.
|
||
- Contextual bandits first; full RL only later if decisions are truly sequential
|
||
and materially coupled across multiple execution steps.
|
||
- Discrete and bucketed parameters use Thompson Sampling, UCB, LinTS, or LinUCB.
|
||
- Continuous bounded scalars are discretized into safe buckets first.
|
||
- Nonstationary behavior uses discounted or sliding-window evidence plus drift
|
||
detection.
|
||
- Safety-critical parameters require baseline-safe exploration, confidence
|
||
thresholds, step limits, cooldowns, and hard guardrails.
|
||
- Passive fill and time-to-fill decisions should use survival-analysis modules
|
||
where censoring matters.
|
||
|
||
## 3. System Boundary
|
||
|
||
VIBRISS must not silently mutate engine internals.
|
||
|
||
The correct production shape is:
|
||
|
||
```text
|
||
context ingestion
|
||
-> admissible candidate generation
|
||
-> learner scoring
|
||
-> guardrail filter
|
||
-> action selection
|
||
-> advice publication
|
||
-> allowed engine consumption point
|
||
-> delayed outcome capture
|
||
-> reward mapping
|
||
-> online update
|
||
```
|
||
|
||
The hot execution path consumes advice only at documented decision points. The
|
||
learner/update path is separate and may lag. If advice is stale, low-confidence,
|
||
or invalid, the engine falls back to the baseline parameter.
|
||
|
||
BLUE is in-memory/paper and not BingX-enabled. PINK is the BingX venue-facing
|
||
world. VIBRISS may govern both, but its output contract must be namespace-aware
|
||
and must not assume that BLUE has exchange state.
|
||
|
||
Non-goals:
|
||
|
||
- VIBRISS does not pick assets.
|
||
- VIBRISS does not replace MARAS, OBF, V7, ACB, EFSM, or SurvivalStack.
|
||
- VIBRISS does not own exchange reconciliation.
|
||
- VIBRISS does not rewrite frozen champion configs.
|
||
- VIBRISS does not turn offline backtest winners into live settings without
|
||
a shadow/OPE/promotion path.
|
||
|
||
Its only authority is to publish bounded, versioned parameter advice and to
|
||
learn from the outcome trail.
|
||
|
||
## 4. Terminology
|
||
|
||
| Term | Meaning |
|
||
|---|---|
|
||
| `vibrissa` | One probe-trade, parameter test, or market feeler. |
|
||
| `vibrissae` | The active parameter-probe array. |
|
||
| `parameter spec` | Loadable contract defining one tunable parameter. |
|
||
| `arm` | One candidate value or execution configuration. |
|
||
| `reward` | Bounded realized execution-quality score. |
|
||
| `posture` | Current preferred parameter set plus confidence and fallback metadata. |
|
||
| `baseline` | The currently trusted hardcoded or documented production value. |
|
||
|
||
## 4.1 Control-Plane Elegance Constraints
|
||
|
||
VIBRISS must remain a disciplined parameter-governance control plane, not an
|
||
unbounded mesh of subsystems mutating each other. Adaptive behavior is allowed
|
||
only when it preserves ownership, auditability, and bounded actuation.
|
||
|
||
Hard architecture rules:
|
||
|
||
1. One writer per parameter.
|
||
- A live parameter may have many sensors and many context inputs, but only
|
||
one ParamSet is allowed to publish the effective value for that parameter
|
||
in a given namespace.
|
||
|
||
2. ParamSpecs and ParamSetSpecs own promotion rules.
|
||
- Promotion cadence, evidence gates, rollback rules, manual-approval
|
||
requirements, and replacement rhythm are part of the spec. The runner must
|
||
execute declared policy, not invent policy.
|
||
|
||
3. Meta-cadence is itself a parameter, but only at a slower cadence.
|
||
- VIBRISS may tune replay cadence, promotion-review cadence, checkpoint
|
||
cadence, or reward-join cadence, but those meta-parameters must move more
|
||
slowly than the governed trading/execution parameter and must have
|
||
stronger guardrails.
|
||
|
||
4. EsoF, ExoF, MARAS, OBF, V7, MHS, and drawdown state are context inputs, not
|
||
arbitrary controllers.
|
||
- They may influence candidate scoring, confidence, demotion, or fallback,
|
||
but they must not directly mutate live parameters outside the owning
|
||
ParamSet.
|
||
|
||
5. Every live change must be reproducible.
|
||
- Log candidate set, chosen action, action probability or confidence,
|
||
context hash, reward mapping, model version, compiled config hash,
|
||
fallback reason, promotion state, and rollback path.
|
||
|
||
6. No hidden cross-subsystem mutation.
|
||
- If one subsystem changes another subsystem's effective behavior, the change
|
||
must appear as a typed ParamSet advice event and an audited engine-consumed
|
||
posture update.
|
||
|
||
7. Shadow first, replay/OPE second, canary third, live last.
|
||
- No safety-critical parameter may skip directly from idea or in-sample
|
||
replay to live actuation. Live promotion requires held-out evidence,
|
||
shadow logging, explicit approval when required, and automatic demotion
|
||
conditions.
|
||
|
||
These constraints are mandatory for all future ADVSL, TP, DVOL/VOL, IRP,
|
||
asset-picker, EFSM/overlay, and meta-cadence ParamSets. If a design violates
|
||
them, the design is considered tangled and must be simplified before
|
||
implementation.
|
||
|
||
## 5. Parameter Spec Contract
|
||
|
||
Each adaptive parameter must be declared by a loadable spec. VIBRISS should not
|
||
hardcode knowledge of individual parameters.
|
||
|
||
Important terminology:
|
||
|
||
- `ParamSetSpec`: the loadable contract for a family of related parameters.
|
||
- `paramset_config`: configuration that applies to the ParamSet as a whole.
|
||
- `params`: the parameter declarations contained by the ParamSet.
|
||
- `param_defaults`: defaults inherited by every parameter in `params`.
|
||
- per-param override: a field inside one `params.<param_name>` entry that
|
||
overrides `param_defaults` for that parameter only.
|
||
|
||
The live runner must not perform complex inheritance during scoring. Specs are
|
||
authored in a rich hierarchical form, validated, compiled, and hash-stamped into
|
||
a flat canonical policy document before the runner consumes them.
|
||
|
||
Required fields:
|
||
|
||
```yaml
|
||
identity:
|
||
name: advsl.overlay_min_hold_bars
|
||
type: integer
|
||
units: bars
|
||
default: 6
|
||
|
||
domain:
|
||
candidates: [4, 6, 8, 10, 12, 16, 20]
|
||
hard_min: 0
|
||
hard_max: 40
|
||
|
||
safety:
|
||
fallback_baseline: 6
|
||
max_step_change: 4
|
||
cooldown_trades: 5
|
||
min_shadow_samples: 100
|
||
min_live_confidence: 0.80
|
||
max_exploration_rate: 0.05
|
||
|
||
placement:
|
||
consumer: advanced_sl
|
||
decision_point: open_trade_exit_evaluation
|
||
namespace: blue
|
||
|
||
live_change_policy:
|
||
mode: between_trades
|
||
allow_intratrade_change: false
|
||
|
||
candidate_policy:
|
||
learner: linucb
|
||
nonstationarity: sliding_window
|
||
window_trades: 300
|
||
|
||
success:
|
||
primary_metric: capital_curve_delta_after_cost
|
||
secondary_metrics:
|
||
- clipped_winner_cost
|
||
- saved_loss
|
||
- drawdown_delta
|
||
- recovery_lag
|
||
|
||
inputs:
|
||
- maras_latest
|
||
- v7_decision_events
|
||
- advanced_sl_monitor_latest
|
||
- obf_universe_latest
|
||
- eigen_scan
|
||
- trade_path
|
||
|
||
reward_mapping:
|
||
bounded_range: [-1.0, 1.0]
|
||
delayed_until: trade_close_or_counterfactual_terminal
|
||
components:
|
||
saved_loss: +1.0
|
||
missed_profit: -1.5
|
||
drawdown_reduction: +0.5
|
||
tail_loss: -2.0
|
||
|
||
promotion_policy:
|
||
owner: param_set
|
||
technique: replay_shadow_canary
|
||
review_cadence_s: 900
|
||
min_replay_trades: 300
|
||
min_shadow_decisions: 200
|
||
min_realized_rewards: 50
|
||
min_contiguous_regions: 4
|
||
required_evidence:
|
||
recursive_capital_curve_delta_after_cost: "> 0"
|
||
worst_region_delta: ">= configured_floor"
|
||
clipped_winner_cost: "<= configured_budget"
|
||
drawdown_delta: "<= 0"
|
||
allowed_transitions:
|
||
- disabled_to_shadow
|
||
- shadow_to_advisory
|
||
- advisory_to_canary_live
|
||
- canary_live_to_controlled_live
|
||
manual_approval_required:
|
||
- advisory_to_canary_live
|
||
- canary_live_to_controlled_live
|
||
automatic_demotion_on:
|
||
- stale_required_sensor
|
||
- reward_drift
|
||
- drawdown_alarm
|
||
- invalid_checkpoint
|
||
|
||
meta_cadence_policy:
|
||
owner: param_set
|
||
status: shadow_first
|
||
tunable_cadences:
|
||
calibration_interval_s: [300, 900, 1800, 3600]
|
||
promotion_review_interval_s: [900, 1800, 3600, 7200]
|
||
checkpoint_interval_s: [30, 60, 120, 300]
|
||
shadow_to_canary_cooldown_trades: [25, 50, 100, 200]
|
||
context_inputs:
|
||
- maras_latest
|
||
- exof_latest
|
||
- esof_latest
|
||
- mhs_latest
|
||
- reward_backlog
|
||
- drawdown_state
|
||
success:
|
||
primary_metric: policy_stability_adjusted_reward
|
||
secondary_metrics:
|
||
- stale_advice_rate
|
||
- promotion_false_positive_rate
|
||
- missed_adaptation_cost
|
||
- operator_churn
|
||
- compute_cost
|
||
live_change_policy:
|
||
calibration_cadence: controlled_after_shadow
|
||
promotion_cadence: advisory_only_until_explicit_approval
|
||
|
||
outputs:
|
||
hz_key: DOLPHIN_FEATURES.vibriss_param_advice
|
||
clickhouse_table: dolphin.vibriss_decisions
|
||
state_table: dolphin.vibriss_policy_state
|
||
```
|
||
|
||
### 5.1 ParamSet Config and Per-Parameter Overrides
|
||
|
||
The canonical authoring shape is:
|
||
|
||
```yaml
|
||
param_set:
|
||
id: advsl.hold_substitute.v1
|
||
version: 1.0.0
|
||
namespace_default: blue
|
||
status: shadow_first
|
||
|
||
paramset_config:
|
||
consumer: advanced_sl
|
||
decision_family: exit_risk_timing
|
||
placement:
|
||
decision_point: trade_entry
|
||
live_replacement_rhythm: capture_on_entry
|
||
promotion_policy:
|
||
technique: replay_shadow_canary
|
||
review_cadence_s: 1800
|
||
meta_cadence_policy:
|
||
status: shadow_first
|
||
outputs:
|
||
hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice
|
||
decision_table: dolphin.vibriss_decisions
|
||
reward_table: dolphin.vibriss_rewards
|
||
|
||
param_defaults:
|
||
learner:
|
||
type: discounted_ucb
|
||
nonstationarity: sliding_window
|
||
window_trades: 300
|
||
safety:
|
||
fallback_baseline: 12
|
||
min_shadow_samples: 200
|
||
min_live_confidence: 0.80
|
||
max_exploration_rate: 0.0
|
||
reward_mapping:
|
||
bounded_range: [-1.0, 1.0]
|
||
primary_metric: recursive_capital_curve_delta_after_cost
|
||
guardrails:
|
||
stale_sensor_policy: shrink_to_baseline
|
||
drawdown_alarm_policy: freeze_to_baseline
|
||
|
||
params:
|
||
advsl.min_hold_bars_before_floor_arm:
|
||
type: integer
|
||
units: bars
|
||
domain:
|
||
candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]
|
||
hard_min: 0
|
||
hard_max: 48
|
||
default: 12
|
||
baseline_reference: 20
|
||
|
||
advsl.recovery_extension_max_bars:
|
||
type: integer
|
||
units: bars
|
||
domain:
|
||
candidates: [0, 4, 8, 12, 20, 34]
|
||
hard_min: 0
|
||
hard_max: 40
|
||
default: 0
|
||
learner:
|
||
type: shadow_only_discounted_ucb
|
||
safety:
|
||
min_shadow_samples: 500
|
||
min_live_confidence: 0.90
|
||
```
|
||
|
||
Merge precedence:
|
||
|
||
```text
|
||
compiled_param =
|
||
built_in_schema_defaults
|
||
< paramset_config
|
||
< param_defaults
|
||
< params.<param_name>
|
||
< namespace/runtime override if explicitly allowed by spec
|
||
```
|
||
|
||
Rules:
|
||
|
||
- ParamSet-wide promotion and meta-cadence policy live in `paramset_config`
|
||
unless a parameter explicitly overrides a narrower field.
|
||
- Per-param overrides may tighten safety, narrow domains, increase sample
|
||
requirements, or change learner type only if the ParamSet allows it.
|
||
- Per-param overrides may not weaken global catastrophic guardrails.
|
||
- The compiler must emit both the original source spec hash and the compiled
|
||
canonical hash.
|
||
- The runner consumes only the compiled canonical form.
|
||
|
||
### 5.2 Spec Compiler and Validation Library
|
||
|
||
Use an existing platform-agnostic schema/config tool for the authoring layer.
|
||
Do not invent a bespoke inheritance language.
|
||
|
||
Recommended stance:
|
||
|
||
| Need | Recommended tool | Runtime placement |
|
||
|---|---|---|
|
||
| Cross-language schema contract | JSON Schema | CI, compiler, runner validation. |
|
||
| Rich defaults, constraints, unification, inheritance-like config | CUE | Spec compiler / CI, not hot path. |
|
||
| Human-friendly authoring | YAML | Source only; compiled immediately. |
|
||
| Runner consumption | canonical JSON | Hot path. |
|
||
| Fast internal representation | dataclass / Pydantic / msgspec-style object | Runner load time only. |
|
||
|
||
VIBRISS should prefer:
|
||
|
||
```text
|
||
YAML authoring -> CUE/JSON-Schema validation -> canonical JSON -> runner cache
|
||
```
|
||
|
||
The live runner should never parse CUE, run template expansion, or resolve a
|
||
large inheritance tree during an advice decision. It should load a precompiled
|
||
canonical JSON document, verify hashes and schema version, then use direct field
|
||
access.
|
||
|
||
Performance requirements:
|
||
|
||
- spec compile can be slower because it is CI/worker time;
|
||
- runner spec load should be bounded and rare;
|
||
- advice scoring must use already-merged values;
|
||
- every compiled ParamSet must include a deterministic `compiled_config_hash`;
|
||
- all advice/audit rows must log `spec_hash` and `compiled_config_hash`.
|
||
|
||
## 6. Candidate Algorithms
|
||
|
||
V1 should support a small set of algorithms well, rather than a broad library
|
||
surface poorly.
|
||
|
||
Recommended V1 learners:
|
||
|
||
| Parameter type | Default learner | Notes |
|
||
|---|---|---|
|
||
| Small categorical | Thompson Sampling | Useful for urgency, route, retry, fixed mode selection. |
|
||
| Ordered discrete scalar | UCB or discounted UCB | Good for hold bars, TP buckets, pressure thresholds. |
|
||
| Contextual finite arms | LinUCB or LinTS | First choice for MARAS/OBF/V7-conditioned advice. |
|
||
| Continuous scalar | Adaptive discretization | Start bucketed; upgrade only if buckets are too coarse. |
|
||
| Passive fill/delay | Survival model | Explicitly handle censored fill and recovery windows. |
|
||
|
||
Useful libraries to inspect:
|
||
|
||
- Vowpal Wabbit for contextual bandits, logged propensities, and OPE.
|
||
- River for streaming statistics, online GLMs, and drift detection.
|
||
- Open Bandit Pipeline for offline policy evaluation.
|
||
- MABWiser for fast Python prototype comparison.
|
||
- lifelines or statsmodels for survival analysis.
|
||
- NumPyro/Pyro only when hierarchical Bayesian pooling is justified.
|
||
|
||
### 6.1 Dependency Placement and Reliability Policy
|
||
|
||
VIBRISS must distinguish algorithm research from live parameter governance.
|
||
Performance and reliability are more important than using the most general
|
||
library in the first live version.
|
||
|
||
Dependency rule:
|
||
|
||
- The live runner should have a small deterministic dependency surface.
|
||
- Heavy learning, OPE, simulation, Bayesian inference, and broad model
|
||
comparison belong in `vibriss_worker` or offline jobs.
|
||
- The engine consumes compact checkpointed policy state and advice payloads. It
|
||
must not shell out to a learner or wait on an offline library.
|
||
- ClickHouse writes, model updates, and replay jobs must never block the hot
|
||
advice publication loop.
|
||
- If a dependency is not needed to score the current checkpointed policy, it is
|
||
not a live-runner dependency.
|
||
|
||
Recommended V1 split:
|
||
|
||
| Layer | Allowed dependency posture | Reason |
|
||
|---|---|---|
|
||
| Engine hot path | no VIBRISS learner dependency | Engine reads validated advice only. |
|
||
| `vibriss_runner` | stdlib + NumPy/Pandas only if needed; optional River subset for drift/stats | Keep startup, memory, and failure modes bounded. |
|
||
| `vibriss_worker` | VW, River, OBP, MABWiser, lifelines, statsmodels, contextual libraries | Calibration, OPE, replay, walk-forward, and report generation. |
|
||
| Research/simulation | ABIDES, Pyro/NumPyro, CATX, experimental packages | Valuable, but not part of the live critical path. |
|
||
|
||
### 6.2 Library Decision Matrix
|
||
|
||
| Library / stack | VIBRISS use | Placement | Decision |
|
||
|---|---|---|---|
|
||
| Internal UCB/TS/LinUCB | First production learners for bounded discrete arms. | runner + worker | Use first; easiest to audit and checkpoint. |
|
||
| Vowpal Wabbit | Contextual bandit benchmark, action-dependent features, OPE workflows, possible future compact policy generator. | worker/offline | Approved for evaluation; not a V1 hot-path dependency. |
|
||
| River | Streaming stats, reward normalization, ADWIN/Page-Hinkley/KSWIN-style drift detection, progressive validation. | runner optional; worker default | Approved, but keep live usage narrow. |
|
||
| Open Bandit Pipeline | OPE estimator benchmarking and logged-bandit evaluation. | offline/worker | Approved for reports; not live. |
|
||
| MABWiser | Fast Python comparison of TS/UCB/LinTS/LinUCB policies. | offline/worker | Approved for prototyping; not live. |
|
||
| lifelines / statsmodels | Survival models, recursive diagnostics, stability checks. | worker/offline | Approved for passive fill/recovery modeling. |
|
||
| contextualbandits | Alternative contextual-bandit benchmark implementations. | offline/worker | Research benchmark only. |
|
||
| SMPyBandits / BanditPylib / PyBandits | Algorithm comparison and stochastic-bandit sandboxing. | offline/research | Optional; do not add to live image. |
|
||
| NumPyro / Pyro | Hierarchical Bayesian pooling for sparse per-symbol/per-hash modules. | research/worker | Defer until sparse-data pooling is clearly needed. |
|
||
| CATX | Continuous-action contextual bandit research. | research | Defer; bucketed actions first. |
|
||
| ABIDES / ABIDES-Gym | Market-interactive simulation and stress rehearsal. | research/simulation | Useful later; too heavy for V1 runner. |
|
||
| Kafka / Flink | Durable event-stream backbone and stateful stream processing. | future infra | Defer; Dolphin already has Hazelcast + ClickHouse + supervisord. |
|
||
| scikit-multiflow | Historical stream-learning reference. | none | Do not use for net-new code; prefer River. |
|
||
| banditml | Architectural reference for production bandit services. | research only | Do not depend on it without a fresh maintenance review. |
|
||
|
||
### 6.3 Performance Budgets
|
||
|
||
Initial budgets for the live runner:
|
||
|
||
| Operation | Target | Hard behavior on miss |
|
||
|---|---:|---|
|
||
| Score one ParamSet advice snapshot | `p95 <= 10 ms` | publish fallback or previous checkpoint. |
|
||
| Full live advice loop over enabled ParamSets | `p95 <= 50 ms` | skip noncritical ParamSets first. |
|
||
| Hazelcast publish | nonblocking best effort | mark advice degraded if publish fails. |
|
||
| ClickHouse audit write | never blocks advice | spool locally and expose backlog. |
|
||
| Runner startup with warm checkpoint | `<5 s` target | publish no advice until checkpoint valid. |
|
||
| Memory footprint | bounded and observable | disable worker-style models in runner. |
|
||
|
||
Candidate sets must stay small. For `advsl.hold_substitute.v1`, a dozen finite
|
||
hold-bar arms is acceptable; hundreds of arms are not. Continuous-action
|
||
learners are disallowed in live V1 because they make bounded behavior harder to
|
||
audit and harder to replay exactly.
|
||
|
||
### 6.4 Algorithm Defaults by Parameter Class
|
||
|
||
Concrete defaults:
|
||
|
||
| Parameter situation | Default | Upgrade path | Notes |
|
||
|---|---|---|---|
|
||
| Small finite categorical, weak context | Thompson Sampling or UCB1 | discounted UCB if drift appears | Use for mode, urgency, route, retry-like knobs. |
|
||
| Ordered discrete scalar | discounted UCB with monotone/smoothness diagnostics | contextual finite-arm learner | Good first fit for hold bars and TP buckets. |
|
||
| Finite arms with rich context | LinUCB or LinTS | GLM-UCB/GLM-TS if reward shape demands it | Use MARAS/OBF/V7/EFSM context. |
|
||
| Continuous bounded scalar | adaptive discretization | continuous-action contextual bandit only after bucket failure | Prefer auditability over fine resolution. |
|
||
| Coupled parameter bundle | small safe bundle catalog | slate/combinatorial learner only if interaction is proven | Avoid action-space explosion. |
|
||
| Nonstationary regime | discounted/sliding-window learner + drift detector | replay-reset logic | Freeze or shrink on drift; do not blindly chase. |
|
||
| Safety/budget constrained parameter | baseline-safe gating around the learner | conservative contextual bandit / budgeted bandit | Guardrails must dominate learner output. |
|
||
| Passive fill or recovery delay | survival model | richer survival only after classical model stability | Treat censoring explicitly. |
|
||
|
||
### 6.5 Explicit Deferrals
|
||
|
||
VIBRISS V1 should not attempt:
|
||
|
||
- full RL;
|
||
- continuous-action live control;
|
||
- live probe trades by default;
|
||
- Kafka/Flink migration;
|
||
- ABIDES-in-the-loop production scoring;
|
||
- hierarchical Bayesian pooling in the runner;
|
||
- joint optimization of many parameters before single-ParamSet evidence exists.
|
||
|
||
These are not rejected ideas. They are deferred because the current bottleneck is
|
||
reliable evidence collection, replay/OPE discipline, and safe advice
|
||
publication.
|
||
|
||
## 7. Reward Design
|
||
|
||
Rewards must be decomposed, bounded, and auditable. Store both raw components
|
||
and normalized reward.
|
||
|
||
Typical reward components:
|
||
|
||
- positive: saved loss, lower drawdown, better realized terminal PnL, better
|
||
capital compounding trajectory, successful recovery without excess hold.
|
||
- negative: clipped winner, missed TP, extra adverse selection, slippage, timeout,
|
||
excessive hold, larger tail loss, oscillation, stale-data actuation.
|
||
|
||
For ADVSL/TP research, the primary reward should be capital-curve delta after
|
||
opportunity cost, not terminal trade PnL alone. A rule that saves losses but
|
||
systematically clips larger winners must be penalized accordingly.
|
||
|
||
## 8. Required Audit Logging
|
||
|
||
Every VIBRISS decision must be replayable.
|
||
|
||
Minimum decision log fields:
|
||
|
||
- timestamp and scan number
|
||
- namespace: blue, pink, prodgreen, research
|
||
- parameter spec id and version
|
||
- context snapshot hash
|
||
- MARAS regime, scalar hash, composite hash when available
|
||
- candidate set
|
||
- chosen arm
|
||
- action probability or confidence
|
||
- baseline value
|
||
- guardrail decisions and fallback reason
|
||
- model version
|
||
- advice publication timestamp
|
||
- engine consumption timestamp, if consumed
|
||
- delayed reward components
|
||
- terminal reward
|
||
- policy update version
|
||
|
||
## 9. Control-Plane Output
|
||
|
||
VIBRISS publishes advice, not imperative mutations.
|
||
|
||
Recommended HZ shape:
|
||
|
||
```json
|
||
{
|
||
"schema": "vibriss.param_advice.v1",
|
||
"namespace": "blue",
|
||
"ts": "2026-06-03T00:00:00Z",
|
||
"spec_id": "advsl.overlay_min_hold_bars",
|
||
"spec_version": "1.0.0",
|
||
"baseline_value": 6,
|
||
"recommended_value": 12,
|
||
"confidence": 0.82,
|
||
"candidate_set": [4, 6, 8, 10, 12, 16, 20],
|
||
"context_hash": "maras:57957|asset:XLMUSDT|side:LONG",
|
||
"learner": "linucb",
|
||
"guardrail_status": "PASS",
|
||
"fallback_reason": null,
|
||
"expires_at": "2026-06-03T00:05:00Z"
|
||
}
|
||
```
|
||
|
||
Consumption rule: the engine may consume this only if the parameter spec says
|
||
the current state is an allowed change point and all guardrails pass. Otherwise
|
||
the baseline remains in force.
|
||
|
||
## 10. Initial VIBRISS Targets
|
||
|
||
### 10.1 Conditional Fast TP
|
||
|
||
First replay-backed target:
|
||
|
||
- `fast_tp.tp_pct`
|
||
- `fast_tp.bars_held_min`
|
||
- `fast_tp.exit_pressure_min`
|
||
- `fast_tp.mfe_decay_min`
|
||
- `fast_tp.pnl_mfe_frac_max`
|
||
|
||
Current evidence says blanket first-touch `0.20%` TP clips too many winners, but
|
||
conditional fast TP is net positive in both full corpus and capital-known BLUE
|
||
subset. The first VIBRISS job is to turn those calibrated constants into a
|
||
shadow policy with logged propensities and OOS replay.
|
||
|
||
This TP percentage is a prime VIBRISS assistance target. Treat it as a
|
||
first-class tunable rather than a frozen constant once replay coverage is
|
||
sufficient.
|
||
|
||
Open research note:
|
||
|
||
- investigate whether the `0.20%` TP should be risk-normalized by notional
|
||
risked, using a monotone nonlinearity such as a cubic retract/expansion curve;
|
||
- the candidate question is whether high-notional or high-leverage trades should
|
||
have a proportionally different TP posture, while keeping the first-touch
|
||
semantics intact for replay accounting;
|
||
- if tested, this must be evaluated with full capital-curve compounding and
|
||
opportunity cost, not just raw win-rate or per-trade PnL.
|
||
|
||
#### 10.1.1 Re-entry-Conditioned Fast TP
|
||
|
||
Same-asset reentries after a profitable exit are a separate research bucket.
|
||
They should not inherit the exact same fast-TP posture as a first-entry trade
|
||
without evidence. In current BLUE history, same-asset reentries after wins are
|
||
usually profitable, but the average second-leg move is smaller than the initial
|
||
leg, which means a lower TP multiplier may preserve geometry better than a blunt
|
||
`2.0x` repeat.
|
||
|
||
Recommended candidate arms:
|
||
|
||
- `fast_tp.reentry_tp_multiplier = 1.2`
|
||
- `fast_tp.reentry_tp_multiplier = 1.5`
|
||
- `fast_tp.reentry_tp_multiplier = 2.0`
|
||
|
||
Interpretation:
|
||
|
||
- first-entry trades keep the baseline conditional fast TP
|
||
- re-entry-after-win trades may use a smaller multiplier band
|
||
- re-entry-after-loss trades should remain a separate bucket and may need a
|
||
slower TP or stronger confirmation, not just a smaller multiplier
|
||
- a mild nonlinear / cubic trim on re-entry is a valid shadow-only follow-up
|
||
candidate, but only after the flat multiplier band has been replayed first
|
||
|
||
Ownering rule:
|
||
|
||
- VIBRISS should learn and score the candidate multiplier in shadow replay
|
||
- EFSM should own live application if the runtime ever consumes the bucket
|
||
- do not flatten the geometric ROI curve by forcing a single multiplier on all
|
||
reentries
|
||
|
||
#### 10.1.2 TP Near-Miss Replay
|
||
|
||
The TP research set must include a distinct near-miss population:
|
||
|
||
- trades that came within a small epsilon of the candidate TP but did not
|
||
satisfy the live trigger on the observed cadence
|
||
- trades that briefly exceeded the candidate TP and then reversed before the
|
||
engine observed the touch
|
||
- trades that later stopped out after first-touch proximity, because those are
|
||
the exact counterexamples needed to learn whether a lower TP bucket would
|
||
have been better
|
||
|
||
This bucket is mandatory because a corpus dominated by profitable TP closes is
|
||
survivorship-biased. A learner trained only on winners can learn that the
|
||
current TP is "usually profitable" while remaining blind to the trades where a
|
||
slightly lower TP would have caught the move and prevented a later stop-loss.
|
||
|
||
Required replay semantics:
|
||
|
||
- use first-touch TP labels, not close-only labels
|
||
- keep near-miss candidates separate from clean TP hits
|
||
- score each candidate by recursive capital-curve delta after opportunity cost
|
||
- preserve scan-cadence effects when the live engine is scan-driven
|
||
|
||
Primary use:
|
||
|
||
- learn whether a tighter TP bucket is justified for specific regimes, assets,
|
||
or reentry conditions
|
||
- quantify the opportunity cost of the missed touch itself, not just the later
|
||
realized close
|
||
- explain repeated "why did this one not TP?" incidents without overfitting to
|
||
already-winning trades
|
||
|
||
### 10.2 ADVSL Hold/Floor
|
||
|
||
Second target:
|
||
|
||
- `advsl.base_catastrophic_floor_pct`
|
||
- `advsl.overlay_catastrophic_floor_pct`
|
||
- `advsl.overlay_max_loss_usd`
|
||
- `advsl.overlay_min_hold_bars`
|
||
- `advsl.overlay_pressure_min`
|
||
- `advsl.overlay_mae_risk_min`
|
||
|
||
This is safety-critical. VIBRISS may advise, but live application requires
|
||
strong guardrails, bounded step changes, and explicit fallback to the current
|
||
documented ADVSL values.
|
||
|
||
Floor percentage is also a prime VIBRISS assistance target, but it must stay
|
||
outside the learner’s ability to disable the catastrophic floor entirely.
|
||
|
||
Hard safety ceiling:
|
||
|
||
- the operator may define a non-negotiable max-loss ceiling per trade, per leg,
|
||
or per session
|
||
- this ceiling is distinct from the replay optimum and distinct from the
|
||
learner’s preferred floor/TP/hold posture
|
||
- if a candidate policy exceeds the ceiling, the ceiling wins even when the
|
||
replayed recursive capital curve would otherwise look better
|
||
- VIBRISS may tune inside the ceiling, but it must not optimize the ceiling
|
||
away, relax it implicitly, or treat operator pain tolerance as a soft signal
|
||
|
||
### 10.3 MARAS-Conditioned Hold Bars
|
||
|
||
Third target:
|
||
|
||
- per-hash or per-regime hold-bar posture
|
||
- per-label bias around known hash medians
|
||
- OBF-conditioned hold extension or contraction
|
||
|
||
Do not use MARAS labels as hard filters. Labels such as CHOPPY can contain both
|
||
many wins and severe losses. Use the composite hash, raw signature dimensions,
|
||
confidence, conflict, and nearest-neighbor regime evidence as context features.
|
||
|
||
### 10.4 DVOL/VOL Gate and Trade-Pause Posture
|
||
|
||
Candidate carefulness-critical target:
|
||
|
||
- `entry_gate.dvol_threshold`
|
||
- `entry_gate.vol_open_persistence_bars`
|
||
- `entry_gate.min_qualified_cross_rate`
|
||
- `entry_gate.pick_latency_pause_s`
|
||
- `entry_gate.open_gate_no_pick_pause_score`
|
||
|
||
This target exists because a VOL/DVOL gate can be technically open while the
|
||
engine still sees low-quality entry conditions: few accepted threshold crosses,
|
||
weak asset-pick evidence, or no fresh accepted pick after a normally sufficient
|
||
latency window.
|
||
|
||
The first useful derived sensor is:
|
||
|
||
```text
|
||
open_gate_no_pick_pause_score =
|
||
VOL/DVOL gate open
|
||
+ low recent vel_div threshold-cross density
|
||
+ no accepted entry for expected_pick_latency_s
|
||
+ neutral/hostile EsoF/ExoF/MARAS context
|
||
+ no evidence of stale scans or halted runtime
|
||
```
|
||
|
||
This must not be treated as an urgent kill switch by default. It is a
|
||
carefulness parameter: VIBRISS should first log it, correlate it with later
|
||
trade quality, and test whether it predicts profitable trade pauses or smaller
|
||
position sizing. The baseline is no pause beyond current gate logic.
|
||
|
||
Related empirical TODOs:
|
||
|
||
- Reconsider `min_irp_alignment=0.0` empirically. The live gold config disables
|
||
the IRP alignment filter, but the larger current corpus may now be sufficient
|
||
to retest whether a nonzero IRP alignment floor improves asset-pick quality.
|
||
- Examine whether the apparent `VOL open / no immediate pick` condition is a
|
||
useful trade-pause state or simply the expected effect of the stricter
|
||
effective signal-strength gate (`vel_div < about -0.03`).
|
||
- Initial live observation: recent quiet after the last known good picks appears
|
||
protective rather than broken. This must be tested with opportunity cost:
|
||
measure what the system avoided during quiet periods and what it missed by not
|
||
entering.
|
||
- Examine whether MARAS composite hashes need more granularity: more distinct
|
||
market-descriptive buckets while preserving the sortable scalar hash and
|
||
nearest-neighbor/similarity behavior.
|
||
|
||
### 10.5 Capital-Protect / Profit-Lock
|
||
|
||
Fourth target:
|
||
|
||
- `capital.protect_arm_threshold_pct`
|
||
- `capital.protect_full_threshold_pct`
|
||
- `capital.protect_tp_min_multiplier`
|
||
- `capital.protect_cubic_coeff`
|
||
- `capital.protect_reset_drawdown_pct`
|
||
- `capital.protect_hysteresis_bars`
|
||
- reset family selector: `capital.protect_reset_mode`
|
||
- time-based reset controls: `capital.protect_reset_time_trades`, `capital.protect_reset_time_seconds`
|
||
- regime/hash reset controls: `capital.protect_reset_regime_whitelist`, `capital.protect_reset_fingerprint_whitelist`
|
||
- sc-EsoF reset controls: `capital.protect_reset_sc_floor`, `capital.protect_reset_sc_neutral_floor`, `capital.protect_reset_sc_positive_floor`
|
||
|
||
This is the profit-protect / peak-lock family. The idea is not to mute risk
|
||
management, but to preserve capital once the day/session has already become
|
||
meaningfully profitable. The study must test whether a gain threshold such as
|
||
`1.2%`, `2.3%`, `3.3%`, ... should arm a more conservative TP posture for
|
||
subsequent trades, and whether a cubic trim on the TP multiplier is better than
|
||
an abrupt step change.
|
||
|
||
Required policy questions:
|
||
|
||
- what profit threshold should arm the protect state
|
||
- how quickly TP should tighten once the threshold is crossed
|
||
- whether the tighten curve should be cubic, stepped, or mixed
|
||
- when the protect state must reset
|
||
- how much drawdown from the protected peak is required to disarm
|
||
- how many bars/trades of hysteresis are needed before a reset is valid
|
||
- whether reset should be keyed to time, regime, known fingerprint, sc-EsoF, or mixed logic
|
||
- whether reset should use a whitelist gate or a change-detection gate for regime/fingerprint families
|
||
|
||
The baseline reset rule should be conservative:
|
||
|
||
- arm only after the gain threshold is crossed on the recursive capital curve
|
||
- keep the lock until a real drawdown-from-peak or day/session reset occurs
|
||
- do not reset on a single noisy bar if the protected peak is still intact
|
||
|
||
This target must be evaluated against:
|
||
|
||
- recursive capital-curve delta after opportunity cost
|
||
- clipped-winner cost from over-tightening
|
||
- saved-loss from avoiding giveback after the day is already up
|
||
- win-return statistics after the arm event
|
||
- ceiling-violation count, because the profit protect should never create an
|
||
implicit max-loss escape hatch
|
||
|
||
It is especially important to compare:
|
||
|
||
- flat threshold steps vs cubic tightening
|
||
- no hysteresis vs bar-count hysteresis
|
||
- immediate reset vs drawdown-based reset
|
||
- day-reset vs rolling-session reset
|
||
|
||
The tape should be replayed on the same capital curve used by the live engine,
|
||
so the protect state is evaluated recursively, not from a fixed post-hoc label.
|
||
|
||
### 10.6 OB Cascade TP-Modulation (added 2026-06-12, LINK 5e05eeeb post-mortem)
|
||
|
||
Candidate carefulness-critical target — the parameters of the OB
|
||
tail-avoidance layer in `alpha_exit_manager.evaluate()` that silently
|
||
modulate the "fixed" TP:
|
||
|
||
- `ob_cascade.count_threshold` — number of assets withdrawing liquidity
|
||
(depth withdrawal velocity < CASCADE_THRESHOLD) required to enter cascade
|
||
mode. **Currently hardcoded as `cascade_count > 0`, i.e. a SINGLE asset
|
||
anywhere in the tracked set widens every open trade's TP by x1.40.** The
|
||
LINK 5e05eeeb diagnosis (2026-06-11, -$1,248.71) showed this trigger is
|
||
active on a large fraction of trades because entries occur during panics
|
||
by construction. Domain candidates: {1, 2, 3, n_assets//4, n_assets//2};
|
||
fallback_baseline: 1 (current behavior).
|
||
- `ob_cascade.tp_widen_factor` — currently hardcoded 1.40. Population
|
||
evidence (post-2026-05-11 cohort): widening earned ~+$84.7K on
|
||
continuation trades vs ~-$16.9K given back on reversals, so the factor is
|
||
net-positive but fat-left-tailed. Domain: [1.0 .. 1.6]; 1.0 = modulation
|
||
off.
|
||
- `ob_cascade.withdrawal_velocity_threshold` — `CASCADE_THRESHOLD` in
|
||
`ob_features.py`, currently -0.10 (10% depth pulled over lookback).
|
||
|
||
Required sensors already exist since 2026-06-12: `dynamic_tp_pct`,
|
||
`tp_mod_factor`, `cascade_count`, `ob_regime_signal`, `tp_floor_armed` are
|
||
logged on every `dolphin.v7_decision_events` row, so reward attribution can
|
||
be computed offline from the live tape with no new instrumentation.
|
||
|
||
INTERPLAY (REQUIRED reading for the paramset author): these parameters
|
||
interact with (a) the TP_FLOOR profit-floor ratchet (2026-06-12,
|
||
`DOLPHIN_TP_FLOOR`) which caps the left tail of the widening — reward must
|
||
be computed on the JOINT policy (widen + floor), not the widen alone; and
|
||
(b) §10.1 Conditional Fast TP / the future ADAPTIVE TP THRESHOLD ("Dynamic
|
||
TP"): the adaptive TP threshold itself is hereby marked FIT FOR VIBRISS
|
||
GOVERNANCE — the effective TP should ultimately be one governed surface
|
||
(base x leverage-curve x market-state x cascade modulation), with VIBRISS
|
||
owning the modulation terms and the champion base (0.20%) remaining frozen
|
||
outside governance. A VIOLET-era sub-second exit guard changes the
|
||
actuation latency of both TP and floor; cadence is therefore a context
|
||
feature, not a governed parameter, per the data-cadence operator rule.
|
||
|
||
## 11. First Concrete ParamSet: ADVSL Hold Substitute
|
||
|
||
### 11.1 Objective
|
||
|
||
This is the first concrete VIBRISS use case.
|
||
|
||
The parameter set replaces a static ADVSL no-arm / min-hold rule with a bounded,
|
||
evidence-scored hold target. The original research problem was the legacy
|
||
`20`-bar hold window: it protects winners from premature ADVSL exits, but it can
|
||
also let fast adverse trades slip through before the floor arms. Replay work
|
||
found that shorter centers, especially around `12` bars, can protect capital in
|
||
tail events, while longer holds can be correct in snapback/recovery pockets.
|
||
|
||
The VIBRISS answer is not "always use 12" and not "always use 20." It is:
|
||
|
||
- choose a hold target from a bounded set,
|
||
- condition the choice on current trade/path/regime sensors,
|
||
- score it by recursive capital-curve impact after opportunity cost,
|
||
- keep catastrophic loss floors outside the learner as non-negotiable safety.
|
||
|
||
The sweep geometry itself is also a VIBRISS parameter. The ParamSet may carry a
|
||
global sweep window plus per-regime/per-hash sweep windows in `sweep_policy`.
|
||
When the derived best band touches the search window boundary, treat that as a
|
||
signal that the search is still censored by the current bounds, not as proof
|
||
that the optimum is "wide open." In that case, expand the admissible sweep
|
||
window and re-evaluate before promoting the range.
|
||
|
||
### 11.2 ParamSet Identity
|
||
|
||
```yaml
|
||
param_set:
|
||
id: advsl.hold_substitute.v1
|
||
name: ADVSL Hold Substitute
|
||
status: shadow_first
|
||
namespace_default: blue
|
||
consumer: advanced_sl
|
||
decision_family: exit_risk_timing
|
||
replaces:
|
||
- legacy_advsl_min_hold_bars_20
|
||
related_live_controls:
|
||
- advsl.base_catastrophic_floor_pct
|
||
- advsl.overlay_catastrophic_floor_pct
|
||
- advsl.overlay_max_loss_usd
|
||
- advsl.overlay_pressure_min
|
||
- advsl.overlay_mae_risk_min
|
||
```
|
||
|
||
This spec governs the hold/arming decision only. It may recommend when ADVSL
|
||
is allowed to arm, but it must not remove the catastrophic floor.
|
||
|
||
### 11.3 ParamSet Config and Parameters
|
||
|
||
Shared ParamSet config:
|
||
|
||
```yaml
|
||
paramset_config:
|
||
consumer: advanced_sl
|
||
decision_family: exit_risk_timing
|
||
placement:
|
||
decision_point: trade_entry
|
||
live_replacement_rhythm: capture_on_entry
|
||
intratrade_change_policy: shadow_only
|
||
outputs:
|
||
hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice
|
||
decision_table: dolphin.vibriss_decisions
|
||
reward_table: dolphin.vibriss_rewards
|
||
|
||
param_defaults:
|
||
learner:
|
||
type: discounted_ucb
|
||
contextual_shadow_branch: linucb
|
||
nonstationarity: sliding_window
|
||
window_trades: 300
|
||
safety:
|
||
fallback_baseline: 12
|
||
max_exploration_rate: 0.0
|
||
min_shadow_samples: 200
|
||
min_live_confidence: 0.80
|
||
reward_mapping:
|
||
primary_metric: recursive_capital_curve_delta_after_opportunity_cost
|
||
bounded_range: [-1.0, 1.0]
|
||
guardrails:
|
||
stale_obf_policy: ignore_obf_features
|
||
low_maras_confidence_policy: shrink_to_global_prior
|
||
drawdown_alarm_policy: freeze_to_safe_baseline
|
||
```
|
||
|
||
Primary learned parameter:
|
||
|
||
```yaml
|
||
params:
|
||
advsl.min_hold_bars_before_floor_arm:
|
||
type: integer
|
||
units: bars
|
||
baseline_reference: 20
|
||
starting_center: 12
|
||
current_live_overlay_reference: 6
|
||
default: 12
|
||
domain:
|
||
candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]
|
||
hard_min: 0
|
||
hard_max: 48
|
||
```
|
||
|
||
Companion deterministic guardrails:
|
||
|
||
```yaml
|
||
params:
|
||
advsl.max_loss_usd_floor:
|
||
type: float
|
||
units: usd
|
||
default_overlay: 500.0
|
||
research_candidate: 400.0
|
||
learner_controlled: false
|
||
|
||
advsl.catastrophic_floor_pct:
|
||
type: float
|
||
units: pct
|
||
default_base: 0.0120
|
||
default_overlay: 0.0050
|
||
learner_controlled: false
|
||
|
||
advsl.recovery_extension_max_bars:
|
||
type: integer
|
||
units: bars
|
||
default: 0
|
||
domain:
|
||
candidates: [0, 4, 8, 12, 20, 34]
|
||
hard_min: 0
|
||
hard_max: 40
|
||
learner_controlled: shadow_only_until_validated
|
||
safety:
|
||
min_shadow_samples: 500
|
||
min_live_confidence: 0.90
|
||
```
|
||
|
||
Interpretation:
|
||
|
||
- `baseline_reference=20` preserves the historical question.
|
||
- `starting_center=12` is the current replay-derived center.
|
||
- `current_live_overlay_reference=6` records the tightened overlay state and
|
||
must be reported separately from the legacy 20-bar research baseline.
|
||
- `34` and `40` remain candidates because contiguous-region medians observed
|
||
during replay included materially longer optima.
|
||
|
||
### 11.4 Required Sensors
|
||
|
||
The hold substitute must use point-in-time sensors only. End-of-trade labels may
|
||
be used for reward calculation, not for action selection.
|
||
|
||
Core context sensors:
|
||
|
||
| Sensor | Source | Use |
|
||
|---|---|---|
|
||
| `asset` | live trade state | Asset-level prior and OBF join key. |
|
||
| `side` | live trade state / EFSM | Separate SHORT base from EFSM-flipped LONG contexts. |
|
||
| `bars_held` | live trade state | Determines current arming progress. |
|
||
| `entry_price` / `current_price` | live trade state | Signed path and current PnL. |
|
||
| `post_gross_path_pct` | trade path replay/live path state | Measures post-entry excursion shape. |
|
||
| `mae_pct` | live path state | Adverse excursion severity. |
|
||
| `mfe_pct` | live path state | Favorable excursion and recovery potential. |
|
||
| `mfe_decay` | derived from MFE/current PnL | Detects giveback and weakening recovery. |
|
||
| `current_pnl_mfe_frac` | derived from current PnL / MFE | Indicates whether recovery is intact or mostly lost. |
|
||
| `v7_exit_pressure` | `v7_decision_events` / live V7 snapshot | Pressure/continuation signal for recovery unlikely cases. |
|
||
| `v7_mae_risk` | V7 snapshot | Separates ordinary drawdown from risk-tier drawdown. |
|
||
| `v7_action` | V7 snapshot | EXIT/RETRACT/EXTEND/HOLD context. |
|
||
| `state_confidence` | market-state / MARAS / bundle confidence | Low confidence forces conservative fallback. |
|
||
|
||
OBF sensors:
|
||
|
||
| Sensor | Source | Use |
|
||
|---|---|---|
|
||
| `obf_depth_1pct_usd` | `obf_universe_latest` / OBF CH | Recovery-capacity and liquidity depth. |
|
||
| `obf_depth_quality` | OBF derived quality | Distinguishes deep snapback pockets from weak-book grinds. |
|
||
| `obf_spread_bps` | OBF | Penalizes bad microstructure. |
|
||
| `obf_imbalance` | OBF | Directional liquidity pressure. |
|
||
| `obf_imbalance_ma5` / `obf_imbalance_ma10` | OBF derived path | Smooths raw book pressure for in-trade TP/SL context. |
|
||
| `obf_imbalance_slope` | OBF derived path | Detects whether pressure is strengthening or fading. |
|
||
| `obf_imbalance_persistence` | OBF derived path | Measures sign stability rather than one-tick noise. |
|
||
| `obf_imbalance_reaccel` | OBF derived path | Detects renewed pressure after a mid-trade weakening/plateau. |
|
||
| `obf_staleness_s` | OBF timestamp | Guardrail; stale OBF cannot steer hold. |
|
||
|
||
Regime sensors:
|
||
|
||
| Sensor | Source | Use |
|
||
|---|---|---|
|
||
| `maras_regime` | `maras_latest` / `maras_fingerprint` | Label-level bias only, never hard filter. |
|
||
| `maras_composite_hash` | MARAS Scope B | Exact historical hash prior when sample size is enough. |
|
||
| `maras_scalar_hash` | MARAS Scope A | Coarse sortable regime prior. |
|
||
| `maras_confidence` | MARAS | Low confidence reduces live trust. |
|
||
| `maras_conflict_level` | MARAS | High conflict increases uncertainty/exploration penalty. |
|
||
| `s_eigen_vd`, `s_eigen_w50`, `s_eigen_w750` | MARAS raw signature | Eigen-state context. |
|
||
| `s_btc_dev_pct`, `raw_btc_ma99` | MARAS BTC tier | Trend/uptrend/downtrend pressure context. |
|
||
| `s_acb_boost`, `s_acb_beta` | MARAS/ACB | Protective/risk-on context. |
|
||
|
||
Outcome-only reward sensors:
|
||
|
||
| Sensor | Source | Use |
|
||
|---|---|---|
|
||
| `actual_exit_pnl` | `trade_events` | Realized baseline outcome. |
|
||
| `counterfactual_exit_pnl_by_hold` | tape replay | Arm-level reward. |
|
||
| `recovery_lag_s` | tape replay | Time to recover after floor/cut. |
|
||
| `extra_bars_to_recovery` | tape replay | Cost of too-short hold. |
|
||
| `clipped_winner_delta` | tape replay | Opportunity cost of premature exit. |
|
||
| `saved_loss_delta` | tape replay | Loss avoided by earlier floor arm. |
|
||
| `capital_curve_delta` | recursive replay | Primary reward accounting. |
|
||
|
||
### 11.5 Feature Construction
|
||
|
||
VIBRISS should compute a compact feature vector from the sensors:
|
||
|
||
```text
|
||
path_speed = abs(post_gross_path_pct) / max(1, bars_held)
|
||
mae_velocity = mae_pct / max(1, bars_since_entry)
|
||
mfe_velocity = mfe_pct / max(1, bars_since_entry)
|
||
recovery_ratio = current_pnl_mfe_frac
|
||
giveback_ratio = 1.0 - current_pnl_mfe_frac
|
||
liquidity_score = f(obf_depth_1pct_usd, obf_depth_quality, obf_spread_bps)
|
||
signed_obf_imbalance = side_sign * obf_imbalance
|
||
imbalance_confirmation = f(signed_obf_imbalance_ma5, persistence, slope)
|
||
imbalance_reacceleration = f(prior_weakening, current_signed_slope, persistence)
|
||
pressure_score = f(v7_exit_pressure, v7_mae_risk, v7_action)
|
||
regime_key = maras_composite_hash if sample_count(hash) >= min_hash_n else maras_regime
|
||
confidence_weight = min(state_confidence, maras_confidence) * (1.0 - maras_conflict_level)
|
||
```
|
||
|
||
Feature requirements:
|
||
|
||
- All features must be point-in-time.
|
||
- Missing OBF must not become zero-depth unless zero-depth is the actual
|
||
observation. Missing OBF is its own mask feature.
|
||
- MARAS labels are context, not filters. Use hash/sample priors and raw
|
||
signature dimensions where possible.
|
||
- Side must be explicit. EFSM-flipped LONG trades cannot share a blind SHORT
|
||
prior.
|
||
- OBF imbalance must be side-normalized. For a SHORT, negative raw imbalance is
|
||
confirming; for a LONG, positive raw imbalance is confirming.
|
||
- Raw imbalance is not enough. Use moving averages, persistence, slope, and
|
||
re-acceleration after weakening so a single noisy tick cannot steer ADVSL.
|
||
|
||
### 11.5.1 OBF Imbalance Assistance Research
|
||
|
||
Live ENJUSDT observation on `2026-06-04` motivates an explicit research feature
|
||
family for ADVSL/TP assistance. The trade entered SHORT near `10:06:14 UTC` and
|
||
closed `FIXED_TP` near `10:10:11 UTC` for `+$118.53`.
|
||
|
||
Observed OBF path:
|
||
|
||
- entry imbalance was near neutral (`~ -0.015` to `+0.001`);
|
||
- within seconds it snapped SHORT-confirming (`~ -0.18` to `-0.21`);
|
||
- mid-trade it weakened and oscillated around neutral in 30s buckets;
|
||
- into TP it re-strengthened materially (`~ -0.30` to `-0.35`).
|
||
|
||
Conclusion:
|
||
|
||
- Imbalance did not monotonically increase from entry to exit.
|
||
- It behaved as a confirmation/re-acceleration signal: neutral -> confirming
|
||
pressure -> weakening/plateau -> renewed confirming pressure into TP.
|
||
- Therefore VIBRISS should not use raw imbalance as a simple exit trigger.
|
||
|
||
Candidate uses:
|
||
|
||
| Use | Candidate rule |
|
||
|---|---|
|
||
| TP assist | If price is near TP and side-normalized imbalance re-accelerates in favor, avoid premature ADVSL/retract exits. |
|
||
| SL/ADVSL assist | If adverse PnL appears and side-normalized imbalance persistently contradicts the trade, recovery probability should shrink. |
|
||
| Hold assist | If imbalance is neutral/choppy but not contradictory, do not force an exit from imbalance alone. |
|
||
| Floor timing | Combine `price_progress_to_tp * imbalance_confirmation` with MAE/MFE path shape to decide whether the floor should wait or arm. |
|
||
|
||
Candidate feature names:
|
||
|
||
```text
|
||
imbalance_signed_for_trade
|
||
imbalance_ma5_signed
|
||
imbalance_ma10_signed
|
||
imbalance_slope_signed
|
||
imbalance_persistence_signed
|
||
imbalance_reacceleration_after_weakening
|
||
price_progress_to_tp_x_imbalance_confirmation
|
||
adverse_pnl_x_imbalance_contradiction
|
||
```
|
||
|
||
Research requirement: replay this across completed trades before live use. Score
|
||
it by recursive capital delta after opportunity cost, not by whether it explains
|
||
one ENJ winner.
|
||
|
||
### 11.5.2 Macro-Thesis Persistence vs Local Danger Research
|
||
|
||
Live XLMUSDT observation on `2026-06-04` motivates a mandatory ADVSL/VIBRISS
|
||
research direction. The trade suffered a large adverse excursion before closing
|
||
at `FIXED_TP`. Local OBF imbalance and V7 pressure were frightening during the
|
||
worst MAE; they did not cleanly foresee the recovery. The higher-level
|
||
eigen/MARAS context, however, stayed coherent with the trade thesis: bearish or
|
||
choppy-bearish posture, low conflict, active dislocation, and bearish BTC
|
||
context.
|
||
|
||
Actionable lesson to test to exhaustion:
|
||
|
||
```text
|
||
ADVSL/V7 local danger should be overruled only when macro thesis persistence
|
||
remains strong, MARAS conflict/novelty remains low, and OBF contradiction is not
|
||
persistent/deep enough to invalidate the thesis.
|
||
```
|
||
|
||
This is not a live rule yet. It is a research requirement for the first
|
||
VIBRISS-governed ADVSL/bar-hold policy. The learner must explicitly measure
|
||
when local pain is a true invalidation signal versus when it is survivable
|
||
excursion inside a still-valid macro/eigen thesis.
|
||
|
||
The required research output is a weighting model, not a binary exception. The
|
||
policy must estimate how much authority belongs to local danger signals versus
|
||
macro-thesis persistence under the current context. Those weights are themselves
|
||
VIBRISS-tunable parameters and must be represented in the ParamSet spec with
|
||
safe defaults, bounded candidate ranges, promotion rules, and audit logging.
|
||
|
||
Candidate feature names:
|
||
|
||
```text
|
||
macro_thesis_persistence
|
||
maras_conflict_low_during_mae
|
||
maras_hash_knownness_during_mae
|
||
eigen_dislocation_persistence_during_mae
|
||
btc_context_alignment_during_mae
|
||
local_obf_contradiction_persistence
|
||
local_obf_contradiction_depth_weighted
|
||
v7_pressure_without_macro_invalidation
|
||
adverse_move_vs_macro_persistence
|
||
late_recovery_obf_reacceleration
|
||
```
|
||
|
||
Candidate tunable parameters:
|
||
|
||
```text
|
||
local_danger_weight
|
||
macro_thesis_weight
|
||
obf_contradiction_weight
|
||
maras_conflict_weight
|
||
eigen_persistence_weight
|
||
btc_context_weight
|
||
v7_pressure_weight
|
||
macro_override_min_confidence
|
||
local_invalidation_min_persistence_bars
|
||
```
|
||
|
||
The initial decision form should be simple and auditable:
|
||
|
||
```text
|
||
local_danger_score =
|
||
local_danger_weight * v7_pressure
|
||
+ obf_contradiction_weight * local_obf_contradiction_persistence
|
||
+ maras_conflict_weight * maras_conflict_or_novelty
|
||
|
||
macro_thesis_score =
|
||
macro_thesis_weight * macro_thesis_persistence
|
||
+ eigen_persistence_weight * eigen_dislocation_persistence_during_mae
|
||
+ btc_context_weight * btc_context_alignment_during_mae
|
||
|
||
hold_or_cut_bias = macro_thesis_score - local_danger_score
|
||
```
|
||
|
||
VIBRISS may tune the weights, but guardrails must prevent pathological behavior:
|
||
local danger cannot be ignored at extreme MAE, and macro thesis cannot override
|
||
persistent high-depth OBF contradiction plus MARAS conflict/novelty.
|
||
|
||
Required tests:
|
||
|
||
- replay all completed trades with this feature family available point-in-time;
|
||
- isolate high-MAE trades that later TP'd from high-MAE trades that continued
|
||
into real loss;
|
||
- charge every delayed cut for worst-case tail loss and every early cut for
|
||
missed recovery/opportunity cost;
|
||
- evaluate separately for base SHORTs and EFSM/overlay-flipped LONGs;
|
||
- report per-MARAS-hash, per-label, and nearest-neighbor raw-signature results;
|
||
- report learned/suggested weights and their stability by contiguous region,
|
||
MARAS hash, side, and asset-liquidity bucket;
|
||
- promote only if held-out contiguous regions improve recursive capital delta
|
||
without hiding clipped winners or worse tail events.
|
||
|
||
### 11.5.3 Macro/OBF Evidence Hierarchy Research
|
||
|
||
Live DASHUSDT observations on `2026-06-04` add a third case study to the XLM
|
||
and ETC findings. DASH produced two fast SHORT `FIXED_TP` trades, including
|
||
`efcc6dce`, which entered near `11:00:15 UTC` and closed near `11:00:38 UTC`
|
||
after only `2` bars for `+$367.92`.
|
||
|
||
The large DASH trade was not a scary hold-through-MAE case:
|
||
|
||
- V7 recorded `mae = 0` for the trade path;
|
||
- entry `vel_div` was extreme (`~ -0.2463`);
|
||
- MARAS at entry was `BEARISH`, low conflict, composite hash `58981`;
|
||
- BTC context remained bearish (`s_btc_above_ma99 = 0`);
|
||
- OBF imbalance initially leaned against the SHORT, then flipped materially
|
||
SHORT-confirming during the price break.
|
||
|
||
This suggests an evidence hierarchy that must be tested explicitly:
|
||
|
||
```text
|
||
macro/eigen OK + OBF confirms
|
||
> macro/eigen OK + OBF neutral/choppy
|
||
> macro/eigen OK + OBF counters transiently but then flips confirming
|
||
> macro/eigen OK + OBF persistently counters with depth
|
||
> macro/eigen weak/conflicted regardless of OBF
|
||
```
|
||
|
||
The hierarchy is not a live rule. DASH shows that a very strong macro/eigen
|
||
impulse can overcome early OBF contradiction when the contradiction is shallow
|
||
or transient. ETC shows the stronger case, where OBF remained SHORT-confirming
|
||
through adverse price movement. XLM shows the weaker/riskier case, where macro
|
||
thesis persistence carried the trade while OBF was ugly at the worst point.
|
||
|
||
Candidate features:
|
||
|
||
```text
|
||
macro_obf_alignment_class
|
||
macro_extreme_impulse_score
|
||
obf_counter_transience_bars
|
||
obf_counter_depth_weighted
|
||
obf_flip_to_confirmation_latency_s
|
||
obf_confirmation_after_macro_impulse
|
||
macro_ok_obf_confirm_weight
|
||
macro_ok_obf_counter_weight
|
||
macro_extreme_overrides_obf_counter_weight
|
||
```
|
||
|
||
Required tests:
|
||
|
||
- rank outcomes by `macro_obf_alignment_class`;
|
||
- compare `macro OK + OBF confirm` against `macro OK + OBF counter`;
|
||
- split OBF counter cases into transient, shallow, persistent, and
|
||
depth-weighted contradiction;
|
||
- measure whether OBF flip-to-confirmation latency predicts TP speed;
|
||
- report whether extreme `vel_div` can safely receive more weight than early
|
||
OBF contradiction, and where that becomes unsafe;
|
||
- expose the learned hierarchy weights as VIBRISS-tunable parameters, not
|
||
hardcoded doctrine.
|
||
|
||
### 11.5.4 Falling-Knife / Missing-Bounce-Sensor Case Study
|
||
|
||
Live LTCUSDT observation on `2026-06-04` (`c0139cea`) adds an open/pending case
|
||
study for the opposite side of the DASH impulse capture. The trade entered SHORT
|
||
near `11:15:12 UTC` with extreme entry `vel_div` (`~ -0.1942`) and high notional,
|
||
but subsequently showed severe adverse excursion and no meaningful favorable
|
||
excursion at the time of review. V7 also emitted repeated `RETRACT`
|
||
recommendations, but V7 pressure is not treated as truth by itself; XLM showed
|
||
that V7 can scream during a trade that later recovers profitably.
|
||
|
||
Observed at review time:
|
||
|
||
- `inverse_ars_bounce_shadow` was stale; latest row was `2026-06-03 18:42:26
|
||
UTC`, so the bounce detector was not assisting live;
|
||
- V7 repeatedly emitted `RETRACT / V7_RISK_DOMINANT`, which is local-pain
|
||
evidence only;
|
||
- V7 observed `mae ~ 0.854%`, `mfe = 0`, and `exit_pressure = 3`;
|
||
- OBF was mostly neutral/choppy with weak, oscillating side-normalized evidence,
|
||
not a strong rescue signal;
|
||
- MARAS/BTC remained broadly bearish/low-conflict, but recent eigen values were
|
||
intermittent rather than steadily thesis-confirming.
|
||
|
||
Research meaning:
|
||
|
||
```text
|
||
macro/eigen entry impulse alone is insufficient when local danger is extreme,
|
||
MFE remains zero, OBF does not confirm, and the bounce/inverse-risk sensor is
|
||
missing or stale.
|
||
```
|
||
|
||
V7 pressure must be weighted conditionally:
|
||
|
||
```text
|
||
V7 pressure is discounted when macro thesis remains strong, OBF confirms, and
|
||
MFE exists.
|
||
|
||
V7 pressure receives more weight only when independent local invalidation
|
||
features agree: zero MFE, rising MAE, neutral/counter OBF, stale/missing bounce
|
||
sensor, macro impulse decay, or MARAS conflict/novelty.
|
||
```
|
||
|
||
Candidate features:
|
||
|
||
```text
|
||
bounce_sensor_freshness_s
|
||
bounce_sensor_missing_mask
|
||
extreme_macro_without_mfe
|
||
v7_retract_persistence_bars
|
||
zero_mfe_high_mae_flag
|
||
obf_neutral_or_counter_during_mae
|
||
macro_impulse_decay_after_entry
|
||
```
|
||
|
||
Required replay treatment:
|
||
|
||
- stale/missing bounce data must be an explicit mask feature, not an assumed
|
||
neutral score;
|
||
- compare extreme-entry trades that get early MFE against extreme-entry trades
|
||
with zero MFE and rising MAE;
|
||
- treat persistent V7 `RETRACT` as a local-danger amplifier only when confirmed
|
||
by independent invalidation sensors such as stale bounce, zero MFE, rising
|
||
MAE, neutral/counter OBF, or macro impulse decay;
|
||
- only promote a macro override if it survives this LTC-style case family after
|
||
opportunity-cost and tail-loss accounting.
|
||
|
||
### 11.6 Learning / Computing Model
|
||
|
||
V1 should use a two-layer policy:
|
||
|
||
1. Prior/posture estimator:
|
||
- computes candidate priors from historical replay by MARAS composite hash,
|
||
MARAS label, asset, side, and contiguous time region.
|
||
- uses shrinkage: hash prior -> label prior -> global prior.
|
||
- initializes the hold target near `12` bars unless the context prior has
|
||
enough evidence to move it.
|
||
|
||
2. Online contextual bandit:
|
||
- learner: discounted LinUCB or LinTS over finite hold-bar arms.
|
||
- arms: `[4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]`.
|
||
- reward: delayed until trade close or replay terminal.
|
||
- discount/window: sliding 300 closed trades, plus faster decay when drift is
|
||
detected.
|
||
- exploration: shadow-only by default; live exploration cap starts at `0`.
|
||
|
||
Recommended fallback if contextual coverage is sparse:
|
||
|
||
```text
|
||
if hash_sample_n >= 30:
|
||
prior = median_best_hold_for_hash
|
||
elif label_side_sample_n >= 100:
|
||
prior = median_best_hold_for_label_side + label_bias
|
||
else:
|
||
prior = 12
|
||
|
||
advice = guardrail_filter(contextual_bandit(prior, candidates))
|
||
```
|
||
|
||
Optional recovery model:
|
||
|
||
- Train a survival model for `extra_bars_to_recovery`.
|
||
- Use it only as a veto/adjuster until validated.
|
||
- It may increase hold only when recovery probability is high and expected
|
||
extra hold is short.
|
||
|
||
### 11.7 Success Definition
|
||
|
||
Primary success metric:
|
||
|
||
```text
|
||
recursive_capital_curve_delta_after_opportunity_cost
|
||
```
|
||
|
||
This means the replay must account for saved capital compounding forward, and
|
||
must subtract the opportunity cost of trades that would have recovered or won
|
||
after a premature floor/ADVSL action.
|
||
|
||
Secondary metrics:
|
||
|
||
- net PnL delta
|
||
- ROI delta
|
||
- max drawdown delta
|
||
- tail-loss count and severity
|
||
- number of hard/floor cuts
|
||
- number of clipped winners
|
||
- gross saved loss
|
||
- gross missed upside
|
||
- average and median recovery lag
|
||
- average and median extra bars to recovery
|
||
- TP near-miss count, TP near-miss recovery lag, and first-touch TP hit rate
|
||
- per-hash and per-label stability
|
||
- OOD region performance
|
||
- worst contiguous-region degradation
|
||
- explicit ceiling-violation count and worst single-loss size under the tested
|
||
policy, because a "best" replay result is not acceptable if it breaches the
|
||
operator's declared loss ceiling
|
||
|
||
Promotion requires:
|
||
|
||
- positive recursive capital-curve delta on held-out contiguous regions,
|
||
- no unacceptable increase in clipped-winner opportunity cost,
|
||
- no hidden dependence on a single asset or single MARAS hash,
|
||
- improvement or neutral behavior on EFSM-flipped LONG subset,
|
||
- deterministic replay reproducibility,
|
||
- shadow logging coverage sufficient for OPE.
|
||
|
||
### 11.8 Calibration Protocol
|
||
|
||
Calibration must run in this order:
|
||
|
||
1. Full-tape replay:
|
||
- evaluate every candidate hold arm on every eligible historical trade path.
|
||
- include all available BLUE/PINK/PRODGREEN executed trade history only when
|
||
namespace semantics are kept separate.
|
||
|
||
2. Capital-aware replay:
|
||
- recursively recompute capital after each counterfactual exit.
|
||
- preserve position sizing geometry when the saved/lost capital changes the
|
||
subsequent notional.
|
||
|
||
3. Opportunity-cost audit:
|
||
- for every floor/ADVSL cut, measure whether the trade later recovered.
|
||
- record recovery lag, extra bars, and missed PnL.
|
||
|
||
4. Region validation:
|
||
- split into contiguous time regions with enough trades.
|
||
- repeat with moving/randomized boundaries.
|
||
- report median/best hold per region.
|
||
|
||
5. MARAS proximity validation:
|
||
- group by composite hash when sample size is enough.
|
||
- otherwise use nearest-neighbor distance over MARAS raw signature fields.
|
||
- report whether per-hash/per-neighbor priors outperform global 12-bar center.
|
||
|
||
6. OBF validation:
|
||
- bind optimum hold to `obf_depth_1pct_usd`, `obf_depth_quality`, spread, and
|
||
imbalance.
|
||
|
||
7. TP near-miss validation:
|
||
- include trades that nearly touched candidate TP but missed on the observed
|
||
cadence.
|
||
- compute first-touch labels from the highest-resolution available path.
|
||
- isolate the opportunity cost of late reversal after near-touch.
|
||
- compare the resulting TP bucket against the profitable-close-only sample.
|
||
- test on OOD time slices; do not promote an OBF rule from in-sample fit only.
|
||
|
||
7. Walk-forward:
|
||
- train on region N, validate on N+1.
|
||
- repeat across the full history.
|
||
- freeze the learner if the current best policy degrades versus baseline.
|
||
|
||
### 11.9 Advice Payload
|
||
|
||
Example advice:
|
||
|
||
```json
|
||
{
|
||
"schema": "vibriss.param_set_advice.v1",
|
||
"namespace": "blue",
|
||
"param_set_id": "advsl.hold_substitute.v1",
|
||
"spec_version": "1.0.0",
|
||
"trade_scope": "on_entry",
|
||
"baseline_reference": 20,
|
||
"current_live_overlay_reference": 6,
|
||
"recommended": {
|
||
"advsl.min_hold_bars_before_floor_arm": 12,
|
||
"advsl.recovery_extension_max_bars": 0
|
||
},
|
||
"candidate_set": [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40],
|
||
"confidence": 0.74,
|
||
"context": {
|
||
"asset": "XLMUSDT",
|
||
"side": "LONG",
|
||
"maras_composite_hash": 57957,
|
||
"maras_regime": "CHOPPY_BEARISH",
|
||
"obf_depth_quality_bucket": "weak",
|
||
"v7_pressure_bucket": "high"
|
||
},
|
||
"guardrail_status": "SHADOW_ONLY",
|
||
"fallback_value": 12,
|
||
"expires_at": "2026-06-03T00:05:00Z"
|
||
}
|
||
```
|
||
|
||
### 11.10 Guardrails
|
||
|
||
Mandatory guardrails:
|
||
|
||
- Shadow-only until walk-forward validation is positive.
|
||
- No live exploration by default.
|
||
- Do not allow the learner to disable catastrophic floors.
|
||
- If OBF is stale, ignore OBF-derived hold extension.
|
||
- If MARAS confidence is low or conflict is high, shrink toward global prior.
|
||
- If context is EFSM-flipped LONG and LONG sample count is sparse, use the
|
||
tighter safe prior, not a broad SHORT-derived prior.
|
||
- If the recommended hold would increase worst-case open loss beyond the active
|
||
floor/cap, the floor/cap wins.
|
||
- If capital drawdown alarm is active, freeze to deterministic safe baseline.
|
||
|
||
### 11.11 Starting Priors From Current Research
|
||
|
||
Current replay-derived starting posture:
|
||
|
||
| Context | Starting prior | Rationale |
|
||
|---|---:|---|
|
||
| Global ADVSL hold substitute | `12` bars | Best current center for reducing 20-bar tail slips without assuming all contexts need long waits. |
|
||
| Legacy baseline comparison | `20` bars | Historical no-arm/min-hold reference. |
|
||
| Tight overlay reference | `6` bars | Current live overlay guardrail reference, not the general learned policy. |
|
||
| Recovery/snapback pockets | `24` to `40` bars | Some contiguous-region medians were materially longer; keep as candidates, not defaults. |
|
||
| Sparse/unknown context | `12` bars | Conservative research center with shrinkage. |
|
||
| EFSM-flipped LONG sparse context | `6` to `12` bars | Do not borrow broad SHORT recovery priors blindly. |
|
||
|
||
Known caution:
|
||
|
||
- A `$400` hard cap improved one capital-aware slice by about `+$592.83` versus
|
||
the 12-bar-only replay, but generated a gross forgone-upside bucket around
|
||
`+$6,617.30` on hard-cap hits. Therefore max-loss floors must be evaluated
|
||
with opportunity cost and recovery lag, not judged by saved-loss totals alone.
|
||
|
||
### 11.12 Promotion Policy
|
||
|
||
Promotion is part of this ParamSet, not a global runner decision.
|
||
|
||
```yaml
|
||
promotion_policy:
|
||
owner: advsl.hold_substitute.v1
|
||
technique: replay_shadow_canary
|
||
baseline_policy:
|
||
legacy_reference: 20
|
||
current_overlay_reference: 6
|
||
fallback_value: 12
|
||
cadence:
|
||
replay_calibration: every_6h_or_50_new_rewards
|
||
promotion_review: every_30m
|
||
checkpoint_review: every_60s
|
||
live_replacement_rhythm: at_trade_entry_only
|
||
evidence_gates:
|
||
shadow_to_advisory:
|
||
min_replay_trades: 300
|
||
min_contiguous_regions: 4
|
||
recursive_capital_curve_delta_after_cost: "> 0"
|
||
worst_region_delta: ">= -0.10 * positive_total_delta"
|
||
clipped_winner_cost_budget: "documented_and_bounded"
|
||
advisory_to_canary_live:
|
||
min_shadow_decisions: 200
|
||
min_closed_trade_rewards: 50
|
||
min_days_observed: 3
|
||
no_unexplained_tail_loss_cluster: true
|
||
manual_approval_required: true
|
||
canary_live_to_controlled_live:
|
||
min_live_consumed_trades: 50
|
||
live_vs_shadow_regret: "<= 0"
|
||
no_guardrail_violation: true
|
||
manual_approval_required: true
|
||
canary_scope:
|
||
namespaces: [blue]
|
||
max_paramsets_live: 1
|
||
max_live_exploration_rate: 0.0
|
||
allow_only_capture_on_entry: true
|
||
automatic_demotion:
|
||
- stale_obf_or_maras_required_context
|
||
- reward_backlog_critical
|
||
- drawdown_alarm
|
||
- candidate_underperforms_baseline_in_shadow
|
||
- checkpoint_hash_mismatch
|
||
```
|
||
|
||
Interpretation:
|
||
|
||
- `replay_calibration` answers how often the ParamSet re-estimates candidate
|
||
quality from historical/newly closed data.
|
||
- `promotion_review` answers how often the ParamSet is checked for stronger
|
||
mode eligibility.
|
||
- `live_replacement_rhythm` answers when the engine may replace the old
|
||
parameter with the VIBRISS value. For this ParamSet it is only at trade entry.
|
||
- The runner executes this contract. It does not invent promotion thresholds.
|
||
|
||
### 11.13 Meta-Cadence Policy
|
||
|
||
The cadence parameters are themselves governed by this ParamSet. They are not
|
||
free-floating daemon settings.
|
||
|
||
```yaml
|
||
meta_cadence_policy:
|
||
owner: advsl.hold_substitute.v1
|
||
status: shadow_first
|
||
learner: discounted_ucb_then_linucb
|
||
tunable_cadences:
|
||
replay_calibration_interval_s:
|
||
baseline: 21600
|
||
candidates: [1800, 3600, 10800, 21600, 43200]
|
||
promotion_review_interval_s:
|
||
baseline: 1800
|
||
candidates: [900, 1800, 3600, 7200]
|
||
checkpoint_interval_s:
|
||
baseline: 60
|
||
candidates: [30, 60, 120, 300]
|
||
min_new_rewards_before_recalibration:
|
||
baseline: 50
|
||
candidates: [10, 25, 50, 100]
|
||
shadow_to_canary_cooldown_trades:
|
||
baseline: 100
|
||
candidates: [25, 50, 100, 200]
|
||
context_inputs:
|
||
maras:
|
||
- maras_composite_hash
|
||
- maras_confidence
|
||
- maras_conflict_level
|
||
- maras_nearest_distance
|
||
exof:
|
||
- exf_latest
|
||
- btc_regime_features
|
||
- market_volatility_context
|
||
esof:
|
||
- session_bucket
|
||
- day_of_week
|
||
- calendar_event_flags
|
||
ops:
|
||
- reward_backlog_age_s
|
||
- ch_write_failure_rate
|
||
- artifact_disk_free_gb
|
||
- drawdown_state
|
||
reward_mapping:
|
||
positive:
|
||
- faster_detection_of_degraded_hold_policy
|
||
- lower_stale_advice_rate
|
||
- lower_missed_adaptation_cost
|
||
negative:
|
||
- promotion_false_positive
|
||
- noisy_recalibration_churn
|
||
- excessive_compute_or_backlog
|
||
- operator_churn
|
||
live_change_policy:
|
||
replay_calibration_interval_s: controlled_after_shadow
|
||
promotion_review_interval_s: advisory_only_until_manual_approval
|
||
checkpoint_interval_s: fixed_by_ops_until_runner_load_tested
|
||
shadow_to_canary_cooldown_trades: advisory_only
|
||
```
|
||
|
||
This makes MARAS, ExoF, and EsoF eligible context for cadence advice. For
|
||
example, VIBRISS may learn that high MARAS novelty plus hostile ExoF context
|
||
requires faster recalibration review, while ordinary stable regimes can use a
|
||
slower cadence to avoid overreacting.
|
||
|
||
Cadence testing is permitted, but first in shadow:
|
||
|
||
- log what cadence would have been chosen;
|
||
- replay whether that cadence would have detected degradation sooner;
|
||
- charge compute/backlog cost;
|
||
- charge false-promotion cost;
|
||
- compare against fixed-cadence baseline.
|
||
|
||
Only after the meta-cadence policy beats fixed cadence in walk-forward replay
|
||
and shadow operation may it control any real scheduler interval.
|
||
|
||
### 11.14 Catastrophic Floor Derivation Study
|
||
|
||
The floor percentage is now a dedicated shadow-only VIBRISS research target.
|
||
|
||
```yaml
|
||
param_set:
|
||
id: advsl.catastrophic_floor_derivation.v1
|
||
name: ADVSL Catastrophic Floor Derivation
|
||
status: shadow_first
|
||
success:
|
||
primary_metric: recursive_capital_curve_delta_after_opportunity_cost
|
||
artifact_kinds: [code, test, spec]
|
||
artifact_refs:
|
||
- prod/vibriss/floor_derivation.py
|
||
- prod/vibriss/test_floor_derivation.py
|
||
- prod/docs/ADVSL_CATASTROPHIC_FLOOR_DERIVATION_STUDY.md
|
||
- prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md
|
||
```
|
||
|
||
Current full-tape replay on the blue trade tape:
|
||
|
||
- replayable trades: `802`
|
||
- actual end capital: `$51,937.21`
|
||
- floor-only best aggregate candidate: `1.50%`
|
||
- floor-only per-regime averages: still centered at `0.50%`
|
||
|
||
Interpretation:
|
||
|
||
- this study does **not** validate `1.20%` as a universal standalone floor;
|
||
- it validates the need for a derivation path and the ability to bind the
|
||
floor to code/test/spec evidence;
|
||
- `1.20%` remains a coupled-policy prior for the broader ADVSL/TP/hold stack,
|
||
not a floor-only truth.
|
||
|
||
The floor-only study must remain shadow-only. Live use may only follow a
|
||
coupled policy that demonstrates positive recursive capital curve delta on
|
||
held-out contiguous regions.
|
||
|
||
### 11.15 Acceptance Tests
|
||
|
||
Minimum tests before implementation can be called complete:
|
||
|
||
- Given a fixed replay window, the same hold recommendation and reward are
|
||
reproduced bit-for-bit or within declared float tolerance.
|
||
- Candidate arms outside the hard range are rejected.
|
||
- Stale OBF creates a masked feature, not a fake zero-depth observation.
|
||
- Low MARAS confidence or high conflict shrinks advice toward the global prior.
|
||
- EFSM-flipped LONG contexts do not use unqualified SHORT-only priors.
|
||
- Capital-aware replay compounds saved/lost capital forward.
|
||
- Opportunity cost is charged when a cut trade later recovers.
|
||
- The shadow advice payload contains candidate set, chosen arm, confidence,
|
||
baseline, guardrail result, and reproducibility keys.
|
||
- Promotion decisions are rejected when the ParamSet omits `promotion_policy`.
|
||
- Meta-cadence advice is logged as a ParamSet decision, not a runner-local
|
||
heuristic.
|
||
|
||
## 12. VIBRISS Ops / Runner System
|
||
|
||
### 12.1 Operational Objective
|
||
|
||
VIBRISS must run as an observable production subsystem, not as an ad hoc
|
||
notebook or one-off replay script.
|
||
|
||
The runner is responsible for:
|
||
|
||
- loading parameter specs and ParamSet specs,
|
||
- ingesting live context from Hazelcast and historical context from ClickHouse,
|
||
- publishing shadow/advisory parameter postures,
|
||
- scheduling replay/calibration subtasks,
|
||
- writing full audit logs,
|
||
- exposing health sensors to MHS,
|
||
- feeding TUI/observability surfaces,
|
||
- checkpointing learner state so recommendations are reproducible after restart.
|
||
|
||
The runner must reuse the existing infrastructure pattern:
|
||
|
||
- supervisord is the process authority;
|
||
- Hazelcast is the live bus;
|
||
- ClickHouse is the audit/event store;
|
||
- NATS is the optional event transport for replay, reward, and policy-state
|
||
fanout when decoupled workers or durable queues are useful;
|
||
- MHS reads composite health from HZ and reports it in `DOLPHIN_META_HEALTH`;
|
||
- TUI observes primarily through HZ listeners and polls CH only for heavier
|
||
historical panels;
|
||
- Prefect is optional for scheduled offline jobs, not required for the hot
|
||
VIBRISS daemon.
|
||
|
||
### 12.2 Process Topology
|
||
|
||
VIBRISS should be containerized, but still owned by supervisord.
|
||
In the current production layout, the host supervisord owns only the
|
||
container bootstrap wrapper; the container itself runs its own supervisord
|
||
instance, which owns the live runner process. That makes later full-system
|
||
containerization easier without changing the runner contract.
|
||
|
||
If sandboxing is enabled, gVisor is the outer runtime boundary for the
|
||
container or worker container. VIBRISS does not instantiate or manage gVisor
|
||
from inside the container; the host/container runtime selects that boundary at
|
||
launch time. The containerized runner must still reach host Hazelcast and
|
||
ClickHouse over the configured backplane. If NATS is enabled, it runs as a
|
||
sibling stack service on the host backplane and the container talks to it over
|
||
`nats://localhost:4222`.
|
||
|
||
Recommended process shape:
|
||
|
||
```text
|
||
supervisord
|
||
-> vibriss_runner container
|
||
-> live advice loop
|
||
-> spec loader
|
||
-> health publisher
|
||
-> lightweight replay scheduler
|
||
-> learner checkpoint writer
|
||
|
||
-> optional vibriss_worker container(s)
|
||
-> full-tape replay
|
||
-> walk-forward validation
|
||
-> OBF/MARAS proximity calibration
|
||
-> offline policy evaluation
|
||
```
|
||
|
||
The live runner is a long-lived daemon. Heavy replay/calibration jobs are
|
||
separate subtasks so the live advice loop cannot be blocked by ML work.
|
||
|
||
The experiment-side harness that replays trade episodes, sweep ranges, and
|
||
walk-forward windows is specified separately in
|
||
[`VIBRASS_EXPERIMENT_RUNNER_SPEC.md`](VIBRASS_EXPERIMENT_RUNNER_SPEC.md).
|
||
|
||
Container runtime:
|
||
|
||
- Docker or Podman is acceptable.
|
||
- Prefer Podman if rootless isolation becomes important.
|
||
- Optional sandbox runtime: gVisor may wrap the launched container or worker
|
||
container, but it is selected outside VIBRISS by the host/container runtime.
|
||
VIBRISS must not attempt to manage the sandbox boundary from inside the
|
||
container.
|
||
- Do not put Hazelcast in the VIBRISS container.
|
||
- Do not restart Hazelcast as part of VIBRISS recovery.
|
||
- Mount large replay outputs to `/mnt/dolphin_training/vibriss/`, not the SMB
|
||
repo path.
|
||
- Write only small docs/specs to `/mnt/dolphinng5_predict/prod/docs/`.
|
||
|
||
### 12.3 Supervisor Contract
|
||
|
||
Recommended supervisord entries:
|
||
|
||
```ini
|
||
[program:vibriss_runner]
|
||
command=/usr/bin/podman run --rm --name dolphin-vibriss-runner
|
||
--network host
|
||
-v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro
|
||
-v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw
|
||
-v /mnt/ng6_data:/mnt/ng6_data:ro
|
||
-e HZ_HOST=localhost:5701
|
||
-e CH_URL=http://localhost:8123/
|
||
-e CH_DB=dolphin
|
||
dolphin-vibriss:latest
|
||
python -m vibriss.runner --mode shadow
|
||
directory=/mnt/dolphinng5_predict/prod
|
||
autostart=true
|
||
autorestart=true
|
||
startsecs=10
|
||
startretries=5
|
||
stopwaitsecs=20
|
||
stopasgroup=true
|
||
killasgroup=true
|
||
stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner.log
|
||
stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner-error.log
|
||
|
||
[program:vibriss_worker]
|
||
command=/usr/bin/podman run --rm --name dolphin-vibriss-worker
|
||
--network host
|
||
-v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro
|
||
-v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw
|
||
-v /mnt/ng6_data:/mnt/ng6_data:ro
|
||
dolphin-vibriss:latest
|
||
python -m vibriss.worker --idle
|
||
directory=/mnt/dolphinng5_predict/prod
|
||
autostart=false
|
||
autorestart=false
|
||
startsecs=0
|
||
stopwaitsecs=30
|
||
stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker.log
|
||
stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker-error.log
|
||
```
|
||
|
||
Group placement:
|
||
|
||
```ini
|
||
[group:dolphin_data]
|
||
programs=exf_fetcher,acb_processor,obf_universe,meta_health,system_stats,
|
||
esof_advisor,maras_service,vibriss_runner
|
||
```
|
||
|
||
Rationale:
|
||
|
||
- VIBRISS is data/control-plane infrastructure, not the trader itself.
|
||
- The runner can be autostarted because it begins shadow-only.
|
||
- Workers remain manual or scheduler-launched because full replay can be heavy.
|
||
- MHS must observe VIBRISS health, but must not fight the container runtime
|
||
through systemd.
|
||
|
||
### 12.4 Container Interface
|
||
|
||
Required environment variables:
|
||
|
||
| Env | Meaning |
|
||
|---|---|
|
||
| `HZ_HOST` | Hazelcast host/port, default `localhost:5701`. |
|
||
| `CH_URL` | ClickHouse HTTP URL. |
|
||
| `CH_DB` | Namespace DB: `dolphin`, `dolphin_prodgreen`, or PINK-specific DB. |
|
||
| `CH_USER` / `CH_PASS` | ClickHouse credentials. |
|
||
| `NATS_URL` | Optional NATS server URL, default `nats://localhost:4222`. |
|
||
| `VIBRISS_ENABLE_NATS_TRANSPORT` | Enable best-effort NATS publication. |
|
||
| `VIBRISS_NATS_SUBJECT_PREFIX` | Subject prefix, default `vibriss`. |
|
||
| `VIBRISS_MODE` | `shadow`, `advisory`, `canary`, or `disabled`. |
|
||
| `VIBRISS_NAMESPACE` | `blue`, `pink`, `prodgreen`, or `research`. |
|
||
| `VIBRISS_SPEC_DIR` | Param spec directory. |
|
||
| `VIBRISS_STATE_DIR` | Checkpoint/output directory. |
|
||
| `VIBRISS_ENABLE_LIVE_ACTUATION` | Must default to `0`. |
|
||
| `VIBRISS_CALIBRATION_INTERVAL_S` | Default replay/calibration scheduler interval. |
|
||
| `VIBRISS_PROMOTION_REVIEW_INTERVAL_S` | Default promotion-gate review interval. |
|
||
| `VIBRISS_META_CADENCE_MODE` | `fixed`, `shadow`, or `controlled`; defaults to `fixed`. |
|
||
| `VIBRISS_MHS_SENSOR_KEY` | Default `vibriss_sensors_blue`. |
|
||
| `VIBRISS_HEALTH_INTERVAL_S` | Default `5`. |
|
||
|
||
Filesystem contract:
|
||
|
||
| Path | Mode | Use |
|
||
|---|---|---|
|
||
| `/mnt/dolphinng5_predict` | read-only in container | Code/spec/doc access. |
|
||
| `/mnt/dolphin_training/vibriss` | read-write | Learner state, replay artifacts, reports. |
|
||
| `/mnt/ng6_data` | read-only | Tape, OBF, scan data. |
|
||
| `/tmp` inside container | read-write ephemeral | Small temporary files only. |
|
||
|
||
### 12.5 Internal Runner Loops
|
||
|
||
The runner should have separate loops with independent health status:
|
||
|
||
| Loop | Cadence | Responsibility |
|
||
|---|---:|---|
|
||
| `spec_loader` | startup + 60s | Load/validate ParamSpec and ParamSetSpec files. |
|
||
| `context_ingestor` | 0.5s to 5s | Read HZ live context and keep a point-in-time snapshot. |
|
||
| `advice_loop` | on context/trade event | Score candidates and publish shadow/advisory advice. |
|
||
| `reward_collector` | 10s to 60s | Join closed trades to advice and write delayed rewards. |
|
||
| `checkpoint_loop` | 60s | Persist learner state and model metadata. |
|
||
| `calibration_scheduler` | 5m+ | Queue replay/validation subtasks when new data warrants it. |
|
||
| `promotion_evaluator` | 15m+ | Evaluate whether a ParamSet may move to a stronger mode. |
|
||
| `meta_cadence_evaluator` | 15m+ | Shadow-test cadence settings for calibration/promotion/update loops. |
|
||
| `health_publisher` | 5s | Publish MHS-compatible sensor payload. |
|
||
|
||
The advice loop must never wait on full replay, model training, or ClickHouse
|
||
backfill. If ClickHouse is slow, advice may continue from latest checkpoint and
|
||
mark reward collection degraded.
|
||
|
||
### 12.6 Hazelcast Surfaces
|
||
|
||
Recommended HZ maps/keys:
|
||
|
||
| Map | Key | Producer | Consumer | Purpose |
|
||
|---|---|---|---|---|
|
||
| `DOLPHIN_FEATURES` | `vibriss_param_advice` | runner | BLUE/PINK/TUI | Latest general parameter advice. |
|
||
| `DOLPHIN_FEATURES` | `vibriss_hold_substitute_advice` | runner | ADVSL/TUI | Latest ADVSL hold-substitute advice. |
|
||
| `DOLPHIN_FEATURES` | `vibriss_latest` | runner | TUI/MHS/manual ops | Compact subsystem summary. |
|
||
| `DOLPHIN_META_HEALTH` | `vibriss_sensors_blue` | runner | MHS | BLUE VIBRISS sensor payload. |
|
||
| `DOLPHIN_META_HEALTH` | `vibriss_sensors_pink` | runner | MHS | PINK VIBRISS sensor payload. |
|
||
| `DOLPHIN_HEARTBEAT` | `vibriss_runner_heartbeat` | runner | MHS/TUI | Liveness heartbeat. |
|
||
| `DOLPHIN_CONTROL_PLANE` | `vibriss_commands` | ops/TUI | runner | Freeze, unfreeze, replay, reload specs. |
|
||
|
||
Advice remains separate from commands. An advice key tells the engine what
|
||
VIBRISS recommends; a command key tells VIBRISS what operators want it to do.
|
||
|
||
### 12.7 ClickHouse Tables
|
||
|
||
VIBRISS needs durable audit tables. Recommended tables:
|
||
|
||
| Table | Purpose |
|
||
|---|---|
|
||
| `dolphin.vibriss_decisions` | One row per candidate-scoring decision. |
|
||
| `dolphin.vibriss_rewards` | Delayed realized/counterfactual reward rows. |
|
||
| `dolphin.vibriss_policy_state` | Checkpoint metadata and active posture versions. |
|
||
| `dolphin.vibriss_paramset_status` | Per-ParamSet health/performance summary. |
|
||
| `dolphin.vibriss_subtasks` | Replay/calibration/ML subtask lifecycle. |
|
||
|
||
Minimum `vibriss_decisions` fields:
|
||
|
||
```sql
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
mode LowCardinality(String),
|
||
param_set_id LowCardinality(String),
|
||
spec_version String,
|
||
decision_id String,
|
||
trade_id String,
|
||
asset LowCardinality(String),
|
||
side LowCardinality(String),
|
||
scan_number UInt64,
|
||
context_hash String,
|
||
maras_composite_hash UInt16,
|
||
maras_regime LowCardinality(String),
|
||
candidate_set_json String,
|
||
chosen_arm String,
|
||
baseline_value String,
|
||
recommended_value String,
|
||
confidence Float32,
|
||
propensity Float32,
|
||
guardrail_status LowCardinality(String),
|
||
fallback_reason String,
|
||
model_version String,
|
||
payload_json String
|
||
```
|
||
|
||
Minimum `vibriss_rewards` fields:
|
||
|
||
```sql
|
||
ts DateTime64(6, 'UTC'),
|
||
decision_id String,
|
||
trade_id String,
|
||
reward_status LowCardinality(String),
|
||
raw_actual_pnl Float64,
|
||
raw_counterfactual_pnl Float64,
|
||
saved_loss_delta Float64,
|
||
clipped_winner_delta Float64,
|
||
capital_curve_delta Float64,
|
||
drawdown_delta Float64,
|
||
recovery_lag_s Float32,
|
||
extra_bars_to_recovery Float32,
|
||
normalized_reward Float32,
|
||
reward_components_json String
|
||
```
|
||
|
||
Subtask rows must include `subtask_id`, `param_set_id`, `kind`, `status`,
|
||
`started_at`, `finished_at`, `input_window`, `artifact_path`, `n_trades`,
|
||
`primary_metric`, `failure_reason`, and `parent_decision_id` when applicable.
|
||
|
||
### 12.8 MHS Sensor Contract
|
||
|
||
VIBRISS should expose an MHS-compatible composite payload, modeled after the
|
||
existing optional DITA sensor pattern.
|
||
|
||
Recommended HZ key:
|
||
|
||
```text
|
||
DOLPHIN_META_HEALTH["vibriss_sensors_blue"]
|
||
```
|
||
|
||
Payload:
|
||
|
||
```json
|
||
{
|
||
"schema": "vibriss.mhs_sensors.v1",
|
||
"namespace": "blue",
|
||
"ts": "2026-06-03T00:00:00Z",
|
||
"rm_meta": 0.93,
|
||
"status": "GREEN",
|
||
"m14_vibriss_runner_liveness": 1.0,
|
||
"m15_vibriss_spec_integrity": 1.0,
|
||
"m16_vibriss_data_freshness": 0.9,
|
||
"m17_vibriss_advice_integrity": 1.0,
|
||
"m18_vibriss_reward_backlog": 0.85,
|
||
"m19_vibriss_paramset_health": 0.95,
|
||
"param_sets": {
|
||
"advsl.hold_substitute.v1": {
|
||
"score": 0.94,
|
||
"status": "GREEN",
|
||
"mode": "shadow",
|
||
"last_advice_age_s": 2.4,
|
||
"last_reward_age_s": 31.0,
|
||
"open_decisions": 1,
|
||
"reward_backlog": 3,
|
||
"shadow_samples": 240,
|
||
"walk_forward_status": "pending",
|
||
"latest_recommended_hold": 12
|
||
}
|
||
},
|
||
"subtasks": {
|
||
"full_tape_replay": {"score": 1.0, "status": "IDLE"},
|
||
"walk_forward": {"score": 0.8, "status": "STALE"},
|
||
"obf_binding": {"score": 1.0, "status": "IDLE"}
|
||
}
|
||
}
|
||
```
|
||
|
||
Sensor scoring:
|
||
|
||
| Sensor | Score rule |
|
||
|---|---|
|
||
| `m14_vibriss_runner_liveness` | 1 if heartbeat age < 15s, 0.5 if < 60s, else 0. |
|
||
| `m15_vibriss_spec_integrity` | Fraction of loaded specs passing validation. |
|
||
| `m16_vibriss_data_freshness` | Freshness of HZ context, CH close rows, OBF/MARAS context. |
|
||
| `m17_vibriss_advice_integrity` | 1 when latest advice is schema-valid and guardrailed. |
|
||
| `m18_vibriss_reward_backlog` | Penalizes unjoined decisions awaiting reward too long. |
|
||
| `m19_vibriss_paramset_health` | Mean score of all enabled ParamSets. |
|
||
|
||
MHS integration rule:
|
||
|
||
- VIBRISS starts with weight `0.0` in RM_META until stable.
|
||
- Then enable a small optional weight, analogous to DITA sensors.
|
||
- Suggested initial weight: `0.02`.
|
||
- Maximum allowed weight: `0.10` until the subsystem is live-actuating.
|
||
- If VIBRISS is disabled, MHS score must be neutral and must not degrade BLUE.
|
||
|
||
Suggested MHS env shape:
|
||
|
||
```text
|
||
DOLPHIN_MHS_USE_VIBRISS_SENSORS=1
|
||
DOLPHIN_MHS_VIBRISS_SENSOR_WEIGHT=0.02
|
||
DOLPHIN_VIBRISS_SENSOR_KEY=vibriss_sensors_blue
|
||
DOLPHIN_MHS_VIBRISS_SENSOR_MAPS=DOLPHIN_META_HEALTH,DOLPHIN_FEATURES
|
||
```
|
||
|
||
### 12.9 Observability / TUI Integration
|
||
|
||
TUI integration should follow the existing v9 pattern:
|
||
|
||
- use HZ listeners for latest VIBRISS state;
|
||
- add CH polling only for historical/replay-heavy summaries;
|
||
- never poll origin subsystems directly from the TUI.
|
||
|
||
Recommended panels:
|
||
|
||
| Panel | Source | Cadence | Content |
|
||
|---|---|---:|---|
|
||
| `VIBRISS` main panel | `DOLPHIN_FEATURES/vibriss_latest` | HZ listener | mode, status, latest ParamSet advice, confidence, MHS score. |
|
||
| `VIBRISS Hold` footer | `vibriss_hold_substitute_advice` + CH rewards | HZ + 60s CH | recommended hold, baseline, prior, reward backlog, recent net delta. |
|
||
| `VIBRISS Tasks` footer | `vibriss_subtasks` | 60s CH | replay/walk-forward/OBF binding status. |
|
||
| `MHS` existing panel | `DOLPHIN_META_HEALTH/latest` | HZ listener | include VIBRISS sensor details if enabled. |
|
||
|
||
Display fields for `advsl.hold_substitute.v1`:
|
||
|
||
```text
|
||
VIBRISS HOLD mode=shadow rec=12b base=20b live_ref=6b
|
||
conf=74% guard=PASS hash=57957 obf=weak pressure=high
|
||
reward_backlog=3 wf=pending samples=240
|
||
```
|
||
|
||
The TUI must clearly distinguish:
|
||
|
||
- baseline reference,
|
||
- current live reference,
|
||
- VIBRISS recommendation,
|
||
- whether recommendation is shadow-only or live-consumed.
|
||
|
||
Implementation note:
|
||
|
||
- `prod/vibriss/vibriss_tui.py` now provides the Textual dashboard, and
|
||
`python -m vibriss.vibriss_runner tui` launches it in read-only shadow mode.
|
||
- The UI is panel-registry based so additional metrics can be added without
|
||
rewriting the dashboard shell.
|
||
|
||
### 12.10 Control Commands
|
||
|
||
Commands should be written to `DOLPHIN_CONTROL_PLANE["vibriss_commands"]`.
|
||
|
||
Allowed commands:
|
||
|
||
| Command | Effect |
|
||
|---|---|
|
||
| `RELOAD_SPECS` | Reload ParamSpec/ParamSetSpec files and validate. |
|
||
| `FREEZE_PARAMSET` | Stop updating and publish fallback for one ParamSet. |
|
||
| `UNFREEZE_PARAMSET` | Resume shadow/advisory scoring. |
|
||
| `RUN_REPLAY` | Queue replay subtask for a parameter set/window. |
|
||
| `RUN_WALK_FORWARD` | Queue walk-forward validation. |
|
||
| `SET_MODE` | Move `disabled -> shadow -> advisory`; live/canary requires explicit code/config gate. |
|
||
| `CHECKPOINT_NOW` | Persist learner state immediately. |
|
||
|
||
Commands must be acknowledged to:
|
||
|
||
```text
|
||
DOLPHIN_CONTROL_PLANE["vibriss_command_ack"]
|
||
```
|
||
|
||
Ack payloads must include command id, acceptance/rejection, reason, and current
|
||
mode. Queue consumption alone is not success.
|
||
|
||
### 12.11 Prefect Role
|
||
|
||
Prefect is optional for VIBRISS. It should not be required for live advice.
|
||
|
||
Acceptable Prefect use:
|
||
|
||
- daily full-tape replay,
|
||
- scheduled walk-forward validation,
|
||
- artifact publication,
|
||
- long offline calibration runs.
|
||
|
||
Not acceptable:
|
||
|
||
- live advice loop,
|
||
- hot-path reward joining,
|
||
- health publication,
|
||
- operator freeze/unfreeze commands.
|
||
|
||
If Prefect is unavailable, the VIBRISS runner should continue shadow/advisory
|
||
operation from the last checkpoint and mark scheduled calibration stale.
|
||
|
||
### 12.12 Failure Modes and Fallback
|
||
|
||
| Failure | Required behavior |
|
||
|---|---|
|
||
| HZ unavailable | Runner logs degraded, cannot publish advice, MHS score <= 0.5. |
|
||
| CH unavailable | Advice may continue from checkpoint; reward collector degrades. |
|
||
| OBF stale | Mask OBF features; do not use OBF hold extension. |
|
||
| MARAS stale | Shrink to global/label-free prior. |
|
||
| Spec validation failure | Disable affected ParamSet, publish fallback. |
|
||
| Learner checkpoint corrupt | Revert to last good checkpoint or baseline prior. |
|
||
| Replay worker OOM/fails | Mark subtask failed; live runner continues. |
|
||
| Advice schema invalid | Do not publish; MHS advice integrity drops. |
|
||
| Drawdown alarm | Freeze to deterministic safe baseline. |
|
||
|
||
### 12.13 Promotion Gates
|
||
|
||
Before any engine consumes VIBRISS hold advice live:
|
||
|
||
1. Runner has been stable for at least 7 calendar days.
|
||
2. MHS VIBRISS sensors are GREEN or neutral for 95% of runner uptime.
|
||
3. `advsl.hold_substitute.v1` has completed full-tape replay.
|
||
4. Walk-forward is positive versus baseline on capital-curve delta after
|
||
opportunity cost.
|
||
5. OOD region performance has no catastrophic degradation.
|
||
6. TUI displays baseline/current/recommended state correctly.
|
||
7. Command ack path is verified.
|
||
8. Safe fallback is tested by intentionally freezing the ParamSet.
|
||
9. Engine consumption is limited to one ParamSet and one namespace.
|
||
10. `VIBRISS_ENABLE_LIVE_ACTUATION=1` is explicitly set and reviewed.
|
||
|
||
## 13. V1 Rollout Plan
|
||
|
||
1. Offline replay only:
|
||
- replay historical decisions from ClickHouse and tape.
|
||
- benchmark against baseline constants.
|
||
- compute OPE where logged propensities exist.
|
||
- report by asset, side, MARAS hash, regime label, V7 reason, OBF bucket,
|
||
and contiguous time region.
|
||
|
||
2. Shadow mode:
|
||
- publish advice to HZ.
|
||
- do not allow engine consumption.
|
||
- write `vibriss_decisions`, `vibriss_rewards`, and `vibriss_policy_state`.
|
||
|
||
3. Guarded advisory:
|
||
- engine reads advice and surfaces what it would have used.
|
||
- still no actuation.
|
||
|
||
4. Canary live:
|
||
- one parameter only.
|
||
- no simultaneous bundle changes.
|
||
- low exploration cap.
|
||
- hard fallback on stale data, drawdown alarm, or drift alarm.
|
||
|
||
5. Controlled live comparison:
|
||
- compare baseline-vs-advised on matched contexts.
|
||
- freeze policy if replay quality deteriorates.
|
||
|
||
## 14. Safety Rules
|
||
|
||
Mandatory:
|
||
|
||
- no direct mutation of `blue.yml` or frozen champion config from VIBRISS.
|
||
- no live promotion without replay, shadow, and documented approval.
|
||
- no advice consumption when data is stale.
|
||
- no advice consumption inside disallowed live-change windows.
|
||
- no multi-parameter bundle learning until single-parameter learners prove that
|
||
independent adaptation is insufficient.
|
||
- every live-consumed recommendation must be reconstructable from logs.
|
||
- every safety-critical parameter must preserve a catastrophic fallback floor.
|
||
|
||
## 15. Concrete Storage and Schema
|
||
|
||
VIBRISS must be event-sourced. Current policy state is a cache; decisions and
|
||
rewards are the durable truth.
|
||
|
||
### 15.1 ClickHouse DDL
|
||
|
||
Recommended DDL:
|
||
|
||
```sql
|
||
CREATE TABLE IF NOT EXISTS dolphin.vibriss_decisions
|
||
(
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
mode LowCardinality(String),
|
||
param_set_id LowCardinality(String),
|
||
spec_version String,
|
||
decision_id String,
|
||
parent_decision_id String,
|
||
trade_id String,
|
||
asset LowCardinality(String),
|
||
side LowCardinality(String),
|
||
scan_number UInt64,
|
||
bars_held UInt32,
|
||
context_hash String,
|
||
context_schema String,
|
||
maras_composite_hash UInt32,
|
||
maras_scalar_hash UInt32,
|
||
maras_regime LowCardinality(String),
|
||
maras_confidence Float32,
|
||
maras_conflict Float32,
|
||
obf_stale UInt8,
|
||
obf_depth_1pct_usd Float64,
|
||
obf_depth_quality Float32,
|
||
v7_pressure Float32,
|
||
v7_mae_risk Float32,
|
||
candidate_set_json String,
|
||
chosen_arm String,
|
||
baseline_value String,
|
||
recommended_value String,
|
||
confidence Float32,
|
||
propensity Float32,
|
||
guardrail_status LowCardinality(String),
|
||
fallback_reason String,
|
||
model_version String,
|
||
policy_version String,
|
||
compiled_config_hash String,
|
||
consumed UInt8,
|
||
consumed_ts Nullable(DateTime64(6, 'UTC')),
|
||
payload_json String
|
||
)
|
||
ENGINE = MergeTree
|
||
PARTITION BY toYYYYMM(ts)
|
||
ORDER BY (namespace, param_set_id, ts, decision_id)
|
||
TTL ts + INTERVAL 180 DAY;
|
||
|
||
CREATE TABLE IF NOT EXISTS dolphin.vibriss_rewards
|
||
(
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
param_set_id LowCardinality(String),
|
||
decision_id String,
|
||
trade_id String,
|
||
reward_status LowCardinality(String),
|
||
reward_delay_s Float32,
|
||
actual_exit_reason LowCardinality(String),
|
||
counterfactual_exit_reason LowCardinality(String),
|
||
actual_exit_pnl Float64,
|
||
counterfactual_exit_pnl Float64,
|
||
saved_loss_delta Float64,
|
||
clipped_winner_delta Float64,
|
||
capital_curve_delta Float64,
|
||
drawdown_delta Float64,
|
||
recovery_lag_s Float32,
|
||
extra_bars_to_recovery Float32,
|
||
normalized_reward Float32,
|
||
opportunity_cost_charged UInt8,
|
||
replay_artifact_path String,
|
||
reward_components_json String
|
||
)
|
||
ENGINE = MergeTree
|
||
PARTITION BY toYYYYMM(ts)
|
||
ORDER BY (namespace, param_set_id, ts, decision_id)
|
||
TTL ts + INTERVAL 365 DAY;
|
||
|
||
CREATE TABLE IF NOT EXISTS dolphin.vibriss_policy_state
|
||
(
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
param_set_id LowCardinality(String),
|
||
policy_version String,
|
||
mode LowCardinality(String),
|
||
learner LowCardinality(String),
|
||
checkpoint_path String,
|
||
checkpoint_hash String,
|
||
spec_hash String,
|
||
compiled_config_hash String,
|
||
n_decisions UInt64,
|
||
n_rewards UInt64,
|
||
shadow_samples UInt64,
|
||
walk_forward_status LowCardinality(String),
|
||
active_baseline_value String,
|
||
active_recommended_value String,
|
||
confidence Float32,
|
||
state_json String
|
||
)
|
||
ENGINE = ReplacingMergeTree(ts)
|
||
ORDER BY (namespace, param_set_id, policy_version);
|
||
|
||
CREATE TABLE IF NOT EXISTS dolphin.vibriss_subtasks
|
||
(
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
subtask_id String,
|
||
param_set_id LowCardinality(String),
|
||
kind LowCardinality(String),
|
||
status LowCardinality(String),
|
||
started_at DateTime64(6, 'UTC'),
|
||
finished_at Nullable(DateTime64(6, 'UTC')),
|
||
input_window String,
|
||
n_trades UInt64,
|
||
n_decisions UInt64,
|
||
primary_metric Float64,
|
||
baseline_metric Float64,
|
||
artifact_path String,
|
||
artifact_hash String,
|
||
failure_reason String,
|
||
payload_json String
|
||
)
|
||
ENGINE = MergeTree
|
||
PARTITION BY toYYYYMM(started_at)
|
||
ORDER BY (namespace, param_set_id, started_at, subtask_id)
|
||
TTL started_at + INTERVAL 365 DAY;
|
||
|
||
CREATE TABLE IF NOT EXISTS dolphin.vibriss_promotions
|
||
(
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
param_set_id LowCardinality(String),
|
||
promotion_id String,
|
||
from_mode LowCardinality(String),
|
||
to_mode LowCardinality(String),
|
||
requested_by LowCardinality(String),
|
||
approved_by LowCardinality(String),
|
||
policy_version String,
|
||
checkpoint_hash String,
|
||
evidence_window String,
|
||
n_decisions UInt64,
|
||
n_rewards UInt64,
|
||
n_shadow_samples UInt64,
|
||
n_live_samples UInt64,
|
||
recursive_capital_delta Float64,
|
||
opportunity_cost_delta Float64,
|
||
max_drawdown_delta Float64,
|
||
worst_region_delta Float64,
|
||
baseline_metric Float64,
|
||
candidate_metric Float64,
|
||
guardrail_status LowCardinality(String),
|
||
decision LowCardinality(String),
|
||
reason String,
|
||
artifact_path String,
|
||
payload_json String
|
||
)
|
||
ENGINE = MergeTree
|
||
PARTITION BY toYYYYMM(ts)
|
||
ORDER BY (namespace, param_set_id, ts, promotion_id)
|
||
TTL ts + INTERVAL 730 DAY;
|
||
|
||
CREATE TABLE IF NOT EXISTS dolphin.vibriss_meta_cadence_decisions
|
||
(
|
||
ts DateTime64(6, 'UTC'),
|
||
namespace LowCardinality(String),
|
||
param_set_id LowCardinality(String),
|
||
cadence_id LowCardinality(String),
|
||
decision_id String,
|
||
mode LowCardinality(String),
|
||
context_hash String,
|
||
maras_composite_hash UInt32,
|
||
maras_regime LowCardinality(String),
|
||
exof_state String,
|
||
esof_state String,
|
||
candidate_set_json String,
|
||
chosen_value String,
|
||
baseline_value String,
|
||
confidence Float32,
|
||
reward_status LowCardinality(String),
|
||
reward_value Float32,
|
||
guardrail_status LowCardinality(String),
|
||
fallback_reason String,
|
||
policy_version String,
|
||
payload_json String
|
||
)
|
||
ENGINE = MergeTree
|
||
PARTITION BY toYYYYMM(ts)
|
||
ORDER BY (namespace, param_set_id, cadence_id, ts, decision_id)
|
||
TTL ts + INTERVAL 365 DAY;
|
||
```
|
||
|
||
These tables are deliberately narrow enough for hot audit reads and broad enough
|
||
to replay the decision. Large path arrays, per-bar simulations, and model
|
||
artifacts must be written to artifact storage, not inlined into ClickHouse.
|
||
|
||
### 15.2 Artifact Layout
|
||
|
||
Use a non-SMB path for generated artifacts:
|
||
|
||
```text
|
||
/mnt/dolphin_training/vibriss/
|
||
specs/
|
||
advsl.hold_substitute.v1.yaml
|
||
checkpoints/
|
||
blue/advsl.hold_substitute.v1/<policy_version>/
|
||
state.json
|
||
learner.pkl
|
||
manifest.json
|
||
replays/
|
||
<YYYY-MM-DD>/<subtask_id>/
|
||
config.yaml
|
||
replay_summary.json
|
||
capital_curve.csv
|
||
per_trade_counterfactuals.parquet
|
||
opportunity_cost_audit.parquet
|
||
reports/
|
||
walk_forward/
|
||
obf_binding/
|
||
maras_hash_priors/
|
||
```
|
||
|
||
Every artifact directory must contain a `manifest.json`:
|
||
|
||
```json
|
||
{
|
||
"schema": "vibriss.artifact_manifest.v1",
|
||
"subtask_id": "wf-20260603-001",
|
||
"param_set_id": "advsl.hold_substitute.v1",
|
||
"namespace": "blue",
|
||
"created_at": "2026-06-03T00:00:00Z",
|
||
"git_sha": "unknown-or-sha",
|
||
"spec_hash": "sha256:...",
|
||
"input_tables": {
|
||
"trade_events": {"min_ts": "...", "max_ts": "...", "row_count": 1234},
|
||
"v7_decision_events": {"min_ts": "...", "max_ts": "...", "row_count": 9999}
|
||
},
|
||
"tape_sources": ["/mnt/ng6_data/arrow_scans/..."],
|
||
"random_seed": 0,
|
||
"artifact_hashes": {
|
||
"replay_summary.json": "sha256:...",
|
||
"per_trade_counterfactuals.parquet": "sha256:..."
|
||
}
|
||
}
|
||
```
|
||
|
||
## 16. Replay, OPE, and Causality Rules
|
||
|
||
VIBRISS must be explicit about what kind of evidence it has.
|
||
|
||
Evidence classes:
|
||
|
||
| Class | Meaning | Allowed use |
|
||
|---|---|---|
|
||
| `realized_live` | Parameter was actually used live. | Highest-quality reward. |
|
||
| `shadow_counterfactual` | Advice logged, baseline used, tape can replay alternative. | OPE/research only unless validated. |
|
||
| `historical_replay` | Offline replay over historical trades with no logged propensity. | Calibration prior, not proof. |
|
||
| `synthetic_mc` | Monte Carlo augmentation from validated distribution. | Stress coverage only. |
|
||
| `expert_baseline` | Human/research default such as 12 bars. | Fallback/prior. |
|
||
|
||
Counterfactual replay must store:
|
||
|
||
- actual entry, actual exit, and actual capital before/after;
|
||
- counterfactual exit scan/bar and price;
|
||
- whether the counterfactual exit depends on sub-bar, bar-close, or tape-close
|
||
cadence;
|
||
- whether the trade later recovered;
|
||
- how many bars/seconds were needed for recovery;
|
||
- opportunity cost charged;
|
||
- recursive capital state after applying the counterfactual.
|
||
|
||
OPE rules:
|
||
|
||
- Use inverse propensity or doubly robust estimators only when propensities were
|
||
actually logged.
|
||
- Do not pretend historical replay has logged propensities.
|
||
- For shadow decisions without randomized action, report them as model
|
||
counterfactuals, not causal estimates.
|
||
- Region splits must be contiguous first; randomized splits are secondary
|
||
robustness checks only.
|
||
- A policy that wins by one tail event and loses broadly must be flagged as
|
||
fragile even when net capital delta is positive.
|
||
|
||
Minimum replay report:
|
||
|
||
```text
|
||
baseline_end_capital
|
||
policy_end_capital
|
||
recursive_delta
|
||
gross_saved_loss
|
||
gross_opportunity_cost
|
||
net_trade_pnl_delta
|
||
max_drawdown_delta
|
||
tail_loss_count_delta
|
||
clipped_winner_count
|
||
recovered_cut_count
|
||
median_recovery_lag_s
|
||
worst_region_delta
|
||
best_region_delta
|
||
per_asset_concentration
|
||
per_hash_concentration
|
||
```
|
||
|
||
## 17. Mode State Machine
|
||
|
||
VIBRISS modes are explicit and monotonic unless an operator command or guardrail
|
||
forces demotion.
|
||
|
||
```text
|
||
disabled
|
||
-> shadow
|
||
-> advisory
|
||
-> canary_live
|
||
-> controlled_live
|
||
```
|
||
|
||
Mode meanings:
|
||
|
||
| Mode | Publishes advice | Engine may read | Engine may act | Learner updates |
|
||
|---|---:|---:|---:|---:|
|
||
| `disabled` | no | no | no | no |
|
||
| `shadow` | yes | no | no | yes |
|
||
| `advisory` | yes | yes, display only | no | yes |
|
||
| `canary_live` | yes | yes | yes, one ParamSet/namespace | yes |
|
||
| `controlled_live` | yes | yes | yes, bounded | yes |
|
||
|
||
Automatic demotions:
|
||
|
||
- stale required sensor -> `shadow` or fallback advice;
|
||
- invalid spec -> affected ParamSet disabled;
|
||
- reward backlog beyond threshold -> freeze learner updates;
|
||
- drawdown alarm -> deterministic safe baseline;
|
||
- ClickHouse unavailable -> keep publishing only if checkpoint is fresh; mark
|
||
reward collection degraded;
|
||
- Hazelcast unavailable -> no advice publication;
|
||
- policy drift alarm -> freeze to last known-good checkpoint.
|
||
|
||
Promotion technique, thresholds, cadence, and evidence gates must be declared
|
||
inside the affected ParamSet spec. The runner evaluates and records those gates;
|
||
it is not allowed to invent a promotion policy from global defaults.
|
||
|
||
Promotion must be manual and auditable for any transition that enables live
|
||
actuation. No health recovery path may silently promote VIBRISS into a stronger
|
||
actuation mode.
|
||
|
||
### 17.1 ParamSet-Owned Promotion Lifecycle
|
||
|
||
Every ParamSet must answer these questions before it can leave `shadow`:
|
||
|
||
| Question | Required ParamSet field |
|
||
|---|---|
|
||
| What baseline is being challenged? | `promotion_policy.baseline_policy` |
|
||
| What evidence class is allowed? | `promotion_policy.technique` and `evidence_gates` |
|
||
| How often is the evidence recomputed? | `promotion_policy.cadence.replay_calibration` |
|
||
| How often is promotion eligibility reviewed? | `promotion_policy.cadence.promotion_review` |
|
||
| When may the engine replace the old value? | `promotion_policy.cadence.live_replacement_rhythm` |
|
||
| What samples are required? | `promotion_policy.evidence_gates.*min*` |
|
||
| What demotes it? | `promotion_policy.automatic_demotion` |
|
||
| Who approves live use? | `promotion_policy.*manual_approval_required` |
|
||
|
||
Promotion is also subject to the control-plane elegance constraints in §4.1:
|
||
one writer per parameter, spec-owned promotion, slow-governed meta-cadence,
|
||
context inputs instead of arbitrary controllers, reproducible live changes, no
|
||
hidden cross-subsystem mutation, and shadow/replay/canary before live.
|
||
|
||
Default lifecycle:
|
||
|
||
```text
|
||
historical_replay
|
||
-> walk_forward_replay
|
||
-> shadow_advice_logging
|
||
-> advisory_display
|
||
-> canary_live_capture
|
||
-> controlled_live
|
||
```
|
||
|
||
The cadence of each phase is also ParamSet-owned:
|
||
|
||
- `advice cadence`: how often the ParamSet emits advice.
|
||
- `reward cadence`: how often delayed rewards are joined and scored.
|
||
- `calibration cadence`: how often the learner updates from replay/rewards.
|
||
- `promotion-review cadence`: how often mode eligibility is evaluated.
|
||
- `replacement rhythm`: the exact engine decision point where a live parameter
|
||
can replace the baseline.
|
||
|
||
For safety-critical exit parameters, replacement rhythm should usually be
|
||
`capture_on_entry` or `between_trades`, not arbitrary intratrade mutation.
|
||
|
||
### 17.2 Meta-Cadences as Governed Parameters
|
||
|
||
Meta-cadences are tunable parameters. If VIBRISS changes them, they must be
|
||
declared in the ParamSet under `meta_cadence_policy`.
|
||
|
||
Examples:
|
||
|
||
| Meta-cadence | Meaning |
|
||
|---|---|
|
||
| `replay_calibration_interval_s` | How often to re-run replay/calibration. |
|
||
| `promotion_review_interval_s` | How often to evaluate mode promotion/demotion. |
|
||
| `checkpoint_interval_s` | How often to persist learner state. |
|
||
| `min_new_rewards_before_recalibration` | Event-driven cadence threshold. |
|
||
| `shadow_to_canary_cooldown_trades` | Minimum stable evidence before live canary. |
|
||
|
||
MARAS, ExoF, EsoF, OBF, V7, MHS, and drawdown state may be context inputs for
|
||
meta-cadence advice, but the cadence learner is subject to the same evidence
|
||
rules as any other parameter learner. In particular:
|
||
|
||
- fixed cadence is the baseline;
|
||
- shadow cadence decisions must be logged with candidate set and confidence;
|
||
- replay must estimate missed-adaptation cost and false-promotion cost;
|
||
- compute/backlog cost is part of reward;
|
||
- live control of promotion cadence requires explicit manual approval.
|
||
|
||
## 18. Engine Consumption Contract
|
||
|
||
The engine must treat VIBRISS advice as optional, expiring input.
|
||
|
||
Consumption algorithm:
|
||
|
||
```text
|
||
read advice payload
|
||
validate schema and spec_version
|
||
check namespace matches runtime
|
||
check mode permits consumption
|
||
check expires_at > now
|
||
check trade_scope is current decision point
|
||
check recommendation within hard range
|
||
check guardrail_status == PASS or permitted advisory state
|
||
check fallback/catastrophic floor remains active
|
||
capture value into trade-local immutable parameter snapshot
|
||
emit consumption audit
|
||
```
|
||
|
||
For `advsl.hold_substitute.v1`, the first live contract should be:
|
||
|
||
- consume only on entry;
|
||
- store the selected hold bars in the pending/open trade state;
|
||
- do not mutate it intratrade;
|
||
- allow intratrade VIBRISS values only as shadow comparisons;
|
||
- let catastrophic floor and max-dollar floor override hold advice.
|
||
|
||
This avoids a subtle failure mode where a learner changes the hold target after
|
||
seeing adverse movement that was not available at entry. Intratrade contraction
|
||
can be researched later, but it is a different ParamSet.
|
||
|
||
## 19. Drift, Novelty, and Freezing
|
||
|
||
VIBRISS must separate three conditions:
|
||
|
||
1. data-quality degradation,
|
||
2. market/regime novelty,
|
||
3. policy underperformance.
|
||
|
||
Drift sensors:
|
||
|
||
| Sensor | Trigger |
|
||
|---|---|
|
||
| context distribution drift | MARAS/OBF/V7 feature distribution shifts versus training window. |
|
||
| reward drift | rolling reward lower than baseline beyond confidence bound. |
|
||
| regret drift | chosen arm underperforms baseline arm in shadow replay. |
|
||
| tail cluster | tail-loss or floor-hit count above historical percentile. |
|
||
| sparse regime | nearest-neighbor distance to known MARAS/OBF contexts too high. |
|
||
|
||
Actions:
|
||
|
||
- distribution drift alone: shrink toward baseline and raise uncertainty;
|
||
- reward drift: freeze learner updates and publish fallback;
|
||
- tail cluster: tighten safety floors only if pre-authorized by the ParamSet;
|
||
- sparse regime: use global safe prior, not nearest hash overfit;
|
||
- data-quality drift: stop consuming affected sensors.
|
||
|
||
VIBRISS should publish drift state in `vibriss_latest` and
|
||
`vibriss_paramset_status`.
|
||
|
||
## 20. Data Volume and Backpressure
|
||
|
||
The ClickHouse outage and spool backlog failure mode matters for VIBRISS.
|
||
|
||
Rules:
|
||
|
||
- VIBRISS must have its own spool and backlog metric.
|
||
- Advice publication must not block on ClickHouse.
|
||
- Reward collection may lag, but the lag must be visible in MHS.
|
||
- Large per-bar OBF or path arrays must not be written to hot audit tables.
|
||
- Calibration workers must rate-limit writes and should prefer compact Parquet
|
||
artifacts for heavy outputs.
|
||
- If ClickHouse spool backlog exceeds threshold, VIBRISS must degrade to
|
||
`shadow_no_update`: publish from checkpoint only, do not update learners from
|
||
partial reward data.
|
||
|
||
Recommended thresholds:
|
||
|
||
| Metric | GREEN | DEGRADED | CRITICAL |
|
||
|---|---:|---:|---:|
|
||
| decision spool backlog | `<1k` | `1k-50k` | `>50k` |
|
||
| reward backlog age | `<10m` | `10m-2h` | `>2h` |
|
||
| artifact disk free | `>20GB` | `5-20GB` | `<5GB` |
|
||
| CH write failure rate | `<1%` | `1-10%` | `>10%` |
|
||
|
||
VIBRISS must not repeat the OBF-style failure mode of letting millions of
|
||
low-priority rows delay high-priority trade/reward rows. Use priority queues:
|
||
|
||
1. decisions, rewards, policy state;
|
||
2. trade/path summary;
|
||
3. calibration summary;
|
||
4. heavy diagnostics.
|
||
|
||
## 21. Security and Operational Guardrails
|
||
|
||
Secrets:
|
||
|
||
- use existing ClickHouse user/password env pattern;
|
||
- do not write credentials into spec files;
|
||
- do not put secrets in artifact manifests.
|
||
|
||
Filesystem:
|
||
|
||
- code/spec mount is read-only inside the container;
|
||
- learner state and replay artifacts are written outside the SMB repo path;
|
||
- runner must check free disk before replay subtasks;
|
||
- no large file writes to `/mnt/dolphinng5_predict`.
|
||
|
||
Runtime:
|
||
|
||
- do not restart Hazelcast;
|
||
- do not use systemd for Dolphin services;
|
||
- use supervisord as the owner of the container process;
|
||
- if gVisor is used, treat it as a host-selected sandbox/runtime wrapper, not a
|
||
process owned by VIBRISS internals;
|
||
- worker OOM must not kill the live advice runner;
|
||
- health checks must distinguish runner alive from learner valid.
|
||
|
||
## 22. Implementation Defaults
|
||
|
||
These decisions are now recommended defaults, not open questions:
|
||
|
||
- First learner: discounted UCB for non-contextual hold-bar baseline plus LinUCB
|
||
shadow branch for MARAS/OBF/V7 context.
|
||
- First live dependency posture: internal finite-arm learners and compact
|
||
checkpointed state in the runner; no VW, OBP, ABIDES, Pyro/NumPyro, CATX, or
|
||
broad benchmark libraries in the live advice path.
|
||
- First worker dependency posture: VW, River, OBP, MABWiser, lifelines,
|
||
statsmodels, and benchmark libraries are allowed only in replay/OPE/calibration
|
||
jobs with bounded memory and artifact output.
|
||
- First drift implementation: simple internal rolling statistics plus optional
|
||
River-backed detectors if the dependency remains stable inside the runner.
|
||
- First HZ publication surface: `DOLPHIN_FEATURES["vibriss_param_advice"]` plus
|
||
dedicated keys for high-value ParamSets such as
|
||
`vibriss_hold_substitute_advice`.
|
||
- First consumption point for ADVSL hold substitute: capture-on-entry only.
|
||
- Counterfactual rewards: store as `shadow_counterfactual` with explicit
|
||
replay artifact path and no causal-propensity claim.
|
||
- Drift ownership: VIBRISS computes policy/reward drift and subscribes to MHS,
|
||
MARAS, OBF, and SurvivalStack for external drift/context.
|
||
- Container launch: use a small wrapper script under supervisord in production
|
||
so image existence, disk space, mount health, and env are checked before
|
||
`podman run` or `docker run`.
|
||
- MHS integration: prefer a generic external-sensor loader eventually, but V1
|
||
may implement a VIBRISS-specific optional sensor as long as it is neutral when
|
||
disabled.
|
||
- Infrastructure posture: keep Hazelcast + ClickHouse + supervisord for V1;
|
||
Kafka/Flink are deferred until measured event volume or recovery requirements
|
||
exceed the existing bus/audit pattern.
|
||
|
||
## 23. Open Implementation Questions
|
||
|
||
- Exact minimum sample thresholds per parameter family after the full 1.7k+
|
||
trade corpus is rebuilt under the same capital geometry.
|
||
- Whether hard `$400` floors should be a separate ParamSet or remain outside
|
||
VIBRISS as fixed safety policy.
|
||
- How to measure sub-bar TP/cadence opportunity cost in a way compatible with
|
||
bar-based ADVSL replay.
|
||
- Whether intratrade hold contraction deserves a second ParamSet after
|
||
entry-captured hold advice is validated.
|
||
- How much MC/synthetic data is statistically acceptable without overstating
|
||
confidence in rare-tail regimes.
|
||
- Whether PINK can share BLUE priors after venue slippage, fills, and exchange
|
||
state are included, or must maintain separate priors from day one.
|
||
|
||
## 24. Recommended First Build
|
||
|
||
Build VIBRISS V1 as a shadow-only package with:
|
||
|
||
- `ParamSpec` dataclasses and YAML loader.
|
||
- `ParamSetSpec` support for `advsl.hold_substitute.v1`.
|
||
- discrete UCB/Thompson learner.
|
||
- contextual LinUCB learner stub or implementation.
|
||
- advice publisher.
|
||
- ClickHouse audit writer.
|
||
- MHS-compatible sensor publisher.
|
||
- supervisord/container runner definition.
|
||
- offline replay harness for conditional fast TP and ADVSL hold bars.
|
||
- capital-aware replay and opportunity-cost accounting for the hold substitute.
|
||
- no live actuation.
|
||
|
||
Recommended package layout:
|
||
|
||
```text
|
||
/mnt/dolphinng5_predict/vibriss/
|
||
__init__.py
|
||
specs.py # ParamSpec / ParamSetSpec dataclasses and validation
|
||
context.py # HZ/CH context snapshots, masks, point-in-time joins
|
||
features.py # deterministic feature construction
|
||
learners/
|
||
__init__.py
|
||
ucb.py # discounted UCB over finite arms
|
||
thompson.py # categorical Thompson sampling
|
||
linucb.py # contextual finite-arm learner
|
||
priors.py # MARAS/label/asset/side shrinkage priors
|
||
guardrails.py # hard range, freshness, confidence, drawdown gates
|
||
advice.py # advice payload builder + schema validation
|
||
publisher.py # Hazelcast publication
|
||
audit.py # ClickHouse writer facade and spool priority
|
||
rewards.py # delayed reward joining and opportunity cost
|
||
replay/
|
||
tape.py # tape/path loading
|
||
capital_curve.py # recursive capital replay
|
||
counterfactuals.py # arm-level exit simulation
|
||
walk_forward.py # contiguous and moving-window validation
|
||
reports.py # JSON/CSV/Parquet artifact writers
|
||
runner.py # live shadow/advisory daemon
|
||
worker.py # offline subtasks
|
||
cli.py # ops commands and local replay entry points
|
||
tests/
|
||
```
|
||
|
||
V1 module responsibilities:
|
||
|
||
| Module | Must do | Must not do |
|
||
|---|---|---|
|
||
| `specs.py` | validate ranges, modes, required sensors, output surfaces | import live trader code |
|
||
| `context.py` | build point-in-time snapshots with freshness masks | fill missing market data with fake zeros |
|
||
| `features.py` | compute deterministic feature vectors | read future outcome labels |
|
||
| `learners/*` | expose `choose`, `update`, `checkpoint`, `restore` | know about ADVSL internals |
|
||
| `guardrails.py` | enforce hard safety and fallback | optimize reward |
|
||
| `advice.py` | produce schema-valid advice payloads | publish directly to HZ |
|
||
| `publisher.py` | write HZ advice and heartbeat | mutate engine state |
|
||
| `rewards.py` | join decisions to realized/counterfactual outcomes | update policy without reward status |
|
||
| `replay/*` | reproduce capital-aware backtests | depend on live HZ |
|
||
| `runner.py` | run shadow loops and MHS payloads | run full replay inline |
|
||
| `worker.py` | run heavy calibration/replay jobs | publish live advice |
|
||
|
||
Minimum local commands:
|
||
|
||
```bash
|
||
python -m vibriss.cli validate-specs \
|
||
--spec-dir /mnt/dolphin_training/vibriss/specs
|
||
|
||
python -m vibriss.cli replay \
|
||
--param-set advsl.hold_substitute.v1 \
|
||
--namespace blue \
|
||
--from 2026-05-01 --to 2026-06-04 \
|
||
--out /mnt/dolphin_training/vibriss/replays/manual
|
||
|
||
python -m vibriss.runner \
|
||
--mode shadow \
|
||
--namespace blue \
|
||
--spec-dir /mnt/dolphin_training/vibriss/specs \
|
||
--state-dir /mnt/dolphin_training/vibriss/checkpoints
|
||
```
|
||
|
||
Minimum test set:
|
||
|
||
| Test | Purpose |
|
||
|---|---|
|
||
| `test_spec_validation.py` | rejects invalid ranges, missing sensors, unsafe live policies. |
|
||
| `test_advice_schema.py` | validates HZ payloads and expiry/fallback fields. |
|
||
| `test_guardrails.py` | proves stale OBF/MARAS and drawdown alarms force fallback. |
|
||
| `test_replay_determinism.py` | same tape/spec/seed gives same capital curve. |
|
||
| `test_opportunity_cost.py` | recovered cut trades charge missed upside. |
|
||
| `test_priority_spool.py` | high-priority decision/reward rows flush before diagnostics. |
|
||
| `test_mode_state_machine.py` | promotion is manual; demotion is automatic. |
|
||
| `test_no_live_actuation_default.py` | default env cannot make engine consume advice. |
|
||
|
||
The first acceptance test is not "did it make more money in-sample." The first
|
||
acceptance test is:
|
||
|
||
1. the same historical decision can be replayed deterministically,
|
||
2. every recommended parameter has a valid spec and guardrail trail,
|
||
3. baseline fallback is used under stale/low-confidence context,
|
||
4. reward accounting includes clipped-winner opportunity cost,
|
||
5. the replayed capital curve is reproducible.
|
||
|
||
The first useful artifact is a replay bundle, not a daemon:
|
||
|
||
```text
|
||
replay_summary.json
|
||
capital_curve.csv
|
||
per_trade_counterfactuals.parquet
|
||
opportunity_cost_audit.parquet
|
||
maras_hash_hold_priors.parquet
|
||
obf_hold_binding_report.json
|
||
walk_forward_summary.json
|
||
```
|
||
|
||
Only after that bundle is reproducible should the shadow runner be started.
|