# VIBRISS Parameter Governance Spec **Name**: VIBRISS — Variational Input-driven Bandit-Reactive Intelligent Sensing System **Status**: Design doctrine / implementation target **Scope**: BLUE/PINK parameter governance, initially shadow/advisory only **Canonical dependency**: `SYSTEM_BIBLE_v7.md` **Operational stance**: shadow-first, replay-first, guardrail-first. VIBRISS must be useful even when it never gets permission to actuate live. ## 1. Purpose VIBRISS is the engine's active parameter-sensing and adaptive execution layer. Its job is to replace brittle hardcoded execution constants with bounded, auditable, continuously re-evaluated parameter recommendations. VIBRISS is not a new alpha model and not a full RL layer. It is an online statistical parameter-governance system: observe outcomes, test safe candidate values, score the realized response, retire weak settings, and keep enough controlled exploration alive to detect drift. The first intended target is exit-parameter governance, especially ADVSL and fast/cubic TP parameters such as hold-bar limits, floor thresholds, pressure thresholds, and TP posture. Later targets can include sizing haircuts, urgency, asset-selection posture, and venue-specific execution parameters. ## 2. Design Stance VIBRISS must be modular, spec-driven, replayable, and safety bounded. Key doctrine: - One learner per parameter spec by default. - Bundle/slate learning only after interaction effects are repeatedly material. - Contextual bandits first; full RL only later if decisions are truly sequential and materially coupled across multiple execution steps. - Discrete and bucketed parameters use Thompson Sampling, UCB, LinTS, or LinUCB. - Continuous bounded scalars are discretized into safe buckets first. - Nonstationary behavior uses discounted or sliding-window evidence plus drift detection. - Safety-critical parameters require baseline-safe exploration, confidence thresholds, step limits, cooldowns, and hard guardrails. - Passive fill and time-to-fill decisions should use survival-analysis modules where censoring matters. ## 3. System Boundary VIBRISS must not silently mutate engine internals. The correct production shape is: ```text context ingestion -> admissible candidate generation -> learner scoring -> guardrail filter -> action selection -> advice publication -> allowed engine consumption point -> delayed outcome capture -> reward mapping -> online update ``` The hot execution path consumes advice only at documented decision points. The learner/update path is separate and may lag. If advice is stale, low-confidence, or invalid, the engine falls back to the baseline parameter. BLUE is in-memory/paper and not BingX-enabled. PINK is the BingX venue-facing world. VIBRISS may govern both, but its output contract must be namespace-aware and must not assume that BLUE has exchange state. Non-goals: - VIBRISS does not pick assets. - VIBRISS does not replace MARAS, OBF, V7, ACB, EFSM, or SurvivalStack. - VIBRISS does not own exchange reconciliation. - VIBRISS does not rewrite frozen champion configs. - VIBRISS does not turn offline backtest winners into live settings without a shadow/OPE/promotion path. Its only authority is to publish bounded, versioned parameter advice and to learn from the outcome trail. ## 4. Terminology | Term | Meaning | |---|---| | `vibrissa` | One probe-trade, parameter test, or market feeler. | | `vibrissae` | The active parameter-probe array. | | `parameter spec` | Loadable contract defining one tunable parameter. | | `arm` | One candidate value or execution configuration. | | `reward` | Bounded realized execution-quality score. | | `posture` | Current preferred parameter set plus confidence and fallback metadata. | | `baseline` | The currently trusted hardcoded or documented production value. | ## 4.1 Control-Plane Elegance Constraints VIBRISS must remain a disciplined parameter-governance control plane, not an unbounded mesh of subsystems mutating each other. Adaptive behavior is allowed only when it preserves ownership, auditability, and bounded actuation. Hard architecture rules: 1. One writer per parameter. - A live parameter may have many sensors and many context inputs, but only one ParamSet is allowed to publish the effective value for that parameter in a given namespace. 2. ParamSpecs and ParamSetSpecs own promotion rules. - Promotion cadence, evidence gates, rollback rules, manual-approval requirements, and replacement rhythm are part of the spec. The runner must execute declared policy, not invent policy. 3. Meta-cadence is itself a parameter, but only at a slower cadence. - VIBRISS may tune replay cadence, promotion-review cadence, checkpoint cadence, or reward-join cadence, but those meta-parameters must move more slowly than the governed trading/execution parameter and must have stronger guardrails. 4. EsoF, ExoF, MARAS, OBF, V7, MHS, and drawdown state are context inputs, not arbitrary controllers. - They may influence candidate scoring, confidence, demotion, or fallback, but they must not directly mutate live parameters outside the owning ParamSet. 5. Every live change must be reproducible. - Log candidate set, chosen action, action probability or confidence, context hash, reward mapping, model version, compiled config hash, fallback reason, promotion state, and rollback path. 6. No hidden cross-subsystem mutation. - If one subsystem changes another subsystem's effective behavior, the change must appear as a typed ParamSet advice event and an audited engine-consumed posture update. 7. Shadow first, replay/OPE second, canary third, live last. - No safety-critical parameter may skip directly from idea or in-sample replay to live actuation. Live promotion requires held-out evidence, shadow logging, explicit approval when required, and automatic demotion conditions. These constraints are mandatory for all future ADVSL, TP, DVOL/VOL, IRP, asset-picker, EFSM/overlay, and meta-cadence ParamSets. If a design violates them, the design is considered tangled and must be simplified before implementation. ## 5. Parameter Spec Contract Each adaptive parameter must be declared by a loadable spec. VIBRISS should not hardcode knowledge of individual parameters. Important terminology: - `ParamSetSpec`: the loadable contract for a family of related parameters. - `paramset_config`: configuration that applies to the ParamSet as a whole. - `params`: the parameter declarations contained by the ParamSet. - `param_defaults`: defaults inherited by every parameter in `params`. - per-param override: a field inside one `params.` entry that overrides `param_defaults` for that parameter only. The live runner must not perform complex inheritance during scoring. Specs are authored in a rich hierarchical form, validated, compiled, and hash-stamped into a flat canonical policy document before the runner consumes them. Required fields: ```yaml identity: name: advsl.overlay_min_hold_bars type: integer units: bars default: 6 domain: candidates: [4, 6, 8, 10, 12, 16, 20] hard_min: 0 hard_max: 40 safety: fallback_baseline: 6 max_step_change: 4 cooldown_trades: 5 min_shadow_samples: 100 min_live_confidence: 0.80 max_exploration_rate: 0.05 placement: consumer: advanced_sl decision_point: open_trade_exit_evaluation namespace: blue live_change_policy: mode: between_trades allow_intratrade_change: false candidate_policy: learner: linucb nonstationarity: sliding_window window_trades: 300 success: primary_metric: capital_curve_delta_after_cost secondary_metrics: - clipped_winner_cost - saved_loss - drawdown_delta - recovery_lag inputs: - maras_latest - v7_decision_events - advanced_sl_monitor_latest - obf_universe_latest - eigen_scan - trade_path reward_mapping: bounded_range: [-1.0, 1.0] delayed_until: trade_close_or_counterfactual_terminal components: saved_loss: +1.0 missed_profit: -1.5 drawdown_reduction: +0.5 tail_loss: -2.0 promotion_policy: owner: param_set technique: replay_shadow_canary review_cadence_s: 900 min_replay_trades: 300 min_shadow_decisions: 200 min_realized_rewards: 50 min_contiguous_regions: 4 required_evidence: recursive_capital_curve_delta_after_cost: "> 0" worst_region_delta: ">= configured_floor" clipped_winner_cost: "<= configured_budget" drawdown_delta: "<= 0" allowed_transitions: - disabled_to_shadow - shadow_to_advisory - advisory_to_canary_live - canary_live_to_controlled_live manual_approval_required: - advisory_to_canary_live - canary_live_to_controlled_live automatic_demotion_on: - stale_required_sensor - reward_drift - drawdown_alarm - invalid_checkpoint meta_cadence_policy: owner: param_set status: shadow_first tunable_cadences: calibration_interval_s: [300, 900, 1800, 3600] promotion_review_interval_s: [900, 1800, 3600, 7200] checkpoint_interval_s: [30, 60, 120, 300] shadow_to_canary_cooldown_trades: [25, 50, 100, 200] context_inputs: - maras_latest - exof_latest - esof_latest - mhs_latest - reward_backlog - drawdown_state success: primary_metric: policy_stability_adjusted_reward secondary_metrics: - stale_advice_rate - promotion_false_positive_rate - missed_adaptation_cost - operator_churn - compute_cost live_change_policy: calibration_cadence: controlled_after_shadow promotion_cadence: advisory_only_until_explicit_approval outputs: hz_key: DOLPHIN_FEATURES.vibriss_param_advice clickhouse_table: dolphin.vibriss_decisions state_table: dolphin.vibriss_policy_state ``` ### 5.1 ParamSet Config and Per-Parameter Overrides The canonical authoring shape is: ```yaml param_set: id: advsl.hold_substitute.v1 version: 1.0.0 namespace_default: blue status: shadow_first paramset_config: consumer: advanced_sl decision_family: exit_risk_timing placement: decision_point: trade_entry live_replacement_rhythm: capture_on_entry promotion_policy: technique: replay_shadow_canary review_cadence_s: 1800 meta_cadence_policy: status: shadow_first outputs: hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice decision_table: dolphin.vibriss_decisions reward_table: dolphin.vibriss_rewards param_defaults: learner: type: discounted_ucb nonstationarity: sliding_window window_trades: 300 safety: fallback_baseline: 12 min_shadow_samples: 200 min_live_confidence: 0.80 max_exploration_rate: 0.0 reward_mapping: bounded_range: [-1.0, 1.0] primary_metric: recursive_capital_curve_delta_after_cost guardrails: stale_sensor_policy: shrink_to_baseline drawdown_alarm_policy: freeze_to_baseline params: advsl.min_hold_bars_before_floor_arm: type: integer units: bars domain: candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40] hard_min: 0 hard_max: 48 default: 12 baseline_reference: 20 advsl.recovery_extension_max_bars: type: integer units: bars domain: candidates: [0, 4, 8, 12, 20, 34] hard_min: 0 hard_max: 40 default: 0 learner: type: shadow_only_discounted_ucb safety: min_shadow_samples: 500 min_live_confidence: 0.90 ``` Merge precedence: ```text compiled_param = built_in_schema_defaults < paramset_config < param_defaults < params. < namespace/runtime override if explicitly allowed by spec ``` Rules: - ParamSet-wide promotion and meta-cadence policy live in `paramset_config` unless a parameter explicitly overrides a narrower field. - Per-param overrides may tighten safety, narrow domains, increase sample requirements, or change learner type only if the ParamSet allows it. - Per-param overrides may not weaken global catastrophic guardrails. - The compiler must emit both the original source spec hash and the compiled canonical hash. - The runner consumes only the compiled canonical form. ### 5.2 Spec Compiler and Validation Library Use an existing platform-agnostic schema/config tool for the authoring layer. Do not invent a bespoke inheritance language. Recommended stance: | Need | Recommended tool | Runtime placement | |---|---|---| | Cross-language schema contract | JSON Schema | CI, compiler, runner validation. | | Rich defaults, constraints, unification, inheritance-like config | CUE | Spec compiler / CI, not hot path. | | Human-friendly authoring | YAML | Source only; compiled immediately. | | Runner consumption | canonical JSON | Hot path. | | Fast internal representation | dataclass / Pydantic / msgspec-style object | Runner load time only. | VIBRISS should prefer: ```text YAML authoring -> CUE/JSON-Schema validation -> canonical JSON -> runner cache ``` The live runner should never parse CUE, run template expansion, or resolve a large inheritance tree during an advice decision. It should load a precompiled canonical JSON document, verify hashes and schema version, then use direct field access. Performance requirements: - spec compile can be slower because it is CI/worker time; - runner spec load should be bounded and rare; - advice scoring must use already-merged values; - every compiled ParamSet must include a deterministic `compiled_config_hash`; - all advice/audit rows must log `spec_hash` and `compiled_config_hash`. ## 6. Candidate Algorithms V1 should support a small set of algorithms well, rather than a broad library surface poorly. Recommended V1 learners: | Parameter type | Default learner | Notes | |---|---|---| | Small categorical | Thompson Sampling | Useful for urgency, route, retry, fixed mode selection. | | Ordered discrete scalar | UCB or discounted UCB | Good for hold bars, TP buckets, pressure thresholds. | | Contextual finite arms | LinUCB or LinTS | First choice for MARAS/OBF/V7-conditioned advice. | | Continuous scalar | Adaptive discretization | Start bucketed; upgrade only if buckets are too coarse. | | Passive fill/delay | Survival model | Explicitly handle censored fill and recovery windows. | Useful libraries to inspect: - Vowpal Wabbit for contextual bandits, logged propensities, and OPE. - River for streaming statistics, online GLMs, and drift detection. - Open Bandit Pipeline for offline policy evaluation. - MABWiser for fast Python prototype comparison. - lifelines or statsmodels for survival analysis. - NumPyro/Pyro only when hierarchical Bayesian pooling is justified. ### 6.1 Dependency Placement and Reliability Policy VIBRISS must distinguish algorithm research from live parameter governance. Performance and reliability are more important than using the most general library in the first live version. Dependency rule: - The live runner should have a small deterministic dependency surface. - Heavy learning, OPE, simulation, Bayesian inference, and broad model comparison belong in `vibriss_worker` or offline jobs. - The engine consumes compact checkpointed policy state and advice payloads. It must not shell out to a learner or wait on an offline library. - ClickHouse writes, model updates, and replay jobs must never block the hot advice publication loop. - If a dependency is not needed to score the current checkpointed policy, it is not a live-runner dependency. Recommended V1 split: | Layer | Allowed dependency posture | Reason | |---|---|---| | Engine hot path | no VIBRISS learner dependency | Engine reads validated advice only. | | `vibriss_runner` | stdlib + NumPy/Pandas only if needed; optional River subset for drift/stats | Keep startup, memory, and failure modes bounded. | | `vibriss_worker` | VW, River, OBP, MABWiser, lifelines, statsmodels, contextual libraries | Calibration, OPE, replay, walk-forward, and report generation. | | Research/simulation | ABIDES, Pyro/NumPyro, CATX, experimental packages | Valuable, but not part of the live critical path. | ### 6.2 Library Decision Matrix | Library / stack | VIBRISS use | Placement | Decision | |---|---|---|---| | Internal UCB/TS/LinUCB | First production learners for bounded discrete arms. | runner + worker | Use first; easiest to audit and checkpoint. | | Vowpal Wabbit | Contextual bandit benchmark, action-dependent features, OPE workflows, possible future compact policy generator. | worker/offline | Approved for evaluation; not a V1 hot-path dependency. | | River | Streaming stats, reward normalization, ADWIN/Page-Hinkley/KSWIN-style drift detection, progressive validation. | runner optional; worker default | Approved, but keep live usage narrow. | | Open Bandit Pipeline | OPE estimator benchmarking and logged-bandit evaluation. | offline/worker | Approved for reports; not live. | | MABWiser | Fast Python comparison of TS/UCB/LinTS/LinUCB policies. | offline/worker | Approved for prototyping; not live. | | lifelines / statsmodels | Survival models, recursive diagnostics, stability checks. | worker/offline | Approved for passive fill/recovery modeling. | | contextualbandits | Alternative contextual-bandit benchmark implementations. | offline/worker | Research benchmark only. | | SMPyBandits / BanditPylib / PyBandits | Algorithm comparison and stochastic-bandit sandboxing. | offline/research | Optional; do not add to live image. | | NumPyro / Pyro | Hierarchical Bayesian pooling for sparse per-symbol/per-hash modules. | research/worker | Defer until sparse-data pooling is clearly needed. | | CATX | Continuous-action contextual bandit research. | research | Defer; bucketed actions first. | | ABIDES / ABIDES-Gym | Market-interactive simulation and stress rehearsal. | research/simulation | Useful later; too heavy for V1 runner. | | Kafka / Flink | Durable event-stream backbone and stateful stream processing. | future infra | Defer; Dolphin already has Hazelcast + ClickHouse + supervisord. | | scikit-multiflow | Historical stream-learning reference. | none | Do not use for net-new code; prefer River. | | banditml | Architectural reference for production bandit services. | research only | Do not depend on it without a fresh maintenance review. | ### 6.3 Performance Budgets Initial budgets for the live runner: | Operation | Target | Hard behavior on miss | |---|---:|---| | Score one ParamSet advice snapshot | `p95 <= 10 ms` | publish fallback or previous checkpoint. | | Full live advice loop over enabled ParamSets | `p95 <= 50 ms` | skip noncritical ParamSets first. | | Hazelcast publish | nonblocking best effort | mark advice degraded if publish fails. | | ClickHouse audit write | never blocks advice | spool locally and expose backlog. | | Runner startup with warm checkpoint | `<5 s` target | publish no advice until checkpoint valid. | | Memory footprint | bounded and observable | disable worker-style models in runner. | Candidate sets must stay small. For `advsl.hold_substitute.v1`, a dozen finite hold-bar arms is acceptable; hundreds of arms are not. Continuous-action learners are disallowed in live V1 because they make bounded behavior harder to audit and harder to replay exactly. ### 6.4 Algorithm Defaults by Parameter Class Concrete defaults: | Parameter situation | Default | Upgrade path | Notes | |---|---|---|---| | Small finite categorical, weak context | Thompson Sampling or UCB1 | discounted UCB if drift appears | Use for mode, urgency, route, retry-like knobs. | | Ordered discrete scalar | discounted UCB with monotone/smoothness diagnostics | contextual finite-arm learner | Good first fit for hold bars and TP buckets. | | Finite arms with rich context | LinUCB or LinTS | GLM-UCB/GLM-TS if reward shape demands it | Use MARAS/OBF/V7/EFSM context. | | Continuous bounded scalar | adaptive discretization | continuous-action contextual bandit only after bucket failure | Prefer auditability over fine resolution. | | Coupled parameter bundle | small safe bundle catalog | slate/combinatorial learner only if interaction is proven | Avoid action-space explosion. | | Nonstationary regime | discounted/sliding-window learner + drift detector | replay-reset logic | Freeze or shrink on drift; do not blindly chase. | | Safety/budget constrained parameter | baseline-safe gating around the learner | conservative contextual bandit / budgeted bandit | Guardrails must dominate learner output. | | Passive fill or recovery delay | survival model | richer survival only after classical model stability | Treat censoring explicitly. | ### 6.5 Explicit Deferrals VIBRISS V1 should not attempt: - full RL; - continuous-action live control; - live probe trades by default; - Kafka/Flink migration; - ABIDES-in-the-loop production scoring; - hierarchical Bayesian pooling in the runner; - joint optimization of many parameters before single-ParamSet evidence exists. These are not rejected ideas. They are deferred because the current bottleneck is reliable evidence collection, replay/OPE discipline, and safe advice publication. ## 7. Reward Design Rewards must be decomposed, bounded, and auditable. Store both raw components and normalized reward. Typical reward components: - positive: saved loss, lower drawdown, better realized terminal PnL, better capital compounding trajectory, successful recovery without excess hold. - negative: clipped winner, missed TP, extra adverse selection, slippage, timeout, excessive hold, larger tail loss, oscillation, stale-data actuation. For ADVSL/TP research, the primary reward should be capital-curve delta after opportunity cost, not terminal trade PnL alone. A rule that saves losses but systematically clips larger winners must be penalized accordingly. ## 8. Required Audit Logging Every VIBRISS decision must be replayable. Minimum decision log fields: - timestamp and scan number - namespace: blue, pink, prodgreen, research - parameter spec id and version - context snapshot hash - MARAS regime, scalar hash, composite hash when available - candidate set - chosen arm - action probability or confidence - baseline value - guardrail decisions and fallback reason - model version - advice publication timestamp - engine consumption timestamp, if consumed - delayed reward components - terminal reward - policy update version ## 9. Control-Plane Output VIBRISS publishes advice, not imperative mutations. Recommended HZ shape: ```json { "schema": "vibriss.param_advice.v1", "namespace": "blue", "ts": "2026-06-03T00:00:00Z", "spec_id": "advsl.overlay_min_hold_bars", "spec_version": "1.0.0", "baseline_value": 6, "recommended_value": 12, "confidence": 0.82, "candidate_set": [4, 6, 8, 10, 12, 16, 20], "context_hash": "maras:57957|asset:XLMUSDT|side:LONG", "learner": "linucb", "guardrail_status": "PASS", "fallback_reason": null, "expires_at": "2026-06-03T00:05:00Z" } ``` Consumption rule: the engine may consume this only if the parameter spec says the current state is an allowed change point and all guardrails pass. Otherwise the baseline remains in force. ## 10. Initial VIBRISS Targets ### 10.1 Conditional Fast TP First replay-backed target: - `fast_tp.tp_pct` - `fast_tp.bars_held_min` - `fast_tp.exit_pressure_min` - `fast_tp.mfe_decay_min` - `fast_tp.pnl_mfe_frac_max` Current evidence says blanket first-touch `0.20%` TP clips too many winners, but conditional fast TP is net positive in both full corpus and capital-known BLUE subset. The first VIBRISS job is to turn those calibrated constants into a shadow policy with logged propensities and OOS replay. This TP percentage is a prime VIBRISS assistance target. Treat it as a first-class tunable rather than a frozen constant once replay coverage is sufficient. Open research note: - investigate whether the `0.20%` TP should be risk-normalized by notional risked, using a monotone nonlinearity such as a cubic retract/expansion curve; - the candidate question is whether high-notional or high-leverage trades should have a proportionally different TP posture, while keeping the first-touch semantics intact for replay accounting; - if tested, this must be evaluated with full capital-curve compounding and opportunity cost, not just raw win-rate or per-trade PnL. #### 10.1.1 Re-entry-Conditioned Fast TP Same-asset reentries after a profitable exit are a separate research bucket. They should not inherit the exact same fast-TP posture as a first-entry trade without evidence. In current BLUE history, same-asset reentries after wins are usually profitable, but the average second-leg move is smaller than the initial leg, which means a lower TP multiplier may preserve geometry better than a blunt `2.0x` repeat. Recommended candidate arms: - `fast_tp.reentry_tp_multiplier = 1.2` - `fast_tp.reentry_tp_multiplier = 1.5` - `fast_tp.reentry_tp_multiplier = 2.0` Interpretation: - first-entry trades keep the baseline conditional fast TP - re-entry-after-win trades may use a smaller multiplier band - re-entry-after-loss trades should remain a separate bucket and may need a slower TP or stronger confirmation, not just a smaller multiplier - a mild nonlinear / cubic trim on re-entry is a valid shadow-only follow-up candidate, but only after the flat multiplier band has been replayed first Ownering rule: - VIBRISS should learn and score the candidate multiplier in shadow replay - EFSM should own live application if the runtime ever consumes the bucket - do not flatten the geometric ROI curve by forcing a single multiplier on all reentries #### 10.1.2 TP Near-Miss Replay The TP research set must include a distinct near-miss population: - trades that came within a small epsilon of the candidate TP but did not satisfy the live trigger on the observed cadence - trades that briefly exceeded the candidate TP and then reversed before the engine observed the touch - trades that later stopped out after first-touch proximity, because those are the exact counterexamples needed to learn whether a lower TP bucket would have been better This bucket is mandatory because a corpus dominated by profitable TP closes is survivorship-biased. A learner trained only on winners can learn that the current TP is "usually profitable" while remaining blind to the trades where a slightly lower TP would have caught the move and prevented a later stop-loss. Required replay semantics: - use first-touch TP labels, not close-only labels - keep near-miss candidates separate from clean TP hits - score each candidate by recursive capital-curve delta after opportunity cost - preserve scan-cadence effects when the live engine is scan-driven Primary use: - learn whether a tighter TP bucket is justified for specific regimes, assets, or reentry conditions - quantify the opportunity cost of the missed touch itself, not just the later realized close - explain repeated "why did this one not TP?" incidents without overfitting to already-winning trades ### 10.2 ADVSL Hold/Floor Second target: - `advsl.base_catastrophic_floor_pct` - `advsl.overlay_catastrophic_floor_pct` - `advsl.overlay_max_loss_usd` - `advsl.overlay_min_hold_bars` - `advsl.overlay_pressure_min` - `advsl.overlay_mae_risk_min` This is safety-critical. VIBRISS may advise, but live application requires strong guardrails, bounded step changes, and explicit fallback to the current documented ADVSL values. Floor percentage is also a prime VIBRISS assistance target, but it must stay outside the learner’s ability to disable the catastrophic floor entirely. Hard safety ceiling: - the operator may define a non-negotiable max-loss ceiling per trade, per leg, or per session - this ceiling is distinct from the replay optimum and distinct from the learner’s preferred floor/TP/hold posture - if a candidate policy exceeds the ceiling, the ceiling wins even when the replayed recursive capital curve would otherwise look better - VIBRISS may tune inside the ceiling, but it must not optimize the ceiling away, relax it implicitly, or treat operator pain tolerance as a soft signal ### 10.3 MARAS-Conditioned Hold Bars Third target: - per-hash or per-regime hold-bar posture - per-label bias around known hash medians - OBF-conditioned hold extension or contraction Do not use MARAS labels as hard filters. Labels such as CHOPPY can contain both many wins and severe losses. Use the composite hash, raw signature dimensions, confidence, conflict, and nearest-neighbor regime evidence as context features. ### 10.4 DVOL/VOL Gate and Trade-Pause Posture Candidate carefulness-critical target: - `entry_gate.dvol_threshold` - `entry_gate.vol_open_persistence_bars` - `entry_gate.min_qualified_cross_rate` - `entry_gate.pick_latency_pause_s` - `entry_gate.open_gate_no_pick_pause_score` This target exists because a VOL/DVOL gate can be technically open while the engine still sees low-quality entry conditions: few accepted threshold crosses, weak asset-pick evidence, or no fresh accepted pick after a normally sufficient latency window. The first useful derived sensor is: ```text open_gate_no_pick_pause_score = VOL/DVOL gate open + low recent vel_div threshold-cross density + no accepted entry for expected_pick_latency_s + neutral/hostile EsoF/ExoF/MARAS context + no evidence of stale scans or halted runtime ``` This must not be treated as an urgent kill switch by default. It is a carefulness parameter: VIBRISS should first log it, correlate it with later trade quality, and test whether it predicts profitable trade pauses or smaller position sizing. The baseline is no pause beyond current gate logic. Related empirical TODOs: - Reconsider `min_irp_alignment=0.0` empirically. The live gold config disables the IRP alignment filter, but the larger current corpus may now be sufficient to retest whether a nonzero IRP alignment floor improves asset-pick quality. - Examine whether the apparent `VOL open / no immediate pick` condition is a useful trade-pause state or simply the expected effect of the stricter effective signal-strength gate (`vel_div < about -0.03`). - Initial live observation: recent quiet after the last known good picks appears protective rather than broken. This must be tested with opportunity cost: measure what the system avoided during quiet periods and what it missed by not entering. - Examine whether MARAS composite hashes need more granularity: more distinct market-descriptive buckets while preserving the sortable scalar hash and nearest-neighbor/similarity behavior. ### 10.5 Capital-Protect / Profit-Lock Fourth target: - `capital.protect_arm_threshold_pct` - `capital.protect_full_threshold_pct` - `capital.protect_tp_min_multiplier` - `capital.protect_cubic_coeff` - `capital.protect_reset_drawdown_pct` - `capital.protect_hysteresis_bars` - reset family selector: `capital.protect_reset_mode` - time-based reset controls: `capital.protect_reset_time_trades`, `capital.protect_reset_time_seconds` - regime/hash reset controls: `capital.protect_reset_regime_whitelist`, `capital.protect_reset_fingerprint_whitelist` - sc-EsoF reset controls: `capital.protect_reset_sc_floor`, `capital.protect_reset_sc_neutral_floor`, `capital.protect_reset_sc_positive_floor` This is the profit-protect / peak-lock family. The idea is not to mute risk management, but to preserve capital once the day/session has already become meaningfully profitable. The study must test whether a gain threshold such as `1.2%`, `2.3%`, `3.3%`, ... should arm a more conservative TP posture for subsequent trades, and whether a cubic trim on the TP multiplier is better than an abrupt step change. Required policy questions: - what profit threshold should arm the protect state - how quickly TP should tighten once the threshold is crossed - whether the tighten curve should be cubic, stepped, or mixed - when the protect state must reset - how much drawdown from the protected peak is required to disarm - how many bars/trades of hysteresis are needed before a reset is valid - whether reset should be keyed to time, regime, known fingerprint, sc-EsoF, or mixed logic - whether reset should use a whitelist gate or a change-detection gate for regime/fingerprint families The baseline reset rule should be conservative: - arm only after the gain threshold is crossed on the recursive capital curve - keep the lock until a real drawdown-from-peak or day/session reset occurs - do not reset on a single noisy bar if the protected peak is still intact This target must be evaluated against: - recursive capital-curve delta after opportunity cost - clipped-winner cost from over-tightening - saved-loss from avoiding giveback after the day is already up - win-return statistics after the arm event - ceiling-violation count, because the profit protect should never create an implicit max-loss escape hatch It is especially important to compare: - flat threshold steps vs cubic tightening - no hysteresis vs bar-count hysteresis - immediate reset vs drawdown-based reset - day-reset vs rolling-session reset The tape should be replayed on the same capital curve used by the live engine, so the protect state is evaluated recursively, not from a fixed post-hoc label. ### 10.6 OB Cascade TP-Modulation (added 2026-06-12, LINK 5e05eeeb post-mortem) Candidate carefulness-critical target — the parameters of the OB tail-avoidance layer in `alpha_exit_manager.evaluate()` that silently modulate the "fixed" TP: - `ob_cascade.count_threshold` — number of assets withdrawing liquidity (depth withdrawal velocity < CASCADE_THRESHOLD) required to enter cascade mode. **Currently hardcoded as `cascade_count > 0`, i.e. a SINGLE asset anywhere in the tracked set widens every open trade's TP by x1.40.** The LINK 5e05eeeb diagnosis (2026-06-11, -$1,248.71) showed this trigger is active on a large fraction of trades because entries occur during panics by construction. Domain candidates: {1, 2, 3, n_assets//4, n_assets//2}; fallback_baseline: 1 (current behavior). - `ob_cascade.tp_widen_factor` — currently hardcoded 1.40. Population evidence (post-2026-05-11 cohort): widening earned ~+$84.7K on continuation trades vs ~-$16.9K given back on reversals, so the factor is net-positive but fat-left-tailed. Domain: [1.0 .. 1.6]; 1.0 = modulation off. - `ob_cascade.withdrawal_velocity_threshold` — `CASCADE_THRESHOLD` in `ob_features.py`, currently -0.10 (10% depth pulled over lookback). Required sensors already exist since 2026-06-12: `dynamic_tp_pct`, `tp_mod_factor`, `cascade_count`, `ob_regime_signal`, `tp_floor_armed` are logged on every `dolphin.v7_decision_events` row, so reward attribution can be computed offline from the live tape with no new instrumentation. INTERPLAY (REQUIRED reading for the paramset author): these parameters interact with (a) the TP_FLOOR profit-floor ratchet (2026-06-12, `DOLPHIN_TP_FLOOR`) which caps the left tail of the widening — reward must be computed on the JOINT policy (widen + floor), not the widen alone; and (b) §10.1 Conditional Fast TP / the future ADAPTIVE TP THRESHOLD ("Dynamic TP"): the adaptive TP threshold itself is hereby marked FIT FOR VIBRISS GOVERNANCE — the effective TP should ultimately be one governed surface (base x leverage-curve x market-state x cascade modulation), with VIBRISS owning the modulation terms and the champion base (0.20%) remaining frozen outside governance. A VIOLET-era sub-second exit guard changes the actuation latency of both TP and floor; cadence is therefore a context feature, not a governed parameter, per the data-cadence operator rule. ## 11. First Concrete ParamSet: ADVSL Hold Substitute ### 11.1 Objective This is the first concrete VIBRISS use case. The parameter set replaces a static ADVSL no-arm / min-hold rule with a bounded, evidence-scored hold target. The original research problem was the legacy `20`-bar hold window: it protects winners from premature ADVSL exits, but it can also let fast adverse trades slip through before the floor arms. Replay work found that shorter centers, especially around `12` bars, can protect capital in tail events, while longer holds can be correct in snapback/recovery pockets. The VIBRISS answer is not "always use 12" and not "always use 20." It is: - choose a hold target from a bounded set, - condition the choice on current trade/path/regime sensors, - score it by recursive capital-curve impact after opportunity cost, - keep catastrophic loss floors outside the learner as non-negotiable safety. The sweep geometry itself is also a VIBRISS parameter. The ParamSet may carry a global sweep window plus per-regime/per-hash sweep windows in `sweep_policy`. When the derived best band touches the search window boundary, treat that as a signal that the search is still censored by the current bounds, not as proof that the optimum is "wide open." In that case, expand the admissible sweep window and re-evaluate before promoting the range. ### 11.2 ParamSet Identity ```yaml param_set: id: advsl.hold_substitute.v1 name: ADVSL Hold Substitute status: shadow_first namespace_default: blue consumer: advanced_sl decision_family: exit_risk_timing replaces: - legacy_advsl_min_hold_bars_20 related_live_controls: - advsl.base_catastrophic_floor_pct - advsl.overlay_catastrophic_floor_pct - advsl.overlay_max_loss_usd - advsl.overlay_pressure_min - advsl.overlay_mae_risk_min ``` This spec governs the hold/arming decision only. It may recommend when ADVSL is allowed to arm, but it must not remove the catastrophic floor. ### 11.3 ParamSet Config and Parameters Shared ParamSet config: ```yaml paramset_config: consumer: advanced_sl decision_family: exit_risk_timing placement: decision_point: trade_entry live_replacement_rhythm: capture_on_entry intratrade_change_policy: shadow_only outputs: hz_key: DOLPHIN_FEATURES.vibriss_hold_substitute_advice decision_table: dolphin.vibriss_decisions reward_table: dolphin.vibriss_rewards param_defaults: learner: type: discounted_ucb contextual_shadow_branch: linucb nonstationarity: sliding_window window_trades: 300 safety: fallback_baseline: 12 max_exploration_rate: 0.0 min_shadow_samples: 200 min_live_confidence: 0.80 reward_mapping: primary_metric: recursive_capital_curve_delta_after_opportunity_cost bounded_range: [-1.0, 1.0] guardrails: stale_obf_policy: ignore_obf_features low_maras_confidence_policy: shrink_to_global_prior drawdown_alarm_policy: freeze_to_safe_baseline ``` Primary learned parameter: ```yaml params: advsl.min_hold_bars_before_floor_arm: type: integer units: bars baseline_reference: 20 starting_center: 12 current_live_overlay_reference: 6 default: 12 domain: candidates: [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40] hard_min: 0 hard_max: 48 ``` Companion deterministic guardrails: ```yaml params: advsl.max_loss_usd_floor: type: float units: usd default_overlay: 500.0 research_candidate: 400.0 learner_controlled: false advsl.catastrophic_floor_pct: type: float units: pct default_base: 0.0120 default_overlay: 0.0050 learner_controlled: false advsl.recovery_extension_max_bars: type: integer units: bars default: 0 domain: candidates: [0, 4, 8, 12, 20, 34] hard_min: 0 hard_max: 40 learner_controlled: shadow_only_until_validated safety: min_shadow_samples: 500 min_live_confidence: 0.90 ``` Interpretation: - `baseline_reference=20` preserves the historical question. - `starting_center=12` is the current replay-derived center. - `current_live_overlay_reference=6` records the tightened overlay state and must be reported separately from the legacy 20-bar research baseline. - `34` and `40` remain candidates because contiguous-region medians observed during replay included materially longer optima. ### 11.4 Required Sensors The hold substitute must use point-in-time sensors only. End-of-trade labels may be used for reward calculation, not for action selection. Core context sensors: | Sensor | Source | Use | |---|---|---| | `asset` | live trade state | Asset-level prior and OBF join key. | | `side` | live trade state / EFSM | Separate SHORT base from EFSM-flipped LONG contexts. | | `bars_held` | live trade state | Determines current arming progress. | | `entry_price` / `current_price` | live trade state | Signed path and current PnL. | | `post_gross_path_pct` | trade path replay/live path state | Measures post-entry excursion shape. | | `mae_pct` | live path state | Adverse excursion severity. | | `mfe_pct` | live path state | Favorable excursion and recovery potential. | | `mfe_decay` | derived from MFE/current PnL | Detects giveback and weakening recovery. | | `current_pnl_mfe_frac` | derived from current PnL / MFE | Indicates whether recovery is intact or mostly lost. | | `v7_exit_pressure` | `v7_decision_events` / live V7 snapshot | Pressure/continuation signal for recovery unlikely cases. | | `v7_mae_risk` | V7 snapshot | Separates ordinary drawdown from risk-tier drawdown. | | `v7_action` | V7 snapshot | EXIT/RETRACT/EXTEND/HOLD context. | | `state_confidence` | market-state / MARAS / bundle confidence | Low confidence forces conservative fallback. | OBF sensors: | Sensor | Source | Use | |---|---|---| | `obf_depth_1pct_usd` | `obf_universe_latest` / OBF CH | Recovery-capacity and liquidity depth. | | `obf_depth_quality` | OBF derived quality | Distinguishes deep snapback pockets from weak-book grinds. | | `obf_spread_bps` | OBF | Penalizes bad microstructure. | | `obf_imbalance` | OBF | Directional liquidity pressure. | | `obf_imbalance_ma5` / `obf_imbalance_ma10` | OBF derived path | Smooths raw book pressure for in-trade TP/SL context. | | `obf_imbalance_slope` | OBF derived path | Detects whether pressure is strengthening or fading. | | `obf_imbalance_persistence` | OBF derived path | Measures sign stability rather than one-tick noise. | | `obf_imbalance_reaccel` | OBF derived path | Detects renewed pressure after a mid-trade weakening/plateau. | | `obf_staleness_s` | OBF timestamp | Guardrail; stale OBF cannot steer hold. | Regime sensors: | Sensor | Source | Use | |---|---|---| | `maras_regime` | `maras_latest` / `maras_fingerprint` | Label-level bias only, never hard filter. | | `maras_composite_hash` | MARAS Scope B | Exact historical hash prior when sample size is enough. | | `maras_scalar_hash` | MARAS Scope A | Coarse sortable regime prior. | | `maras_confidence` | MARAS | Low confidence reduces live trust. | | `maras_conflict_level` | MARAS | High conflict increases uncertainty/exploration penalty. | | `s_eigen_vd`, `s_eigen_w50`, `s_eigen_w750` | MARAS raw signature | Eigen-state context. | | `s_btc_dev_pct`, `raw_btc_ma99` | MARAS BTC tier | Trend/uptrend/downtrend pressure context. | | `s_acb_boost`, `s_acb_beta` | MARAS/ACB | Protective/risk-on context. | Outcome-only reward sensors: | Sensor | Source | Use | |---|---|---| | `actual_exit_pnl` | `trade_events` | Realized baseline outcome. | | `counterfactual_exit_pnl_by_hold` | tape replay | Arm-level reward. | | `recovery_lag_s` | tape replay | Time to recover after floor/cut. | | `extra_bars_to_recovery` | tape replay | Cost of too-short hold. | | `clipped_winner_delta` | tape replay | Opportunity cost of premature exit. | | `saved_loss_delta` | tape replay | Loss avoided by earlier floor arm. | | `capital_curve_delta` | recursive replay | Primary reward accounting. | ### 11.5 Feature Construction VIBRISS should compute a compact feature vector from the sensors: ```text path_speed = abs(post_gross_path_pct) / max(1, bars_held) mae_velocity = mae_pct / max(1, bars_since_entry) mfe_velocity = mfe_pct / max(1, bars_since_entry) recovery_ratio = current_pnl_mfe_frac giveback_ratio = 1.0 - current_pnl_mfe_frac liquidity_score = f(obf_depth_1pct_usd, obf_depth_quality, obf_spread_bps) signed_obf_imbalance = side_sign * obf_imbalance imbalance_confirmation = f(signed_obf_imbalance_ma5, persistence, slope) imbalance_reacceleration = f(prior_weakening, current_signed_slope, persistence) pressure_score = f(v7_exit_pressure, v7_mae_risk, v7_action) regime_key = maras_composite_hash if sample_count(hash) >= min_hash_n else maras_regime confidence_weight = min(state_confidence, maras_confidence) * (1.0 - maras_conflict_level) ``` Feature requirements: - All features must be point-in-time. - Missing OBF must not become zero-depth unless zero-depth is the actual observation. Missing OBF is its own mask feature. - MARAS labels are context, not filters. Use hash/sample priors and raw signature dimensions where possible. - Side must be explicit. EFSM-flipped LONG trades cannot share a blind SHORT prior. - OBF imbalance must be side-normalized. For a SHORT, negative raw imbalance is confirming; for a LONG, positive raw imbalance is confirming. - Raw imbalance is not enough. Use moving averages, persistence, slope, and re-acceleration after weakening so a single noisy tick cannot steer ADVSL. ### 11.5.1 OBF Imbalance Assistance Research Live ENJUSDT observation on `2026-06-04` motivates an explicit research feature family for ADVSL/TP assistance. The trade entered SHORT near `10:06:14 UTC` and closed `FIXED_TP` near `10:10:11 UTC` for `+$118.53`. Observed OBF path: - entry imbalance was near neutral (`~ -0.015` to `+0.001`); - within seconds it snapped SHORT-confirming (`~ -0.18` to `-0.21`); - mid-trade it weakened and oscillated around neutral in 30s buckets; - into TP it re-strengthened materially (`~ -0.30` to `-0.35`). Conclusion: - Imbalance did not monotonically increase from entry to exit. - It behaved as a confirmation/re-acceleration signal: neutral -> confirming pressure -> weakening/plateau -> renewed confirming pressure into TP. - Therefore VIBRISS should not use raw imbalance as a simple exit trigger. Candidate uses: | Use | Candidate rule | |---|---| | TP assist | If price is near TP and side-normalized imbalance re-accelerates in favor, avoid premature ADVSL/retract exits. | | SL/ADVSL assist | If adverse PnL appears and side-normalized imbalance persistently contradicts the trade, recovery probability should shrink. | | Hold assist | If imbalance is neutral/choppy but not contradictory, do not force an exit from imbalance alone. | | Floor timing | Combine `price_progress_to_tp * imbalance_confirmation` with MAE/MFE path shape to decide whether the floor should wait or arm. | Candidate feature names: ```text imbalance_signed_for_trade imbalance_ma5_signed imbalance_ma10_signed imbalance_slope_signed imbalance_persistence_signed imbalance_reacceleration_after_weakening price_progress_to_tp_x_imbalance_confirmation adverse_pnl_x_imbalance_contradiction ``` Research requirement: replay this across completed trades before live use. Score it by recursive capital delta after opportunity cost, not by whether it explains one ENJ winner. ### 11.5.2 Macro-Thesis Persistence vs Local Danger Research Live XLMUSDT observation on `2026-06-04` motivates a mandatory ADVSL/VIBRISS research direction. The trade suffered a large adverse excursion before closing at `FIXED_TP`. Local OBF imbalance and V7 pressure were frightening during the worst MAE; they did not cleanly foresee the recovery. The higher-level eigen/MARAS context, however, stayed coherent with the trade thesis: bearish or choppy-bearish posture, low conflict, active dislocation, and bearish BTC context. Actionable lesson to test to exhaustion: ```text ADVSL/V7 local danger should be overruled only when macro thesis persistence remains strong, MARAS conflict/novelty remains low, and OBF contradiction is not persistent/deep enough to invalidate the thesis. ``` This is not a live rule yet. It is a research requirement for the first VIBRISS-governed ADVSL/bar-hold policy. The learner must explicitly measure when local pain is a true invalidation signal versus when it is survivable excursion inside a still-valid macro/eigen thesis. The required research output is a weighting model, not a binary exception. The policy must estimate how much authority belongs to local danger signals versus macro-thesis persistence under the current context. Those weights are themselves VIBRISS-tunable parameters and must be represented in the ParamSet spec with safe defaults, bounded candidate ranges, promotion rules, and audit logging. Candidate feature names: ```text macro_thesis_persistence maras_conflict_low_during_mae maras_hash_knownness_during_mae eigen_dislocation_persistence_during_mae btc_context_alignment_during_mae local_obf_contradiction_persistence local_obf_contradiction_depth_weighted v7_pressure_without_macro_invalidation adverse_move_vs_macro_persistence late_recovery_obf_reacceleration ``` Candidate tunable parameters: ```text local_danger_weight macro_thesis_weight obf_contradiction_weight maras_conflict_weight eigen_persistence_weight btc_context_weight v7_pressure_weight macro_override_min_confidence local_invalidation_min_persistence_bars ``` The initial decision form should be simple and auditable: ```text local_danger_score = local_danger_weight * v7_pressure + obf_contradiction_weight * local_obf_contradiction_persistence + maras_conflict_weight * maras_conflict_or_novelty macro_thesis_score = macro_thesis_weight * macro_thesis_persistence + eigen_persistence_weight * eigen_dislocation_persistence_during_mae + btc_context_weight * btc_context_alignment_during_mae hold_or_cut_bias = macro_thesis_score - local_danger_score ``` VIBRISS may tune the weights, but guardrails must prevent pathological behavior: local danger cannot be ignored at extreme MAE, and macro thesis cannot override persistent high-depth OBF contradiction plus MARAS conflict/novelty. Required tests: - replay all completed trades with this feature family available point-in-time; - isolate high-MAE trades that later TP'd from high-MAE trades that continued into real loss; - charge every delayed cut for worst-case tail loss and every early cut for missed recovery/opportunity cost; - evaluate separately for base SHORTs and EFSM/overlay-flipped LONGs; - report per-MARAS-hash, per-label, and nearest-neighbor raw-signature results; - report learned/suggested weights and their stability by contiguous region, MARAS hash, side, and asset-liquidity bucket; - promote only if held-out contiguous regions improve recursive capital delta without hiding clipped winners or worse tail events. ### 11.5.3 Macro/OBF Evidence Hierarchy Research Live DASHUSDT observations on `2026-06-04` add a third case study to the XLM and ETC findings. DASH produced two fast SHORT `FIXED_TP` trades, including `efcc6dce`, which entered near `11:00:15 UTC` and closed near `11:00:38 UTC` after only `2` bars for `+$367.92`. The large DASH trade was not a scary hold-through-MAE case: - V7 recorded `mae = 0` for the trade path; - entry `vel_div` was extreme (`~ -0.2463`); - MARAS at entry was `BEARISH`, low conflict, composite hash `58981`; - BTC context remained bearish (`s_btc_above_ma99 = 0`); - OBF imbalance initially leaned against the SHORT, then flipped materially SHORT-confirming during the price break. This suggests an evidence hierarchy that must be tested explicitly: ```text macro/eigen OK + OBF confirms > macro/eigen OK + OBF neutral/choppy > macro/eigen OK + OBF counters transiently but then flips confirming > macro/eigen OK + OBF persistently counters with depth > macro/eigen weak/conflicted regardless of OBF ``` The hierarchy is not a live rule. DASH shows that a very strong macro/eigen impulse can overcome early OBF contradiction when the contradiction is shallow or transient. ETC shows the stronger case, where OBF remained SHORT-confirming through adverse price movement. XLM shows the weaker/riskier case, where macro thesis persistence carried the trade while OBF was ugly at the worst point. Candidate features: ```text macro_obf_alignment_class macro_extreme_impulse_score obf_counter_transience_bars obf_counter_depth_weighted obf_flip_to_confirmation_latency_s obf_confirmation_after_macro_impulse macro_ok_obf_confirm_weight macro_ok_obf_counter_weight macro_extreme_overrides_obf_counter_weight ``` Required tests: - rank outcomes by `macro_obf_alignment_class`; - compare `macro OK + OBF confirm` against `macro OK + OBF counter`; - split OBF counter cases into transient, shallow, persistent, and depth-weighted contradiction; - measure whether OBF flip-to-confirmation latency predicts TP speed; - report whether extreme `vel_div` can safely receive more weight than early OBF contradiction, and where that becomes unsafe; - expose the learned hierarchy weights as VIBRISS-tunable parameters, not hardcoded doctrine. ### 11.5.4 Falling-Knife / Missing-Bounce-Sensor Case Study Live LTCUSDT observation on `2026-06-04` (`c0139cea`) adds an open/pending case study for the opposite side of the DASH impulse capture. The trade entered SHORT near `11:15:12 UTC` with extreme entry `vel_div` (`~ -0.1942`) and high notional, but subsequently showed severe adverse excursion and no meaningful favorable excursion at the time of review. V7 also emitted repeated `RETRACT` recommendations, but V7 pressure is not treated as truth by itself; XLM showed that V7 can scream during a trade that later recovers profitably. Observed at review time: - `inverse_ars_bounce_shadow` was stale; latest row was `2026-06-03 18:42:26 UTC`, so the bounce detector was not assisting live; - V7 repeatedly emitted `RETRACT / V7_RISK_DOMINANT`, which is local-pain evidence only; - V7 observed `mae ~ 0.854%`, `mfe = 0`, and `exit_pressure = 3`; - OBF was mostly neutral/choppy with weak, oscillating side-normalized evidence, not a strong rescue signal; - MARAS/BTC remained broadly bearish/low-conflict, but recent eigen values were intermittent rather than steadily thesis-confirming. Research meaning: ```text macro/eigen entry impulse alone is insufficient when local danger is extreme, MFE remains zero, OBF does not confirm, and the bounce/inverse-risk sensor is missing or stale. ``` V7 pressure must be weighted conditionally: ```text V7 pressure is discounted when macro thesis remains strong, OBF confirms, and MFE exists. V7 pressure receives more weight only when independent local invalidation features agree: zero MFE, rising MAE, neutral/counter OBF, stale/missing bounce sensor, macro impulse decay, or MARAS conflict/novelty. ``` Candidate features: ```text bounce_sensor_freshness_s bounce_sensor_missing_mask extreme_macro_without_mfe v7_retract_persistence_bars zero_mfe_high_mae_flag obf_neutral_or_counter_during_mae macro_impulse_decay_after_entry ``` Required replay treatment: - stale/missing bounce data must be an explicit mask feature, not an assumed neutral score; - compare extreme-entry trades that get early MFE against extreme-entry trades with zero MFE and rising MAE; - treat persistent V7 `RETRACT` as a local-danger amplifier only when confirmed by independent invalidation sensors such as stale bounce, zero MFE, rising MAE, neutral/counter OBF, or macro impulse decay; - only promote a macro override if it survives this LTC-style case family after opportunity-cost and tail-loss accounting. ### 11.6 Learning / Computing Model V1 should use a two-layer policy: 1. Prior/posture estimator: - computes candidate priors from historical replay by MARAS composite hash, MARAS label, asset, side, and contiguous time region. - uses shrinkage: hash prior -> label prior -> global prior. - initializes the hold target near `12` bars unless the context prior has enough evidence to move it. 2. Online contextual bandit: - learner: discounted LinUCB or LinTS over finite hold-bar arms. - arms: `[4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40]`. - reward: delayed until trade close or replay terminal. - discount/window: sliding 300 closed trades, plus faster decay when drift is detected. - exploration: shadow-only by default; live exploration cap starts at `0`. Recommended fallback if contextual coverage is sparse: ```text if hash_sample_n >= 30: prior = median_best_hold_for_hash elif label_side_sample_n >= 100: prior = median_best_hold_for_label_side + label_bias else: prior = 12 advice = guardrail_filter(contextual_bandit(prior, candidates)) ``` Optional recovery model: - Train a survival model for `extra_bars_to_recovery`. - Use it only as a veto/adjuster until validated. - It may increase hold only when recovery probability is high and expected extra hold is short. ### 11.7 Success Definition Primary success metric: ```text recursive_capital_curve_delta_after_opportunity_cost ``` This means the replay must account for saved capital compounding forward, and must subtract the opportunity cost of trades that would have recovered or won after a premature floor/ADVSL action. Secondary metrics: - net PnL delta - ROI delta - max drawdown delta - tail-loss count and severity - number of hard/floor cuts - number of clipped winners - gross saved loss - gross missed upside - average and median recovery lag - average and median extra bars to recovery - TP near-miss count, TP near-miss recovery lag, and first-touch TP hit rate - per-hash and per-label stability - OOD region performance - worst contiguous-region degradation - explicit ceiling-violation count and worst single-loss size under the tested policy, because a "best" replay result is not acceptable if it breaches the operator's declared loss ceiling Promotion requires: - positive recursive capital-curve delta on held-out contiguous regions, - no unacceptable increase in clipped-winner opportunity cost, - no hidden dependence on a single asset or single MARAS hash, - improvement or neutral behavior on EFSM-flipped LONG subset, - deterministic replay reproducibility, - shadow logging coverage sufficient for OPE. ### 11.8 Calibration Protocol Calibration must run in this order: 1. Full-tape replay: - evaluate every candidate hold arm on every eligible historical trade path. - include all available BLUE/PINK/PRODGREEN executed trade history only when namespace semantics are kept separate. 2. Capital-aware replay: - recursively recompute capital after each counterfactual exit. - preserve position sizing geometry when the saved/lost capital changes the subsequent notional. 3. Opportunity-cost audit: - for every floor/ADVSL cut, measure whether the trade later recovered. - record recovery lag, extra bars, and missed PnL. 4. Region validation: - split into contiguous time regions with enough trades. - repeat with moving/randomized boundaries. - report median/best hold per region. 5. MARAS proximity validation: - group by composite hash when sample size is enough. - otherwise use nearest-neighbor distance over MARAS raw signature fields. - report whether per-hash/per-neighbor priors outperform global 12-bar center. 6. OBF validation: - bind optimum hold to `obf_depth_1pct_usd`, `obf_depth_quality`, spread, and imbalance. 7. TP near-miss validation: - include trades that nearly touched candidate TP but missed on the observed cadence. - compute first-touch labels from the highest-resolution available path. - isolate the opportunity cost of late reversal after near-touch. - compare the resulting TP bucket against the profitable-close-only sample. - test on OOD time slices; do not promote an OBF rule from in-sample fit only. 7. Walk-forward: - train on region N, validate on N+1. - repeat across the full history. - freeze the learner if the current best policy degrades versus baseline. ### 11.9 Advice Payload Example advice: ```json { "schema": "vibriss.param_set_advice.v1", "namespace": "blue", "param_set_id": "advsl.hold_substitute.v1", "spec_version": "1.0.0", "trade_scope": "on_entry", "baseline_reference": 20, "current_live_overlay_reference": 6, "recommended": { "advsl.min_hold_bars_before_floor_arm": 12, "advsl.recovery_extension_max_bars": 0 }, "candidate_set": [4, 6, 8, 10, 12, 14, 16, 20, 24, 28, 34, 40], "confidence": 0.74, "context": { "asset": "XLMUSDT", "side": "LONG", "maras_composite_hash": 57957, "maras_regime": "CHOPPY_BEARISH", "obf_depth_quality_bucket": "weak", "v7_pressure_bucket": "high" }, "guardrail_status": "SHADOW_ONLY", "fallback_value": 12, "expires_at": "2026-06-03T00:05:00Z" } ``` ### 11.10 Guardrails Mandatory guardrails: - Shadow-only until walk-forward validation is positive. - No live exploration by default. - Do not allow the learner to disable catastrophic floors. - If OBF is stale, ignore OBF-derived hold extension. - If MARAS confidence is low or conflict is high, shrink toward global prior. - If context is EFSM-flipped LONG and LONG sample count is sparse, use the tighter safe prior, not a broad SHORT-derived prior. - If the recommended hold would increase worst-case open loss beyond the active floor/cap, the floor/cap wins. - If capital drawdown alarm is active, freeze to deterministic safe baseline. ### 11.11 Starting Priors From Current Research Current replay-derived starting posture: | Context | Starting prior | Rationale | |---|---:|---| | Global ADVSL hold substitute | `12` bars | Best current center for reducing 20-bar tail slips without assuming all contexts need long waits. | | Legacy baseline comparison | `20` bars | Historical no-arm/min-hold reference. | | Tight overlay reference | `6` bars | Current live overlay guardrail reference, not the general learned policy. | | Recovery/snapback pockets | `24` to `40` bars | Some contiguous-region medians were materially longer; keep as candidates, not defaults. | | Sparse/unknown context | `12` bars | Conservative research center with shrinkage. | | EFSM-flipped LONG sparse context | `6` to `12` bars | Do not borrow broad SHORT recovery priors blindly. | Known caution: - A `$400` hard cap improved one capital-aware slice by about `+$592.83` versus the 12-bar-only replay, but generated a gross forgone-upside bucket around `+$6,617.30` on hard-cap hits. Therefore max-loss floors must be evaluated with opportunity cost and recovery lag, not judged by saved-loss totals alone. ### 11.12 Promotion Policy Promotion is part of this ParamSet, not a global runner decision. ```yaml promotion_policy: owner: advsl.hold_substitute.v1 technique: replay_shadow_canary baseline_policy: legacy_reference: 20 current_overlay_reference: 6 fallback_value: 12 cadence: replay_calibration: every_6h_or_50_new_rewards promotion_review: every_30m checkpoint_review: every_60s live_replacement_rhythm: at_trade_entry_only evidence_gates: shadow_to_advisory: min_replay_trades: 300 min_contiguous_regions: 4 recursive_capital_curve_delta_after_cost: "> 0" worst_region_delta: ">= -0.10 * positive_total_delta" clipped_winner_cost_budget: "documented_and_bounded" advisory_to_canary_live: min_shadow_decisions: 200 min_closed_trade_rewards: 50 min_days_observed: 3 no_unexplained_tail_loss_cluster: true manual_approval_required: true canary_live_to_controlled_live: min_live_consumed_trades: 50 live_vs_shadow_regret: "<= 0" no_guardrail_violation: true manual_approval_required: true canary_scope: namespaces: [blue] max_paramsets_live: 1 max_live_exploration_rate: 0.0 allow_only_capture_on_entry: true automatic_demotion: - stale_obf_or_maras_required_context - reward_backlog_critical - drawdown_alarm - candidate_underperforms_baseline_in_shadow - checkpoint_hash_mismatch ``` Interpretation: - `replay_calibration` answers how often the ParamSet re-estimates candidate quality from historical/newly closed data. - `promotion_review` answers how often the ParamSet is checked for stronger mode eligibility. - `live_replacement_rhythm` answers when the engine may replace the old parameter with the VIBRISS value. For this ParamSet it is only at trade entry. - The runner executes this contract. It does not invent promotion thresholds. ### 11.13 Meta-Cadence Policy The cadence parameters are themselves governed by this ParamSet. They are not free-floating daemon settings. ```yaml meta_cadence_policy: owner: advsl.hold_substitute.v1 status: shadow_first learner: discounted_ucb_then_linucb tunable_cadences: replay_calibration_interval_s: baseline: 21600 candidates: [1800, 3600, 10800, 21600, 43200] promotion_review_interval_s: baseline: 1800 candidates: [900, 1800, 3600, 7200] checkpoint_interval_s: baseline: 60 candidates: [30, 60, 120, 300] min_new_rewards_before_recalibration: baseline: 50 candidates: [10, 25, 50, 100] shadow_to_canary_cooldown_trades: baseline: 100 candidates: [25, 50, 100, 200] context_inputs: maras: - maras_composite_hash - maras_confidence - maras_conflict_level - maras_nearest_distance exof: - exf_latest - btc_regime_features - market_volatility_context esof: - session_bucket - day_of_week - calendar_event_flags ops: - reward_backlog_age_s - ch_write_failure_rate - artifact_disk_free_gb - drawdown_state reward_mapping: positive: - faster_detection_of_degraded_hold_policy - lower_stale_advice_rate - lower_missed_adaptation_cost negative: - promotion_false_positive - noisy_recalibration_churn - excessive_compute_or_backlog - operator_churn live_change_policy: replay_calibration_interval_s: controlled_after_shadow promotion_review_interval_s: advisory_only_until_manual_approval checkpoint_interval_s: fixed_by_ops_until_runner_load_tested shadow_to_canary_cooldown_trades: advisory_only ``` This makes MARAS, ExoF, and EsoF eligible context for cadence advice. For example, VIBRISS may learn that high MARAS novelty plus hostile ExoF context requires faster recalibration review, while ordinary stable regimes can use a slower cadence to avoid overreacting. Cadence testing is permitted, but first in shadow: - log what cadence would have been chosen; - replay whether that cadence would have detected degradation sooner; - charge compute/backlog cost; - charge false-promotion cost; - compare against fixed-cadence baseline. Only after the meta-cadence policy beats fixed cadence in walk-forward replay and shadow operation may it control any real scheduler interval. ### 11.14 Catastrophic Floor Derivation Study The floor percentage is now a dedicated shadow-only VIBRISS research target. ```yaml param_set: id: advsl.catastrophic_floor_derivation.v1 name: ADVSL Catastrophic Floor Derivation status: shadow_first success: primary_metric: recursive_capital_curve_delta_after_opportunity_cost artifact_kinds: [code, test, spec] artifact_refs: - prod/vibriss/floor_derivation.py - prod/vibriss/test_floor_derivation.py - prod/docs/ADVSL_CATASTROPHIC_FLOOR_DERIVATION_STUDY.md - prod/docs/VIBRISS_PARAMETER_GOVERNANCE_SPEC.md ``` Current full-tape replay on the blue trade tape: - replayable trades: `802` - actual end capital: `$51,937.21` - floor-only best aggregate candidate: `1.50%` - floor-only per-regime averages: still centered at `0.50%` Interpretation: - this study does **not** validate `1.20%` as a universal standalone floor; - it validates the need for a derivation path and the ability to bind the floor to code/test/spec evidence; - `1.20%` remains a coupled-policy prior for the broader ADVSL/TP/hold stack, not a floor-only truth. The floor-only study must remain shadow-only. Live use may only follow a coupled policy that demonstrates positive recursive capital curve delta on held-out contiguous regions. ### 11.15 Acceptance Tests Minimum tests before implementation can be called complete: - Given a fixed replay window, the same hold recommendation and reward are reproduced bit-for-bit or within declared float tolerance. - Candidate arms outside the hard range are rejected. - Stale OBF creates a masked feature, not a fake zero-depth observation. - Low MARAS confidence or high conflict shrinks advice toward the global prior. - EFSM-flipped LONG contexts do not use unqualified SHORT-only priors. - Capital-aware replay compounds saved/lost capital forward. - Opportunity cost is charged when a cut trade later recovers. - The shadow advice payload contains candidate set, chosen arm, confidence, baseline, guardrail result, and reproducibility keys. - Promotion decisions are rejected when the ParamSet omits `promotion_policy`. - Meta-cadence advice is logged as a ParamSet decision, not a runner-local heuristic. ## 12. VIBRISS Ops / Runner System ### 12.1 Operational Objective VIBRISS must run as an observable production subsystem, not as an ad hoc notebook or one-off replay script. The runner is responsible for: - loading parameter specs and ParamSet specs, - ingesting live context from Hazelcast and historical context from ClickHouse, - publishing shadow/advisory parameter postures, - scheduling replay/calibration subtasks, - writing full audit logs, - exposing health sensors to MHS, - feeding TUI/observability surfaces, - checkpointing learner state so recommendations are reproducible after restart. The runner must reuse the existing infrastructure pattern: - supervisord is the process authority; - Hazelcast is the live bus; - ClickHouse is the audit/event store; - NATS is the optional event transport for replay, reward, and policy-state fanout when decoupled workers or durable queues are useful; - MHS reads composite health from HZ and reports it in `DOLPHIN_META_HEALTH`; - TUI observes primarily through HZ listeners and polls CH only for heavier historical panels; - Prefect is optional for scheduled offline jobs, not required for the hot VIBRISS daemon. ### 12.2 Process Topology VIBRISS should be containerized, but still owned by supervisord. In the current production layout, the host supervisord owns only the container bootstrap wrapper; the container itself runs its own supervisord instance, which owns the live runner process. That makes later full-system containerization easier without changing the runner contract. If sandboxing is enabled, gVisor is the outer runtime boundary for the container or worker container. VIBRISS does not instantiate or manage gVisor from inside the container; the host/container runtime selects that boundary at launch time. The containerized runner must still reach host Hazelcast and ClickHouse over the configured backplane. If NATS is enabled, it runs as a sibling stack service on the host backplane and the container talks to it over `nats://localhost:4222`. Recommended process shape: ```text supervisord -> vibriss_runner container -> live advice loop -> spec loader -> health publisher -> lightweight replay scheduler -> learner checkpoint writer -> optional vibriss_worker container(s) -> full-tape replay -> walk-forward validation -> OBF/MARAS proximity calibration -> offline policy evaluation ``` The live runner is a long-lived daemon. Heavy replay/calibration jobs are separate subtasks so the live advice loop cannot be blocked by ML work. The experiment-side harness that replays trade episodes, sweep ranges, and walk-forward windows is specified separately in [`VIBRASS_EXPERIMENT_RUNNER_SPEC.md`](VIBRASS_EXPERIMENT_RUNNER_SPEC.md). Container runtime: - Docker or Podman is acceptable. - Prefer Podman if rootless isolation becomes important. - Optional sandbox runtime: gVisor may wrap the launched container or worker container, but it is selected outside VIBRISS by the host/container runtime. VIBRISS must not attempt to manage the sandbox boundary from inside the container. - Do not put Hazelcast in the VIBRISS container. - Do not restart Hazelcast as part of VIBRISS recovery. - Mount large replay outputs to `/mnt/dolphin_training/vibriss/`, not the SMB repo path. - Write only small docs/specs to `/mnt/dolphinng5_predict/prod/docs/`. ### 12.3 Supervisor Contract Recommended supervisord entries: ```ini [program:vibriss_runner] command=/usr/bin/podman run --rm --name dolphin-vibriss-runner --network host -v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro -v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw -v /mnt/ng6_data:/mnt/ng6_data:ro -e HZ_HOST=localhost:5701 -e CH_URL=http://localhost:8123/ -e CH_DB=dolphin dolphin-vibriss:latest python -m vibriss.runner --mode shadow directory=/mnt/dolphinng5_predict/prod autostart=true autorestart=true startsecs=10 startretries=5 stopwaitsecs=20 stopasgroup=true killasgroup=true stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner.log stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_runner-error.log [program:vibriss_worker] command=/usr/bin/podman run --rm --name dolphin-vibriss-worker --network host -v /mnt/dolphinng5_predict:/mnt/dolphinng5_predict:ro -v /mnt/dolphin_training/vibriss:/mnt/dolphin_training/vibriss:rw -v /mnt/ng6_data:/mnt/ng6_data:ro dolphin-vibriss:latest python -m vibriss.worker --idle directory=/mnt/dolphinng5_predict/prod autostart=false autorestart=false startsecs=0 stopwaitsecs=30 stdout_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker.log stderr_logfile=/mnt/dolphin_training/vibriss/logs/supervisor/vibriss_worker-error.log ``` Group placement: ```ini [group:dolphin_data] programs=exf_fetcher,acb_processor,obf_universe,meta_health,system_stats, esof_advisor,maras_service,vibriss_runner ``` Rationale: - VIBRISS is data/control-plane infrastructure, not the trader itself. - The runner can be autostarted because it begins shadow-only. - Workers remain manual or scheduler-launched because full replay can be heavy. - MHS must observe VIBRISS health, but must not fight the container runtime through systemd. ### 12.4 Container Interface Required environment variables: | Env | Meaning | |---|---| | `HZ_HOST` | Hazelcast host/port, default `localhost:5701`. | | `CH_URL` | ClickHouse HTTP URL. | | `CH_DB` | Namespace DB: `dolphin`, `dolphin_prodgreen`, or PINK-specific DB. | | `CH_USER` / `CH_PASS` | ClickHouse credentials. | | `NATS_URL` | Optional NATS server URL, default `nats://localhost:4222`. | | `VIBRISS_ENABLE_NATS_TRANSPORT` | Enable best-effort NATS publication. | | `VIBRISS_NATS_SUBJECT_PREFIX` | Subject prefix, default `vibriss`. | | `VIBRISS_MODE` | `shadow`, `advisory`, `canary`, or `disabled`. | | `VIBRISS_NAMESPACE` | `blue`, `pink`, `prodgreen`, or `research`. | | `VIBRISS_SPEC_DIR` | Param spec directory. | | `VIBRISS_STATE_DIR` | Checkpoint/output directory. | | `VIBRISS_ENABLE_LIVE_ACTUATION` | Must default to `0`. | | `VIBRISS_CALIBRATION_INTERVAL_S` | Default replay/calibration scheduler interval. | | `VIBRISS_PROMOTION_REVIEW_INTERVAL_S` | Default promotion-gate review interval. | | `VIBRISS_META_CADENCE_MODE` | `fixed`, `shadow`, or `controlled`; defaults to `fixed`. | | `VIBRISS_MHS_SENSOR_KEY` | Default `vibriss_sensors_blue`. | | `VIBRISS_HEALTH_INTERVAL_S` | Default `5`. | Filesystem contract: | Path | Mode | Use | |---|---|---| | `/mnt/dolphinng5_predict` | read-only in container | Code/spec/doc access. | | `/mnt/dolphin_training/vibriss` | read-write | Learner state, replay artifacts, reports. | | `/mnt/ng6_data` | read-only | Tape, OBF, scan data. | | `/tmp` inside container | read-write ephemeral | Small temporary files only. | ### 12.5 Internal Runner Loops The runner should have separate loops with independent health status: | Loop | Cadence | Responsibility | |---|---:|---| | `spec_loader` | startup + 60s | Load/validate ParamSpec and ParamSetSpec files. | | `context_ingestor` | 0.5s to 5s | Read HZ live context and keep a point-in-time snapshot. | | `advice_loop` | on context/trade event | Score candidates and publish shadow/advisory advice. | | `reward_collector` | 10s to 60s | Join closed trades to advice and write delayed rewards. | | `checkpoint_loop` | 60s | Persist learner state and model metadata. | | `calibration_scheduler` | 5m+ | Queue replay/validation subtasks when new data warrants it. | | `promotion_evaluator` | 15m+ | Evaluate whether a ParamSet may move to a stronger mode. | | `meta_cadence_evaluator` | 15m+ | Shadow-test cadence settings for calibration/promotion/update loops. | | `health_publisher` | 5s | Publish MHS-compatible sensor payload. | The advice loop must never wait on full replay, model training, or ClickHouse backfill. If ClickHouse is slow, advice may continue from latest checkpoint and mark reward collection degraded. ### 12.6 Hazelcast Surfaces Recommended HZ maps/keys: | Map | Key | Producer | Consumer | Purpose | |---|---|---|---|---| | `DOLPHIN_FEATURES` | `vibriss_param_advice` | runner | BLUE/PINK/TUI | Latest general parameter advice. | | `DOLPHIN_FEATURES` | `vibriss_hold_substitute_advice` | runner | ADVSL/TUI | Latest ADVSL hold-substitute advice. | | `DOLPHIN_FEATURES` | `vibriss_latest` | runner | TUI/MHS/manual ops | Compact subsystem summary. | | `DOLPHIN_META_HEALTH` | `vibriss_sensors_blue` | runner | MHS | BLUE VIBRISS sensor payload. | | `DOLPHIN_META_HEALTH` | `vibriss_sensors_pink` | runner | MHS | PINK VIBRISS sensor payload. | | `DOLPHIN_HEARTBEAT` | `vibriss_runner_heartbeat` | runner | MHS/TUI | Liveness heartbeat. | | `DOLPHIN_CONTROL_PLANE` | `vibriss_commands` | ops/TUI | runner | Freeze, unfreeze, replay, reload specs. | Advice remains separate from commands. An advice key tells the engine what VIBRISS recommends; a command key tells VIBRISS what operators want it to do. ### 12.7 ClickHouse Tables VIBRISS needs durable audit tables. Recommended tables: | Table | Purpose | |---|---| | `dolphin.vibriss_decisions` | One row per candidate-scoring decision. | | `dolphin.vibriss_rewards` | Delayed realized/counterfactual reward rows. | | `dolphin.vibriss_policy_state` | Checkpoint metadata and active posture versions. | | `dolphin.vibriss_paramset_status` | Per-ParamSet health/performance summary. | | `dolphin.vibriss_subtasks` | Replay/calibration/ML subtask lifecycle. | Minimum `vibriss_decisions` fields: ```sql ts DateTime64(6, 'UTC'), namespace LowCardinality(String), mode LowCardinality(String), param_set_id LowCardinality(String), spec_version String, decision_id String, trade_id String, asset LowCardinality(String), side LowCardinality(String), scan_number UInt64, context_hash String, maras_composite_hash UInt16, maras_regime LowCardinality(String), candidate_set_json String, chosen_arm String, baseline_value String, recommended_value String, confidence Float32, propensity Float32, guardrail_status LowCardinality(String), fallback_reason String, model_version String, payload_json String ``` Minimum `vibriss_rewards` fields: ```sql ts DateTime64(6, 'UTC'), decision_id String, trade_id String, reward_status LowCardinality(String), raw_actual_pnl Float64, raw_counterfactual_pnl Float64, saved_loss_delta Float64, clipped_winner_delta Float64, capital_curve_delta Float64, drawdown_delta Float64, recovery_lag_s Float32, extra_bars_to_recovery Float32, normalized_reward Float32, reward_components_json String ``` Subtask rows must include `subtask_id`, `param_set_id`, `kind`, `status`, `started_at`, `finished_at`, `input_window`, `artifact_path`, `n_trades`, `primary_metric`, `failure_reason`, and `parent_decision_id` when applicable. ### 12.8 MHS Sensor Contract VIBRISS should expose an MHS-compatible composite payload, modeled after the existing optional DITA sensor pattern. Recommended HZ key: ```text DOLPHIN_META_HEALTH["vibriss_sensors_blue"] ``` Payload: ```json { "schema": "vibriss.mhs_sensors.v1", "namespace": "blue", "ts": "2026-06-03T00:00:00Z", "rm_meta": 0.93, "status": "GREEN", "m14_vibriss_runner_liveness": 1.0, "m15_vibriss_spec_integrity": 1.0, "m16_vibriss_data_freshness": 0.9, "m17_vibriss_advice_integrity": 1.0, "m18_vibriss_reward_backlog": 0.85, "m19_vibriss_paramset_health": 0.95, "param_sets": { "advsl.hold_substitute.v1": { "score": 0.94, "status": "GREEN", "mode": "shadow", "last_advice_age_s": 2.4, "last_reward_age_s": 31.0, "open_decisions": 1, "reward_backlog": 3, "shadow_samples": 240, "walk_forward_status": "pending", "latest_recommended_hold": 12 } }, "subtasks": { "full_tape_replay": {"score": 1.0, "status": "IDLE"}, "walk_forward": {"score": 0.8, "status": "STALE"}, "obf_binding": {"score": 1.0, "status": "IDLE"} } } ``` Sensor scoring: | Sensor | Score rule | |---|---| | `m14_vibriss_runner_liveness` | 1 if heartbeat age < 15s, 0.5 if < 60s, else 0. | | `m15_vibriss_spec_integrity` | Fraction of loaded specs passing validation. | | `m16_vibriss_data_freshness` | Freshness of HZ context, CH close rows, OBF/MARAS context. | | `m17_vibriss_advice_integrity` | 1 when latest advice is schema-valid and guardrailed. | | `m18_vibriss_reward_backlog` | Penalizes unjoined decisions awaiting reward too long. | | `m19_vibriss_paramset_health` | Mean score of all enabled ParamSets. | MHS integration rule: - VIBRISS starts with weight `0.0` in RM_META until stable. - Then enable a small optional weight, analogous to DITA sensors. - Suggested initial weight: `0.02`. - Maximum allowed weight: `0.10` until the subsystem is live-actuating. - If VIBRISS is disabled, MHS score must be neutral and must not degrade BLUE. Suggested MHS env shape: ```text DOLPHIN_MHS_USE_VIBRISS_SENSORS=1 DOLPHIN_MHS_VIBRISS_SENSOR_WEIGHT=0.02 DOLPHIN_VIBRISS_SENSOR_KEY=vibriss_sensors_blue DOLPHIN_MHS_VIBRISS_SENSOR_MAPS=DOLPHIN_META_HEALTH,DOLPHIN_FEATURES ``` ### 12.9 Observability / TUI Integration TUI integration should follow the existing v9 pattern: - use HZ listeners for latest VIBRISS state; - add CH polling only for historical/replay-heavy summaries; - never poll origin subsystems directly from the TUI. Recommended panels: | Panel | Source | Cadence | Content | |---|---|---:|---| | `VIBRISS` main panel | `DOLPHIN_FEATURES/vibriss_latest` | HZ listener | mode, status, latest ParamSet advice, confidence, MHS score. | | `VIBRISS Hold` footer | `vibriss_hold_substitute_advice` + CH rewards | HZ + 60s CH | recommended hold, baseline, prior, reward backlog, recent net delta. | | `VIBRISS Tasks` footer | `vibriss_subtasks` | 60s CH | replay/walk-forward/OBF binding status. | | `MHS` existing panel | `DOLPHIN_META_HEALTH/latest` | HZ listener | include VIBRISS sensor details if enabled. | Display fields for `advsl.hold_substitute.v1`: ```text VIBRISS HOLD mode=shadow rec=12b base=20b live_ref=6b conf=74% guard=PASS hash=57957 obf=weak pressure=high reward_backlog=3 wf=pending samples=240 ``` The TUI must clearly distinguish: - baseline reference, - current live reference, - VIBRISS recommendation, - whether recommendation is shadow-only or live-consumed. Implementation note: - `prod/vibriss/vibriss_tui.py` now provides the Textual dashboard, and `python -m vibriss.vibriss_runner tui` launches it in read-only shadow mode. - The UI is panel-registry based so additional metrics can be added without rewriting the dashboard shell. ### 12.10 Control Commands Commands should be written to `DOLPHIN_CONTROL_PLANE["vibriss_commands"]`. Allowed commands: | Command | Effect | |---|---| | `RELOAD_SPECS` | Reload ParamSpec/ParamSetSpec files and validate. | | `FREEZE_PARAMSET` | Stop updating and publish fallback for one ParamSet. | | `UNFREEZE_PARAMSET` | Resume shadow/advisory scoring. | | `RUN_REPLAY` | Queue replay subtask for a parameter set/window. | | `RUN_WALK_FORWARD` | Queue walk-forward validation. | | `SET_MODE` | Move `disabled -> shadow -> advisory`; live/canary requires explicit code/config gate. | | `CHECKPOINT_NOW` | Persist learner state immediately. | Commands must be acknowledged to: ```text DOLPHIN_CONTROL_PLANE["vibriss_command_ack"] ``` Ack payloads must include command id, acceptance/rejection, reason, and current mode. Queue consumption alone is not success. ### 12.11 Prefect Role Prefect is optional for VIBRISS. It should not be required for live advice. Acceptable Prefect use: - daily full-tape replay, - scheduled walk-forward validation, - artifact publication, - long offline calibration runs. Not acceptable: - live advice loop, - hot-path reward joining, - health publication, - operator freeze/unfreeze commands. If Prefect is unavailable, the VIBRISS runner should continue shadow/advisory operation from the last checkpoint and mark scheduled calibration stale. ### 12.12 Failure Modes and Fallback | Failure | Required behavior | |---|---| | HZ unavailable | Runner logs degraded, cannot publish advice, MHS score <= 0.5. | | CH unavailable | Advice may continue from checkpoint; reward collector degrades. | | OBF stale | Mask OBF features; do not use OBF hold extension. | | MARAS stale | Shrink to global/label-free prior. | | Spec validation failure | Disable affected ParamSet, publish fallback. | | Learner checkpoint corrupt | Revert to last good checkpoint or baseline prior. | | Replay worker OOM/fails | Mark subtask failed; live runner continues. | | Advice schema invalid | Do not publish; MHS advice integrity drops. | | Drawdown alarm | Freeze to deterministic safe baseline. | ### 12.13 Promotion Gates Before any engine consumes VIBRISS hold advice live: 1. Runner has been stable for at least 7 calendar days. 2. MHS VIBRISS sensors are GREEN or neutral for 95% of runner uptime. 3. `advsl.hold_substitute.v1` has completed full-tape replay. 4. Walk-forward is positive versus baseline on capital-curve delta after opportunity cost. 5. OOD region performance has no catastrophic degradation. 6. TUI displays baseline/current/recommended state correctly. 7. Command ack path is verified. 8. Safe fallback is tested by intentionally freezing the ParamSet. 9. Engine consumption is limited to one ParamSet and one namespace. 10. `VIBRISS_ENABLE_LIVE_ACTUATION=1` is explicitly set and reviewed. ## 13. V1 Rollout Plan 1. Offline replay only: - replay historical decisions from ClickHouse and tape. - benchmark against baseline constants. - compute OPE where logged propensities exist. - report by asset, side, MARAS hash, regime label, V7 reason, OBF bucket, and contiguous time region. 2. Shadow mode: - publish advice to HZ. - do not allow engine consumption. - write `vibriss_decisions`, `vibriss_rewards`, and `vibriss_policy_state`. 3. Guarded advisory: - engine reads advice and surfaces what it would have used. - still no actuation. 4. Canary live: - one parameter only. - no simultaneous bundle changes. - low exploration cap. - hard fallback on stale data, drawdown alarm, or drift alarm. 5. Controlled live comparison: - compare baseline-vs-advised on matched contexts. - freeze policy if replay quality deteriorates. ## 14. Safety Rules Mandatory: - no direct mutation of `blue.yml` or frozen champion config from VIBRISS. - no live promotion without replay, shadow, and documented approval. - no advice consumption when data is stale. - no advice consumption inside disallowed live-change windows. - no multi-parameter bundle learning until single-parameter learners prove that independent adaptation is insufficient. - every live-consumed recommendation must be reconstructable from logs. - every safety-critical parameter must preserve a catastrophic fallback floor. ## 15. Concrete Storage and Schema VIBRISS must be event-sourced. Current policy state is a cache; decisions and rewards are the durable truth. ### 15.1 ClickHouse DDL Recommended DDL: ```sql CREATE TABLE IF NOT EXISTS dolphin.vibriss_decisions ( ts DateTime64(6, 'UTC'), namespace LowCardinality(String), mode LowCardinality(String), param_set_id LowCardinality(String), spec_version String, decision_id String, parent_decision_id String, trade_id String, asset LowCardinality(String), side LowCardinality(String), scan_number UInt64, bars_held UInt32, context_hash String, context_schema String, maras_composite_hash UInt32, maras_scalar_hash UInt32, maras_regime LowCardinality(String), maras_confidence Float32, maras_conflict Float32, obf_stale UInt8, obf_depth_1pct_usd Float64, obf_depth_quality Float32, v7_pressure Float32, v7_mae_risk Float32, candidate_set_json String, chosen_arm String, baseline_value String, recommended_value String, confidence Float32, propensity Float32, guardrail_status LowCardinality(String), fallback_reason String, model_version String, policy_version String, compiled_config_hash String, consumed UInt8, consumed_ts Nullable(DateTime64(6, 'UTC')), payload_json String ) ENGINE = MergeTree PARTITION BY toYYYYMM(ts) ORDER BY (namespace, param_set_id, ts, decision_id) TTL ts + INTERVAL 180 DAY; CREATE TABLE IF NOT EXISTS dolphin.vibriss_rewards ( ts DateTime64(6, 'UTC'), namespace LowCardinality(String), param_set_id LowCardinality(String), decision_id String, trade_id String, reward_status LowCardinality(String), reward_delay_s Float32, actual_exit_reason LowCardinality(String), counterfactual_exit_reason LowCardinality(String), actual_exit_pnl Float64, counterfactual_exit_pnl Float64, saved_loss_delta Float64, clipped_winner_delta Float64, capital_curve_delta Float64, drawdown_delta Float64, recovery_lag_s Float32, extra_bars_to_recovery Float32, normalized_reward Float32, opportunity_cost_charged UInt8, replay_artifact_path String, reward_components_json String ) ENGINE = MergeTree PARTITION BY toYYYYMM(ts) ORDER BY (namespace, param_set_id, ts, decision_id) TTL ts + INTERVAL 365 DAY; CREATE TABLE IF NOT EXISTS dolphin.vibriss_policy_state ( ts DateTime64(6, 'UTC'), namespace LowCardinality(String), param_set_id LowCardinality(String), policy_version String, mode LowCardinality(String), learner LowCardinality(String), checkpoint_path String, checkpoint_hash String, spec_hash String, compiled_config_hash String, n_decisions UInt64, n_rewards UInt64, shadow_samples UInt64, walk_forward_status LowCardinality(String), active_baseline_value String, active_recommended_value String, confidence Float32, state_json String ) ENGINE = ReplacingMergeTree(ts) ORDER BY (namespace, param_set_id, policy_version); CREATE TABLE IF NOT EXISTS dolphin.vibriss_subtasks ( ts DateTime64(6, 'UTC'), namespace LowCardinality(String), subtask_id String, param_set_id LowCardinality(String), kind LowCardinality(String), status LowCardinality(String), started_at DateTime64(6, 'UTC'), finished_at Nullable(DateTime64(6, 'UTC')), input_window String, n_trades UInt64, n_decisions UInt64, primary_metric Float64, baseline_metric Float64, artifact_path String, artifact_hash String, failure_reason String, payload_json String ) ENGINE = MergeTree PARTITION BY toYYYYMM(started_at) ORDER BY (namespace, param_set_id, started_at, subtask_id) TTL started_at + INTERVAL 365 DAY; CREATE TABLE IF NOT EXISTS dolphin.vibriss_promotions ( ts DateTime64(6, 'UTC'), namespace LowCardinality(String), param_set_id LowCardinality(String), promotion_id String, from_mode LowCardinality(String), to_mode LowCardinality(String), requested_by LowCardinality(String), approved_by LowCardinality(String), policy_version String, checkpoint_hash String, evidence_window String, n_decisions UInt64, n_rewards UInt64, n_shadow_samples UInt64, n_live_samples UInt64, recursive_capital_delta Float64, opportunity_cost_delta Float64, max_drawdown_delta Float64, worst_region_delta Float64, baseline_metric Float64, candidate_metric Float64, guardrail_status LowCardinality(String), decision LowCardinality(String), reason String, artifact_path String, payload_json String ) ENGINE = MergeTree PARTITION BY toYYYYMM(ts) ORDER BY (namespace, param_set_id, ts, promotion_id) TTL ts + INTERVAL 730 DAY; CREATE TABLE IF NOT EXISTS dolphin.vibriss_meta_cadence_decisions ( ts DateTime64(6, 'UTC'), namespace LowCardinality(String), param_set_id LowCardinality(String), cadence_id LowCardinality(String), decision_id String, mode LowCardinality(String), context_hash String, maras_composite_hash UInt32, maras_regime LowCardinality(String), exof_state String, esof_state String, candidate_set_json String, chosen_value String, baseline_value String, confidence Float32, reward_status LowCardinality(String), reward_value Float32, guardrail_status LowCardinality(String), fallback_reason String, policy_version String, payload_json String ) ENGINE = MergeTree PARTITION BY toYYYYMM(ts) ORDER BY (namespace, param_set_id, cadence_id, ts, decision_id) TTL ts + INTERVAL 365 DAY; ``` These tables are deliberately narrow enough for hot audit reads and broad enough to replay the decision. Large path arrays, per-bar simulations, and model artifacts must be written to artifact storage, not inlined into ClickHouse. ### 15.2 Artifact Layout Use a non-SMB path for generated artifacts: ```text /mnt/dolphin_training/vibriss/ specs/ advsl.hold_substitute.v1.yaml checkpoints/ blue/advsl.hold_substitute.v1// state.json learner.pkl manifest.json replays/ // config.yaml replay_summary.json capital_curve.csv per_trade_counterfactuals.parquet opportunity_cost_audit.parquet reports/ walk_forward/ obf_binding/ maras_hash_priors/ ``` Every artifact directory must contain a `manifest.json`: ```json { "schema": "vibriss.artifact_manifest.v1", "subtask_id": "wf-20260603-001", "param_set_id": "advsl.hold_substitute.v1", "namespace": "blue", "created_at": "2026-06-03T00:00:00Z", "git_sha": "unknown-or-sha", "spec_hash": "sha256:...", "input_tables": { "trade_events": {"min_ts": "...", "max_ts": "...", "row_count": 1234}, "v7_decision_events": {"min_ts": "...", "max_ts": "...", "row_count": 9999} }, "tape_sources": ["/mnt/ng6_data/arrow_scans/..."], "random_seed": 0, "artifact_hashes": { "replay_summary.json": "sha256:...", "per_trade_counterfactuals.parquet": "sha256:..." } } ``` ## 16. Replay, OPE, and Causality Rules VIBRISS must be explicit about what kind of evidence it has. Evidence classes: | Class | Meaning | Allowed use | |---|---|---| | `realized_live` | Parameter was actually used live. | Highest-quality reward. | | `shadow_counterfactual` | Advice logged, baseline used, tape can replay alternative. | OPE/research only unless validated. | | `historical_replay` | Offline replay over historical trades with no logged propensity. | Calibration prior, not proof. | | `synthetic_mc` | Monte Carlo augmentation from validated distribution. | Stress coverage only. | | `expert_baseline` | Human/research default such as 12 bars. | Fallback/prior. | Counterfactual replay must store: - actual entry, actual exit, and actual capital before/after; - counterfactual exit scan/bar and price; - whether the counterfactual exit depends on sub-bar, bar-close, or tape-close cadence; - whether the trade later recovered; - how many bars/seconds were needed for recovery; - opportunity cost charged; - recursive capital state after applying the counterfactual. OPE rules: - Use inverse propensity or doubly robust estimators only when propensities were actually logged. - Do not pretend historical replay has logged propensities. - For shadow decisions without randomized action, report them as model counterfactuals, not causal estimates. - Region splits must be contiguous first; randomized splits are secondary robustness checks only. - A policy that wins by one tail event and loses broadly must be flagged as fragile even when net capital delta is positive. Minimum replay report: ```text baseline_end_capital policy_end_capital recursive_delta gross_saved_loss gross_opportunity_cost net_trade_pnl_delta max_drawdown_delta tail_loss_count_delta clipped_winner_count recovered_cut_count median_recovery_lag_s worst_region_delta best_region_delta per_asset_concentration per_hash_concentration ``` ## 17. Mode State Machine VIBRISS modes are explicit and monotonic unless an operator command or guardrail forces demotion. ```text disabled -> shadow -> advisory -> canary_live -> controlled_live ``` Mode meanings: | Mode | Publishes advice | Engine may read | Engine may act | Learner updates | |---|---:|---:|---:|---:| | `disabled` | no | no | no | no | | `shadow` | yes | no | no | yes | | `advisory` | yes | yes, display only | no | yes | | `canary_live` | yes | yes | yes, one ParamSet/namespace | yes | | `controlled_live` | yes | yes | yes, bounded | yes | Automatic demotions: - stale required sensor -> `shadow` or fallback advice; - invalid spec -> affected ParamSet disabled; - reward backlog beyond threshold -> freeze learner updates; - drawdown alarm -> deterministic safe baseline; - ClickHouse unavailable -> keep publishing only if checkpoint is fresh; mark reward collection degraded; - Hazelcast unavailable -> no advice publication; - policy drift alarm -> freeze to last known-good checkpoint. Promotion technique, thresholds, cadence, and evidence gates must be declared inside the affected ParamSet spec. The runner evaluates and records those gates; it is not allowed to invent a promotion policy from global defaults. Promotion must be manual and auditable for any transition that enables live actuation. No health recovery path may silently promote VIBRISS into a stronger actuation mode. ### 17.1 ParamSet-Owned Promotion Lifecycle Every ParamSet must answer these questions before it can leave `shadow`: | Question | Required ParamSet field | |---|---| | What baseline is being challenged? | `promotion_policy.baseline_policy` | | What evidence class is allowed? | `promotion_policy.technique` and `evidence_gates` | | How often is the evidence recomputed? | `promotion_policy.cadence.replay_calibration` | | How often is promotion eligibility reviewed? | `promotion_policy.cadence.promotion_review` | | When may the engine replace the old value? | `promotion_policy.cadence.live_replacement_rhythm` | | What samples are required? | `promotion_policy.evidence_gates.*min*` | | What demotes it? | `promotion_policy.automatic_demotion` | | Who approves live use? | `promotion_policy.*manual_approval_required` | Promotion is also subject to the control-plane elegance constraints in §4.1: one writer per parameter, spec-owned promotion, slow-governed meta-cadence, context inputs instead of arbitrary controllers, reproducible live changes, no hidden cross-subsystem mutation, and shadow/replay/canary before live. Default lifecycle: ```text historical_replay -> walk_forward_replay -> shadow_advice_logging -> advisory_display -> canary_live_capture -> controlled_live ``` The cadence of each phase is also ParamSet-owned: - `advice cadence`: how often the ParamSet emits advice. - `reward cadence`: how often delayed rewards are joined and scored. - `calibration cadence`: how often the learner updates from replay/rewards. - `promotion-review cadence`: how often mode eligibility is evaluated. - `replacement rhythm`: the exact engine decision point where a live parameter can replace the baseline. For safety-critical exit parameters, replacement rhythm should usually be `capture_on_entry` or `between_trades`, not arbitrary intratrade mutation. ### 17.2 Meta-Cadences as Governed Parameters Meta-cadences are tunable parameters. If VIBRISS changes them, they must be declared in the ParamSet under `meta_cadence_policy`. Examples: | Meta-cadence | Meaning | |---|---| | `replay_calibration_interval_s` | How often to re-run replay/calibration. | | `promotion_review_interval_s` | How often to evaluate mode promotion/demotion. | | `checkpoint_interval_s` | How often to persist learner state. | | `min_new_rewards_before_recalibration` | Event-driven cadence threshold. | | `shadow_to_canary_cooldown_trades` | Minimum stable evidence before live canary. | MARAS, ExoF, EsoF, OBF, V7, MHS, and drawdown state may be context inputs for meta-cadence advice, but the cadence learner is subject to the same evidence rules as any other parameter learner. In particular: - fixed cadence is the baseline; - shadow cadence decisions must be logged with candidate set and confidence; - replay must estimate missed-adaptation cost and false-promotion cost; - compute/backlog cost is part of reward; - live control of promotion cadence requires explicit manual approval. ## 18. Engine Consumption Contract The engine must treat VIBRISS advice as optional, expiring input. Consumption algorithm: ```text read advice payload validate schema and spec_version check namespace matches runtime check mode permits consumption check expires_at > now check trade_scope is current decision point check recommendation within hard range check guardrail_status == PASS or permitted advisory state check fallback/catastrophic floor remains active capture value into trade-local immutable parameter snapshot emit consumption audit ``` For `advsl.hold_substitute.v1`, the first live contract should be: - consume only on entry; - store the selected hold bars in the pending/open trade state; - do not mutate it intratrade; - allow intratrade VIBRISS values only as shadow comparisons; - let catastrophic floor and max-dollar floor override hold advice. This avoids a subtle failure mode where a learner changes the hold target after seeing adverse movement that was not available at entry. Intratrade contraction can be researched later, but it is a different ParamSet. ## 19. Drift, Novelty, and Freezing VIBRISS must separate three conditions: 1. data-quality degradation, 2. market/regime novelty, 3. policy underperformance. Drift sensors: | Sensor | Trigger | |---|---| | context distribution drift | MARAS/OBF/V7 feature distribution shifts versus training window. | | reward drift | rolling reward lower than baseline beyond confidence bound. | | regret drift | chosen arm underperforms baseline arm in shadow replay. | | tail cluster | tail-loss or floor-hit count above historical percentile. | | sparse regime | nearest-neighbor distance to known MARAS/OBF contexts too high. | Actions: - distribution drift alone: shrink toward baseline and raise uncertainty; - reward drift: freeze learner updates and publish fallback; - tail cluster: tighten safety floors only if pre-authorized by the ParamSet; - sparse regime: use global safe prior, not nearest hash overfit; - data-quality drift: stop consuming affected sensors. VIBRISS should publish drift state in `vibriss_latest` and `vibriss_paramset_status`. ## 20. Data Volume and Backpressure The ClickHouse outage and spool backlog failure mode matters for VIBRISS. Rules: - VIBRISS must have its own spool and backlog metric. - Advice publication must not block on ClickHouse. - Reward collection may lag, but the lag must be visible in MHS. - Large per-bar OBF or path arrays must not be written to hot audit tables. - Calibration workers must rate-limit writes and should prefer compact Parquet artifacts for heavy outputs. - If ClickHouse spool backlog exceeds threshold, VIBRISS must degrade to `shadow_no_update`: publish from checkpoint only, do not update learners from partial reward data. Recommended thresholds: | Metric | GREEN | DEGRADED | CRITICAL | |---|---:|---:|---:| | decision spool backlog | `<1k` | `1k-50k` | `>50k` | | reward backlog age | `<10m` | `10m-2h` | `>2h` | | artifact disk free | `>20GB` | `5-20GB` | `<5GB` | | CH write failure rate | `<1%` | `1-10%` | `>10%` | VIBRISS must not repeat the OBF-style failure mode of letting millions of low-priority rows delay high-priority trade/reward rows. Use priority queues: 1. decisions, rewards, policy state; 2. trade/path summary; 3. calibration summary; 4. heavy diagnostics. ## 21. Security and Operational Guardrails Secrets: - use existing ClickHouse user/password env pattern; - do not write credentials into spec files; - do not put secrets in artifact manifests. Filesystem: - code/spec mount is read-only inside the container; - learner state and replay artifacts are written outside the SMB repo path; - runner must check free disk before replay subtasks; - no large file writes to `/mnt/dolphinng5_predict`. Runtime: - do not restart Hazelcast; - do not use systemd for Dolphin services; - use supervisord as the owner of the container process; - if gVisor is used, treat it as a host-selected sandbox/runtime wrapper, not a process owned by VIBRISS internals; - worker OOM must not kill the live advice runner; - health checks must distinguish runner alive from learner valid. ## 22. Implementation Defaults These decisions are now recommended defaults, not open questions: - First learner: discounted UCB for non-contextual hold-bar baseline plus LinUCB shadow branch for MARAS/OBF/V7 context. - First live dependency posture: internal finite-arm learners and compact checkpointed state in the runner; no VW, OBP, ABIDES, Pyro/NumPyro, CATX, or broad benchmark libraries in the live advice path. - First worker dependency posture: VW, River, OBP, MABWiser, lifelines, statsmodels, and benchmark libraries are allowed only in replay/OPE/calibration jobs with bounded memory and artifact output. - First drift implementation: simple internal rolling statistics plus optional River-backed detectors if the dependency remains stable inside the runner. - First HZ publication surface: `DOLPHIN_FEATURES["vibriss_param_advice"]` plus dedicated keys for high-value ParamSets such as `vibriss_hold_substitute_advice`. - First consumption point for ADVSL hold substitute: capture-on-entry only. - Counterfactual rewards: store as `shadow_counterfactual` with explicit replay artifact path and no causal-propensity claim. - Drift ownership: VIBRISS computes policy/reward drift and subscribes to MHS, MARAS, OBF, and SurvivalStack for external drift/context. - Container launch: use a small wrapper script under supervisord in production so image existence, disk space, mount health, and env are checked before `podman run` or `docker run`. - MHS integration: prefer a generic external-sensor loader eventually, but V1 may implement a VIBRISS-specific optional sensor as long as it is neutral when disabled. - Infrastructure posture: keep Hazelcast + ClickHouse + supervisord for V1; Kafka/Flink are deferred until measured event volume or recovery requirements exceed the existing bus/audit pattern. ## 23. Open Implementation Questions - Exact minimum sample thresholds per parameter family after the full 1.7k+ trade corpus is rebuilt under the same capital geometry. - Whether hard `$400` floors should be a separate ParamSet or remain outside VIBRISS as fixed safety policy. - How to measure sub-bar TP/cadence opportunity cost in a way compatible with bar-based ADVSL replay. - Whether intratrade hold contraction deserves a second ParamSet after entry-captured hold advice is validated. - How much MC/synthetic data is statistically acceptable without overstating confidence in rare-tail regimes. - Whether PINK can share BLUE priors after venue slippage, fills, and exchange state are included, or must maintain separate priors from day one. ## 24. Recommended First Build Build VIBRISS V1 as a shadow-only package with: - `ParamSpec` dataclasses and YAML loader. - `ParamSetSpec` support for `advsl.hold_substitute.v1`. - discrete UCB/Thompson learner. - contextual LinUCB learner stub or implementation. - advice publisher. - ClickHouse audit writer. - MHS-compatible sensor publisher. - supervisord/container runner definition. - offline replay harness for conditional fast TP and ADVSL hold bars. - capital-aware replay and opportunity-cost accounting for the hold substitute. - no live actuation. Recommended package layout: ```text /mnt/dolphinng5_predict/vibriss/ __init__.py specs.py # ParamSpec / ParamSetSpec dataclasses and validation context.py # HZ/CH context snapshots, masks, point-in-time joins features.py # deterministic feature construction learners/ __init__.py ucb.py # discounted UCB over finite arms thompson.py # categorical Thompson sampling linucb.py # contextual finite-arm learner priors.py # MARAS/label/asset/side shrinkage priors guardrails.py # hard range, freshness, confidence, drawdown gates advice.py # advice payload builder + schema validation publisher.py # Hazelcast publication audit.py # ClickHouse writer facade and spool priority rewards.py # delayed reward joining and opportunity cost replay/ tape.py # tape/path loading capital_curve.py # recursive capital replay counterfactuals.py # arm-level exit simulation walk_forward.py # contiguous and moving-window validation reports.py # JSON/CSV/Parquet artifact writers runner.py # live shadow/advisory daemon worker.py # offline subtasks cli.py # ops commands and local replay entry points tests/ ``` V1 module responsibilities: | Module | Must do | Must not do | |---|---|---| | `specs.py` | validate ranges, modes, required sensors, output surfaces | import live trader code | | `context.py` | build point-in-time snapshots with freshness masks | fill missing market data with fake zeros | | `features.py` | compute deterministic feature vectors | read future outcome labels | | `learners/*` | expose `choose`, `update`, `checkpoint`, `restore` | know about ADVSL internals | | `guardrails.py` | enforce hard safety and fallback | optimize reward | | `advice.py` | produce schema-valid advice payloads | publish directly to HZ | | `publisher.py` | write HZ advice and heartbeat | mutate engine state | | `rewards.py` | join decisions to realized/counterfactual outcomes | update policy without reward status | | `replay/*` | reproduce capital-aware backtests | depend on live HZ | | `runner.py` | run shadow loops and MHS payloads | run full replay inline | | `worker.py` | run heavy calibration/replay jobs | publish live advice | Minimum local commands: ```bash python -m vibriss.cli validate-specs \ --spec-dir /mnt/dolphin_training/vibriss/specs python -m vibriss.cli replay \ --param-set advsl.hold_substitute.v1 \ --namespace blue \ --from 2026-05-01 --to 2026-06-04 \ --out /mnt/dolphin_training/vibriss/replays/manual python -m vibriss.runner \ --mode shadow \ --namespace blue \ --spec-dir /mnt/dolphin_training/vibriss/specs \ --state-dir /mnt/dolphin_training/vibriss/checkpoints ``` Minimum test set: | Test | Purpose | |---|---| | `test_spec_validation.py` | rejects invalid ranges, missing sensors, unsafe live policies. | | `test_advice_schema.py` | validates HZ payloads and expiry/fallback fields. | | `test_guardrails.py` | proves stale OBF/MARAS and drawdown alarms force fallback. | | `test_replay_determinism.py` | same tape/spec/seed gives same capital curve. | | `test_opportunity_cost.py` | recovered cut trades charge missed upside. | | `test_priority_spool.py` | high-priority decision/reward rows flush before diagnostics. | | `test_mode_state_machine.py` | promotion is manual; demotion is automatic. | | `test_no_live_actuation_default.py` | default env cannot make engine consume advice. | The first acceptance test is not "did it make more money in-sample." The first acceptance test is: 1. the same historical decision can be replayed deterministically, 2. every recommended parameter has a valid spec and guardrail trail, 3. baseline fallback is used under stale/low-confidence context, 4. reward accounting includes clipped-winner opportunity cost, 5. the replayed capital curve is reproducible. The first useful artifact is a replay bundle, not a daemon: ```text replay_summary.json capital_curve.csv per_trade_counterfactuals.parquet opportunity_cost_audit.parquet maras_hash_hold_priors.parquet obf_hold_binding_report.json walk_forward_summary.json ``` Only after that bundle is reproducible should the shadow runner be started.