# LONG Deterministic Rule Research

Date: 2026-05-07

## Goal

Find the simplest deterministic long-side market rule, using primarily Dolphin
NG eigendata, that behaves like the original short Alpha Engine rule in spirit:

- few moving parts
- market-structural
- explainable in one breath
- reliable enough to serve as a basal gate before asset selection and later
  overlays

This note is explicitly **not** about a fitted long model.

## Data source

The analysis uses the raw daily scan cache summarized by:

- `adaptive_exit/characterize_long_signals.py`
- `/mnt/dolphin_training/long_signal_research/long_signal_scan_summary_h24.parquet`
- `/mnt/dolphin_training/long_signal_research/long_signal_characterization_report.json`

Only eigendata and scan-price-derived outcomes are used here:

- `instability_50`
- `v50/v150/v300/v750_lambda_max_velocity`
- `vel_div`
- `vel_div` lag / delta terms

No ExF, EsoF, or OBF are required for the core finding.

## What does **not** work as the basal long rule

The obvious mirror thesis,

- `vel_div > 0.01`

is too weak to be the basal long edge.

Recent HQ slice (`2025-12-31` onward):

- support: `39.65%`
- `strong_long` lift: `1.15x`
- `broad_long` lift: `1.22x`

That is not useless, but it is not elegant enough nor selective enough to be
the long analogue of `vel_div < -0.02`.

## Strongest deterministic shape

The long side shows up most clearly as a **stressed unwind / squeeze** regime,
not as a generic bullish breakout regime.

### Candidate primary deterministic rule

```text
LONG_REGIME if
    instability_50 >= 20.5
    and v300_lambda_max_velocity < 0
    and v750_lambda_max_velocity < 0
```

Interpretation:

- `instability_50 >= 20.5`: the market is structurally stressed
- `v300 < 0` and `v750 < 0`: the slower eigenspace is still negative / damaged
- together: this is a high-stress unwind state where long opportunities tend to
  appear as reversals / squeezes on the same manifold that produces short
  dislocations

### Why `20.5`

`20.5` is the rounded recent-HQ `instability_50` 90th-percentile threshold
(`20.546996...`). It is the most practical fixed threshold found in the
recent-era characterization.

## Empirical support

### Recent HQ (`2025-12-31` onward)

Base rates:

- `strong_long`: `0.1648`
- `broad_long`: `0.1367`

Rule:

- support: `6356` rows (`3.58%`)
- `strong_long`: `0.3409` (`2.07x` lift)
- `broad_long`: `0.3538` (`2.59x` lift)

### Full history

Base rates:

- `strong_long`: `0.2603`
- `broad_long`: `0.2472`

Rule:

- support: `300,728` rows (`12.59%`)
- `strong_long`: `0.3330` (`1.28x` lift)
- `broad_long`: `0.3375` (`1.37x` lift)

## Simpler fallback

If maximum elegance is preferred over extra selectivity, the one-factor
fallback is:

```text
LONG_REGIME_SIMPLE if instability_50 >= 20.5
```

Recent HQ:

- support: `10.10%`
- `strong_long`: `0.3297` (`2.00x` lift)
- `broad_long`: `0.3420` (`2.50x` lift)

This is surprisingly strong for a one-variable rule. It is the closest thing
found to a pure long-side analogue of the short `vel_div < -0.02` gate.

Tradeoff:

- simpler
- broader
- slightly less selective than adding `v300 < 0` and `v750 < 0`

## Optional stricter confirmation

If later tuning wants more explicit “healing after stress” confirmation, the
strict variant is:

```text
LONG_REGIME_STRICT if
    instability_50 >= 20.5
    and vel_div_lag6 < -0.03
    and vel_div_delta6 > 0.02
```

This is directionally sensible, but it is not materially better than the
`instability_50 + v300 + v750` rule, so it should be treated as an optional
refinement, not the basal rule.

## Monthly sanity check

For the candidate primary rule (`instability_50 >= 20.5 && v300 < 0 && v750 < 0`)
in the recent HQ window:

- `2026-01`: `strong_long = 0.348`
- `2026-02`: `strong_long = 0.344`
- `2026-03`: `strong_long = 0.312`

The monthly base rates for the same period were:

- `2026-01`: `0.289`
- `2026-02`: `0.271`
- `2026-03`: `0.068`

So even into the weak March tape, the rule remains elevated relative to base.

## Practical interpretation

This should be viewed as a **market-state gate**, not a complete trade engine.

It says:

- “the market is in the sort of stressed, damaged regime where long squeeze /
  unwind opportunities become meaningfully more likely”

It does **not** by itself say:

- which asset is the best expression
- how to size
- how to exit

That is where the next layers belong:

- deterministic or learned asset selection
- OBF / ARS / bounce overlays
- TP / MAX_HOLD policy

## Recommendation

If a single deterministic long gate must be named now, use:

```text
LONG_REGIME if instability_50 >= 20.5 and v300 < 0 and v750 < 0
```

If maximum simplicity is the priority, use:

```text
LONG_REGIME_SIMPLE if instability_50 >= 20.5
```

And explicitly do **not** promote `vel_div > 0.01` as the basal long rule.

## Deferred analysis idea: dual-shadow regime sampler

This is a **later analysis / control-layer research note**, not a live-rule
recommendation.

One plausible way to sample the market in real time without committing the full
system immediately is a very lightweight **dual-shadow engine**:

- Shadow A: the basal SHORT engine (`vel_div < -0.02` Alpha Engine posture)
- Shadow B: the basal LONG engine (currently the older negative-`vel_div`
  mean-reversion LONG posture is the best simple candidate)

The intent is not merely paper PnL logging. It is to use live, recent
sample-trade outcomes as a **micro-regime probe**:

- if SHORT shadow performance degrades while LONG shadow performance improves,
  the tape may have rotated into a LONG-favorable regime
- if LONG degrades while SHORT improves, the inverse may be true
- if both are performing acceptably, the tape may be permissive / broad enough
  that either side can express edge
- if both are failing, the tape is likely choppy / non-coherent and abstention
  becomes a first-class candidate

This should be implemented, if ever pursued, as:

- very fast
- very lightweight
- explicitly shadow-only at first
- based on small, recent sample trades rather than a heavy fitted model

Longer-term, the entire shadow stream can itself become training data:

- market fingerprints at shadow-entry time
- concurrent SHORT-shadow and LONG-shadow outcomes
- relative WR / ROI-per-trade / drawdown / time-to-win asymmetries

That would allow a later learner to predict or simplify the regime switcher.
But even before ML, the dual-shadow process may already serve as a useful
real-time market-sampling / regime-detection mechanism.

## Dual-shadow persistence characterization

This section records the first persistence pass over extant trades. The goal
was not to prove a full regime-switch system, but to test whether the observed
short-loss streaks are durable enough to justify a regime-favorableness probe.

Important caveat:

- the live SHORT series and the replay LONG series are on different date spans
- this is therefore a side-specific persistence study, not a same-bar paired
  dominance study
- the numbers below are still useful for run-length and hysteresis design

### Live SHORT stream

From the current BLUE trader log:

- trades: `234`
- win rate: `44.44%`
- mean `pnl_pct`: `+0.000506`
- median `pnl_pct`: `-0.000234`
- average win streak: `1.65 trades`
- average loss streak: `2.03 trades`
- `P(win -> win) = 0.394`
- `P(loss -> loss) = 0.512`
- average positive-day run: `1.5 days`
- average negative-day run: `1.5 days`

Interpretation:

- short failures do cluster
- the cluster is real enough to notice
- but it is only mildly persistent
- by itself, it is not strong enough to justify a raw ping-pong switch

### Basal LONG shadow, old mirror posture

Using the recent bullish-month replay and the single comparable `10-bar /
worst_10bar` configuration:

- trades: `2,243`
- win rate: `48.33%`
- mean `pnl_pct`: `+0.000320`
- median `pnl_pct`: `-0.000400`
- average win streak: `1.93 trades`
- average loss streak: `2.07 trades`
- `P(win -> win) = 0.483`
- `P(loss -> loss) = 0.517`
- average positive-day run: `3.0 days`
- average negative-day run: `1.86 days`

Interpretation:

- this is the clearest durable long-favorable candidate seen so far
- the multi-day positive run length is materially better than the live short
  stream
- this supports a long-favorable regime probe, but not an unconditional flip

### Basal LONG shadow, new stressed-unwind posture

Same replay setup:

- trades: `569`
- win rate: `50.44%`
- mean `pnl_pct`: `-0.000078`
- median `pnl_pct`: `+0.000068`
- average win streak: `2.24 trades`
- average loss streak: `2.20 trades`
- `P(win -> win) = 0.556`
- `P(loss -> loss) = 0.546`
- average positive-day run: `1.36 days`
- average negative-day run: `1.18 days`

Interpretation:

- the new long posture has decent local persistence
- but it is more fragile than the mirror-long posture as a regime switch
- it does not yet justify itself as the primary flip trigger

### Conclusion for regime switching

The data support a **smoothed regime-favorableness detector**, not a raw
flip-on-first-loss system.

Practical reading:

- short-loss streak persistence is real but modest
- long-favorable states exist and can persist
- persistence is on the order of a few trades, not a dramatic regime lock
- the correct implementation is a shadow score with hysteresis and abstain
  logic, not a hard immediate SHORT/LONG switch

Suggested rule shape for later analysis:

- compute rolling shadow scores for SHORT and LONG
- use persistence thresholds before flipping
- require stronger evidence to reverse than to stay put
- abstain when both shadows are weak or both are losing

This is enough to justify the next engineering step:

- live dual-shadow logging on the same bars
- market-fingerprint tagging of each shadow entry
- later ML over shadow outcomes if the deterministic layer proves stable

## Rolling flip-worthiness test

To make the side-switch question stricter, the recent live short slice was
retested with a `5-trade` rolling shadow-delta proxy:

- short shadow return = actual live short `pnl_pct`
- long shadow return = counterfactual `-pnl_pct - fee`
- rolling delta = rolling mean of `(long_shadow - short_shadow)`

Recent 3-day slice (`2026-05-04` to `2026-05-06`):

- trades: `168`
- short actual WR: `39.88%`
- short actual compounded return: `+10.02%`
- long counterfactual WR: `47.62%`
- long counterfactual compounded return: `-16.92%`
- flip-to-long signals from the `5-trade` rolling delta: `68`
- flip-to-short signals from the `5-trade` rolling delta: `79`

Interpretation:

- the rolling delta does detect alternating regime pockets
- but it does so often enough that a raw flip would be too twitchy
- on the most recent 30 live trades, the regime buckets were:
  - `13` long-favorable
  - `7` short-favorable
  - `10` neutral
- the long-favorable bucket had positive expected PnL, but the short-favorable
  bucket was also positive and slightly stronger

The important point is that the signal is not “switch now on first loss.”
It is:

- keep a smoothed side-dominance score
- require persistence before flipping
- use hysteresis
- abstain when the shadow spread is weak or oscillatory

So the stricter test reinforces the earlier conclusion:

- there is enough structure to justify a regime-favorableness detector
- there is not yet enough stability to justify a raw mechanical flip
- the right next step is live dual-shadow logging on the same bars, then
  threshold and persistence calibration on that shared stream

## Flip-after-loss counterfactual

The actual live short ledger was also replayed under a simple finite-state
side-switch rule:

- start `SHORT`
- if the current side loses `N` trades in a row, flip to the other side
- keep applying the same rule across the whole trade sequence

This is the cleanest way to test the idea “short losses are the long cue.”

On the current `234`-trade live ledger:

- always short: WR `44.44%`, compounded return `+11.35%`, max DD `5.71%`
- always long: WR `44.87%`, compounded return `-20.13%`, max DD `23.09%`

Threshold sweep:

- `N=1`: WR `40.60%`, compounded return `+5.33%`, max DD `11.11%`, flips `139`
- `N=2`: WR `44.44%`, compounded return `-17.72%`, max DD `17.77%`, flips `43`
- `N=3`: WR `48.29%`, compounded return `+5.48%`, max DD `6.35%`, flips `13`
- `N=4`: WR `47.86%`, compounded return `+6.21%`, max DD `6.55%`, flips `7`
- `N=5`: WR `43.59%`, compounded return `+10.52%`, max DD `5.59%`, flips `5`
- `N=6`: WR `45.73%`, compounded return `+15.17%`, max DD `4.84%`, flips `3`

Interpretation:

- side switching can help
- it helps best when the flip threshold is fairly high
- the best observed threshold in this small grid was `N=6`
- low thresholds are too twitchy and can destroy the edge

So the practical conclusion is:

- a raw flip-on-first-loss rule is not justified
- a slower loss-cluster regime switcher is plausible
- the switcher must be hysteretic and persistence-gated

This is consistent with the earlier shadow-score recommendation and explains
why the observed “8 or 9 losses, then a couple wins” pattern can be useful
without being directly automatable at a low threshold.

## Condition-gated flip replay

I then reran the side-switch counterfactual with an additional gate:

- the current side must first hit `N` consecutive losses
- the opposite side must also satisfy its own deterministic long/short entry condition
- the replay uses the same 10-bar tape skeleton and the worst-10-bar asset expression

Two long theories were tested separately:

- **Old mirror-long**: `vel_div < -0.02` and cross-sectional 10-bar momentum `< 0`
- **New stressed-unwind long**: `instability_50 >= 20.5` and `v300 < 0` and `v750 < 0`

Results on the long research windows:

- old mirror-long becomes marginally usable only at high thresholds:
  - `N=5`: WR `47.00%`, compounded return `+6.34%`, DD `46.23%`, flips `11`
  - `N=6`: WR `46.52%`, compounded return `+28.34%`, DD `43.78%`, flips `5`
- the new stressed-unwind long does **not** survive this gate cleanly:
  - `N=1..6`: compounded return stays negative, with severe drawdown

Interpretation:

- the condition gate does not rescue the new long theory
- it does preserve the old mirror-long as a late, low-frequency fallback
- the market still looks too unstable for a low-threshold flip rule
- if we keep this path, it should be a smoothed regime sampler, not an immediate switcher

Report:

- [`flip_on_loss_condition_gate_report.md`](</mnt/dolphinng5_predict/run_logs/flip_on_loss_condition_gate_report.md>)

## Full-history condition-gated replay

I then ran the same condition-gated flip simulator across the entire
available price tape:

- root: `/mnt/dolphin_training/share_offload/vbt_cache_klines`
- rows: `2,553,401`
- span: `2021-06-15 00:01:00+00:00 -> 2026-03-18 18:16:40.041456896+00:00`

This is the hardest and most useful stress test because it removes the
recent-slice bias entirely.

Results:

- **old mirror-long**
  - `N=1..6` win rate range: `44.95% -> 46.60%`
  - best mean PnL at `N=6`: `-0.000163` per trade
  - best threshold still compounds to `-100%` over the full archive
- **new stressed-unwind long**
  - `N=1..6` win rate range: `44.16% -> 46.86%`
  - best mean PnL at `N=6`: `-0.000218` per trade
  - best threshold also compounds to `-100%`

Interpretation:

- the condition gate does not rescue either long theory at full-archive scale
- the old mirror-long is still the stronger of the two, but only marginally
- the long-side edge, if it exists, is too weak or too regime-dependent to
  survive this archive-wide flip rule without additional filtering
- the full-tape result is a warning against over-trusting the favorable
  recent-month slices

Report:

- [`flip_on_loss_condition_gate_stream_full_report.md`](</mnt/dolphinng5_predict/run_logs/flip_on_loss_condition_gate_stream_full_report.md>)

## Post-outlier-short-win long-flip probe

Motivation: the May 8 live footer showed a familiar-looking pattern:

- large 9x short win, e.g. `ALGOUSDT` `+$466` or `VETUSDT` `+$574`
- immediately followed by a somewhat larger-than-normal short loss, e.g.
  `DASHUSDT -$191` or `STXUSDT -$54`

The question was whether this is a real post-outlier rebound signature:

```text
after a very large short win,
should the next trade, or next few trades, be treated as LONG candidates?
```

Dataset and hygiene:

- source: BLUE only
- ClickHouse `dolphin.trade_events`: `1305` rows, `1296` unique trade IDs
- trader logs: `1712` exit rows, `1092` unique trade IDs
- merged near-duplicate-cleaned sequence: `1609` unique trade IDs
- analysis subset after excluding hibernate / subday ACB exits: `1321` trades
- span: `2026-03-31 01:10:34 UTC` to `2026-05-08 13:26:06 UTC`

The log and warehouse streams overlap but do not have perfectly identical
timestamps, so the analysis de-duplicates by trade id where possible and by
near-time / asset / reason / realized PnL where the same exit was written by
both paths. This matters because a naive merge double-counts many recent exits.

Counterfactual method:

- keep the same entry/exit skeleton
- actual side is the live BLUE short
- counterfactual long return is approximated as `-short_return - 4 bps`
- this is not a separately selected long engine; it only tests whether the
  immediate post-win tape direction would have favored the other side

Baseline over the cleaned sequence:

- always short: `1321` trades, WR `55.79%`, mean return/trade `+0.0781%`,
  compounded return `+166.36%`, max DD `15.70%`
- always long on the same skeleton: WR `38.46%`, mean return/trade `-0.1181%`,
  compounded return `-80.08%`, max DD `80.48%`

So the full ledger does **not** support a broad long flip. The question only
survives as a narrow post-outlier condition.

Primary post-outlier trigger:

```text
trigger if prior trade:
  pnl_abs >= $400
  leverage >= 8.5x
  pnl_pct >= +0.50%
```

Immediate next-trade result:

- triggers: `47`
- next trades affected: `47`
- actual next short subset: WR `53.19%`, mean return `-0.0821%`,
  compounded return `-4.05%`, realized PnL `-$1,725.40`
- flipped-to-long subset: WR `40.43%`, mean return `+0.0421%`,
  compounded return `+1.72%`, estimated PnL `+$409.47`
- estimated dollar delta: `+$2,134.88`
- whole-sequence policy if only those next trades are flipped:
  compounded return improves from `+166.36%` to `+182.38%`
  and max DD improves from `15.70%` to `13.33%`

The stricter trigger `pnl_abs >= $400`, `leverage >= 8.5x`,
`pnl_pct >= +0.95%` is similar:

- triggers: `46`
- actual next short subset: `-$1,534.21`
- flipped-to-long estimate: `+$276.64`
- estimated dollar delta: `+$1,810.85`
- whole-sequence compounded return: `+180.91%`

The effect is strongest on the immediately following trade. It decays quickly:

- next `2` trades after the primary trigger: affected `91`, actual `-$2,689.16`,
  flipped estimate `+$555.98`, dollar delta `+$3,245.15`
- next `3` trades: affected `134`, actual `-$2,357.77`, flipped estimate
  `-$588.02`, dollar delta still positive because the flip loses less
- next `5` trades: benefit becomes materially less clean

Examples from the live tail:

- `ALGOUSDT` `2026-05-08 09:55 UTC`, `+466.34`, `9x`, `+0.929%`
  - next trade `DASHUSDT`: actual short `-191.19`; same-skeleton long would
    have been directionally positive after fee
- `VETUSDT` `2026-05-08 12:37 UTC`, `+573.64`, `9x`, `+1.546%`
  - next trade `STXUSDT`: actual short `-53.52`; same-skeleton long would
    have been directionally positive after fee
- larger historic outlier `STXUSDT` `2026-05-05 20:29 UTC`, `+6796.86`,
  `9x`, `+13.845%`
  - the following trade was a small short loss, and the next several trades
    were mixed rather than uniformly long-favorable

Interpretation:

- there is a real event-conditioned post-outlier rebound / exhaustion signal
- it is not a win-rate improvement; it is a dollar / drawdown improvement
- it should not be promoted as a general long engine
- it is best framed as a one-trade post-outlier **long probe** or short
  cooldown candidate, not as a multi-trade regime flip

Relationship to the long-system research:

- this is different from both deterministic long theories already studied:
  - old mirror-long: negative `vel_div` mean-reversion long
  - new stressed-unwind long: high instability plus negative slow velocities
- the post-outlier signal is more local and path-conditioned:
  - a violent short win likely means the chosen asset or local basket has
    just completed an exhaustion leg
  - the next trade may be more exposed to rebound / adverse short continuation
    than to fresh downside continuation
- this should become a feature inside the dual-shadow side-selection sampler:
  - `last_trade_was_outlier_short_win`
  - `last_trade_leverage`
  - `last_trade_realized_pnl_abs`
  - `last_trade_return_pct`
  - `bars_since_outlier_win`
  - `same_asset_or_correlated_asset_followup`

Research conclusion:

- broad `SHORT -> LONG` inversion remains false on the full sequence
- immediate one-trade long probing after a large 9x short win is empirically
  plausible and improved historical BLUE dollars in this cleaned replay
- the next test should condition this event trigger on the existing long gates
  and market fingerprint state, rather than using it as a naked side switch

## Leverage-as-conviction win-probe sweep

Follow-up thesis:

```text
leverage is a conviction expression

if a high-conviction short probe wins:
  make subsequent / next trades LONG

if leverage is below roughly 0.69:
  possibly do not trade
```

The initial test used:

```text
trigger_lev = 0.70
trade_min_lev = 0.69
win = net PnL > 0
```

Two side-selection forms were tested:

- **persistent shadow probe**: the short engine continues to run as a shadow.
  A high-lev short-shadow win turns the traded side LONG. A high-lev
  short-shadow loss resets the traded side SHORT.
- **one-shot after win**: a high-lev short-shadow win arms only the next
  eligible trade as LONG, then resets.

The test used the same cleaned BLUE sequence as the post-outlier study, updated
through `2026-05-08 13:40:04 UTC`:

- ClickHouse rows: `1307`
- ClickHouse unique trade IDs: `1298`
- trader-log exit rows: `1716`
- merged near-duplicate-cleaned trade IDs: `1612`
- analysis subset after excluding hibernate / subday ACB exits: `1324`

Baselines:

- always short: `1324` trades, WR `55.82%`, mean return/trade `+0.0784%`,
  compounded return `+168.02%`, max DD `15.70%`, PnL `+$11,135.86`
- always long on the same skeleton: WR `38.44%`, compounded return `-80.23%`,
  max DD `80.62%`, PnL `-$36,875.48`
- short-only with `trade_min_lev >= 0.69`: `1050` trades, compounded return
  `+81.86%`, max DD `20.80%`, PnL `+$11,063.86`
- short-only with `trade_min_lev >= 5.0`: `565` trades, compounded return
  `+88.08%`, max DD `8.94%`, PnL `+$11,980.01`
- short-only with `trade_min_lev >= 8.5`: `501` trades, compounded return
  `+82.57%`, max DD `7.58%`, PnL `+$12,193.65`

Initial `0.70 / 0.69` thesis result:

- persistent shadow-probe switch:
  - traded: `1050`
  - LONG trades: `457`
  - flips to LONG: `249`
  - WR `37.08%`
  - compounded return `-5.61%`
  - max DD `26.60%`
  - PnL `-$2,527.65`
- one-shot after high-lev win:
  - traded: `1050`
  - LONG trades: `455`
  - flips to LONG: `456`
  - WR `37.24%`
  - compounded return `-3.56%`
  - max DD `26.19%`
  - PnL `-$2,113.83`

So the literal initial thesis fails. `0.70` is too low as a
side-switch trigger. It arms hundreds of LONG trades and turns a strong
short-led ledger into a slightly losing one.

Important evaluation frame:

The goal is **not** to find a LONG overlay that beats the whole short-only
engine by itself. The goal is to find a side-selection overlay that adds
marginal value only on the subset where it intervenes. The correct comparison
is therefore:

```text
overlay_delta =
    pnl_if_intervened_long_on_triggered_trades
  - pnl_if_original_short_was_left_unchanged_on_same_triggered_trades
```

The overlay is useful only if it satisfies all of the following:

- it has positive `overlay_delta` after fees and conservative slippage
- it reduces realized drawdown or loss clustering on the intervention subset
- it does not cut too many profitable short trades
- it remains positive across time splits, assets, and neighboring thresholds
- it has enough triggers to be statistically more than a single accident

Under that marginal-overlay framing, the broad leverage-win thesis still fails:

- persistent `0.70 / 0.69` switch delta vs same `lev >= 0.69` short-only
  baseline: about `-$13,591.51`
- one-shot `0.70 / 0.69` switch delta vs same `lev >= 0.69` short-only
  baseline: about `-$13,177.69`
- best swept dollar switch delta vs same `lev >= 0.69` short-only baseline:
  about `-$5,949.36`

By contrast, the narrower post-outlier rule did show positive marginal overlay
value on its triggered subset:

- triggered next-trade cases: `47`
- leaving the next trade SHORT: PnL `-$1,725.40`
- flipping only that next trade LONG: PnL `+$409.47`
- marginal overlay delta: `+$2,134.87`
- whole-sequence drawdown improved from about `15.70%` to `13.33%`

That is the key distinction. The broad high-leverage-win rule is not reliable
enough. The narrow post-outlier rule is a legitimate candidate for guarded
shadow/live-probe research because it adds value exactly where it intervenes,
but the sample is still too small for unconditional deployment.

### Lowered big-win threshold grid

The phrase "sample too small" applies only to the original high-tail trigger
(`pnl_abs >= $400`, `lev >= 8.5`, immediate next trade). It does **not** mean
the BLUE ledger is small. The cleaned replay now spans:

- `1328` non-hibernate / non-subday-ACB BLUE trades
- `1616` merged near-duplicate-cleaned trade IDs
- `2026-03-31 01:10:34 UTC` through `2026-05-08 14:21:31 UTC`

To test whether the effect survives with more triggers, the post-win sweep was
expanded to:

- dollar win thresholds: `$10`, `$25`, `$50`, `$75`, `$100`, `$150`, `$200`,
  `$300`, `$400`, `$500`, `$750`, `$1000`
- leverage thresholds: `0`, `0.69`, `0.70`, `1`, `2`, `3`, `5`, `8.5`, `9`
- return thresholds: `0`, `0.10%`, `0.25%`, `0.50%`, `0.75%`, `0.95%`,
  `1.25%`
- follow-on horizons: next `1`, `2`, `3`, and `5` trades

Important result:

- lowering **dollar threshold alone** does not work
- lowering dollar threshold **with a realized-return threshold** does work
- the effect is mostly next `1` to `2` trades
- by next `5` trades, flipping LONG is not positive; cooldown / abstain is
  better than LONG if the horizon is that wide

Grid-wide stability:

- horizon `1`: `630` eligible threshold combinations, `60.0%` positive
  marginal delta, `45.87%` positive LONG PnL
- horizon `2`: `630` eligible threshold combinations, `57.30%` positive
  marginal delta, `39.52%` positive LONG PnL
- horizon `3`: `693` eligible threshold combinations, `59.60%` positive
  marginal delta, `12.99%` positive LONG PnL
- horizon `5`: `693` eligible threshold combinations, `51.08%` positive
  marginal delta, `0.0%` positive LONG PnL

This says the post-win effect is a short-lived exhaustion / rebound artifact,
not a durable multi-trade LONG regime.

Fixed dollar-only immediate-next-trade rows:

| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|---|---:|---:|---:|---:|---:|---:|
| `$10+`, no lev gate | 277 | `+$3,044` | `-$9,146` | `-$12,190` | `+24.74%` | `24.55%` |
| `$50+`, no lev gate | 181 | `+$4,495` | `-$8,870` | `-$13,365` | `+42.58%` | `22.18%` |
| `$100+`, no lev gate | 135 | `+$908` | `-$4,252` | `-$5,160` | `+97.78%` | `18.09%` |
| `$200+`, no lev gate | 89 | `-$947` | `-$1,496` | `-$549` | `+140.76%` | `14.96%` |
| `$300+`, no lev gate | 62 | `-$1,695` | `-$45` | `+$1,651` | `+174.25%` | `13.70%` |
| `$400+`, no lev gate | 48 | `-$1,725` | `+$407` | `+$2,133` | `+180.70%` | `13.33%` |
| `$500+`, no lev gate | 40 | `-$1,153` | `+$90` | `+$1,242` | `+173.51%` | `13.33%` |

Dollar-only conclusion:

- below about `$300`, the next short trade is still net-profitable or less bad
  than the LONG flip
- around `$300`, the next short trade turns bad, but LONG is only near-flat
- around `$400` to `$500`, the next-trade LONG flip becomes positive

Fixed immediate-next-trade rows with a `+0.75%` realized-return trigger:

| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|---|---:|---:|---:|---:|---:|---:|
| `$10+` and `+0.75%` | 99 | `-$1,735` | `-$409` | `+$1,326` | `+104.45%` | `14.03%` |
| `$50+` and `+0.75%` | 74 | `-$1,950` | `+$105` | `+$2,055` | `+155.62%` | `14.03%` |
| `$75+` and `+0.75%` | 70 | `-$2,028` | `+$194` | `+$2,223` | `+166.91%` | `13.95%` |
| `$100+` and `+0.75%` | 67 | `-$2,083` | `+$336` | `+$2,419` | `+168.60%` | `13.69%` |
| `$150+` and `+0.75%` | 63 | `-$2,082` | `+$344` | `+$2,426` | `+175.37%` | `13.69%` |
| `$300+` and `+0.75%` | 58 | `-$1,738` | `+$58` | `+$1,796` | `+173.61%` | `13.70%` |
| `$400+` and `+0.75%` | 48 | `-$1,725` | `+$407` | `+$2,133` | `+180.70%` | `13.33%` |

Return-conditioned conclusion:

- the effect becomes visible with more triggers when the dollar threshold is
  lowered to `$50-$150` **and** the prior win is also at least `+0.75%`
- the best immediate-next-trade delta in this grid was around `$150+` and
  `+0.75%`: `63` next trades, SHORT `-$2,081.81`, LONG `+$343.94`, delta
  `+$2,425.75`
- the original `$400+`, high-leverage trigger remains good but is not the only
  viable threshold; it is the cleaner high-tail version

Two-trade horizon:

| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|---|---:|---:|---:|---:|---:|---:|
| `$300+`, `lev >= 8.5` | 115 | `-$3,201` | `+$511` | `+$3,712` | `+168.52%` | `14.27%` |
| `$400+`, `lev >= 8.5` | 91 | `-$2,689` | `+$556` | `+$3,245` | `+175.26%` | `13.71%` |
| `$500+`, `lev >= 8.5` | 75 | `-$2,237` | `+$509` | `+$2,747` | `+167.53%` | `14.71%` |

Two-trade conclusion:

- the high-leverage `$300-$500` zone supports a two-trade exhaustion rebound
  more strongly than the original one-trade-only statement
- the best two-trade variant in this fixed grid was `$300+`, `lev >= 8.5`,
  next two trades: delta `+$3,712`, estimated LONG PnL `+$511`
- the five-trade horizon should not be traded LONG; it is only a damage-control
  / cooldown signal

Reliability statement:

The post-win overlay is more solid than initially stated. The robust form is
not "after any win"; that is false. The robust form is:

```text
after a sufficiently large realized short win,
especially a high-return or high-leverage win,
the next 1-2 short-engine opportunities are often contaminated by rebound risk
and can be improved by LONG flip or, at minimum, cooldown/abstain.
```

The strongest candidates for shadow/live-probe research are:

- immediate next trade after `$100-$200` win **and** prior return `>= +0.75%`
- immediate next trade after `$400+` win, especially `lev >= 8.5`
- next two trades after `$300-$500` win with `lev >= 8.5`

Guardrail:

The overlay should not optimize on WR. LONG WR remains lower than SHORT WR on
many triggered subsets. The edge is payoff asymmetry / loss-tail avoidance:
short wins become smaller or disappear after the exhaustion event, while short
losses on the next trade(s) become expensive.

### Candidate codified overlay rule and EFSM

Terminology:

- **EFSM** means **Execution FSM**
- refer to this component as the post-win **EFSM**, not merely a generic
  "state machine"

Candidate rule proposed after the lowered-threshold sweep:

```text
after a completed BLUE SHORT trade:

  if pnl_abs > $397:
      tag next 1 trade as FLIP_LONG

  if pnl_abs > $397 and leverage > 8.6:
      tag next 2 trades as FLIP_LONG

  if 0 < pnl_abs < $250 and pnl_pct >= +0.75%:
      tag next 1 trade as FLIP_LONG

  after the armed slots are consumed:
      reset to SHORT
```

EFSM semantics:

- this is a **slot-based Execution FSM**, not a persistent regime switch
- each trigger arms an explicit number of future slots
- each future entry consumes exactly one slot
- when `slots_remaining == 0`, the state resets to SHORT
- while slots are active, new triggers are ignored by default
- a flipped LONG trade outcome is not allowed to re-arm the overlay
- this prevents the reset bug where one flipped trade recursively arms the next
  and converts a bounded rebound probe into an unbounded side switch
- the implementation supports arbitrary future slot counts, not only `1` and
  `2`

Implementation location:

- EFSM: `adaptive_exit/post_win_long_overlay.py`
- canonical class names: `PostWinExecutionFSM`,
  `PostWinExecutionFSMConfig`
- compatibility aliases: `PostWinLongOverlay`, `PostWinLongOverlayConfig`
- tests: `prod/tests/test_post_win_long_overlay.py`

Focused test coverage:

- `$397+` non-high-leverage win arms one slot
- `$397+` and `lev > 8.6` arms two slots
- `< $250` and `pnl_pct >= +0.75%` arms one slot
- active arms consume deterministically and reset to SHORT
- re-arm attempts while active are ignored
- flipped LONG outcomes cannot re-arm
- optional TTL expiry works
- future `3+` slot rules work

Focused verification:

```text
python -m pytest -o cache_dir=/tmp/pytest-cache-post-win-overlay \
  prod/tests/test_post_win_long_overlay.py -q

7 passed
```

Exact candidate replay, no re-arm during active flip slots:

- input: `1333` cleaned BLUE trades through `2026-05-08 14:34:57 UTC`
- baseline short-only estimated PnL: `+$10,953.50`
- candidate policy estimated PnL: `+$12,464.30`
- marginal dollar delta: `+$1,510.80`
- baseline max DD: `15.70%`
- candidate max DD: `14.78%`
- long-flipped trades: `160`
- affected subset left SHORT: `-$2,415.46`
- affected subset flipped LONG: `-$904.67`
- affected subset marginal delta: `+$1,510.80`
- triggers armed:
  - `small_dollar_high_return`: `77`
  - `big_win_high_lev`: `41`
  - `big_win`: `1`
- slots consumed:
  - `small_dollar_high_return`: `77`
  - `big_win_high_lev`: `82`
  - `big_win`: `1`
- consumed arms: `119`
- dangling slots at end: `0`
- ignored re-arm attempts while active: `20`

Reset sensitivity:

Allowing active flipped trades / active arms to re-arm is harmful:

- unsafe recursive re-arm variant long flips: `183`
- unsafe marginal delta: `-$5,425.32`
- safe no-rearm marginal delta: `+$1,510.80`

Therefore the no-recursive-rearm reset invariant is not optional. It is part of
the edge definition.

Compound-return caveat:

- baseline short-only compound: `+164.89%`
- candidate compound: `+107.26%`

This is why the overlay must be treated as a dollar-tail / drawdown-control
overlay first, not as a compounding optimizer. The current counterfactual uses
same entry/exit skeleton and estimated flipped LONG PnL, so the next validation
step must include actual LONG execution assumptions, long-side V7 behavior, and
time-to-next-entry gating.

Time dependency:

The replay showed material timing dependence:

| Delay from trigger to flipped entry | n | SHORT PnL | LONG PnL | Delta |
|---|---:|---:|---:|---:|
| `<=15m` | 19 | `+$2,765.51` | `-$3,062.37` | `-$5,827.88` |
| `15-30m` | 67 | `-$3,588.76` | `+$2,381.96` | `+$5,970.72` |
| `30-60m` | 40 | `-$882.57` | `-$104.33` | `+$778.24` |
| `>60m` | 34 | `-$709.64` | `-$119.93` | `+$589.72` |

This means the overlay may need a lower-bound delay, an upper-bound TTL, or
market-state confirmation. The current EFSM already supports TTL;
the exact timing gate remains research, not deployed doctrine.

AdvancedExitManagerV7 / AlphaExitEngineV7 caveat:

`AlphaExitEngineV7` is mechanically side-aware:

- `side=0` means LONG
- `side=1` means SHORT
- PnL, MFE, MAE, trend direction, and adverse/favorable movement are signed by
  `ctx.side`

However, V7 calibration is SHORT-lineage:

- bounce model labels were trained on BLUE SHORT adverse-bar samples
- pressure threshold `2.69` was selected on SHORT/GREEN-lineage replay
- MAE/MFE concepts are symmetric in code but not guaranteed symmetric in
  fitted thresholds or bounce probabilities

Before any live FLIP_LONG execution, V7 must be validated in one of these modes:

- shadow-only LONG contexts using actual flipped LONG entries
- conservative LONG-specific V7 threshold override
- disable V7 live exits for overlay LONGs until enough shadow decisions show
  it does not prematurely cut the rebound edge

The rule can be codified, but production wiring must keep the EFSM, side
selection, and V7 exit policy explicitly separable.

Sweep results:

- best by compounded return:
  - mode: one-shot after win
  - `trigger_lev = 9.0`
  - `trade_min_lev = 0.0`
  - traded: `1324`
  - LONG trades: `222`
  - WR `50.91%`
  - compounded return `+61.93%`
  - max DD `19.36%`
  - PnL `-$257.03`
- best by estimated dollars:
  - mode: one-shot after win
  - `trigger_lev = 2.0`
  - `trade_min_lev = 0.69`
  - traded: `1050`
  - LONG trades: `297`
  - WR `40.03%`
  - compounded return `+27.71%`
  - max DD `22.44%`
  - PnL `+$5,114.50`

Both sweep optima still underperform the relevant short-only baselines. In
particular, simply treating high leverage as a short-side quality filter is
stronger than using high-leverage short wins as a broad long-switch trigger:

- `lev >= 8.5`, short-only: PnL `+$12,193.65`, max DD `7.58%`
- best long-switch dollar policy: PnL `+$5,114.50`, max DD `22.44%`

Interpretation:

- leverage does behave like conviction, but the first-order use is filtering /
  sizing, not side inversion
- ordinary high-lev wins are too common to serve as a LONG regime switch
- the previous post-outlier result survives only because it was much narrower:
  large dollar win, 9x, and immediate next trade
- high-lev wins may still be useful as **features** in the dual-shadow /
  market-fingerprint layer:
  - `last_high_lev_short_win`
  - `last_high_lev_short_win_count`
  - `last_high_lev_short_win_pnl_abs`
  - `last_high_lev_short_win_return_pct`
  - `bars_since_high_lev_short_win`
  - `consecutive_high_lev_short_wins`

Research conclusion:

- do not implement the literal `lev > 0.70` long switch
- do preserve leverage as a strong conviction feature
- do keep the narrower post-outlier one-trade long probe in the research queue
- the strongest immediate operational lesson is that low-leverage trades may be
  unnecessary, while high-leverage shorts remain the cleaner expression

## AlphaExitEngineV7 LONG calibration replay

Date: `2026-05-08`

Scope:

- system: BLUE only
- exit engine: `AlphaExitEngineV7`
- harness: `adaptive_exit/calibrate_v7_long_from_journal.py`
- source data: ClickHouse `dolphin.v7_decision_events`
- source rows: `6,812`
- reconstructed BLUE V7-tracked paths: `97`
- path side in source journal: SHORT
- replay side for calibration: synthetic LONG (`side=0`)
- fee assumption: `4 bps`
- natural exit comparator: final logged decision-row price for the same path
- V7 exit comparator: first replayed V7 `EXIT` on the same price path
- bounce model: disabled for this replay by intentionally using a missing model
  path, because the current bounce model is trained on BLUE SHORT adverse-bar
  samples and should not be treated as a validated LONG probability model

This is a LONG-exit calibration proxy, not proof from exchange-filled LONG
trades. It answers a narrower question: if the post-win EFSM had flipped a
trade LONG on price paths that BLUE V7 actually observed, would a LONG-side V7
cut/exit surface have improved or harmed the synthetic LONG outcome versus
holding to the path's natural end?

### Original V7 SHORT calibration pattern

The original V7 calibration was a pressure-threshold sweep over live shadow
decisions. V7 computes:

```text
exit_pressure = clamp(directional_term + risk_term, -3.0, +3.0)
```

Then:

```text
if exit_pressure > 2.69:
    EXIT
elif exit_pressure > 1.0:
    RETRACT
elif exit_pressure < -0.5 and pnl_pct > 0:
    EXTEND
else:
    HOLD
```

The documented SHORT lineage was:

| Pressure threshold | Fires | Result |
|---:|---:|---:|
| `2.00` | `22/24` | `+$439`, ROI `+1.67%` |
| `2.35` | `17/24` | `+$891`, ROI `+3.38%` |
| `2.60` | `17/24` | `+$891`, ROI `+3.38%` |
| `3.00` | `14/24` | `+$796`, ROI `+3.02%` |
| base/no V7 | n/a | `+$784`, ROI `+2.98%` |

The deployed threshold `2.69` was chosen as the high end of the useful
`2.35-2.70` band so V7 stayed closer to base behavior and avoided cutting
winners on transient pressure.

### Threshold surface now explicit

`AlphaExitEngineV7` now accepts an optional per-engine
`AlphaExitV7Config`. Defaults preserve the deployed SHORT-calibrated behavior.
This lets BLUE instantiate separate SHORT and LONG V7 engines later without
global mutation.

V7-specific configurable fields:

| Config field | Default | Meaning |
|---|---:|---|
| `rvol_w15` | `0.50` | realized-vol composite weight for 15-bar volatility |
| `rvol_w30` | `0.30` | realized-vol composite weight for 30-bar volatility |
| `rvol_w50` | `0.20` | realized-vol composite weight for 50-bar volatility |
| `rvol_floor` | `0.000001` | minimum realized-vol denominator |
| `mae_tier1_k` | `3.5` | MAE tier-1 multiplier on `rv_comp` |
| `mae_tier2_k` | `7.0` | MAE tier-2 multiplier on `rv_comp` |
| `mae_tier3_k` | `12.0` | MAE tier-3 multiplier on `rv_comp` |
| `mae_tier1_floor` | `0.005` | MAE tier-1 absolute floor |
| `mae_tier2_floor` | `0.012` | MAE tier-2 absolute floor |
| `mae_tier3_floor` | `0.025` | MAE tier-3 absolute floor |
| `mae_tier1_risk` | `0.5` | pressure contribution once tier 1 is breached |
| `mae_tier2_risk` | `0.8` | pressure contribution once tier 2 is breached |
| `mae_tier3_risk` | `1.2` | pressure contribution once tier 3 is breached |
| `mae_accel_min_bars` | `3` | minimum bars before adverse-acceleration gate can fire |
| `mae_accel_peak_floor` | `0.003` | adverse peak floor for MAE acceleration risk |
| `mae_accel_risk` | `0.6` | pressure contribution for MAE acceleration |
| `mae_recovery_peak_floor` | `0.004` | adverse peak floor for failed-recovery gate |
| `mae_recovery_prev_min` | `0.25` | prior recovery ratio required before snapback risk |
| `mae_recovery_snapback_max` | `0.10` | recovery ratio below which recovery is treated as failed |
| `mae_recovery_risk` | `1.0` | pressure contribution for failed recovery |
| `mae_late_floor` | `0.003` | MAE required before late adverse ramp applies |
| `mae_late_start_frac` | `0.60` | bars-held fraction where late adverse ramp starts |
| `mae_late_risk_max` | `0.4` | maximum late adverse pressure contribution |
| `max_hold_ref_mult_3m` | `3.0` | V7 internal max-hold reference multiplier |
| `mfe_slope_peak_floor` | `0.01` | peak favorable floor for convexity slope break |
| `mfe_convexity_decay_exit` | `0.35` | decay ratio for hard MFE giveback pressure |
| `mfe_convexity_decay_soft` | `0.20` | decay ratio for soft MFE giveback pressure |
| `mfe_convexity_exit_risk` | `1.5` | pressure contribution for hard MFE giveback |
| `mfe_convexity_soft_risk` | `0.3` | pressure contribution for soft MFE giveback |
| `mfe_accel_floor` | `-0.00001` | MFE acceleration floor for adverse convexity |
| `mfe_accel_peak_floor` | `0.005` | peak favorable floor for MFE acceleration risk |
| `mfe_accel_risk` | `0.2` | pressure contribution for MFE acceleration risk |
| `bounce_dir_w` | `0.15` | bounce score directional-term weight |
| `bounce_risk_w` | `0.35` | bounce risk-term weight |
| `bounce_rv_safe_floor` | `0.00001` | bounce feature volatility denominator floor |
| `exit_pressure_threshold` | `2.69` | live `EXIT` threshold |
| `retract_pressure_threshold` | `1.0` | `RETRACT` threshold |
| `extend_pressure_threshold` | `-0.5` | profitable `EXTEND` threshold |
| `pressure_min` | `-3.0` | pressure clamp lower bound |
| `pressure_max` | `3.0` | pressure clamp upper bound |

Inherited V6 weight priors remain configurable through the existing
`WeightAdapter`/`WeightPriors` seam. The new config is specifically for V7
threshold/gate surfaces and is init-time/per-engine configurable.

### LONG replay results

Baseline synthetic LONG natural exit across the 97 paths:

- natural PnL: `-$328.84`
- natural WR: `59.79%`
- natural compound: `+3.50%`
- natural max DD: `2.28%`

The dollar PnL and compound can diverge because path notionals differ. For this
exit calibration, dollar PnL is the more relevant metric because BLUE sizing is
not uniform.

Top tested surfaces:

| Candidate | V7 PnL | Delta vs natural | Exits | Exit rate | V7 WR | V7 max DD |
|---|---:|---:|---:|---:|---:|---:|
| `mfe_risk_scale_0.5` | `+$205.32` | `+$534.15` | `36` | `37.11%` | `50.52%` | `1.69%` |
| `mfe_risk_scale_0.75` | `+$205.32` | `+$534.15` | `36` | `37.11%` | `50.52%` | `1.69%` |
| `combo_p1.7_mae0.75` | `+$47.24` | `+$376.08` | `51` | `52.58%` | `47.42%` | `1.55%` |
| `exit_p1.7` | `+$36.88` | `+$365.72` | `51` | `52.58%` | `47.42%` | `1.53%` |
| `exit_p2.0` | `+$19.68` | `+$348.52` | `41` | `42.27%` | `49.48%` | `1.53%` |
| `short_default` / `exit_p2.69` | `+$1.43` | `+$330.26` | `38` | `39.18%` | `49.48%` | `1.81%` |
| `exit_p3.0` | `-$328.84` | `$0.00` | `0` | `0.00%` | `59.79%` | `2.28%` |

Interpretation:

- The deployed SHORT default is not mechanically broken for LONG. It improved
  synthetic LONG dollar outcome by `+$330.26` versus natural exit on the 97
  replayed paths.
- The best tested LONG proxy did not come from lowering the pressure threshold.
  It came from reducing MFE giveback/convexity pressure contribution
  (`mfe_risk_scale_0.5` or `0.75`).
- Aggressively lowering `exit_pressure_threshold` to `1.4` over-fires:
  `78/97` exits, V7 PnL `-$11.78`, and many negative deltas. That resembles the
  original SHORT calibration failure at `2.0`: pressure that is too sensitive
  cuts too much transient noise.
- A moderate pressure threshold around `1.7-2.0` is useful, but still inferior
  to leaving pressure at `2.69` and reducing MFE-risk contributions in this
  proxy.

Recommended LONG overlay calibration candidate for shadow:

```python
AlphaExitV7Config(
    mfe_convexity_exit_risk=0.75,
    mfe_convexity_soft_risk=0.15,
    mfe_accel_risk=0.10,
)
```

This is the `mfe_risk_scale_0.5` surface. It keeps:

- `exit_pressure_threshold = 2.69`
- all MAE vol-normalized loss-cut thresholds unchanged
- pressure clamp unchanged
- bounce disabled or neutral until a LONG-trained bounce model exists

Why this candidate is preferable to simply lowering `exit_pressure_threshold`:

- it preserved the useful loss-cut behavior while avoiding broad pressure
  over-firing
- it improved dollar PnL more than all pressure-threshold sweeps tested
- it left MAE protection intact, which matters if the flipped LONG thesis is
  wrong and the asset continues down
- it respects that the post-win EFSM edge is a rebound/cooldown edge, so the
  exit manager should not over-penalize ordinary post-entry MFE shape

Do not deploy this LONG config live yet. It should first be run in shadow on
actual EFSM-flipped candidate LONG contexts, because this replay uses SHORT
entries inverted to LONG and not real LONG fills.

### Regression and safety notes

Implemented code seams:

- `nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.py`
  defines `AlphaExitV7Config`
- default `AlphaExitEngineV7()` behavior remains the SHORT-calibrated config
- a LONG-specific engine can be instantiated with `AlphaExitEngineV7(config=...)`
- the calibration harness writes full results to `/tmp/v7_long_calibration.json`

Tests added:

- default config equals the legacy SHORT threshold surface
- custom config is per-instance and does not mutate the default engine
- V7 remains mechanically side-aware for LONG and SHORT PnL/MFE/MAE
- BLUE live V7 provider wiring still records journal decisions and uses OB
  signal input
- EFSM reset/no-recursive-rearm tests remain separate from V7 exit calibration

Research caveats:

- only `97` V7-tracked BLUE paths existed in the current decision journal
- this is enough to reject obviously bad LONG exit settings, but not enough to
  canonize a live LONG exit policy
- bounce must remain neutral for LONG until trained or validated on LONG samples
- V7 `max_hold_ref_mult_3m` still uses an internal time reference rather than
  the orchestrator's effective max hold; the system bible already tracks this
  as a V7 TODO/bug because it can make adverse-ramp pressure too early