Files
siloqy/prod/docs/LONG_DETERMINISTIC_RULE_RESEARCH.md
2026-05-08 19:54:13 +02:00

47 KiB

LONG Deterministic Rule Research

Date: 2026-05-07

Goal

Find the simplest deterministic long-side market rule, using primarily Dolphin NG eigendata, that behaves like the original short Alpha Engine rule in spirit:

  • few moving parts
  • market-structural
  • explainable in one breath
  • reliable enough to serve as a basal gate before asset selection and later overlays

This note is explicitly not about a fitted long model.

Data source

The analysis uses the raw daily scan cache summarized by:

  • adaptive_exit/characterize_long_signals.py
  • /mnt/dolphin_training/long_signal_research/long_signal_scan_summary_h24.parquet
  • /mnt/dolphin_training/long_signal_research/long_signal_characterization_report.json

Only eigendata and scan-price-derived outcomes are used here:

  • instability_50
  • v50/v150/v300/v750_lambda_max_velocity
  • vel_div
  • vel_div lag / delta terms

No ExF, EsoF, or OBF are required for the core finding.

What does not work as the basal long rule

The obvious mirror thesis,

  • vel_div > 0.01

is too weak to be the basal long edge.

Recent HQ slice (2025-12-31 onward):

  • support: 39.65%
  • strong_long lift: 1.15x
  • broad_long lift: 1.22x

That is not useless, but it is not elegant enough nor selective enough to be the long analogue of vel_div < -0.02.

Strongest deterministic shape

The long side shows up most clearly as a stressed unwind / squeeze regime, not as a generic bullish breakout regime.

Candidate primary deterministic rule

LONG_REGIME if
    instability_50 >= 20.5
    and v300_lambda_max_velocity < 0
    and v750_lambda_max_velocity < 0

Interpretation:

  • instability_50 >= 20.5: the market is structurally stressed
  • v300 < 0 and v750 < 0: the slower eigenspace is still negative / damaged
  • together: this is a high-stress unwind state where long opportunities tend to appear as reversals / squeezes on the same manifold that produces short dislocations

Why 20.5

20.5 is the rounded recent-HQ instability_50 90th-percentile threshold (20.546996...). It is the most practical fixed threshold found in the recent-era characterization.

Empirical support

Recent HQ (2025-12-31 onward)

Base rates:

  • strong_long: 0.1648
  • broad_long: 0.1367

Rule:

  • support: 6356 rows (3.58%)
  • strong_long: 0.3409 (2.07x lift)
  • broad_long: 0.3538 (2.59x lift)

Full history

Base rates:

  • strong_long: 0.2603
  • broad_long: 0.2472

Rule:

  • support: 300,728 rows (12.59%)
  • strong_long: 0.3330 (1.28x lift)
  • broad_long: 0.3375 (1.37x lift)

Simpler fallback

If maximum elegance is preferred over extra selectivity, the one-factor fallback is:

LONG_REGIME_SIMPLE if instability_50 >= 20.5

Recent HQ:

  • support: 10.10%
  • strong_long: 0.3297 (2.00x lift)
  • broad_long: 0.3420 (2.50x lift)

This is surprisingly strong for a one-variable rule. It is the closest thing found to a pure long-side analogue of the short vel_div < -0.02 gate.

Tradeoff:

  • simpler
  • broader
  • slightly less selective than adding v300 < 0 and v750 < 0

Optional stricter confirmation

If later tuning wants more explicit “healing after stress” confirmation, the strict variant is:

LONG_REGIME_STRICT if
    instability_50 >= 20.5
    and vel_div_lag6 < -0.03
    and vel_div_delta6 > 0.02

This is directionally sensible, but it is not materially better than the instability_50 + v300 + v750 rule, so it should be treated as an optional refinement, not the basal rule.

Monthly sanity check

For the candidate primary rule (instability_50 >= 20.5 && v300 < 0 && v750 < 0) in the recent HQ window:

  • 2026-01: strong_long = 0.348
  • 2026-02: strong_long = 0.344
  • 2026-03: strong_long = 0.312

The monthly base rates for the same period were:

  • 2026-01: 0.289
  • 2026-02: 0.271
  • 2026-03: 0.068

So even into the weak March tape, the rule remains elevated relative to base.

Practical interpretation

This should be viewed as a market-state gate, not a complete trade engine.

It says:

  • “the market is in the sort of stressed, damaged regime where long squeeze / unwind opportunities become meaningfully more likely”

It does not by itself say:

  • which asset is the best expression
  • how to size
  • how to exit

That is where the next layers belong:

  • deterministic or learned asset selection
  • OBF / ARS / bounce overlays
  • TP / MAX_HOLD policy

Recommendation

If a single deterministic long gate must be named now, use:

LONG_REGIME if instability_50 >= 20.5 and v300 < 0 and v750 < 0

If maximum simplicity is the priority, use:

LONG_REGIME_SIMPLE if instability_50 >= 20.5

And explicitly do not promote vel_div > 0.01 as the basal long rule.

Deferred analysis idea: dual-shadow regime sampler

This is a later analysis / control-layer research note, not a live-rule recommendation.

One plausible way to sample the market in real time without committing the full system immediately is a very lightweight dual-shadow engine:

  • Shadow A: the basal SHORT engine (vel_div < -0.02 Alpha Engine posture)
  • Shadow B: the basal LONG engine (currently the older negative-vel_div mean-reversion LONG posture is the best simple candidate)

The intent is not merely paper PnL logging. It is to use live, recent sample-trade outcomes as a micro-regime probe:

  • if SHORT shadow performance degrades while LONG shadow performance improves, the tape may have rotated into a LONG-favorable regime
  • if LONG degrades while SHORT improves, the inverse may be true
  • if both are performing acceptably, the tape may be permissive / broad enough that either side can express edge
  • if both are failing, the tape is likely choppy / non-coherent and abstention becomes a first-class candidate

This should be implemented, if ever pursued, as:

  • very fast
  • very lightweight
  • explicitly shadow-only at first
  • based on small, recent sample trades rather than a heavy fitted model

Longer-term, the entire shadow stream can itself become training data:

  • market fingerprints at shadow-entry time
  • concurrent SHORT-shadow and LONG-shadow outcomes
  • relative WR / ROI-per-trade / drawdown / time-to-win asymmetries

That would allow a later learner to predict or simplify the regime switcher. But even before ML, the dual-shadow process may already serve as a useful real-time market-sampling / regime-detection mechanism.

Dual-shadow persistence characterization

This section records the first persistence pass over extant trades. The goal was not to prove a full regime-switch system, but to test whether the observed short-loss streaks are durable enough to justify a regime-favorableness probe.

Important caveat:

  • the live SHORT series and the replay LONG series are on different date spans
  • this is therefore a side-specific persistence study, not a same-bar paired dominance study
  • the numbers below are still useful for run-length and hysteresis design

Live SHORT stream

From the current BLUE trader log:

  • trades: 234
  • win rate: 44.44%
  • mean pnl_pct: +0.000506
  • median pnl_pct: -0.000234
  • average win streak: 1.65 trades
  • average loss streak: 2.03 trades
  • P(win -> win) = 0.394
  • P(loss -> loss) = 0.512
  • average positive-day run: 1.5 days
  • average negative-day run: 1.5 days

Interpretation:

  • short failures do cluster
  • the cluster is real enough to notice
  • but it is only mildly persistent
  • by itself, it is not strong enough to justify a raw ping-pong switch

Basal LONG shadow, old mirror posture

Using the recent bullish-month replay and the single comparable 10-bar / worst_10bar configuration:

  • trades: 2,243
  • win rate: 48.33%
  • mean pnl_pct: +0.000320
  • median pnl_pct: -0.000400
  • average win streak: 1.93 trades
  • average loss streak: 2.07 trades
  • P(win -> win) = 0.483
  • P(loss -> loss) = 0.517
  • average positive-day run: 3.0 days
  • average negative-day run: 1.86 days

Interpretation:

  • this is the clearest durable long-favorable candidate seen so far
  • the multi-day positive run length is materially better than the live short stream
  • this supports a long-favorable regime probe, but not an unconditional flip

Basal LONG shadow, new stressed-unwind posture

Same replay setup:

  • trades: 569
  • win rate: 50.44%
  • mean pnl_pct: -0.000078
  • median pnl_pct: +0.000068
  • average win streak: 2.24 trades
  • average loss streak: 2.20 trades
  • P(win -> win) = 0.556
  • P(loss -> loss) = 0.546
  • average positive-day run: 1.36 days
  • average negative-day run: 1.18 days

Interpretation:

  • the new long posture has decent local persistence
  • but it is more fragile than the mirror-long posture as a regime switch
  • it does not yet justify itself as the primary flip trigger

Conclusion for regime switching

The data support a smoothed regime-favorableness detector, not a raw flip-on-first-loss system.

Practical reading:

  • short-loss streak persistence is real but modest
  • long-favorable states exist and can persist
  • persistence is on the order of a few trades, not a dramatic regime lock
  • the correct implementation is a shadow score with hysteresis and abstain logic, not a hard immediate SHORT/LONG switch

Suggested rule shape for later analysis:

  • compute rolling shadow scores for SHORT and LONG
  • use persistence thresholds before flipping
  • require stronger evidence to reverse than to stay put
  • abstain when both shadows are weak or both are losing

This is enough to justify the next engineering step:

  • live dual-shadow logging on the same bars
  • market-fingerprint tagging of each shadow entry
  • later ML over shadow outcomes if the deterministic layer proves stable

Rolling flip-worthiness test

To make the side-switch question stricter, the recent live short slice was retested with a 5-trade rolling shadow-delta proxy:

  • short shadow return = actual live short pnl_pct
  • long shadow return = counterfactual -pnl_pct - fee
  • rolling delta = rolling mean of (long_shadow - short_shadow)

Recent 3-day slice (2026-05-04 to 2026-05-06):

  • trades: 168
  • short actual WR: 39.88%
  • short actual compounded return: +10.02%
  • long counterfactual WR: 47.62%
  • long counterfactual compounded return: -16.92%
  • flip-to-long signals from the 5-trade rolling delta: 68
  • flip-to-short signals from the 5-trade rolling delta: 79

Interpretation:

  • the rolling delta does detect alternating regime pockets
  • but it does so often enough that a raw flip would be too twitchy
  • on the most recent 30 live trades, the regime buckets were:
    • 13 long-favorable
    • 7 short-favorable
    • 10 neutral
  • the long-favorable bucket had positive expected PnL, but the short-favorable bucket was also positive and slightly stronger

The important point is that the signal is not “switch now on first loss.” It is:

  • keep a smoothed side-dominance score
  • require persistence before flipping
  • use hysteresis
  • abstain when the shadow spread is weak or oscillatory

So the stricter test reinforces the earlier conclusion:

  • there is enough structure to justify a regime-favorableness detector
  • there is not yet enough stability to justify a raw mechanical flip
  • the right next step is live dual-shadow logging on the same bars, then threshold and persistence calibration on that shared stream

Flip-after-loss counterfactual

The actual live short ledger was also replayed under a simple finite-state side-switch rule:

  • start SHORT
  • if the current side loses N trades in a row, flip to the other side
  • keep applying the same rule across the whole trade sequence

This is the cleanest way to test the idea “short losses are the long cue.”

On the current 234-trade live ledger:

  • always short: WR 44.44%, compounded return +11.35%, max DD 5.71%
  • always long: WR 44.87%, compounded return -20.13%, max DD 23.09%

Threshold sweep:

  • N=1: WR 40.60%, compounded return +5.33%, max DD 11.11%, flips 139
  • N=2: WR 44.44%, compounded return -17.72%, max DD 17.77%, flips 43
  • N=3: WR 48.29%, compounded return +5.48%, max DD 6.35%, flips 13
  • N=4: WR 47.86%, compounded return +6.21%, max DD 6.55%, flips 7
  • N=5: WR 43.59%, compounded return +10.52%, max DD 5.59%, flips 5
  • N=6: WR 45.73%, compounded return +15.17%, max DD 4.84%, flips 3

Interpretation:

  • side switching can help
  • it helps best when the flip threshold is fairly high
  • the best observed threshold in this small grid was N=6
  • low thresholds are too twitchy and can destroy the edge

So the practical conclusion is:

  • a raw flip-on-first-loss rule is not justified
  • a slower loss-cluster regime switcher is plausible
  • the switcher must be hysteretic and persistence-gated

This is consistent with the earlier shadow-score recommendation and explains why the observed “8 or 9 losses, then a couple wins” pattern can be useful without being directly automatable at a low threshold.

Condition-gated flip replay

I then reran the side-switch counterfactual with an additional gate:

  • the current side must first hit N consecutive losses
  • the opposite side must also satisfy its own deterministic long/short entry condition
  • the replay uses the same 10-bar tape skeleton and the worst-10-bar asset expression

Two long theories were tested separately:

  • Old mirror-long: vel_div < -0.02 and cross-sectional 10-bar momentum < 0
  • New stressed-unwind long: instability_50 >= 20.5 and v300 < 0 and v750 < 0

Results on the long research windows:

  • old mirror-long becomes marginally usable only at high thresholds:
    • N=5: WR 47.00%, compounded return +6.34%, DD 46.23%, flips 11
    • N=6: WR 46.52%, compounded return +28.34%, DD 43.78%, flips 5
  • the new stressed-unwind long does not survive this gate cleanly:
    • N=1..6: compounded return stays negative, with severe drawdown

Interpretation:

  • the condition gate does not rescue the new long theory
  • it does preserve the old mirror-long as a late, low-frequency fallback
  • the market still looks too unstable for a low-threshold flip rule
  • if we keep this path, it should be a smoothed regime sampler, not an immediate switcher

Report:

Full-history condition-gated replay

I then ran the same condition-gated flip simulator across the entire available price tape:

  • root: /mnt/dolphin_training/share_offload/vbt_cache_klines
  • rows: 2,553,401
  • span: 2021-06-15 00:01:00+00:00 -> 2026-03-18 18:16:40.041456896+00:00

This is the hardest and most useful stress test because it removes the recent-slice bias entirely.

Results:

  • old mirror-long
    • N=1..6 win rate range: 44.95% -> 46.60%
    • best mean PnL at N=6: -0.000163 per trade
    • best threshold still compounds to -100% over the full archive
  • new stressed-unwind long
    • N=1..6 win rate range: 44.16% -> 46.86%
    • best mean PnL at N=6: -0.000218 per trade
    • best threshold also compounds to -100%

Interpretation:

  • the condition gate does not rescue either long theory at full-archive scale
  • the old mirror-long is still the stronger of the two, but only marginally
  • the long-side edge, if it exists, is too weak or too regime-dependent to survive this archive-wide flip rule without additional filtering
  • the full-tape result is a warning against over-trusting the favorable recent-month slices

Report:

Post-outlier-short-win long-flip probe

Motivation: the May 8 live footer showed a familiar-looking pattern:

  • large 9x short win, e.g. ALGOUSDT +$466 or VETUSDT +$574
  • immediately followed by a somewhat larger-than-normal short loss, e.g. DASHUSDT -$191 or STXUSDT -$54

The question was whether this is a real post-outlier rebound signature:

after a very large short win,
should the next trade, or next few trades, be treated as LONG candidates?

Dataset and hygiene:

  • source: BLUE only
  • ClickHouse dolphin.trade_events: 1305 rows, 1296 unique trade IDs
  • trader logs: 1712 exit rows, 1092 unique trade IDs
  • merged near-duplicate-cleaned sequence: 1609 unique trade IDs
  • analysis subset after excluding hibernate / subday ACB exits: 1321 trades
  • span: 2026-03-31 01:10:34 UTC to 2026-05-08 13:26:06 UTC

The log and warehouse streams overlap but do not have perfectly identical timestamps, so the analysis de-duplicates by trade id where possible and by near-time / asset / reason / realized PnL where the same exit was written by both paths. This matters because a naive merge double-counts many recent exits.

Counterfactual method:

  • keep the same entry/exit skeleton
  • actual side is the live BLUE short
  • counterfactual long return is approximated as -short_return - 4 bps
  • this is not a separately selected long engine; it only tests whether the immediate post-win tape direction would have favored the other side

Baseline over the cleaned sequence:

  • always short: 1321 trades, WR 55.79%, mean return/trade +0.0781%, compounded return +166.36%, max DD 15.70%
  • always long on the same skeleton: WR 38.46%, mean return/trade -0.1181%, compounded return -80.08%, max DD 80.48%

So the full ledger does not support a broad long flip. The question only survives as a narrow post-outlier condition.

Primary post-outlier trigger:

trigger if prior trade:
  pnl_abs >= $400
  leverage >= 8.5x
  pnl_pct >= +0.50%

Immediate next-trade result:

  • triggers: 47
  • next trades affected: 47
  • actual next short subset: WR 53.19%, mean return -0.0821%, compounded return -4.05%, realized PnL -$1,725.40
  • flipped-to-long subset: WR 40.43%, mean return +0.0421%, compounded return +1.72%, estimated PnL +$409.47
  • estimated dollar delta: +$2,134.88
  • whole-sequence policy if only those next trades are flipped: compounded return improves from +166.36% to +182.38% and max DD improves from 15.70% to 13.33%

The stricter trigger pnl_abs >= $400, leverage >= 8.5x, pnl_pct >= +0.95% is similar:

  • triggers: 46
  • actual next short subset: -$1,534.21
  • flipped-to-long estimate: +$276.64
  • estimated dollar delta: +$1,810.85
  • whole-sequence compounded return: +180.91%

The effect is strongest on the immediately following trade. It decays quickly:

  • next 2 trades after the primary trigger: affected 91, actual -$2,689.16, flipped estimate +$555.98, dollar delta +$3,245.15
  • next 3 trades: affected 134, actual -$2,357.77, flipped estimate -$588.02, dollar delta still positive because the flip loses less
  • next 5 trades: benefit becomes materially less clean

Examples from the live tail:

  • ALGOUSDT 2026-05-08 09:55 UTC, +466.34, 9x, +0.929%
    • next trade DASHUSDT: actual short -191.19; same-skeleton long would have been directionally positive after fee
  • VETUSDT 2026-05-08 12:37 UTC, +573.64, 9x, +1.546%
    • next trade STXUSDT: actual short -53.52; same-skeleton long would have been directionally positive after fee
  • larger historic outlier STXUSDT 2026-05-05 20:29 UTC, +6796.86, 9x, +13.845%
    • the following trade was a small short loss, and the next several trades were mixed rather than uniformly long-favorable

Interpretation:

  • there is a real event-conditioned post-outlier rebound / exhaustion signal
  • it is not a win-rate improvement; it is a dollar / drawdown improvement
  • it should not be promoted as a general long engine
  • it is best framed as a one-trade post-outlier long probe or short cooldown candidate, not as a multi-trade regime flip

Relationship to the long-system research:

  • this is different from both deterministic long theories already studied:
    • old mirror-long: negative vel_div mean-reversion long
    • new stressed-unwind long: high instability plus negative slow velocities
  • the post-outlier signal is more local and path-conditioned:
    • a violent short win likely means the chosen asset or local basket has just completed an exhaustion leg
    • the next trade may be more exposed to rebound / adverse short continuation than to fresh downside continuation
  • this should become a feature inside the dual-shadow side-selection sampler:
    • last_trade_was_outlier_short_win
    • last_trade_leverage
    • last_trade_realized_pnl_abs
    • last_trade_return_pct
    • bars_since_outlier_win
    • same_asset_or_correlated_asset_followup

Research conclusion:

  • broad SHORT -> LONG inversion remains false on the full sequence
  • immediate one-trade long probing after a large 9x short win is empirically plausible and improved historical BLUE dollars in this cleaned replay
  • the next test should condition this event trigger on the existing long gates and market fingerprint state, rather than using it as a naked side switch

Leverage-as-conviction win-probe sweep

Follow-up thesis:

leverage is a conviction expression

if a high-conviction short probe wins:
  make subsequent / next trades LONG

if leverage is below roughly 0.69:
  possibly do not trade

The initial test used:

trigger_lev = 0.70
trade_min_lev = 0.69
win = net PnL > 0

Two side-selection forms were tested:

  • persistent shadow probe: the short engine continues to run as a shadow. A high-lev short-shadow win turns the traded side LONG. A high-lev short-shadow loss resets the traded side SHORT.
  • one-shot after win: a high-lev short-shadow win arms only the next eligible trade as LONG, then resets.

The test used the same cleaned BLUE sequence as the post-outlier study, updated through 2026-05-08 13:40:04 UTC:

  • ClickHouse rows: 1307
  • ClickHouse unique trade IDs: 1298
  • trader-log exit rows: 1716
  • merged near-duplicate-cleaned trade IDs: 1612
  • analysis subset after excluding hibernate / subday ACB exits: 1324

Baselines:

  • always short: 1324 trades, WR 55.82%, mean return/trade +0.0784%, compounded return +168.02%, max DD 15.70%, PnL +$11,135.86
  • always long on the same skeleton: WR 38.44%, compounded return -80.23%, max DD 80.62%, PnL -$36,875.48
  • short-only with trade_min_lev >= 0.69: 1050 trades, compounded return +81.86%, max DD 20.80%, PnL +$11,063.86
  • short-only with trade_min_lev >= 5.0: 565 trades, compounded return +88.08%, max DD 8.94%, PnL +$11,980.01
  • short-only with trade_min_lev >= 8.5: 501 trades, compounded return +82.57%, max DD 7.58%, PnL +$12,193.65

Initial 0.70 / 0.69 thesis result:

  • persistent shadow-probe switch:
    • traded: 1050
    • LONG trades: 457
    • flips to LONG: 249
    • WR 37.08%
    • compounded return -5.61%
    • max DD 26.60%
    • PnL -$2,527.65
  • one-shot after high-lev win:
    • traded: 1050
    • LONG trades: 455
    • flips to LONG: 456
    • WR 37.24%
    • compounded return -3.56%
    • max DD 26.19%
    • PnL -$2,113.83

So the literal initial thesis fails. 0.70 is too low as a side-switch trigger. It arms hundreds of LONG trades and turns a strong short-led ledger into a slightly losing one.

Important evaluation frame:

The goal is not to find a LONG overlay that beats the whole short-only engine by itself. The goal is to find a side-selection overlay that adds marginal value only on the subset where it intervenes. The correct comparison is therefore:

overlay_delta =
    pnl_if_intervened_long_on_triggered_trades
  - pnl_if_original_short_was_left_unchanged_on_same_triggered_trades

The overlay is useful only if it satisfies all of the following:

  • it has positive overlay_delta after fees and conservative slippage
  • it reduces realized drawdown or loss clustering on the intervention subset
  • it does not cut too many profitable short trades
  • it remains positive across time splits, assets, and neighboring thresholds
  • it has enough triggers to be statistically more than a single accident

Under that marginal-overlay framing, the broad leverage-win thesis still fails:

  • persistent 0.70 / 0.69 switch delta vs same lev >= 0.69 short-only baseline: about -$13,591.51
  • one-shot 0.70 / 0.69 switch delta vs same lev >= 0.69 short-only baseline: about -$13,177.69
  • best swept dollar switch delta vs same lev >= 0.69 short-only baseline: about -$5,949.36

By contrast, the narrower post-outlier rule did show positive marginal overlay value on its triggered subset:

  • triggered next-trade cases: 47
  • leaving the next trade SHORT: PnL -$1,725.40
  • flipping only that next trade LONG: PnL +$409.47
  • marginal overlay delta: +$2,134.87
  • whole-sequence drawdown improved from about 15.70% to 13.33%

That is the key distinction. The broad high-leverage-win rule is not reliable enough. The narrow post-outlier rule is a legitimate candidate for guarded shadow/live-probe research because it adds value exactly where it intervenes, but the sample is still too small for unconditional deployment.

Lowered big-win threshold grid

The phrase "sample too small" applies only to the original high-tail trigger (pnl_abs >= $400, lev >= 8.5, immediate next trade). It does not mean the BLUE ledger is small. The cleaned replay now spans:

  • 1328 non-hibernate / non-subday-ACB BLUE trades
  • 1616 merged near-duplicate-cleaned trade IDs
  • 2026-03-31 01:10:34 UTC through 2026-05-08 14:21:31 UTC

To test whether the effect survives with more triggers, the post-win sweep was expanded to:

  • dollar win thresholds: $10, $25, $50, $75, $100, $150, $200, $300, $400, $500, $750, $1000
  • leverage thresholds: 0, 0.69, 0.70, 1, 2, 3, 5, 8.5, 9
  • return thresholds: 0, 0.10%, 0.25%, 0.50%, 0.75%, 0.95%, 1.25%
  • follow-on horizons: next 1, 2, 3, and 5 trades

Important result:

  • lowering dollar threshold alone does not work
  • lowering dollar threshold with a realized-return threshold does work
  • the effect is mostly next 1 to 2 trades
  • by next 5 trades, flipping LONG is not positive; cooldown / abstain is better than LONG if the horizon is that wide

Grid-wide stability:

  • horizon 1: 630 eligible threshold combinations, 60.0% positive marginal delta, 45.87% positive LONG PnL
  • horizon 2: 630 eligible threshold combinations, 57.30% positive marginal delta, 39.52% positive LONG PnL
  • horizon 3: 693 eligible threshold combinations, 59.60% positive marginal delta, 12.99% positive LONG PnL
  • horizon 5: 693 eligible threshold combinations, 51.08% positive marginal delta, 0.0% positive LONG PnL

This says the post-win effect is a short-lived exhaustion / rebound artifact, not a durable multi-trade LONG regime.

Fixed dollar-only immediate-next-trade rows:

Trigger Affected next trades Leave SHORT Flip LONG Delta Whole-policy compound DD
$10+, no lev gate 277 +$3,044 -$9,146 -$12,190 +24.74% 24.55%
$50+, no lev gate 181 +$4,495 -$8,870 -$13,365 +42.58% 22.18%
$100+, no lev gate 135 +$908 -$4,252 -$5,160 +97.78% 18.09%
$200+, no lev gate 89 -$947 -$1,496 -$549 +140.76% 14.96%
$300+, no lev gate 62 -$1,695 -$45 +$1,651 +174.25% 13.70%
$400+, no lev gate 48 -$1,725 +$407 +$2,133 +180.70% 13.33%
$500+, no lev gate 40 -$1,153 +$90 +$1,242 +173.51% 13.33%

Dollar-only conclusion:

  • below about $300, the next short trade is still net-profitable or less bad than the LONG flip
  • around $300, the next short trade turns bad, but LONG is only near-flat
  • around $400 to $500, the next-trade LONG flip becomes positive

Fixed immediate-next-trade rows with a +0.75% realized-return trigger:

Trigger Affected next trades Leave SHORT Flip LONG Delta Whole-policy compound DD
$10+ and +0.75% 99 -$1,735 -$409 +$1,326 +104.45% 14.03%
$50+ and +0.75% 74 -$1,950 +$105 +$2,055 +155.62% 14.03%
$75+ and +0.75% 70 -$2,028 +$194 +$2,223 +166.91% 13.95%
$100+ and +0.75% 67 -$2,083 +$336 +$2,419 +168.60% 13.69%
$150+ and +0.75% 63 -$2,082 +$344 +$2,426 +175.37% 13.69%
$300+ and +0.75% 58 -$1,738 +$58 +$1,796 +173.61% 13.70%
$400+ and +0.75% 48 -$1,725 +$407 +$2,133 +180.70% 13.33%

Return-conditioned conclusion:

  • the effect becomes visible with more triggers when the dollar threshold is lowered to $50-$150 and the prior win is also at least +0.75%
  • the best immediate-next-trade delta in this grid was around $150+ and +0.75%: 63 next trades, SHORT -$2,081.81, LONG +$343.94, delta +$2,425.75
  • the original $400+, high-leverage trigger remains good but is not the only viable threshold; it is the cleaner high-tail version

Two-trade horizon:

Trigger Affected next trades Leave SHORT Flip LONG Delta Whole-policy compound DD
$300+, lev >= 8.5 115 -$3,201 +$511 +$3,712 +168.52% 14.27%
$400+, lev >= 8.5 91 -$2,689 +$556 +$3,245 +175.26% 13.71%
$500+, lev >= 8.5 75 -$2,237 +$509 +$2,747 +167.53% 14.71%

Two-trade conclusion:

  • the high-leverage $300-$500 zone supports a two-trade exhaustion rebound more strongly than the original one-trade-only statement
  • the best two-trade variant in this fixed grid was $300+, lev >= 8.5, next two trades: delta +$3,712, estimated LONG PnL +$511
  • the five-trade horizon should not be traded LONG; it is only a damage-control / cooldown signal

Reliability statement:

The post-win overlay is more solid than initially stated. The robust form is not "after any win"; that is false. The robust form is:

after a sufficiently large realized short win,
especially a high-return or high-leverage win,
the next 1-2 short-engine opportunities are often contaminated by rebound risk
and can be improved by LONG flip or, at minimum, cooldown/abstain.

The strongest candidates for shadow/live-probe research are:

  • immediate next trade after $100-$200 win and prior return >= +0.75%
  • immediate next trade after $400+ win, especially lev >= 8.5
  • next two trades after $300-$500 win with lev >= 8.5

Guardrail:

The overlay should not optimize on WR. LONG WR remains lower than SHORT WR on many triggered subsets. The edge is payoff asymmetry / loss-tail avoidance: short wins become smaller or disappear after the exhaustion event, while short losses on the next trade(s) become expensive.

Candidate codified overlay rule and EFSM

Terminology:

  • EFSM means Execution FSM
  • refer to this component as the post-win EFSM, not merely a generic "state machine"

Candidate rule proposed after the lowered-threshold sweep:

after a completed BLUE SHORT trade:

  if pnl_abs > $397:
      tag next 1 trade as FLIP_LONG

  if pnl_abs > $397 and leverage > 8.6:
      tag next 2 trades as FLIP_LONG

  if 0 < pnl_abs < $250 and pnl_pct >= +0.75%:
      tag next 1 trade as FLIP_LONG

  after the armed slots are consumed:
      reset to SHORT

EFSM semantics:

  • this is a slot-based Execution FSM, not a persistent regime switch
  • each trigger arms an explicit number of future slots
  • each future entry consumes exactly one slot
  • when slots_remaining == 0, the state resets to SHORT
  • while slots are active, new triggers are ignored by default
  • a flipped LONG trade outcome is not allowed to re-arm the overlay
  • this prevents the reset bug where one flipped trade recursively arms the next and converts a bounded rebound probe into an unbounded side switch
  • the implementation supports arbitrary future slot counts, not only 1 and 2

Implementation location:

  • EFSM: adaptive_exit/post_win_long_overlay.py
  • canonical class names: PostWinExecutionFSM, PostWinExecutionFSMConfig
  • compatibility aliases: PostWinLongOverlay, PostWinLongOverlayConfig
  • tests: prod/tests/test_post_win_long_overlay.py

Focused test coverage:

  • $397+ non-high-leverage win arms one slot
  • $397+ and lev > 8.6 arms two slots
  • < $250 and pnl_pct >= +0.75% arms one slot
  • active arms consume deterministically and reset to SHORT
  • re-arm attempts while active are ignored
  • flipped LONG outcomes cannot re-arm
  • optional TTL expiry works
  • future 3+ slot rules work

Focused verification:

python -m pytest -o cache_dir=/tmp/pytest-cache-post-win-overlay \
  prod/tests/test_post_win_long_overlay.py -q

7 passed

Exact candidate replay, no re-arm during active flip slots:

  • input: 1333 cleaned BLUE trades through 2026-05-08 14:34:57 UTC
  • baseline short-only estimated PnL: +$10,953.50
  • candidate policy estimated PnL: +$12,464.30
  • marginal dollar delta: +$1,510.80
  • baseline max DD: 15.70%
  • candidate max DD: 14.78%
  • long-flipped trades: 160
  • affected subset left SHORT: -$2,415.46
  • affected subset flipped LONG: -$904.67
  • affected subset marginal delta: +$1,510.80
  • triggers armed:
    • small_dollar_high_return: 77
    • big_win_high_lev: 41
    • big_win: 1
  • slots consumed:
    • small_dollar_high_return: 77
    • big_win_high_lev: 82
    • big_win: 1
  • consumed arms: 119
  • dangling slots at end: 0
  • ignored re-arm attempts while active: 20

Reset sensitivity:

Allowing active flipped trades / active arms to re-arm is harmful:

  • unsafe recursive re-arm variant long flips: 183
  • unsafe marginal delta: -$5,425.32
  • safe no-rearm marginal delta: +$1,510.80

Therefore the no-recursive-rearm reset invariant is not optional. It is part of the edge definition.

Compound-return caveat:

  • baseline short-only compound: +164.89%
  • candidate compound: +107.26%

This is why the overlay must be treated as a dollar-tail / drawdown-control overlay first, not as a compounding optimizer. The current counterfactual uses same entry/exit skeleton and estimated flipped LONG PnL, so the next validation step must include actual LONG execution assumptions, long-side V7 behavior, and time-to-next-entry gating.

Time dependency:

The replay showed material timing dependence:

Delay from trigger to flipped entry n SHORT PnL LONG PnL Delta
<=15m 19 +$2,765.51 -$3,062.37 -$5,827.88
15-30m 67 -$3,588.76 +$2,381.96 +$5,970.72
30-60m 40 -$882.57 -$104.33 +$778.24
>60m 34 -$709.64 -$119.93 +$589.72

This means the overlay may need a lower-bound delay, an upper-bound TTL, or market-state confirmation. The current EFSM already supports TTL; the exact timing gate remains research, not deployed doctrine.

AdvancedExitManagerV7 / AlphaExitEngineV7 caveat:

AlphaExitEngineV7 is mechanically side-aware:

  • side=0 means LONG
  • side=1 means SHORT
  • PnL, MFE, MAE, trend direction, and adverse/favorable movement are signed by ctx.side

However, V7 calibration is SHORT-lineage:

  • bounce model labels were trained on BLUE SHORT adverse-bar samples
  • pressure threshold 2.69 was selected on SHORT/GREEN-lineage replay
  • MAE/MFE concepts are symmetric in code but not guaranteed symmetric in fitted thresholds or bounce probabilities

Before any live FLIP_LONG execution, V7 must be validated in one of these modes:

  • shadow-only LONG contexts using actual flipped LONG entries
  • conservative LONG-specific V7 threshold override
  • disable V7 live exits for overlay LONGs until enough shadow decisions show it does not prematurely cut the rebound edge

The rule can be codified, but production wiring must keep the EFSM, side selection, and V7 exit policy explicitly separable.

Sweep results:

  • best by compounded return:
    • mode: one-shot after win
    • trigger_lev = 9.0
    • trade_min_lev = 0.0
    • traded: 1324
    • LONG trades: 222
    • WR 50.91%
    • compounded return +61.93%
    • max DD 19.36%
    • PnL -$257.03
  • best by estimated dollars:
    • mode: one-shot after win
    • trigger_lev = 2.0
    • trade_min_lev = 0.69
    • traded: 1050
    • LONG trades: 297
    • WR 40.03%
    • compounded return +27.71%
    • max DD 22.44%
    • PnL +$5,114.50

Both sweep optima still underperform the relevant short-only baselines. In particular, simply treating high leverage as a short-side quality filter is stronger than using high-leverage short wins as a broad long-switch trigger:

  • lev >= 8.5, short-only: PnL +$12,193.65, max DD 7.58%
  • best long-switch dollar policy: PnL +$5,114.50, max DD 22.44%

Interpretation:

  • leverage does behave like conviction, but the first-order use is filtering / sizing, not side inversion
  • ordinary high-lev wins are too common to serve as a LONG regime switch
  • the previous post-outlier result survives only because it was much narrower: large dollar win, 9x, and immediate next trade
  • high-lev wins may still be useful as features in the dual-shadow / market-fingerprint layer:
    • last_high_lev_short_win
    • last_high_lev_short_win_count
    • last_high_lev_short_win_pnl_abs
    • last_high_lev_short_win_return_pct
    • bars_since_high_lev_short_win
    • consecutive_high_lev_short_wins

Research conclusion:

  • do not implement the literal lev > 0.70 long switch
  • do preserve leverage as a strong conviction feature
  • do keep the narrower post-outlier one-trade long probe in the research queue
  • the strongest immediate operational lesson is that low-leverage trades may be unnecessary, while high-leverage shorts remain the cleaner expression

AlphaExitEngineV7 LONG calibration replay

Date: 2026-05-08

Scope:

  • system: BLUE only
  • exit engine: AlphaExitEngineV7
  • harness: adaptive_exit/calibrate_v7_long_from_journal.py
  • source data: ClickHouse dolphin.v7_decision_events
  • source rows: 6,812
  • reconstructed BLUE V7-tracked paths: 97
  • path side in source journal: SHORT
  • replay side for calibration: synthetic LONG (side=0)
  • fee assumption: 4 bps
  • natural exit comparator: final logged decision-row price for the same path
  • V7 exit comparator: first replayed V7 EXIT on the same price path
  • bounce model: disabled for this replay by intentionally using a missing model path, because the current bounce model is trained on BLUE SHORT adverse-bar samples and should not be treated as a validated LONG probability model

This is a LONG-exit calibration proxy, not proof from exchange-filled LONG trades. It answers a narrower question: if the post-win EFSM had flipped a trade LONG on price paths that BLUE V7 actually observed, would a LONG-side V7 cut/exit surface have improved or harmed the synthetic LONG outcome versus holding to the path's natural end?

Original V7 SHORT calibration pattern

The original V7 calibration was a pressure-threshold sweep over live shadow decisions. V7 computes:

exit_pressure = clamp(directional_term + risk_term, -3.0, +3.0)

Then:

if exit_pressure > 2.69:
    EXIT
elif exit_pressure > 1.0:
    RETRACT
elif exit_pressure < -0.5 and pnl_pct > 0:
    EXTEND
else:
    HOLD

The documented SHORT lineage was:

Pressure threshold Fires Result
2.00 22/24 +$439, ROI +1.67%
2.35 17/24 +$891, ROI +3.38%
2.60 17/24 +$891, ROI +3.38%
3.00 14/24 +$796, ROI +3.02%
base/no V7 n/a +$784, ROI +2.98%

The deployed threshold 2.69 was chosen as the high end of the useful 2.35-2.70 band so V7 stayed closer to base behavior and avoided cutting winners on transient pressure.

Threshold surface now explicit

AlphaExitEngineV7 now accepts an optional per-engine AlphaExitV7Config. Defaults preserve the deployed SHORT-calibrated behavior. This lets BLUE instantiate separate SHORT and LONG V7 engines later without global mutation.

V7-specific configurable fields:

Config field Default Meaning
rvol_w15 0.50 realized-vol composite weight for 15-bar volatility
rvol_w30 0.30 realized-vol composite weight for 30-bar volatility
rvol_w50 0.20 realized-vol composite weight for 50-bar volatility
rvol_floor 0.000001 minimum realized-vol denominator
mae_tier1_k 3.5 MAE tier-1 multiplier on rv_comp
mae_tier2_k 7.0 MAE tier-2 multiplier on rv_comp
mae_tier3_k 12.0 MAE tier-3 multiplier on rv_comp
mae_tier1_floor 0.005 MAE tier-1 absolute floor
mae_tier2_floor 0.012 MAE tier-2 absolute floor
mae_tier3_floor 0.025 MAE tier-3 absolute floor
mae_tier1_risk 0.5 pressure contribution once tier 1 is breached
mae_tier2_risk 0.8 pressure contribution once tier 2 is breached
mae_tier3_risk 1.2 pressure contribution once tier 3 is breached
mae_accel_min_bars 3 minimum bars before adverse-acceleration gate can fire
mae_accel_peak_floor 0.003 adverse peak floor for MAE acceleration risk
mae_accel_risk 0.6 pressure contribution for MAE acceleration
mae_recovery_peak_floor 0.004 adverse peak floor for failed-recovery gate
mae_recovery_prev_min 0.25 prior recovery ratio required before snapback risk
mae_recovery_snapback_max 0.10 recovery ratio below which recovery is treated as failed
mae_recovery_risk 1.0 pressure contribution for failed recovery
mae_late_floor 0.003 MAE required before late adverse ramp applies
mae_late_start_frac 0.60 bars-held fraction where late adverse ramp starts
mae_late_risk_max 0.4 maximum late adverse pressure contribution
max_hold_ref_mult_3m 3.0 V7 internal max-hold reference multiplier
mfe_slope_peak_floor 0.01 peak favorable floor for convexity slope break
mfe_convexity_decay_exit 0.35 decay ratio for hard MFE giveback pressure
mfe_convexity_decay_soft 0.20 decay ratio for soft MFE giveback pressure
mfe_convexity_exit_risk 1.5 pressure contribution for hard MFE giveback
mfe_convexity_soft_risk 0.3 pressure contribution for soft MFE giveback
mfe_accel_floor -0.00001 MFE acceleration floor for adverse convexity
mfe_accel_peak_floor 0.005 peak favorable floor for MFE acceleration risk
mfe_accel_risk 0.2 pressure contribution for MFE acceleration risk
bounce_dir_w 0.15 bounce score directional-term weight
bounce_risk_w 0.35 bounce risk-term weight
bounce_rv_safe_floor 0.00001 bounce feature volatility denominator floor
exit_pressure_threshold 2.69 live EXIT threshold
retract_pressure_threshold 1.0 RETRACT threshold
extend_pressure_threshold -0.5 profitable EXTEND threshold
pressure_min -3.0 pressure clamp lower bound
pressure_max 3.0 pressure clamp upper bound

Inherited V6 weight priors remain configurable through the existing WeightAdapter/WeightPriors seam. The new config is specifically for V7 threshold/gate surfaces and is init-time/per-engine configurable.

LONG replay results

Baseline synthetic LONG natural exit across the 97 paths:

  • natural PnL: -$328.84
  • natural WR: 59.79%
  • natural compound: +3.50%
  • natural max DD: 2.28%

The dollar PnL and compound can diverge because path notionals differ. For this exit calibration, dollar PnL is the more relevant metric because BLUE sizing is not uniform.

Top tested surfaces:

Candidate V7 PnL Delta vs natural Exits Exit rate V7 WR V7 max DD
mfe_risk_scale_0.5 +$205.32 +$534.15 36 37.11% 50.52% 1.69%
mfe_risk_scale_0.75 +$205.32 +$534.15 36 37.11% 50.52% 1.69%
combo_p1.7_mae0.75 +$47.24 +$376.08 51 52.58% 47.42% 1.55%
exit_p1.7 +$36.88 +$365.72 51 52.58% 47.42% 1.53%
exit_p2.0 +$19.68 +$348.52 41 42.27% 49.48% 1.53%
short_default / exit_p2.69 +$1.43 +$330.26 38 39.18% 49.48% 1.81%
exit_p3.0 -$328.84 $0.00 0 0.00% 59.79% 2.28%

Interpretation:

  • The deployed SHORT default is not mechanically broken for LONG. It improved synthetic LONG dollar outcome by +$330.26 versus natural exit on the 97 replayed paths.
  • The best tested LONG proxy did not come from lowering the pressure threshold. It came from reducing MFE giveback/convexity pressure contribution (mfe_risk_scale_0.5 or 0.75).
  • Aggressively lowering exit_pressure_threshold to 1.4 over-fires: 78/97 exits, V7 PnL -$11.78, and many negative deltas. That resembles the original SHORT calibration failure at 2.0: pressure that is too sensitive cuts too much transient noise.
  • A moderate pressure threshold around 1.7-2.0 is useful, but still inferior to leaving pressure at 2.69 and reducing MFE-risk contributions in this proxy.

Recommended LONG overlay calibration candidate for shadow:

AlphaExitV7Config(
    mfe_convexity_exit_risk=0.75,
    mfe_convexity_soft_risk=0.15,
    mfe_accel_risk=0.10,
)

This is the mfe_risk_scale_0.5 surface. It keeps:

  • exit_pressure_threshold = 2.69
  • all MAE vol-normalized loss-cut thresholds unchanged
  • pressure clamp unchanged
  • bounce disabled or neutral until a LONG-trained bounce model exists

Why this candidate is preferable to simply lowering exit_pressure_threshold:

  • it preserved the useful loss-cut behavior while avoiding broad pressure over-firing
  • it improved dollar PnL more than all pressure-threshold sweeps tested
  • it left MAE protection intact, which matters if the flipped LONG thesis is wrong and the asset continues down
  • it respects that the post-win EFSM edge is a rebound/cooldown edge, so the exit manager should not over-penalize ordinary post-entry MFE shape

Do not deploy this LONG config live yet. It should first be run in shadow on actual EFSM-flipped candidate LONG contexts, because this replay uses SHORT entries inverted to LONG and not real LONG fills.

Regression and safety notes

Implemented code seams:

  • nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.py defines AlphaExitV7Config
  • default AlphaExitEngineV7() behavior remains the SHORT-calibrated config
  • a LONG-specific engine can be instantiated with AlphaExitEngineV7(config=...)
  • the calibration harness writes full results to /tmp/v7_long_calibration.json

Tests added:

  • default config equals the legacy SHORT threshold surface
  • custom config is per-instance and does not mutate the default engine
  • V7 remains mechanically side-aware for LONG and SHORT PnL/MFE/MAE
  • BLUE live V7 provider wiring still records journal decisions and uses OB signal input
  • EFSM reset/no-recursive-rearm tests remain separate from V7 exit calibration

Research caveats:

  • only 97 V7-tracked BLUE paths existed in the current decision journal
  • this is enough to reject obviously bad LONG exit settings, but not enough to canonize a live LONG exit policy
  • bounce must remain neutral for LONG until trained or validated on LONG samples
  • V7 max_hold_ref_mult_3m still uses an internal time reference rather than the orchestrator's effective max hold; the system bible already tracks this as a V7 TODO/bug because it can make adverse-ramp pressure too early