47 KiB
LONG Deterministic Rule Research
Date: 2026-05-07
Goal
Find the simplest deterministic long-side market rule, using primarily Dolphin NG eigendata, that behaves like the original short Alpha Engine rule in spirit:
- few moving parts
- market-structural
- explainable in one breath
- reliable enough to serve as a basal gate before asset selection and later overlays
This note is explicitly not about a fitted long model.
Data source
The analysis uses the raw daily scan cache summarized by:
adaptive_exit/characterize_long_signals.py/mnt/dolphin_training/long_signal_research/long_signal_scan_summary_h24.parquet/mnt/dolphin_training/long_signal_research/long_signal_characterization_report.json
Only eigendata and scan-price-derived outcomes are used here:
instability_50v50/v150/v300/v750_lambda_max_velocityvel_divvel_divlag / delta terms
No ExF, EsoF, or OBF are required for the core finding.
What does not work as the basal long rule
The obvious mirror thesis,
vel_div > 0.01
is too weak to be the basal long edge.
Recent HQ slice (2025-12-31 onward):
- support:
39.65% strong_longlift:1.15xbroad_longlift:1.22x
That is not useless, but it is not elegant enough nor selective enough to be
the long analogue of vel_div < -0.02.
Strongest deterministic shape
The long side shows up most clearly as a stressed unwind / squeeze regime, not as a generic bullish breakout regime.
Candidate primary deterministic rule
LONG_REGIME if
instability_50 >= 20.5
and v300_lambda_max_velocity < 0
and v750_lambda_max_velocity < 0
Interpretation:
instability_50 >= 20.5: the market is structurally stressedv300 < 0andv750 < 0: the slower eigenspace is still negative / damaged- together: this is a high-stress unwind state where long opportunities tend to appear as reversals / squeezes on the same manifold that produces short dislocations
Why 20.5
20.5 is the rounded recent-HQ instability_50 90th-percentile threshold
(20.546996...). It is the most practical fixed threshold found in the
recent-era characterization.
Empirical support
Recent HQ (2025-12-31 onward)
Base rates:
strong_long:0.1648broad_long:0.1367
Rule:
- support:
6356rows (3.58%) strong_long:0.3409(2.07xlift)broad_long:0.3538(2.59xlift)
Full history
Base rates:
strong_long:0.2603broad_long:0.2472
Rule:
- support:
300,728rows (12.59%) strong_long:0.3330(1.28xlift)broad_long:0.3375(1.37xlift)
Simpler fallback
If maximum elegance is preferred over extra selectivity, the one-factor fallback is:
LONG_REGIME_SIMPLE if instability_50 >= 20.5
Recent HQ:
- support:
10.10% strong_long:0.3297(2.00xlift)broad_long:0.3420(2.50xlift)
This is surprisingly strong for a one-variable rule. It is the closest thing
found to a pure long-side analogue of the short vel_div < -0.02 gate.
Tradeoff:
- simpler
- broader
- slightly less selective than adding
v300 < 0andv750 < 0
Optional stricter confirmation
If later tuning wants more explicit “healing after stress” confirmation, the strict variant is:
LONG_REGIME_STRICT if
instability_50 >= 20.5
and vel_div_lag6 < -0.03
and vel_div_delta6 > 0.02
This is directionally sensible, but it is not materially better than the
instability_50 + v300 + v750 rule, so it should be treated as an optional
refinement, not the basal rule.
Monthly sanity check
For the candidate primary rule (instability_50 >= 20.5 && v300 < 0 && v750 < 0)
in the recent HQ window:
2026-01:strong_long = 0.3482026-02:strong_long = 0.3442026-03:strong_long = 0.312
The monthly base rates for the same period were:
2026-01:0.2892026-02:0.2712026-03:0.068
So even into the weak March tape, the rule remains elevated relative to base.
Practical interpretation
This should be viewed as a market-state gate, not a complete trade engine.
It says:
- “the market is in the sort of stressed, damaged regime where long squeeze / unwind opportunities become meaningfully more likely”
It does not by itself say:
- which asset is the best expression
- how to size
- how to exit
That is where the next layers belong:
- deterministic or learned asset selection
- OBF / ARS / bounce overlays
- TP / MAX_HOLD policy
Recommendation
If a single deterministic long gate must be named now, use:
LONG_REGIME if instability_50 >= 20.5 and v300 < 0 and v750 < 0
If maximum simplicity is the priority, use:
LONG_REGIME_SIMPLE if instability_50 >= 20.5
And explicitly do not promote vel_div > 0.01 as the basal long rule.
Deferred analysis idea: dual-shadow regime sampler
This is a later analysis / control-layer research note, not a live-rule recommendation.
One plausible way to sample the market in real time without committing the full system immediately is a very lightweight dual-shadow engine:
- Shadow A: the basal SHORT engine (
vel_div < -0.02Alpha Engine posture) - Shadow B: the basal LONG engine (currently the older negative-
vel_divmean-reversion LONG posture is the best simple candidate)
The intent is not merely paper PnL logging. It is to use live, recent sample-trade outcomes as a micro-regime probe:
- if SHORT shadow performance degrades while LONG shadow performance improves, the tape may have rotated into a LONG-favorable regime
- if LONG degrades while SHORT improves, the inverse may be true
- if both are performing acceptably, the tape may be permissive / broad enough that either side can express edge
- if both are failing, the tape is likely choppy / non-coherent and abstention becomes a first-class candidate
This should be implemented, if ever pursued, as:
- very fast
- very lightweight
- explicitly shadow-only at first
- based on small, recent sample trades rather than a heavy fitted model
Longer-term, the entire shadow stream can itself become training data:
- market fingerprints at shadow-entry time
- concurrent SHORT-shadow and LONG-shadow outcomes
- relative WR / ROI-per-trade / drawdown / time-to-win asymmetries
That would allow a later learner to predict or simplify the regime switcher. But even before ML, the dual-shadow process may already serve as a useful real-time market-sampling / regime-detection mechanism.
Dual-shadow persistence characterization
This section records the first persistence pass over extant trades. The goal was not to prove a full regime-switch system, but to test whether the observed short-loss streaks are durable enough to justify a regime-favorableness probe.
Important caveat:
- the live SHORT series and the replay LONG series are on different date spans
- this is therefore a side-specific persistence study, not a same-bar paired dominance study
- the numbers below are still useful for run-length and hysteresis design
Live SHORT stream
From the current BLUE trader log:
- trades:
234 - win rate:
44.44% - mean
pnl_pct:+0.000506 - median
pnl_pct:-0.000234 - average win streak:
1.65 trades - average loss streak:
2.03 trades P(win -> win) = 0.394P(loss -> loss) = 0.512- average positive-day run:
1.5 days - average negative-day run:
1.5 days
Interpretation:
- short failures do cluster
- the cluster is real enough to notice
- but it is only mildly persistent
- by itself, it is not strong enough to justify a raw ping-pong switch
Basal LONG shadow, old mirror posture
Using the recent bullish-month replay and the single comparable 10-bar / worst_10bar configuration:
- trades:
2,243 - win rate:
48.33% - mean
pnl_pct:+0.000320 - median
pnl_pct:-0.000400 - average win streak:
1.93 trades - average loss streak:
2.07 trades P(win -> win) = 0.483P(loss -> loss) = 0.517- average positive-day run:
3.0 days - average negative-day run:
1.86 days
Interpretation:
- this is the clearest durable long-favorable candidate seen so far
- the multi-day positive run length is materially better than the live short stream
- this supports a long-favorable regime probe, but not an unconditional flip
Basal LONG shadow, new stressed-unwind posture
Same replay setup:
- trades:
569 - win rate:
50.44% - mean
pnl_pct:-0.000078 - median
pnl_pct:+0.000068 - average win streak:
2.24 trades - average loss streak:
2.20 trades P(win -> win) = 0.556P(loss -> loss) = 0.546- average positive-day run:
1.36 days - average negative-day run:
1.18 days
Interpretation:
- the new long posture has decent local persistence
- but it is more fragile than the mirror-long posture as a regime switch
- it does not yet justify itself as the primary flip trigger
Conclusion for regime switching
The data support a smoothed regime-favorableness detector, not a raw flip-on-first-loss system.
Practical reading:
- short-loss streak persistence is real but modest
- long-favorable states exist and can persist
- persistence is on the order of a few trades, not a dramatic regime lock
- the correct implementation is a shadow score with hysteresis and abstain logic, not a hard immediate SHORT/LONG switch
Suggested rule shape for later analysis:
- compute rolling shadow scores for SHORT and LONG
- use persistence thresholds before flipping
- require stronger evidence to reverse than to stay put
- abstain when both shadows are weak or both are losing
This is enough to justify the next engineering step:
- live dual-shadow logging on the same bars
- market-fingerprint tagging of each shadow entry
- later ML over shadow outcomes if the deterministic layer proves stable
Rolling flip-worthiness test
To make the side-switch question stricter, the recent live short slice was
retested with a 5-trade rolling shadow-delta proxy:
- short shadow return = actual live short
pnl_pct - long shadow return = counterfactual
-pnl_pct - fee - rolling delta = rolling mean of
(long_shadow - short_shadow)
Recent 3-day slice (2026-05-04 to 2026-05-06):
- trades:
168 - short actual WR:
39.88% - short actual compounded return:
+10.02% - long counterfactual WR:
47.62% - long counterfactual compounded return:
-16.92% - flip-to-long signals from the
5-traderolling delta:68 - flip-to-short signals from the
5-traderolling delta:79
Interpretation:
- the rolling delta does detect alternating regime pockets
- but it does so often enough that a raw flip would be too twitchy
- on the most recent 30 live trades, the regime buckets were:
13long-favorable7short-favorable10neutral
- the long-favorable bucket had positive expected PnL, but the short-favorable bucket was also positive and slightly stronger
The important point is that the signal is not “switch now on first loss.” It is:
- keep a smoothed side-dominance score
- require persistence before flipping
- use hysteresis
- abstain when the shadow spread is weak or oscillatory
So the stricter test reinforces the earlier conclusion:
- there is enough structure to justify a regime-favorableness detector
- there is not yet enough stability to justify a raw mechanical flip
- the right next step is live dual-shadow logging on the same bars, then threshold and persistence calibration on that shared stream
Flip-after-loss counterfactual
The actual live short ledger was also replayed under a simple finite-state side-switch rule:
- start
SHORT - if the current side loses
Ntrades in a row, flip to the other side - keep applying the same rule across the whole trade sequence
This is the cleanest way to test the idea “short losses are the long cue.”
On the current 234-trade live ledger:
- always short: WR
44.44%, compounded return+11.35%, max DD5.71% - always long: WR
44.87%, compounded return-20.13%, max DD23.09%
Threshold sweep:
N=1: WR40.60%, compounded return+5.33%, max DD11.11%, flips139N=2: WR44.44%, compounded return-17.72%, max DD17.77%, flips43N=3: WR48.29%, compounded return+5.48%, max DD6.35%, flips13N=4: WR47.86%, compounded return+6.21%, max DD6.55%, flips7N=5: WR43.59%, compounded return+10.52%, max DD5.59%, flips5N=6: WR45.73%, compounded return+15.17%, max DD4.84%, flips3
Interpretation:
- side switching can help
- it helps best when the flip threshold is fairly high
- the best observed threshold in this small grid was
N=6 - low thresholds are too twitchy and can destroy the edge
So the practical conclusion is:
- a raw flip-on-first-loss rule is not justified
- a slower loss-cluster regime switcher is plausible
- the switcher must be hysteretic and persistence-gated
This is consistent with the earlier shadow-score recommendation and explains why the observed “8 or 9 losses, then a couple wins” pattern can be useful without being directly automatable at a low threshold.
Condition-gated flip replay
I then reran the side-switch counterfactual with an additional gate:
- the current side must first hit
Nconsecutive losses - the opposite side must also satisfy its own deterministic long/short entry condition
- the replay uses the same 10-bar tape skeleton and the worst-10-bar asset expression
Two long theories were tested separately:
- Old mirror-long:
vel_div < -0.02and cross-sectional 10-bar momentum< 0 - New stressed-unwind long:
instability_50 >= 20.5andv300 < 0andv750 < 0
Results on the long research windows:
- old mirror-long becomes marginally usable only at high thresholds:
N=5: WR47.00%, compounded return+6.34%, DD46.23%, flips11N=6: WR46.52%, compounded return+28.34%, DD43.78%, flips5
- the new stressed-unwind long does not survive this gate cleanly:
N=1..6: compounded return stays negative, with severe drawdown
Interpretation:
- the condition gate does not rescue the new long theory
- it does preserve the old mirror-long as a late, low-frequency fallback
- the market still looks too unstable for a low-threshold flip rule
- if we keep this path, it should be a smoothed regime sampler, not an immediate switcher
Report:
Full-history condition-gated replay
I then ran the same condition-gated flip simulator across the entire available price tape:
- root:
/mnt/dolphin_training/share_offload/vbt_cache_klines - rows:
2,553,401 - span:
2021-06-15 00:01:00+00:00 -> 2026-03-18 18:16:40.041456896+00:00
This is the hardest and most useful stress test because it removes the recent-slice bias entirely.
Results:
- old mirror-long
N=1..6win rate range:44.95% -> 46.60%- best mean PnL at
N=6:-0.000163per trade - best threshold still compounds to
-100%over the full archive
- new stressed-unwind long
N=1..6win rate range:44.16% -> 46.86%- best mean PnL at
N=6:-0.000218per trade - best threshold also compounds to
-100%
Interpretation:
- the condition gate does not rescue either long theory at full-archive scale
- the old mirror-long is still the stronger of the two, but only marginally
- the long-side edge, if it exists, is too weak or too regime-dependent to survive this archive-wide flip rule without additional filtering
- the full-tape result is a warning against over-trusting the favorable recent-month slices
Report:
Post-outlier-short-win long-flip probe
Motivation: the May 8 live footer showed a familiar-looking pattern:
- large 9x short win, e.g.
ALGOUSDT+$466orVETUSDT+$574 - immediately followed by a somewhat larger-than-normal short loss, e.g.
DASHUSDT -$191orSTXUSDT -$54
The question was whether this is a real post-outlier rebound signature:
after a very large short win,
should the next trade, or next few trades, be treated as LONG candidates?
Dataset and hygiene:
- source: BLUE only
- ClickHouse
dolphin.trade_events:1305rows,1296unique trade IDs - trader logs:
1712exit rows,1092unique trade IDs - merged near-duplicate-cleaned sequence:
1609unique trade IDs - analysis subset after excluding hibernate / subday ACB exits:
1321trades - span:
2026-03-31 01:10:34 UTCto2026-05-08 13:26:06 UTC
The log and warehouse streams overlap but do not have perfectly identical timestamps, so the analysis de-duplicates by trade id where possible and by near-time / asset / reason / realized PnL where the same exit was written by both paths. This matters because a naive merge double-counts many recent exits.
Counterfactual method:
- keep the same entry/exit skeleton
- actual side is the live BLUE short
- counterfactual long return is approximated as
-short_return - 4 bps - this is not a separately selected long engine; it only tests whether the immediate post-win tape direction would have favored the other side
Baseline over the cleaned sequence:
- always short:
1321trades, WR55.79%, mean return/trade+0.0781%, compounded return+166.36%, max DD15.70% - always long on the same skeleton: WR
38.46%, mean return/trade-0.1181%, compounded return-80.08%, max DD80.48%
So the full ledger does not support a broad long flip. The question only survives as a narrow post-outlier condition.
Primary post-outlier trigger:
trigger if prior trade:
pnl_abs >= $400
leverage >= 8.5x
pnl_pct >= +0.50%
Immediate next-trade result:
- triggers:
47 - next trades affected:
47 - actual next short subset: WR
53.19%, mean return-0.0821%, compounded return-4.05%, realized PnL-$1,725.40 - flipped-to-long subset: WR
40.43%, mean return+0.0421%, compounded return+1.72%, estimated PnL+$409.47 - estimated dollar delta:
+$2,134.88 - whole-sequence policy if only those next trades are flipped:
compounded return improves from
+166.36%to+182.38%and max DD improves from15.70%to13.33%
The stricter trigger pnl_abs >= $400, leverage >= 8.5x,
pnl_pct >= +0.95% is similar:
- triggers:
46 - actual next short subset:
-$1,534.21 - flipped-to-long estimate:
+$276.64 - estimated dollar delta:
+$1,810.85 - whole-sequence compounded return:
+180.91%
The effect is strongest on the immediately following trade. It decays quickly:
- next
2trades after the primary trigger: affected91, actual-$2,689.16, flipped estimate+$555.98, dollar delta+$3,245.15 - next
3trades: affected134, actual-$2,357.77, flipped estimate-$588.02, dollar delta still positive because the flip loses less - next
5trades: benefit becomes materially less clean
Examples from the live tail:
ALGOUSDT2026-05-08 09:55 UTC,+466.34,9x,+0.929%- next trade
DASHUSDT: actual short-191.19; same-skeleton long would have been directionally positive after fee
- next trade
VETUSDT2026-05-08 12:37 UTC,+573.64,9x,+1.546%- next trade
STXUSDT: actual short-53.52; same-skeleton long would have been directionally positive after fee
- next trade
- larger historic outlier
STXUSDT2026-05-05 20:29 UTC,+6796.86,9x,+13.845%- the following trade was a small short loss, and the next several trades were mixed rather than uniformly long-favorable
Interpretation:
- there is a real event-conditioned post-outlier rebound / exhaustion signal
- it is not a win-rate improvement; it is a dollar / drawdown improvement
- it should not be promoted as a general long engine
- it is best framed as a one-trade post-outlier long probe or short cooldown candidate, not as a multi-trade regime flip
Relationship to the long-system research:
- this is different from both deterministic long theories already studied:
- old mirror-long: negative
vel_divmean-reversion long - new stressed-unwind long: high instability plus negative slow velocities
- old mirror-long: negative
- the post-outlier signal is more local and path-conditioned:
- a violent short win likely means the chosen asset or local basket has just completed an exhaustion leg
- the next trade may be more exposed to rebound / adverse short continuation than to fresh downside continuation
- this should become a feature inside the dual-shadow side-selection sampler:
last_trade_was_outlier_short_winlast_trade_leveragelast_trade_realized_pnl_abslast_trade_return_pctbars_since_outlier_winsame_asset_or_correlated_asset_followup
Research conclusion:
- broad
SHORT -> LONGinversion remains false on the full sequence - immediate one-trade long probing after a large 9x short win is empirically plausible and improved historical BLUE dollars in this cleaned replay
- the next test should condition this event trigger on the existing long gates and market fingerprint state, rather than using it as a naked side switch
Leverage-as-conviction win-probe sweep
Follow-up thesis:
leverage is a conviction expression
if a high-conviction short probe wins:
make subsequent / next trades LONG
if leverage is below roughly 0.69:
possibly do not trade
The initial test used:
trigger_lev = 0.70
trade_min_lev = 0.69
win = net PnL > 0
Two side-selection forms were tested:
- persistent shadow probe: the short engine continues to run as a shadow. A high-lev short-shadow win turns the traded side LONG. A high-lev short-shadow loss resets the traded side SHORT.
- one-shot after win: a high-lev short-shadow win arms only the next eligible trade as LONG, then resets.
The test used the same cleaned BLUE sequence as the post-outlier study, updated
through 2026-05-08 13:40:04 UTC:
- ClickHouse rows:
1307 - ClickHouse unique trade IDs:
1298 - trader-log exit rows:
1716 - merged near-duplicate-cleaned trade IDs:
1612 - analysis subset after excluding hibernate / subday ACB exits:
1324
Baselines:
- always short:
1324trades, WR55.82%, mean return/trade+0.0784%, compounded return+168.02%, max DD15.70%, PnL+$11,135.86 - always long on the same skeleton: WR
38.44%, compounded return-80.23%, max DD80.62%, PnL-$36,875.48 - short-only with
trade_min_lev >= 0.69:1050trades, compounded return+81.86%, max DD20.80%, PnL+$11,063.86 - short-only with
trade_min_lev >= 5.0:565trades, compounded return+88.08%, max DD8.94%, PnL+$11,980.01 - short-only with
trade_min_lev >= 8.5:501trades, compounded return+82.57%, max DD7.58%, PnL+$12,193.65
Initial 0.70 / 0.69 thesis result:
- persistent shadow-probe switch:
- traded:
1050 - LONG trades:
457 - flips to LONG:
249 - WR
37.08% - compounded return
-5.61% - max DD
26.60% - PnL
-$2,527.65
- traded:
- one-shot after high-lev win:
- traded:
1050 - LONG trades:
455 - flips to LONG:
456 - WR
37.24% - compounded return
-3.56% - max DD
26.19% - PnL
-$2,113.83
- traded:
So the literal initial thesis fails. 0.70 is too low as a
side-switch trigger. It arms hundreds of LONG trades and turns a strong
short-led ledger into a slightly losing one.
Important evaluation frame:
The goal is not to find a LONG overlay that beats the whole short-only engine by itself. The goal is to find a side-selection overlay that adds marginal value only on the subset where it intervenes. The correct comparison is therefore:
overlay_delta =
pnl_if_intervened_long_on_triggered_trades
- pnl_if_original_short_was_left_unchanged_on_same_triggered_trades
The overlay is useful only if it satisfies all of the following:
- it has positive
overlay_deltaafter fees and conservative slippage - it reduces realized drawdown or loss clustering on the intervention subset
- it does not cut too many profitable short trades
- it remains positive across time splits, assets, and neighboring thresholds
- it has enough triggers to be statistically more than a single accident
Under that marginal-overlay framing, the broad leverage-win thesis still fails:
- persistent
0.70 / 0.69switch delta vs samelev >= 0.69short-only baseline: about-$13,591.51 - one-shot
0.70 / 0.69switch delta vs samelev >= 0.69short-only baseline: about-$13,177.69 - best swept dollar switch delta vs same
lev >= 0.69short-only baseline: about-$5,949.36
By contrast, the narrower post-outlier rule did show positive marginal overlay value on its triggered subset:
- triggered next-trade cases:
47 - leaving the next trade SHORT: PnL
-$1,725.40 - flipping only that next trade LONG: PnL
+$409.47 - marginal overlay delta:
+$2,134.87 - whole-sequence drawdown improved from about
15.70%to13.33%
That is the key distinction. The broad high-leverage-win rule is not reliable enough. The narrow post-outlier rule is a legitimate candidate for guarded shadow/live-probe research because it adds value exactly where it intervenes, but the sample is still too small for unconditional deployment.
Lowered big-win threshold grid
The phrase "sample too small" applies only to the original high-tail trigger
(pnl_abs >= $400, lev >= 8.5, immediate next trade). It does not mean
the BLUE ledger is small. The cleaned replay now spans:
1328non-hibernate / non-subday-ACB BLUE trades1616merged near-duplicate-cleaned trade IDs2026-03-31 01:10:34 UTCthrough2026-05-08 14:21:31 UTC
To test whether the effect survives with more triggers, the post-win sweep was expanded to:
- dollar win thresholds:
$10,$25,$50,$75,$100,$150,$200,$300,$400,$500,$750,$1000 - leverage thresholds:
0,0.69,0.70,1,2,3,5,8.5,9 - return thresholds:
0,0.10%,0.25%,0.50%,0.75%,0.95%,1.25% - follow-on horizons: next
1,2,3, and5trades
Important result:
- lowering dollar threshold alone does not work
- lowering dollar threshold with a realized-return threshold does work
- the effect is mostly next
1to2trades - by next
5trades, flipping LONG is not positive; cooldown / abstain is better than LONG if the horizon is that wide
Grid-wide stability:
- horizon
1:630eligible threshold combinations,60.0%positive marginal delta,45.87%positive LONG PnL - horizon
2:630eligible threshold combinations,57.30%positive marginal delta,39.52%positive LONG PnL - horizon
3:693eligible threshold combinations,59.60%positive marginal delta,12.99%positive LONG PnL - horizon
5:693eligible threshold combinations,51.08%positive marginal delta,0.0%positive LONG PnL
This says the post-win effect is a short-lived exhaustion / rebound artifact, not a durable multi-trade LONG regime.
Fixed dollar-only immediate-next-trade rows:
| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|---|---|---|---|---|---|---|
$10+, no lev gate |
277 | +$3,044 |
-$9,146 |
-$12,190 |
+24.74% |
24.55% |
$50+, no lev gate |
181 | +$4,495 |
-$8,870 |
-$13,365 |
+42.58% |
22.18% |
$100+, no lev gate |
135 | +$908 |
-$4,252 |
-$5,160 |
+97.78% |
18.09% |
$200+, no lev gate |
89 | -$947 |
-$1,496 |
-$549 |
+140.76% |
14.96% |
$300+, no lev gate |
62 | -$1,695 |
-$45 |
+$1,651 |
+174.25% |
13.70% |
$400+, no lev gate |
48 | -$1,725 |
+$407 |
+$2,133 |
+180.70% |
13.33% |
$500+, no lev gate |
40 | -$1,153 |
+$90 |
+$1,242 |
+173.51% |
13.33% |
Dollar-only conclusion:
- below about
$300, the next short trade is still net-profitable or less bad than the LONG flip - around
$300, the next short trade turns bad, but LONG is only near-flat - around
$400to$500, the next-trade LONG flip becomes positive
Fixed immediate-next-trade rows with a +0.75% realized-return trigger:
| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|---|---|---|---|---|---|---|
$10+ and +0.75% |
99 | -$1,735 |
-$409 |
+$1,326 |
+104.45% |
14.03% |
$50+ and +0.75% |
74 | -$1,950 |
+$105 |
+$2,055 |
+155.62% |
14.03% |
$75+ and +0.75% |
70 | -$2,028 |
+$194 |
+$2,223 |
+166.91% |
13.95% |
$100+ and +0.75% |
67 | -$2,083 |
+$336 |
+$2,419 |
+168.60% |
13.69% |
$150+ and +0.75% |
63 | -$2,082 |
+$344 |
+$2,426 |
+175.37% |
13.69% |
$300+ and +0.75% |
58 | -$1,738 |
+$58 |
+$1,796 |
+173.61% |
13.70% |
$400+ and +0.75% |
48 | -$1,725 |
+$407 |
+$2,133 |
+180.70% |
13.33% |
Return-conditioned conclusion:
- the effect becomes visible with more triggers when the dollar threshold is
lowered to
$50-$150and the prior win is also at least+0.75% - the best immediate-next-trade delta in this grid was around
$150+and+0.75%:63next trades, SHORT-$2,081.81, LONG+$343.94, delta+$2,425.75 - the original
$400+, high-leverage trigger remains good but is not the only viable threshold; it is the cleaner high-tail version
Two-trade horizon:
| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|---|---|---|---|---|---|---|
$300+, lev >= 8.5 |
115 | -$3,201 |
+$511 |
+$3,712 |
+168.52% |
14.27% |
$400+, lev >= 8.5 |
91 | -$2,689 |
+$556 |
+$3,245 |
+175.26% |
13.71% |
$500+, lev >= 8.5 |
75 | -$2,237 |
+$509 |
+$2,747 |
+167.53% |
14.71% |
Two-trade conclusion:
- the high-leverage
$300-$500zone supports a two-trade exhaustion rebound more strongly than the original one-trade-only statement - the best two-trade variant in this fixed grid was
$300+,lev >= 8.5, next two trades: delta+$3,712, estimated LONG PnL+$511 - the five-trade horizon should not be traded LONG; it is only a damage-control / cooldown signal
Reliability statement:
The post-win overlay is more solid than initially stated. The robust form is not "after any win"; that is false. The robust form is:
after a sufficiently large realized short win,
especially a high-return or high-leverage win,
the next 1-2 short-engine opportunities are often contaminated by rebound risk
and can be improved by LONG flip or, at minimum, cooldown/abstain.
The strongest candidates for shadow/live-probe research are:
- immediate next trade after
$100-$200win and prior return>= +0.75% - immediate next trade after
$400+win, especiallylev >= 8.5 - next two trades after
$300-$500win withlev >= 8.5
Guardrail:
The overlay should not optimize on WR. LONG WR remains lower than SHORT WR on many triggered subsets. The edge is payoff asymmetry / loss-tail avoidance: short wins become smaller or disappear after the exhaustion event, while short losses on the next trade(s) become expensive.
Candidate codified overlay rule and EFSM
Terminology:
- EFSM means Execution FSM
- refer to this component as the post-win EFSM, not merely a generic "state machine"
Candidate rule proposed after the lowered-threshold sweep:
after a completed BLUE SHORT trade:
if pnl_abs > $397:
tag next 1 trade as FLIP_LONG
if pnl_abs > $397 and leverage > 8.6:
tag next 2 trades as FLIP_LONG
if 0 < pnl_abs < $250 and pnl_pct >= +0.75%:
tag next 1 trade as FLIP_LONG
after the armed slots are consumed:
reset to SHORT
EFSM semantics:
- this is a slot-based Execution FSM, not a persistent regime switch
- each trigger arms an explicit number of future slots
- each future entry consumes exactly one slot
- when
slots_remaining == 0, the state resets to SHORT - while slots are active, new triggers are ignored by default
- a flipped LONG trade outcome is not allowed to re-arm the overlay
- this prevents the reset bug where one flipped trade recursively arms the next and converts a bounded rebound probe into an unbounded side switch
- the implementation supports arbitrary future slot counts, not only
1and2
Implementation location:
- EFSM:
adaptive_exit/post_win_long_overlay.py - canonical class names:
PostWinExecutionFSM,PostWinExecutionFSMConfig - compatibility aliases:
PostWinLongOverlay,PostWinLongOverlayConfig - tests:
prod/tests/test_post_win_long_overlay.py
Focused test coverage:
$397+non-high-leverage win arms one slot$397+andlev > 8.6arms two slots< $250andpnl_pct >= +0.75%arms one slot- active arms consume deterministically and reset to SHORT
- re-arm attempts while active are ignored
- flipped LONG outcomes cannot re-arm
- optional TTL expiry works
- future
3+slot rules work
Focused verification:
python -m pytest -o cache_dir=/tmp/pytest-cache-post-win-overlay \
prod/tests/test_post_win_long_overlay.py -q
7 passed
Exact candidate replay, no re-arm during active flip slots:
- input:
1333cleaned BLUE trades through2026-05-08 14:34:57 UTC - baseline short-only estimated PnL:
+$10,953.50 - candidate policy estimated PnL:
+$12,464.30 - marginal dollar delta:
+$1,510.80 - baseline max DD:
15.70% - candidate max DD:
14.78% - long-flipped trades:
160 - affected subset left SHORT:
-$2,415.46 - affected subset flipped LONG:
-$904.67 - affected subset marginal delta:
+$1,510.80 - triggers armed:
small_dollar_high_return:77big_win_high_lev:41big_win:1
- slots consumed:
small_dollar_high_return:77big_win_high_lev:82big_win:1
- consumed arms:
119 - dangling slots at end:
0 - ignored re-arm attempts while active:
20
Reset sensitivity:
Allowing active flipped trades / active arms to re-arm is harmful:
- unsafe recursive re-arm variant long flips:
183 - unsafe marginal delta:
-$5,425.32 - safe no-rearm marginal delta:
+$1,510.80
Therefore the no-recursive-rearm reset invariant is not optional. It is part of the edge definition.
Compound-return caveat:
- baseline short-only compound:
+164.89% - candidate compound:
+107.26%
This is why the overlay must be treated as a dollar-tail / drawdown-control overlay first, not as a compounding optimizer. The current counterfactual uses same entry/exit skeleton and estimated flipped LONG PnL, so the next validation step must include actual LONG execution assumptions, long-side V7 behavior, and time-to-next-entry gating.
Time dependency:
The replay showed material timing dependence:
| Delay from trigger to flipped entry | n | SHORT PnL | LONG PnL | Delta |
|---|---|---|---|---|
<=15m |
19 | +$2,765.51 |
-$3,062.37 |
-$5,827.88 |
15-30m |
67 | -$3,588.76 |
+$2,381.96 |
+$5,970.72 |
30-60m |
40 | -$882.57 |
-$104.33 |
+$778.24 |
>60m |
34 | -$709.64 |
-$119.93 |
+$589.72 |
This means the overlay may need a lower-bound delay, an upper-bound TTL, or market-state confirmation. The current EFSM already supports TTL; the exact timing gate remains research, not deployed doctrine.
AdvancedExitManagerV7 / AlphaExitEngineV7 caveat:
AlphaExitEngineV7 is mechanically side-aware:
side=0means LONGside=1means SHORT- PnL, MFE, MAE, trend direction, and adverse/favorable movement are signed by
ctx.side
However, V7 calibration is SHORT-lineage:
- bounce model labels were trained on BLUE SHORT adverse-bar samples
- pressure threshold
2.69was selected on SHORT/GREEN-lineage replay - MAE/MFE concepts are symmetric in code but not guaranteed symmetric in fitted thresholds or bounce probabilities
Before any live FLIP_LONG execution, V7 must be validated in one of these modes:
- shadow-only LONG contexts using actual flipped LONG entries
- conservative LONG-specific V7 threshold override
- disable V7 live exits for overlay LONGs until enough shadow decisions show it does not prematurely cut the rebound edge
The rule can be codified, but production wiring must keep the EFSM, side selection, and V7 exit policy explicitly separable.
Sweep results:
- best by compounded return:
- mode: one-shot after win
trigger_lev = 9.0trade_min_lev = 0.0- traded:
1324 - LONG trades:
222 - WR
50.91% - compounded return
+61.93% - max DD
19.36% - PnL
-$257.03
- best by estimated dollars:
- mode: one-shot after win
trigger_lev = 2.0trade_min_lev = 0.69- traded:
1050 - LONG trades:
297 - WR
40.03% - compounded return
+27.71% - max DD
22.44% - PnL
+$5,114.50
Both sweep optima still underperform the relevant short-only baselines. In particular, simply treating high leverage as a short-side quality filter is stronger than using high-leverage short wins as a broad long-switch trigger:
lev >= 8.5, short-only: PnL+$12,193.65, max DD7.58%- best long-switch dollar policy: PnL
+$5,114.50, max DD22.44%
Interpretation:
- leverage does behave like conviction, but the first-order use is filtering / sizing, not side inversion
- ordinary high-lev wins are too common to serve as a LONG regime switch
- the previous post-outlier result survives only because it was much narrower: large dollar win, 9x, and immediate next trade
- high-lev wins may still be useful as features in the dual-shadow /
market-fingerprint layer:
last_high_lev_short_winlast_high_lev_short_win_countlast_high_lev_short_win_pnl_abslast_high_lev_short_win_return_pctbars_since_high_lev_short_winconsecutive_high_lev_short_wins
Research conclusion:
- do not implement the literal
lev > 0.70long switch - do preserve leverage as a strong conviction feature
- do keep the narrower post-outlier one-trade long probe in the research queue
- the strongest immediate operational lesson is that low-leverage trades may be unnecessary, while high-leverage shorts remain the cleaner expression
AlphaExitEngineV7 LONG calibration replay
Date: 2026-05-08
Scope:
- system: BLUE only
- exit engine:
AlphaExitEngineV7 - harness:
adaptive_exit/calibrate_v7_long_from_journal.py - source data: ClickHouse
dolphin.v7_decision_events - source rows:
6,812 - reconstructed BLUE V7-tracked paths:
97 - path side in source journal: SHORT
- replay side for calibration: synthetic LONG (
side=0) - fee assumption:
4 bps - natural exit comparator: final logged decision-row price for the same path
- V7 exit comparator: first replayed V7
EXITon the same price path - bounce model: disabled for this replay by intentionally using a missing model path, because the current bounce model is trained on BLUE SHORT adverse-bar samples and should not be treated as a validated LONG probability model
This is a LONG-exit calibration proxy, not proof from exchange-filled LONG trades. It answers a narrower question: if the post-win EFSM had flipped a trade LONG on price paths that BLUE V7 actually observed, would a LONG-side V7 cut/exit surface have improved or harmed the synthetic LONG outcome versus holding to the path's natural end?
Original V7 SHORT calibration pattern
The original V7 calibration was a pressure-threshold sweep over live shadow decisions. V7 computes:
exit_pressure = clamp(directional_term + risk_term, -3.0, +3.0)
Then:
if exit_pressure > 2.69:
EXIT
elif exit_pressure > 1.0:
RETRACT
elif exit_pressure < -0.5 and pnl_pct > 0:
EXTEND
else:
HOLD
The documented SHORT lineage was:
| Pressure threshold | Fires | Result |
|---|---|---|
2.00 |
22/24 |
+$439, ROI +1.67% |
2.35 |
17/24 |
+$891, ROI +3.38% |
2.60 |
17/24 |
+$891, ROI +3.38% |
3.00 |
14/24 |
+$796, ROI +3.02% |
| base/no V7 | n/a | +$784, ROI +2.98% |
The deployed threshold 2.69 was chosen as the high end of the useful
2.35-2.70 band so V7 stayed closer to base behavior and avoided cutting
winners on transient pressure.
Threshold surface now explicit
AlphaExitEngineV7 now accepts an optional per-engine
AlphaExitV7Config. Defaults preserve the deployed SHORT-calibrated behavior.
This lets BLUE instantiate separate SHORT and LONG V7 engines later without
global mutation.
V7-specific configurable fields:
| Config field | Default | Meaning |
|---|---|---|
rvol_w15 |
0.50 |
realized-vol composite weight for 15-bar volatility |
rvol_w30 |
0.30 |
realized-vol composite weight for 30-bar volatility |
rvol_w50 |
0.20 |
realized-vol composite weight for 50-bar volatility |
rvol_floor |
0.000001 |
minimum realized-vol denominator |
mae_tier1_k |
3.5 |
MAE tier-1 multiplier on rv_comp |
mae_tier2_k |
7.0 |
MAE tier-2 multiplier on rv_comp |
mae_tier3_k |
12.0 |
MAE tier-3 multiplier on rv_comp |
mae_tier1_floor |
0.005 |
MAE tier-1 absolute floor |
mae_tier2_floor |
0.012 |
MAE tier-2 absolute floor |
mae_tier3_floor |
0.025 |
MAE tier-3 absolute floor |
mae_tier1_risk |
0.5 |
pressure contribution once tier 1 is breached |
mae_tier2_risk |
0.8 |
pressure contribution once tier 2 is breached |
mae_tier3_risk |
1.2 |
pressure contribution once tier 3 is breached |
mae_accel_min_bars |
3 |
minimum bars before adverse-acceleration gate can fire |
mae_accel_peak_floor |
0.003 |
adverse peak floor for MAE acceleration risk |
mae_accel_risk |
0.6 |
pressure contribution for MAE acceleration |
mae_recovery_peak_floor |
0.004 |
adverse peak floor for failed-recovery gate |
mae_recovery_prev_min |
0.25 |
prior recovery ratio required before snapback risk |
mae_recovery_snapback_max |
0.10 |
recovery ratio below which recovery is treated as failed |
mae_recovery_risk |
1.0 |
pressure contribution for failed recovery |
mae_late_floor |
0.003 |
MAE required before late adverse ramp applies |
mae_late_start_frac |
0.60 |
bars-held fraction where late adverse ramp starts |
mae_late_risk_max |
0.4 |
maximum late adverse pressure contribution |
max_hold_ref_mult_3m |
3.0 |
V7 internal max-hold reference multiplier |
mfe_slope_peak_floor |
0.01 |
peak favorable floor for convexity slope break |
mfe_convexity_decay_exit |
0.35 |
decay ratio for hard MFE giveback pressure |
mfe_convexity_decay_soft |
0.20 |
decay ratio for soft MFE giveback pressure |
mfe_convexity_exit_risk |
1.5 |
pressure contribution for hard MFE giveback |
mfe_convexity_soft_risk |
0.3 |
pressure contribution for soft MFE giveback |
mfe_accel_floor |
-0.00001 |
MFE acceleration floor for adverse convexity |
mfe_accel_peak_floor |
0.005 |
peak favorable floor for MFE acceleration risk |
mfe_accel_risk |
0.2 |
pressure contribution for MFE acceleration risk |
bounce_dir_w |
0.15 |
bounce score directional-term weight |
bounce_risk_w |
0.35 |
bounce risk-term weight |
bounce_rv_safe_floor |
0.00001 |
bounce feature volatility denominator floor |
exit_pressure_threshold |
2.69 |
live EXIT threshold |
retract_pressure_threshold |
1.0 |
RETRACT threshold |
extend_pressure_threshold |
-0.5 |
profitable EXTEND threshold |
pressure_min |
-3.0 |
pressure clamp lower bound |
pressure_max |
3.0 |
pressure clamp upper bound |
Inherited V6 weight priors remain configurable through the existing
WeightAdapter/WeightPriors seam. The new config is specifically for V7
threshold/gate surfaces and is init-time/per-engine configurable.
LONG replay results
Baseline synthetic LONG natural exit across the 97 paths:
- natural PnL:
-$328.84 - natural WR:
59.79% - natural compound:
+3.50% - natural max DD:
2.28%
The dollar PnL and compound can diverge because path notionals differ. For this exit calibration, dollar PnL is the more relevant metric because BLUE sizing is not uniform.
Top tested surfaces:
| Candidate | V7 PnL | Delta vs natural | Exits | Exit rate | V7 WR | V7 max DD |
|---|---|---|---|---|---|---|
mfe_risk_scale_0.5 |
+$205.32 |
+$534.15 |
36 |
37.11% |
50.52% |
1.69% |
mfe_risk_scale_0.75 |
+$205.32 |
+$534.15 |
36 |
37.11% |
50.52% |
1.69% |
combo_p1.7_mae0.75 |
+$47.24 |
+$376.08 |
51 |
52.58% |
47.42% |
1.55% |
exit_p1.7 |
+$36.88 |
+$365.72 |
51 |
52.58% |
47.42% |
1.53% |
exit_p2.0 |
+$19.68 |
+$348.52 |
41 |
42.27% |
49.48% |
1.53% |
short_default / exit_p2.69 |
+$1.43 |
+$330.26 |
38 |
39.18% |
49.48% |
1.81% |
exit_p3.0 |
-$328.84 |
$0.00 |
0 |
0.00% |
59.79% |
2.28% |
Interpretation:
- The deployed SHORT default is not mechanically broken for LONG. It improved
synthetic LONG dollar outcome by
+$330.26versus natural exit on the 97 replayed paths. - The best tested LONG proxy did not come from lowering the pressure threshold.
It came from reducing MFE giveback/convexity pressure contribution
(
mfe_risk_scale_0.5or0.75). - Aggressively lowering
exit_pressure_thresholdto1.4over-fires:78/97exits, V7 PnL-$11.78, and many negative deltas. That resembles the original SHORT calibration failure at2.0: pressure that is too sensitive cuts too much transient noise. - A moderate pressure threshold around
1.7-2.0is useful, but still inferior to leaving pressure at2.69and reducing MFE-risk contributions in this proxy.
Recommended LONG overlay calibration candidate for shadow:
AlphaExitV7Config(
mfe_convexity_exit_risk=0.75,
mfe_convexity_soft_risk=0.15,
mfe_accel_risk=0.10,
)
This is the mfe_risk_scale_0.5 surface. It keeps:
exit_pressure_threshold = 2.69- all MAE vol-normalized loss-cut thresholds unchanged
- pressure clamp unchanged
- bounce disabled or neutral until a LONG-trained bounce model exists
Why this candidate is preferable to simply lowering exit_pressure_threshold:
- it preserved the useful loss-cut behavior while avoiding broad pressure over-firing
- it improved dollar PnL more than all pressure-threshold sweeps tested
- it left MAE protection intact, which matters if the flipped LONG thesis is wrong and the asset continues down
- it respects that the post-win EFSM edge is a rebound/cooldown edge, so the exit manager should not over-penalize ordinary post-entry MFE shape
Do not deploy this LONG config live yet. It should first be run in shadow on actual EFSM-flipped candidate LONG contexts, because this replay uses SHORT entries inverted to LONG and not real LONG fills.
Regression and safety notes
Implemented code seams:
nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.pydefinesAlphaExitV7Config- default
AlphaExitEngineV7()behavior remains the SHORT-calibrated config - a LONG-specific engine can be instantiated with
AlphaExitEngineV7(config=...) - the calibration harness writes full results to
/tmp/v7_long_calibration.json
Tests added:
- default config equals the legacy SHORT threshold surface
- custom config is per-instance and does not mutate the default engine
- V7 remains mechanically side-aware for LONG and SHORT PnL/MFE/MAE
- BLUE live V7 provider wiring still records journal decisions and uses OB signal input
- EFSM reset/no-recursive-rearm tests remain separate from V7 exit calibration
Research caveats:
- only
97V7-tracked BLUE paths existed in the current decision journal - this is enough to reject obviously bad LONG exit settings, but not enough to canonize a live LONG exit policy
- bounce must remain neutral for LONG until trained or validated on LONG samples
- V7
max_hold_ref_mult_3mstill uses an internal time reference rather than the orchestrator's effective max hold; the system bible already tracks this as a V7 TODO/bug because it can make adverse-ramp pressure too early