# LONG Deterministic Rule Research Date: 2026-05-07 ## Goal Find the simplest deterministic long-side market rule, using primarily Dolphin NG eigendata, that behaves like the original short Alpha Engine rule in spirit: - few moving parts - market-structural - explainable in one breath - reliable enough to serve as a basal gate before asset selection and later overlays This note is explicitly **not** about a fitted long model. ## Data source The analysis uses the raw daily scan cache summarized by: - `adaptive_exit/characterize_long_signals.py` - `/mnt/dolphin_training/long_signal_research/long_signal_scan_summary_h24.parquet` - `/mnt/dolphin_training/long_signal_research/long_signal_characterization_report.json` Only eigendata and scan-price-derived outcomes are used here: - `instability_50` - `v50/v150/v300/v750_lambda_max_velocity` - `vel_div` - `vel_div` lag / delta terms No ExF, EsoF, or OBF are required for the core finding. ## What does **not** work as the basal long rule The obvious mirror thesis, - `vel_div > 0.01` is too weak to be the basal long edge. Recent HQ slice (`2025-12-31` onward): - support: `39.65%` - `strong_long` lift: `1.15x` - `broad_long` lift: `1.22x` That is not useless, but it is not elegant enough nor selective enough to be the long analogue of `vel_div < -0.02`. ## Strongest deterministic shape The long side shows up most clearly as a **stressed unwind / squeeze** regime, not as a generic bullish breakout regime. ### Candidate primary deterministic rule ```text LONG_REGIME if instability_50 >= 20.5 and v300_lambda_max_velocity < 0 and v750_lambda_max_velocity < 0 ``` Interpretation: - `instability_50 >= 20.5`: the market is structurally stressed - `v300 < 0` and `v750 < 0`: the slower eigenspace is still negative / damaged - together: this is a high-stress unwind state where long opportunities tend to appear as reversals / squeezes on the same manifold that produces short dislocations ### Why `20.5` `20.5` is the rounded recent-HQ `instability_50` 90th-percentile threshold (`20.546996...`). It is the most practical fixed threshold found in the recent-era characterization. ## Empirical support ### Recent HQ (`2025-12-31` onward) Base rates: - `strong_long`: `0.1648` - `broad_long`: `0.1367` Rule: - support: `6356` rows (`3.58%`) - `strong_long`: `0.3409` (`2.07x` lift) - `broad_long`: `0.3538` (`2.59x` lift) ### Full history Base rates: - `strong_long`: `0.2603` - `broad_long`: `0.2472` Rule: - support: `300,728` rows (`12.59%`) - `strong_long`: `0.3330` (`1.28x` lift) - `broad_long`: `0.3375` (`1.37x` lift) ## Simpler fallback If maximum elegance is preferred over extra selectivity, the one-factor fallback is: ```text LONG_REGIME_SIMPLE if instability_50 >= 20.5 ``` Recent HQ: - support: `10.10%` - `strong_long`: `0.3297` (`2.00x` lift) - `broad_long`: `0.3420` (`2.50x` lift) This is surprisingly strong for a one-variable rule. It is the closest thing found to a pure long-side analogue of the short `vel_div < -0.02` gate. Tradeoff: - simpler - broader - slightly less selective than adding `v300 < 0` and `v750 < 0` ## Optional stricter confirmation If later tuning wants more explicit “healing after stress” confirmation, the strict variant is: ```text LONG_REGIME_STRICT if instability_50 >= 20.5 and vel_div_lag6 < -0.03 and vel_div_delta6 > 0.02 ``` This is directionally sensible, but it is not materially better than the `instability_50 + v300 + v750` rule, so it should be treated as an optional refinement, not the basal rule. ## Monthly sanity check For the candidate primary rule (`instability_50 >= 20.5 && v300 < 0 && v750 < 0`) in the recent HQ window: - `2026-01`: `strong_long = 0.348` - `2026-02`: `strong_long = 0.344` - `2026-03`: `strong_long = 0.312` The monthly base rates for the same period were: - `2026-01`: `0.289` - `2026-02`: `0.271` - `2026-03`: `0.068` So even into the weak March tape, the rule remains elevated relative to base. ## Practical interpretation This should be viewed as a **market-state gate**, not a complete trade engine. It says: - “the market is in the sort of stressed, damaged regime where long squeeze / unwind opportunities become meaningfully more likely” It does **not** by itself say: - which asset is the best expression - how to size - how to exit That is where the next layers belong: - deterministic or learned asset selection - OBF / ARS / bounce overlays - TP / MAX_HOLD policy ## Recommendation If a single deterministic long gate must be named now, use: ```text LONG_REGIME if instability_50 >= 20.5 and v300 < 0 and v750 < 0 ``` If maximum simplicity is the priority, use: ```text LONG_REGIME_SIMPLE if instability_50 >= 20.5 ``` And explicitly do **not** promote `vel_div > 0.01` as the basal long rule. ## Deferred analysis idea: dual-shadow regime sampler This is a **later analysis / control-layer research note**, not a live-rule recommendation. One plausible way to sample the market in real time without committing the full system immediately is a very lightweight **dual-shadow engine**: - Shadow A: the basal SHORT engine (`vel_div < -0.02` Alpha Engine posture) - Shadow B: the basal LONG engine (currently the older negative-`vel_div` mean-reversion LONG posture is the best simple candidate) The intent is not merely paper PnL logging. It is to use live, recent sample-trade outcomes as a **micro-regime probe**: - if SHORT shadow performance degrades while LONG shadow performance improves, the tape may have rotated into a LONG-favorable regime - if LONG degrades while SHORT improves, the inverse may be true - if both are performing acceptably, the tape may be permissive / broad enough that either side can express edge - if both are failing, the tape is likely choppy / non-coherent and abstention becomes a first-class candidate This should be implemented, if ever pursued, as: - very fast - very lightweight - explicitly shadow-only at first - based on small, recent sample trades rather than a heavy fitted model Longer-term, the entire shadow stream can itself become training data: - market fingerprints at shadow-entry time - concurrent SHORT-shadow and LONG-shadow outcomes - relative WR / ROI-per-trade / drawdown / time-to-win asymmetries That would allow a later learner to predict or simplify the regime switcher. But even before ML, the dual-shadow process may already serve as a useful real-time market-sampling / regime-detection mechanism. ## Dual-shadow persistence characterization This section records the first persistence pass over extant trades. The goal was not to prove a full regime-switch system, but to test whether the observed short-loss streaks are durable enough to justify a regime-favorableness probe. Important caveat: - the live SHORT series and the replay LONG series are on different date spans - this is therefore a side-specific persistence study, not a same-bar paired dominance study - the numbers below are still useful for run-length and hysteresis design ### Live SHORT stream From the current BLUE trader log: - trades: `234` - win rate: `44.44%` - mean `pnl_pct`: `+0.000506` - median `pnl_pct`: `-0.000234` - average win streak: `1.65 trades` - average loss streak: `2.03 trades` - `P(win -> win) = 0.394` - `P(loss -> loss) = 0.512` - average positive-day run: `1.5 days` - average negative-day run: `1.5 days` Interpretation: - short failures do cluster - the cluster is real enough to notice - but it is only mildly persistent - by itself, it is not strong enough to justify a raw ping-pong switch ### Basal LONG shadow, old mirror posture Using the recent bullish-month replay and the single comparable `10-bar / worst_10bar` configuration: - trades: `2,243` - win rate: `48.33%` - mean `pnl_pct`: `+0.000320` - median `pnl_pct`: `-0.000400` - average win streak: `1.93 trades` - average loss streak: `2.07 trades` - `P(win -> win) = 0.483` - `P(loss -> loss) = 0.517` - average positive-day run: `3.0 days` - average negative-day run: `1.86 days` Interpretation: - this is the clearest durable long-favorable candidate seen so far - the multi-day positive run length is materially better than the live short stream - this supports a long-favorable regime probe, but not an unconditional flip ### Basal LONG shadow, new stressed-unwind posture Same replay setup: - trades: `569` - win rate: `50.44%` - mean `pnl_pct`: `-0.000078` - median `pnl_pct`: `+0.000068` - average win streak: `2.24 trades` - average loss streak: `2.20 trades` - `P(win -> win) = 0.556` - `P(loss -> loss) = 0.546` - average positive-day run: `1.36 days` - average negative-day run: `1.18 days` Interpretation: - the new long posture has decent local persistence - but it is more fragile than the mirror-long posture as a regime switch - it does not yet justify itself as the primary flip trigger ### Conclusion for regime switching The data support a **smoothed regime-favorableness detector**, not a raw flip-on-first-loss system. Practical reading: - short-loss streak persistence is real but modest - long-favorable states exist and can persist - persistence is on the order of a few trades, not a dramatic regime lock - the correct implementation is a shadow score with hysteresis and abstain logic, not a hard immediate SHORT/LONG switch Suggested rule shape for later analysis: - compute rolling shadow scores for SHORT and LONG - use persistence thresholds before flipping - require stronger evidence to reverse than to stay put - abstain when both shadows are weak or both are losing This is enough to justify the next engineering step: - live dual-shadow logging on the same bars - market-fingerprint tagging of each shadow entry - later ML over shadow outcomes if the deterministic layer proves stable ## Rolling flip-worthiness test To make the side-switch question stricter, the recent live short slice was retested with a `5-trade` rolling shadow-delta proxy: - short shadow return = actual live short `pnl_pct` - long shadow return = counterfactual `-pnl_pct - fee` - rolling delta = rolling mean of `(long_shadow - short_shadow)` Recent 3-day slice (`2026-05-04` to `2026-05-06`): - trades: `168` - short actual WR: `39.88%` - short actual compounded return: `+10.02%` - long counterfactual WR: `47.62%` - long counterfactual compounded return: `-16.92%` - flip-to-long signals from the `5-trade` rolling delta: `68` - flip-to-short signals from the `5-trade` rolling delta: `79` Interpretation: - the rolling delta does detect alternating regime pockets - but it does so often enough that a raw flip would be too twitchy - on the most recent 30 live trades, the regime buckets were: - `13` long-favorable - `7` short-favorable - `10` neutral - the long-favorable bucket had positive expected PnL, but the short-favorable bucket was also positive and slightly stronger The important point is that the signal is not “switch now on first loss.” It is: - keep a smoothed side-dominance score - require persistence before flipping - use hysteresis - abstain when the shadow spread is weak or oscillatory So the stricter test reinforces the earlier conclusion: - there is enough structure to justify a regime-favorableness detector - there is not yet enough stability to justify a raw mechanical flip - the right next step is live dual-shadow logging on the same bars, then threshold and persistence calibration on that shared stream ## Flip-after-loss counterfactual The actual live short ledger was also replayed under a simple finite-state side-switch rule: - start `SHORT` - if the current side loses `N` trades in a row, flip to the other side - keep applying the same rule across the whole trade sequence This is the cleanest way to test the idea “short losses are the long cue.” On the current `234`-trade live ledger: - always short: WR `44.44%`, compounded return `+11.35%`, max DD `5.71%` - always long: WR `44.87%`, compounded return `-20.13%`, max DD `23.09%` Threshold sweep: - `N=1`: WR `40.60%`, compounded return `+5.33%`, max DD `11.11%`, flips `139` - `N=2`: WR `44.44%`, compounded return `-17.72%`, max DD `17.77%`, flips `43` - `N=3`: WR `48.29%`, compounded return `+5.48%`, max DD `6.35%`, flips `13` - `N=4`: WR `47.86%`, compounded return `+6.21%`, max DD `6.55%`, flips `7` - `N=5`: WR `43.59%`, compounded return `+10.52%`, max DD `5.59%`, flips `5` - `N=6`: WR `45.73%`, compounded return `+15.17%`, max DD `4.84%`, flips `3` Interpretation: - side switching can help - it helps best when the flip threshold is fairly high - the best observed threshold in this small grid was `N=6` - low thresholds are too twitchy and can destroy the edge So the practical conclusion is: - a raw flip-on-first-loss rule is not justified - a slower loss-cluster regime switcher is plausible - the switcher must be hysteretic and persistence-gated This is consistent with the earlier shadow-score recommendation and explains why the observed “8 or 9 losses, then a couple wins” pattern can be useful without being directly automatable at a low threshold. ## Condition-gated flip replay I then reran the side-switch counterfactual with an additional gate: - the current side must first hit `N` consecutive losses - the opposite side must also satisfy its own deterministic long/short entry condition - the replay uses the same 10-bar tape skeleton and the worst-10-bar asset expression Two long theories were tested separately: - **Old mirror-long**: `vel_div < -0.02` and cross-sectional 10-bar momentum `< 0` - **New stressed-unwind long**: `instability_50 >= 20.5` and `v300 < 0` and `v750 < 0` Results on the long research windows: - old mirror-long becomes marginally usable only at high thresholds: - `N=5`: WR `47.00%`, compounded return `+6.34%`, DD `46.23%`, flips `11` - `N=6`: WR `46.52%`, compounded return `+28.34%`, DD `43.78%`, flips `5` - the new stressed-unwind long does **not** survive this gate cleanly: - `N=1..6`: compounded return stays negative, with severe drawdown Interpretation: - the condition gate does not rescue the new long theory - it does preserve the old mirror-long as a late, low-frequency fallback - the market still looks too unstable for a low-threshold flip rule - if we keep this path, it should be a smoothed regime sampler, not an immediate switcher Report: - [`flip_on_loss_condition_gate_report.md`]() ## Full-history condition-gated replay I then ran the same condition-gated flip simulator across the entire available price tape: - root: `/mnt/dolphin_training/share_offload/vbt_cache_klines` - rows: `2,553,401` - span: `2021-06-15 00:01:00+00:00 -> 2026-03-18 18:16:40.041456896+00:00` This is the hardest and most useful stress test because it removes the recent-slice bias entirely. Results: - **old mirror-long** - `N=1..6` win rate range: `44.95% -> 46.60%` - best mean PnL at `N=6`: `-0.000163` per trade - best threshold still compounds to `-100%` over the full archive - **new stressed-unwind long** - `N=1..6` win rate range: `44.16% -> 46.86%` - best mean PnL at `N=6`: `-0.000218` per trade - best threshold also compounds to `-100%` Interpretation: - the condition gate does not rescue either long theory at full-archive scale - the old mirror-long is still the stronger of the two, but only marginally - the long-side edge, if it exists, is too weak or too regime-dependent to survive this archive-wide flip rule without additional filtering - the full-tape result is a warning against over-trusting the favorable recent-month slices Report: - [`flip_on_loss_condition_gate_stream_full_report.md`]() ## Post-outlier-short-win long-flip probe Motivation: the May 8 live footer showed a familiar-looking pattern: - large 9x short win, e.g. `ALGOUSDT` `+$466` or `VETUSDT` `+$574` - immediately followed by a somewhat larger-than-normal short loss, e.g. `DASHUSDT -$191` or `STXUSDT -$54` The question was whether this is a real post-outlier rebound signature: ```text after a very large short win, should the next trade, or next few trades, be treated as LONG candidates? ``` Dataset and hygiene: - source: BLUE only - ClickHouse `dolphin.trade_events`: `1305` rows, `1296` unique trade IDs - trader logs: `1712` exit rows, `1092` unique trade IDs - merged near-duplicate-cleaned sequence: `1609` unique trade IDs - analysis subset after excluding hibernate / subday ACB exits: `1321` trades - span: `2026-03-31 01:10:34 UTC` to `2026-05-08 13:26:06 UTC` The log and warehouse streams overlap but do not have perfectly identical timestamps, so the analysis de-duplicates by trade id where possible and by near-time / asset / reason / realized PnL where the same exit was written by both paths. This matters because a naive merge double-counts many recent exits. Counterfactual method: - keep the same entry/exit skeleton - actual side is the live BLUE short - counterfactual long return is approximated as `-short_return - 4 bps` - this is not a separately selected long engine; it only tests whether the immediate post-win tape direction would have favored the other side Baseline over the cleaned sequence: - always short: `1321` trades, WR `55.79%`, mean return/trade `+0.0781%`, compounded return `+166.36%`, max DD `15.70%` - always long on the same skeleton: WR `38.46%`, mean return/trade `-0.1181%`, compounded return `-80.08%`, max DD `80.48%` So the full ledger does **not** support a broad long flip. The question only survives as a narrow post-outlier condition. Primary post-outlier trigger: ```text trigger if prior trade: pnl_abs >= $400 leverage >= 8.5x pnl_pct >= +0.50% ``` Immediate next-trade result: - triggers: `47` - next trades affected: `47` - actual next short subset: WR `53.19%`, mean return `-0.0821%`, compounded return `-4.05%`, realized PnL `-$1,725.40` - flipped-to-long subset: WR `40.43%`, mean return `+0.0421%`, compounded return `+1.72%`, estimated PnL `+$409.47` - estimated dollar delta: `+$2,134.88` - whole-sequence policy if only those next trades are flipped: compounded return improves from `+166.36%` to `+182.38%` and max DD improves from `15.70%` to `13.33%` The stricter trigger `pnl_abs >= $400`, `leverage >= 8.5x`, `pnl_pct >= +0.95%` is similar: - triggers: `46` - actual next short subset: `-$1,534.21` - flipped-to-long estimate: `+$276.64` - estimated dollar delta: `+$1,810.85` - whole-sequence compounded return: `+180.91%` The effect is strongest on the immediately following trade. It decays quickly: - next `2` trades after the primary trigger: affected `91`, actual `-$2,689.16`, flipped estimate `+$555.98`, dollar delta `+$3,245.15` - next `3` trades: affected `134`, actual `-$2,357.77`, flipped estimate `-$588.02`, dollar delta still positive because the flip loses less - next `5` trades: benefit becomes materially less clean Examples from the live tail: - `ALGOUSDT` `2026-05-08 09:55 UTC`, `+466.34`, `9x`, `+0.929%` - next trade `DASHUSDT`: actual short `-191.19`; same-skeleton long would have been directionally positive after fee - `VETUSDT` `2026-05-08 12:37 UTC`, `+573.64`, `9x`, `+1.546%` - next trade `STXUSDT`: actual short `-53.52`; same-skeleton long would have been directionally positive after fee - larger historic outlier `STXUSDT` `2026-05-05 20:29 UTC`, `+6796.86`, `9x`, `+13.845%` - the following trade was a small short loss, and the next several trades were mixed rather than uniformly long-favorable Interpretation: - there is a real event-conditioned post-outlier rebound / exhaustion signal - it is not a win-rate improvement; it is a dollar / drawdown improvement - it should not be promoted as a general long engine - it is best framed as a one-trade post-outlier **long probe** or short cooldown candidate, not as a multi-trade regime flip Relationship to the long-system research: - this is different from both deterministic long theories already studied: - old mirror-long: negative `vel_div` mean-reversion long - new stressed-unwind long: high instability plus negative slow velocities - the post-outlier signal is more local and path-conditioned: - a violent short win likely means the chosen asset or local basket has just completed an exhaustion leg - the next trade may be more exposed to rebound / adverse short continuation than to fresh downside continuation - this should become a feature inside the dual-shadow side-selection sampler: - `last_trade_was_outlier_short_win` - `last_trade_leverage` - `last_trade_realized_pnl_abs` - `last_trade_return_pct` - `bars_since_outlier_win` - `same_asset_or_correlated_asset_followup` Research conclusion: - broad `SHORT -> LONG` inversion remains false on the full sequence - immediate one-trade long probing after a large 9x short win is empirically plausible and improved historical BLUE dollars in this cleaned replay - the next test should condition this event trigger on the existing long gates and market fingerprint state, rather than using it as a naked side switch ## Leverage-as-conviction win-probe sweep Follow-up thesis: ```text leverage is a conviction expression if a high-conviction short probe wins: make subsequent / next trades LONG if leverage is below roughly 0.69: possibly do not trade ``` The initial test used: ```text trigger_lev = 0.70 trade_min_lev = 0.69 win = net PnL > 0 ``` Two side-selection forms were tested: - **persistent shadow probe**: the short engine continues to run as a shadow. A high-lev short-shadow win turns the traded side LONG. A high-lev short-shadow loss resets the traded side SHORT. - **one-shot after win**: a high-lev short-shadow win arms only the next eligible trade as LONG, then resets. The test used the same cleaned BLUE sequence as the post-outlier study, updated through `2026-05-08 13:40:04 UTC`: - ClickHouse rows: `1307` - ClickHouse unique trade IDs: `1298` - trader-log exit rows: `1716` - merged near-duplicate-cleaned trade IDs: `1612` - analysis subset after excluding hibernate / subday ACB exits: `1324` Baselines: - always short: `1324` trades, WR `55.82%`, mean return/trade `+0.0784%`, compounded return `+168.02%`, max DD `15.70%`, PnL `+$11,135.86` - always long on the same skeleton: WR `38.44%`, compounded return `-80.23%`, max DD `80.62%`, PnL `-$36,875.48` - short-only with `trade_min_lev >= 0.69`: `1050` trades, compounded return `+81.86%`, max DD `20.80%`, PnL `+$11,063.86` - short-only with `trade_min_lev >= 5.0`: `565` trades, compounded return `+88.08%`, max DD `8.94%`, PnL `+$11,980.01` - short-only with `trade_min_lev >= 8.5`: `501` trades, compounded return `+82.57%`, max DD `7.58%`, PnL `+$12,193.65` Initial `0.70 / 0.69` thesis result: - persistent shadow-probe switch: - traded: `1050` - LONG trades: `457` - flips to LONG: `249` - WR `37.08%` - compounded return `-5.61%` - max DD `26.60%` - PnL `-$2,527.65` - one-shot after high-lev win: - traded: `1050` - LONG trades: `455` - flips to LONG: `456` - WR `37.24%` - compounded return `-3.56%` - max DD `26.19%` - PnL `-$2,113.83` So the literal initial thesis fails. `0.70` is too low as a side-switch trigger. It arms hundreds of LONG trades and turns a strong short-led ledger into a slightly losing one. Important evaluation frame: The goal is **not** to find a LONG overlay that beats the whole short-only engine by itself. The goal is to find a side-selection overlay that adds marginal value only on the subset where it intervenes. The correct comparison is therefore: ```text overlay_delta = pnl_if_intervened_long_on_triggered_trades - pnl_if_original_short_was_left_unchanged_on_same_triggered_trades ``` The overlay is useful only if it satisfies all of the following: - it has positive `overlay_delta` after fees and conservative slippage - it reduces realized drawdown or loss clustering on the intervention subset - it does not cut too many profitable short trades - it remains positive across time splits, assets, and neighboring thresholds - it has enough triggers to be statistically more than a single accident Under that marginal-overlay framing, the broad leverage-win thesis still fails: - persistent `0.70 / 0.69` switch delta vs same `lev >= 0.69` short-only baseline: about `-$13,591.51` - one-shot `0.70 / 0.69` switch delta vs same `lev >= 0.69` short-only baseline: about `-$13,177.69` - best swept dollar switch delta vs same `lev >= 0.69` short-only baseline: about `-$5,949.36` By contrast, the narrower post-outlier rule did show positive marginal overlay value on its triggered subset: - triggered next-trade cases: `47` - leaving the next trade SHORT: PnL `-$1,725.40` - flipping only that next trade LONG: PnL `+$409.47` - marginal overlay delta: `+$2,134.87` - whole-sequence drawdown improved from about `15.70%` to `13.33%` That is the key distinction. The broad high-leverage-win rule is not reliable enough. The narrow post-outlier rule is a legitimate candidate for guarded shadow/live-probe research because it adds value exactly where it intervenes, but the sample is still too small for unconditional deployment. ### Lowered big-win threshold grid The phrase "sample too small" applies only to the original high-tail trigger (`pnl_abs >= $400`, `lev >= 8.5`, immediate next trade). It does **not** mean the BLUE ledger is small. The cleaned replay now spans: - `1328` non-hibernate / non-subday-ACB BLUE trades - `1616` merged near-duplicate-cleaned trade IDs - `2026-03-31 01:10:34 UTC` through `2026-05-08 14:21:31 UTC` To test whether the effect survives with more triggers, the post-win sweep was expanded to: - dollar win thresholds: `$10`, `$25`, `$50`, `$75`, `$100`, `$150`, `$200`, `$300`, `$400`, `$500`, `$750`, `$1000` - leverage thresholds: `0`, `0.69`, `0.70`, `1`, `2`, `3`, `5`, `8.5`, `9` - return thresholds: `0`, `0.10%`, `0.25%`, `0.50%`, `0.75%`, `0.95%`, `1.25%` - follow-on horizons: next `1`, `2`, `3`, and `5` trades Important result: - lowering **dollar threshold alone** does not work - lowering dollar threshold **with a realized-return threshold** does work - the effect is mostly next `1` to `2` trades - by next `5` trades, flipping LONG is not positive; cooldown / abstain is better than LONG if the horizon is that wide Grid-wide stability: - horizon `1`: `630` eligible threshold combinations, `60.0%` positive marginal delta, `45.87%` positive LONG PnL - horizon `2`: `630` eligible threshold combinations, `57.30%` positive marginal delta, `39.52%` positive LONG PnL - horizon `3`: `693` eligible threshold combinations, `59.60%` positive marginal delta, `12.99%` positive LONG PnL - horizon `5`: `693` eligible threshold combinations, `51.08%` positive marginal delta, `0.0%` positive LONG PnL This says the post-win effect is a short-lived exhaustion / rebound artifact, not a durable multi-trade LONG regime. Fixed dollar-only immediate-next-trade rows: | Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD | |---|---:|---:|---:|---:|---:|---:| | `$10+`, no lev gate | 277 | `+$3,044` | `-$9,146` | `-$12,190` | `+24.74%` | `24.55%` | | `$50+`, no lev gate | 181 | `+$4,495` | `-$8,870` | `-$13,365` | `+42.58%` | `22.18%` | | `$100+`, no lev gate | 135 | `+$908` | `-$4,252` | `-$5,160` | `+97.78%` | `18.09%` | | `$200+`, no lev gate | 89 | `-$947` | `-$1,496` | `-$549` | `+140.76%` | `14.96%` | | `$300+`, no lev gate | 62 | `-$1,695` | `-$45` | `+$1,651` | `+174.25%` | `13.70%` | | `$400+`, no lev gate | 48 | `-$1,725` | `+$407` | `+$2,133` | `+180.70%` | `13.33%` | | `$500+`, no lev gate | 40 | `-$1,153` | `+$90` | `+$1,242` | `+173.51%` | `13.33%` | Dollar-only conclusion: - below about `$300`, the next short trade is still net-profitable or less bad than the LONG flip - around `$300`, the next short trade turns bad, but LONG is only near-flat - around `$400` to `$500`, the next-trade LONG flip becomes positive Fixed immediate-next-trade rows with a `+0.75%` realized-return trigger: | Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD | |---|---:|---:|---:|---:|---:|---:| | `$10+` and `+0.75%` | 99 | `-$1,735` | `-$409` | `+$1,326` | `+104.45%` | `14.03%` | | `$50+` and `+0.75%` | 74 | `-$1,950` | `+$105` | `+$2,055` | `+155.62%` | `14.03%` | | `$75+` and `+0.75%` | 70 | `-$2,028` | `+$194` | `+$2,223` | `+166.91%` | `13.95%` | | `$100+` and `+0.75%` | 67 | `-$2,083` | `+$336` | `+$2,419` | `+168.60%` | `13.69%` | | `$150+` and `+0.75%` | 63 | `-$2,082` | `+$344` | `+$2,426` | `+175.37%` | `13.69%` | | `$300+` and `+0.75%` | 58 | `-$1,738` | `+$58` | `+$1,796` | `+173.61%` | `13.70%` | | `$400+` and `+0.75%` | 48 | `-$1,725` | `+$407` | `+$2,133` | `+180.70%` | `13.33%` | Return-conditioned conclusion: - the effect becomes visible with more triggers when the dollar threshold is lowered to `$50-$150` **and** the prior win is also at least `+0.75%` - the best immediate-next-trade delta in this grid was around `$150+` and `+0.75%`: `63` next trades, SHORT `-$2,081.81`, LONG `+$343.94`, delta `+$2,425.75` - the original `$400+`, high-leverage trigger remains good but is not the only viable threshold; it is the cleaner high-tail version Two-trade horizon: | Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD | |---|---:|---:|---:|---:|---:|---:| | `$300+`, `lev >= 8.5` | 115 | `-$3,201` | `+$511` | `+$3,712` | `+168.52%` | `14.27%` | | `$400+`, `lev >= 8.5` | 91 | `-$2,689` | `+$556` | `+$3,245` | `+175.26%` | `13.71%` | | `$500+`, `lev >= 8.5` | 75 | `-$2,237` | `+$509` | `+$2,747` | `+167.53%` | `14.71%` | Two-trade conclusion: - the high-leverage `$300-$500` zone supports a two-trade exhaustion rebound more strongly than the original one-trade-only statement - the best two-trade variant in this fixed grid was `$300+`, `lev >= 8.5`, next two trades: delta `+$3,712`, estimated LONG PnL `+$511` - the five-trade horizon should not be traded LONG; it is only a damage-control / cooldown signal Reliability statement: The post-win overlay is more solid than initially stated. The robust form is not "after any win"; that is false. The robust form is: ```text after a sufficiently large realized short win, especially a high-return or high-leverage win, the next 1-2 short-engine opportunities are often contaminated by rebound risk and can be improved by LONG flip or, at minimum, cooldown/abstain. ``` The strongest candidates for shadow/live-probe research are: - immediate next trade after `$100-$200` win **and** prior return `>= +0.75%` - immediate next trade after `$400+` win, especially `lev >= 8.5` - next two trades after `$300-$500` win with `lev >= 8.5` Guardrail: The overlay should not optimize on WR. LONG WR remains lower than SHORT WR on many triggered subsets. The edge is payoff asymmetry / loss-tail avoidance: short wins become smaller or disappear after the exhaustion event, while short losses on the next trade(s) become expensive. ### Candidate codified overlay rule and EFSM Terminology: - **EFSM** means **Execution FSM** - refer to this component as the post-win **EFSM**, not merely a generic "state machine" Candidate rule proposed after the lowered-threshold sweep: ```text after a completed BLUE SHORT trade: if pnl_abs > $397: tag next 1 trade as FLIP_LONG if pnl_abs > $397 and leverage > 8.6: tag next 2 trades as FLIP_LONG if 0 < pnl_abs < $250 and pnl_pct >= +0.75%: tag next 1 trade as FLIP_LONG after the armed slots are consumed: reset to SHORT ``` EFSM semantics: - this is a **slot-based Execution FSM**, not a persistent regime switch - each trigger arms an explicit number of future slots - each future entry consumes exactly one slot - when `slots_remaining == 0`, the state resets to SHORT - while slots are active, new triggers are ignored by default - a flipped LONG trade outcome is not allowed to re-arm the overlay - this prevents the reset bug where one flipped trade recursively arms the next and converts a bounded rebound probe into an unbounded side switch - the implementation supports arbitrary future slot counts, not only `1` and `2` Implementation location: - EFSM: `adaptive_exit/post_win_long_overlay.py` - canonical class names: `PostWinExecutionFSM`, `PostWinExecutionFSMConfig` - compatibility aliases: `PostWinLongOverlay`, `PostWinLongOverlayConfig` - tests: `prod/tests/test_post_win_long_overlay.py` Focused test coverage: - `$397+` non-high-leverage win arms one slot - `$397+` and `lev > 8.6` arms two slots - `< $250` and `pnl_pct >= +0.75%` arms one slot - active arms consume deterministically and reset to SHORT - re-arm attempts while active are ignored - flipped LONG outcomes cannot re-arm - optional TTL expiry works - future `3+` slot rules work Focused verification: ```text python -m pytest -o cache_dir=/tmp/pytest-cache-post-win-overlay \ prod/tests/test_post_win_long_overlay.py -q 7 passed ``` Exact candidate replay, no re-arm during active flip slots: - input: `1333` cleaned BLUE trades through `2026-05-08 14:34:57 UTC` - baseline short-only estimated PnL: `+$10,953.50` - candidate policy estimated PnL: `+$12,464.30` - marginal dollar delta: `+$1,510.80` - baseline max DD: `15.70%` - candidate max DD: `14.78%` - long-flipped trades: `160` - affected subset left SHORT: `-$2,415.46` - affected subset flipped LONG: `-$904.67` - affected subset marginal delta: `+$1,510.80` - triggers armed: - `small_dollar_high_return`: `77` - `big_win_high_lev`: `41` - `big_win`: `1` - slots consumed: - `small_dollar_high_return`: `77` - `big_win_high_lev`: `82` - `big_win`: `1` - consumed arms: `119` - dangling slots at end: `0` - ignored re-arm attempts while active: `20` Reset sensitivity: Allowing active flipped trades / active arms to re-arm is harmful: - unsafe recursive re-arm variant long flips: `183` - unsafe marginal delta: `-$5,425.32` - safe no-rearm marginal delta: `+$1,510.80` Therefore the no-recursive-rearm reset invariant is not optional. It is part of the edge definition. Compound-return caveat: - baseline short-only compound: `+164.89%` - candidate compound: `+107.26%` This is why the overlay must be treated as a dollar-tail / drawdown-control overlay first, not as a compounding optimizer. The current counterfactual uses same entry/exit skeleton and estimated flipped LONG PnL, so the next validation step must include actual LONG execution assumptions, long-side V7 behavior, and time-to-next-entry gating. Time dependency: The replay showed material timing dependence: | Delay from trigger to flipped entry | n | SHORT PnL | LONG PnL | Delta | |---|---:|---:|---:|---:| | `<=15m` | 19 | `+$2,765.51` | `-$3,062.37` | `-$5,827.88` | | `15-30m` | 67 | `-$3,588.76` | `+$2,381.96` | `+$5,970.72` | | `30-60m` | 40 | `-$882.57` | `-$104.33` | `+$778.24` | | `>60m` | 34 | `-$709.64` | `-$119.93` | `+$589.72` | This means the overlay may need a lower-bound delay, an upper-bound TTL, or market-state confirmation. The current EFSM already supports TTL; the exact timing gate remains research, not deployed doctrine. AdvancedExitManagerV7 / AlphaExitEngineV7 caveat: `AlphaExitEngineV7` is mechanically side-aware: - `side=0` means LONG - `side=1` means SHORT - PnL, MFE, MAE, trend direction, and adverse/favorable movement are signed by `ctx.side` However, V7 calibration is SHORT-lineage: - bounce model labels were trained on BLUE SHORT adverse-bar samples - pressure threshold `2.69` was selected on SHORT/GREEN-lineage replay - MAE/MFE concepts are symmetric in code but not guaranteed symmetric in fitted thresholds or bounce probabilities Before any live FLIP_LONG execution, V7 must be validated in one of these modes: - shadow-only LONG contexts using actual flipped LONG entries - conservative LONG-specific V7 threshold override - disable V7 live exits for overlay LONGs until enough shadow decisions show it does not prematurely cut the rebound edge The rule can be codified, but production wiring must keep the EFSM, side selection, and V7 exit policy explicitly separable. Sweep results: - best by compounded return: - mode: one-shot after win - `trigger_lev = 9.0` - `trade_min_lev = 0.0` - traded: `1324` - LONG trades: `222` - WR `50.91%` - compounded return `+61.93%` - max DD `19.36%` - PnL `-$257.03` - best by estimated dollars: - mode: one-shot after win - `trigger_lev = 2.0` - `trade_min_lev = 0.69` - traded: `1050` - LONG trades: `297` - WR `40.03%` - compounded return `+27.71%` - max DD `22.44%` - PnL `+$5,114.50` Both sweep optima still underperform the relevant short-only baselines. In particular, simply treating high leverage as a short-side quality filter is stronger than using high-leverage short wins as a broad long-switch trigger: - `lev >= 8.5`, short-only: PnL `+$12,193.65`, max DD `7.58%` - best long-switch dollar policy: PnL `+$5,114.50`, max DD `22.44%` Interpretation: - leverage does behave like conviction, but the first-order use is filtering / sizing, not side inversion - ordinary high-lev wins are too common to serve as a LONG regime switch - the previous post-outlier result survives only because it was much narrower: large dollar win, 9x, and immediate next trade - high-lev wins may still be useful as **features** in the dual-shadow / market-fingerprint layer: - `last_high_lev_short_win` - `last_high_lev_short_win_count` - `last_high_lev_short_win_pnl_abs` - `last_high_lev_short_win_return_pct` - `bars_since_high_lev_short_win` - `consecutive_high_lev_short_wins` Research conclusion: - do not implement the literal `lev > 0.70` long switch - do preserve leverage as a strong conviction feature - do keep the narrower post-outlier one-trade long probe in the research queue - the strongest immediate operational lesson is that low-leverage trades may be unnecessary, while high-leverage shorts remain the cleaner expression ## AlphaExitEngineV7 LONG calibration replay Date: `2026-05-08` Scope: - system: BLUE only - exit engine: `AlphaExitEngineV7` - harness: `adaptive_exit/calibrate_v7_long_from_journal.py` - source data: ClickHouse `dolphin.v7_decision_events` - source rows: `6,812` - reconstructed BLUE V7-tracked paths: `97` - path side in source journal: SHORT - replay side for calibration: synthetic LONG (`side=0`) - fee assumption: `4 bps` - natural exit comparator: final logged decision-row price for the same path - V7 exit comparator: first replayed V7 `EXIT` on the same price path - bounce model: disabled for this replay by intentionally using a missing model path, because the current bounce model is trained on BLUE SHORT adverse-bar samples and should not be treated as a validated LONG probability model This is a LONG-exit calibration proxy, not proof from exchange-filled LONG trades. It answers a narrower question: if the post-win EFSM had flipped a trade LONG on price paths that BLUE V7 actually observed, would a LONG-side V7 cut/exit surface have improved or harmed the synthetic LONG outcome versus holding to the path's natural end? ### Original V7 SHORT calibration pattern The original V7 calibration was a pressure-threshold sweep over live shadow decisions. V7 computes: ```text exit_pressure = clamp(directional_term + risk_term, -3.0, +3.0) ``` Then: ```text if exit_pressure > 2.69: EXIT elif exit_pressure > 1.0: RETRACT elif exit_pressure < -0.5 and pnl_pct > 0: EXTEND else: HOLD ``` The documented SHORT lineage was: | Pressure threshold | Fires | Result | |---:|---:|---:| | `2.00` | `22/24` | `+$439`, ROI `+1.67%` | | `2.35` | `17/24` | `+$891`, ROI `+3.38%` | | `2.60` | `17/24` | `+$891`, ROI `+3.38%` | | `3.00` | `14/24` | `+$796`, ROI `+3.02%` | | base/no V7 | n/a | `+$784`, ROI `+2.98%` | The deployed threshold `2.69` was chosen as the high end of the useful `2.35-2.70` band so V7 stayed closer to base behavior and avoided cutting winners on transient pressure. ### Threshold surface now explicit `AlphaExitEngineV7` now accepts an optional per-engine `AlphaExitV7Config`. Defaults preserve the deployed SHORT-calibrated behavior. This lets BLUE instantiate separate SHORT and LONG V7 engines later without global mutation. V7-specific configurable fields: | Config field | Default | Meaning | |---|---:|---| | `rvol_w15` | `0.50` | realized-vol composite weight for 15-bar volatility | | `rvol_w30` | `0.30` | realized-vol composite weight for 30-bar volatility | | `rvol_w50` | `0.20` | realized-vol composite weight for 50-bar volatility | | `rvol_floor` | `0.000001` | minimum realized-vol denominator | | `mae_tier1_k` | `3.5` | MAE tier-1 multiplier on `rv_comp` | | `mae_tier2_k` | `7.0` | MAE tier-2 multiplier on `rv_comp` | | `mae_tier3_k` | `12.0` | MAE tier-3 multiplier on `rv_comp` | | `mae_tier1_floor` | `0.005` | MAE tier-1 absolute floor | | `mae_tier2_floor` | `0.012` | MAE tier-2 absolute floor | | `mae_tier3_floor` | `0.025` | MAE tier-3 absolute floor | | `mae_tier1_risk` | `0.5` | pressure contribution once tier 1 is breached | | `mae_tier2_risk` | `0.8` | pressure contribution once tier 2 is breached | | `mae_tier3_risk` | `1.2` | pressure contribution once tier 3 is breached | | `mae_accel_min_bars` | `3` | minimum bars before adverse-acceleration gate can fire | | `mae_accel_peak_floor` | `0.003` | adverse peak floor for MAE acceleration risk | | `mae_accel_risk` | `0.6` | pressure contribution for MAE acceleration | | `mae_recovery_peak_floor` | `0.004` | adverse peak floor for failed-recovery gate | | `mae_recovery_prev_min` | `0.25` | prior recovery ratio required before snapback risk | | `mae_recovery_snapback_max` | `0.10` | recovery ratio below which recovery is treated as failed | | `mae_recovery_risk` | `1.0` | pressure contribution for failed recovery | | `mae_late_floor` | `0.003` | MAE required before late adverse ramp applies | | `mae_late_start_frac` | `0.60` | bars-held fraction where late adverse ramp starts | | `mae_late_risk_max` | `0.4` | maximum late adverse pressure contribution | | `max_hold_ref_mult_3m` | `3.0` | V7 internal max-hold reference multiplier | | `mfe_slope_peak_floor` | `0.01` | peak favorable floor for convexity slope break | | `mfe_convexity_decay_exit` | `0.35` | decay ratio for hard MFE giveback pressure | | `mfe_convexity_decay_soft` | `0.20` | decay ratio for soft MFE giveback pressure | | `mfe_convexity_exit_risk` | `1.5` | pressure contribution for hard MFE giveback | | `mfe_convexity_soft_risk` | `0.3` | pressure contribution for soft MFE giveback | | `mfe_accel_floor` | `-0.00001` | MFE acceleration floor for adverse convexity | | `mfe_accel_peak_floor` | `0.005` | peak favorable floor for MFE acceleration risk | | `mfe_accel_risk` | `0.2` | pressure contribution for MFE acceleration risk | | `bounce_dir_w` | `0.15` | bounce score directional-term weight | | `bounce_risk_w` | `0.35` | bounce risk-term weight | | `bounce_rv_safe_floor` | `0.00001` | bounce feature volatility denominator floor | | `exit_pressure_threshold` | `2.69` | live `EXIT` threshold | | `retract_pressure_threshold` | `1.0` | `RETRACT` threshold | | `extend_pressure_threshold` | `-0.5` | profitable `EXTEND` threshold | | `pressure_min` | `-3.0` | pressure clamp lower bound | | `pressure_max` | `3.0` | pressure clamp upper bound | Inherited V6 weight priors remain configurable through the existing `WeightAdapter`/`WeightPriors` seam. The new config is specifically for V7 threshold/gate surfaces and is init-time/per-engine configurable. ### LONG replay results Baseline synthetic LONG natural exit across the 97 paths: - natural PnL: `-$328.84` - natural WR: `59.79%` - natural compound: `+3.50%` - natural max DD: `2.28%` The dollar PnL and compound can diverge because path notionals differ. For this exit calibration, dollar PnL is the more relevant metric because BLUE sizing is not uniform. Top tested surfaces: | Candidate | V7 PnL | Delta vs natural | Exits | Exit rate | V7 WR | V7 max DD | |---|---:|---:|---:|---:|---:|---:| | `mfe_risk_scale_0.5` | `+$205.32` | `+$534.15` | `36` | `37.11%` | `50.52%` | `1.69%` | | `mfe_risk_scale_0.75` | `+$205.32` | `+$534.15` | `36` | `37.11%` | `50.52%` | `1.69%` | | `combo_p1.7_mae0.75` | `+$47.24` | `+$376.08` | `51` | `52.58%` | `47.42%` | `1.55%` | | `exit_p1.7` | `+$36.88` | `+$365.72` | `51` | `52.58%` | `47.42%` | `1.53%` | | `exit_p2.0` | `+$19.68` | `+$348.52` | `41` | `42.27%` | `49.48%` | `1.53%` | | `short_default` / `exit_p2.69` | `+$1.43` | `+$330.26` | `38` | `39.18%` | `49.48%` | `1.81%` | | `exit_p3.0` | `-$328.84` | `$0.00` | `0` | `0.00%` | `59.79%` | `2.28%` | Interpretation: - The deployed SHORT default is not mechanically broken for LONG. It improved synthetic LONG dollar outcome by `+$330.26` versus natural exit on the 97 replayed paths. - The best tested LONG proxy did not come from lowering the pressure threshold. It came from reducing MFE giveback/convexity pressure contribution (`mfe_risk_scale_0.5` or `0.75`). - Aggressively lowering `exit_pressure_threshold` to `1.4` over-fires: `78/97` exits, V7 PnL `-$11.78`, and many negative deltas. That resembles the original SHORT calibration failure at `2.0`: pressure that is too sensitive cuts too much transient noise. - A moderate pressure threshold around `1.7-2.0` is useful, but still inferior to leaving pressure at `2.69` and reducing MFE-risk contributions in this proxy. Recommended LONG overlay calibration candidate for shadow: ```python AlphaExitV7Config( mfe_convexity_exit_risk=0.75, mfe_convexity_soft_risk=0.15, mfe_accel_risk=0.10, ) ``` This is the `mfe_risk_scale_0.5` surface. It keeps: - `exit_pressure_threshold = 2.69` - all MAE vol-normalized loss-cut thresholds unchanged - pressure clamp unchanged - bounce disabled or neutral until a LONG-trained bounce model exists Why this candidate is preferable to simply lowering `exit_pressure_threshold`: - it preserved the useful loss-cut behavior while avoiding broad pressure over-firing - it improved dollar PnL more than all pressure-threshold sweeps tested - it left MAE protection intact, which matters if the flipped LONG thesis is wrong and the asset continues down - it respects that the post-win EFSM edge is a rebound/cooldown edge, so the exit manager should not over-penalize ordinary post-entry MFE shape Do not deploy this LONG config live yet. It should first be run in shadow on actual EFSM-flipped candidate LONG contexts, because this replay uses SHORT entries inverted to LONG and not real LONG fills. ### Regression and safety notes Implemented code seams: - `nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.py` defines `AlphaExitV7Config` - default `AlphaExitEngineV7()` behavior remains the SHORT-calibrated config - a LONG-specific engine can be instantiated with `AlphaExitEngineV7(config=...)` - the calibration harness writes full results to `/tmp/v7_long_calibration.json` Tests added: - default config equals the legacy SHORT threshold surface - custom config is per-instance and does not mutate the default engine - V7 remains mechanically side-aware for LONG and SHORT PnL/MFE/MAE - BLUE live V7 provider wiring still records journal decisions and uses OB signal input - EFSM reset/no-recursive-rearm tests remain separate from V7 exit calibration Research caveats: - only `97` V7-tracked BLUE paths existed in the current decision journal - this is enough to reject obviously bad LONG exit settings, but not enough to canonize a live LONG exit policy - bounce must remain neutral for LONG until trained or validated on LONG samples - V7 `max_hold_ref_mult_3m` still uses an internal time reference rather than the orchestrator's effective max hold; the system bible already tracks this as a V7 TODO/bug because it can make adverse-ramp pressure too early