1306 lines
47 KiB
Markdown
1306 lines
47 KiB
Markdown
# LONG Deterministic Rule Research
|
|
|
|
Date: 2026-05-07
|
|
|
|
## Goal
|
|
|
|
Find the simplest deterministic long-side market rule, using primarily Dolphin
|
|
NG eigendata, that behaves like the original short Alpha Engine rule in spirit:
|
|
|
|
- few moving parts
|
|
- market-structural
|
|
- explainable in one breath
|
|
- reliable enough to serve as a basal gate before asset selection and later
|
|
overlays
|
|
|
|
This note is explicitly **not** about a fitted long model.
|
|
|
|
## Data source
|
|
|
|
The analysis uses the raw daily scan cache summarized by:
|
|
|
|
- `adaptive_exit/characterize_long_signals.py`
|
|
- `/mnt/dolphin_training/long_signal_research/long_signal_scan_summary_h24.parquet`
|
|
- `/mnt/dolphin_training/long_signal_research/long_signal_characterization_report.json`
|
|
|
|
Only eigendata and scan-price-derived outcomes are used here:
|
|
|
|
- `instability_50`
|
|
- `v50/v150/v300/v750_lambda_max_velocity`
|
|
- `vel_div`
|
|
- `vel_div` lag / delta terms
|
|
|
|
No ExF, EsoF, or OBF are required for the core finding.
|
|
|
|
## What does **not** work as the basal long rule
|
|
|
|
The obvious mirror thesis,
|
|
|
|
- `vel_div > 0.01`
|
|
|
|
is too weak to be the basal long edge.
|
|
|
|
Recent HQ slice (`2025-12-31` onward):
|
|
|
|
- support: `39.65%`
|
|
- `strong_long` lift: `1.15x`
|
|
- `broad_long` lift: `1.22x`
|
|
|
|
That is not useless, but it is not elegant enough nor selective enough to be
|
|
the long analogue of `vel_div < -0.02`.
|
|
|
|
## Strongest deterministic shape
|
|
|
|
The long side shows up most clearly as a **stressed unwind / squeeze** regime,
|
|
not as a generic bullish breakout regime.
|
|
|
|
### Candidate primary deterministic rule
|
|
|
|
```text
|
|
LONG_REGIME if
|
|
instability_50 >= 20.5
|
|
and v300_lambda_max_velocity < 0
|
|
and v750_lambda_max_velocity < 0
|
|
```
|
|
|
|
Interpretation:
|
|
|
|
- `instability_50 >= 20.5`: the market is structurally stressed
|
|
- `v300 < 0` and `v750 < 0`: the slower eigenspace is still negative / damaged
|
|
- together: this is a high-stress unwind state where long opportunities tend to
|
|
appear as reversals / squeezes on the same manifold that produces short
|
|
dislocations
|
|
|
|
### Why `20.5`
|
|
|
|
`20.5` is the rounded recent-HQ `instability_50` 90th-percentile threshold
|
|
(`20.546996...`). It is the most practical fixed threshold found in the
|
|
recent-era characterization.
|
|
|
|
## Empirical support
|
|
|
|
### Recent HQ (`2025-12-31` onward)
|
|
|
|
Base rates:
|
|
|
|
- `strong_long`: `0.1648`
|
|
- `broad_long`: `0.1367`
|
|
|
|
Rule:
|
|
|
|
- support: `6356` rows (`3.58%`)
|
|
- `strong_long`: `0.3409` (`2.07x` lift)
|
|
- `broad_long`: `0.3538` (`2.59x` lift)
|
|
|
|
### Full history
|
|
|
|
Base rates:
|
|
|
|
- `strong_long`: `0.2603`
|
|
- `broad_long`: `0.2472`
|
|
|
|
Rule:
|
|
|
|
- support: `300,728` rows (`12.59%`)
|
|
- `strong_long`: `0.3330` (`1.28x` lift)
|
|
- `broad_long`: `0.3375` (`1.37x` lift)
|
|
|
|
## Simpler fallback
|
|
|
|
If maximum elegance is preferred over extra selectivity, the one-factor
|
|
fallback is:
|
|
|
|
```text
|
|
LONG_REGIME_SIMPLE if instability_50 >= 20.5
|
|
```
|
|
|
|
Recent HQ:
|
|
|
|
- support: `10.10%`
|
|
- `strong_long`: `0.3297` (`2.00x` lift)
|
|
- `broad_long`: `0.3420` (`2.50x` lift)
|
|
|
|
This is surprisingly strong for a one-variable rule. It is the closest thing
|
|
found to a pure long-side analogue of the short `vel_div < -0.02` gate.
|
|
|
|
Tradeoff:
|
|
|
|
- simpler
|
|
- broader
|
|
- slightly less selective than adding `v300 < 0` and `v750 < 0`
|
|
|
|
## Optional stricter confirmation
|
|
|
|
If later tuning wants more explicit “healing after stress” confirmation, the
|
|
strict variant is:
|
|
|
|
```text
|
|
LONG_REGIME_STRICT if
|
|
instability_50 >= 20.5
|
|
and vel_div_lag6 < -0.03
|
|
and vel_div_delta6 > 0.02
|
|
```
|
|
|
|
This is directionally sensible, but it is not materially better than the
|
|
`instability_50 + v300 + v750` rule, so it should be treated as an optional
|
|
refinement, not the basal rule.
|
|
|
|
## Monthly sanity check
|
|
|
|
For the candidate primary rule (`instability_50 >= 20.5 && v300 < 0 && v750 < 0`)
|
|
in the recent HQ window:
|
|
|
|
- `2026-01`: `strong_long = 0.348`
|
|
- `2026-02`: `strong_long = 0.344`
|
|
- `2026-03`: `strong_long = 0.312`
|
|
|
|
The monthly base rates for the same period were:
|
|
|
|
- `2026-01`: `0.289`
|
|
- `2026-02`: `0.271`
|
|
- `2026-03`: `0.068`
|
|
|
|
So even into the weak March tape, the rule remains elevated relative to base.
|
|
|
|
## Practical interpretation
|
|
|
|
This should be viewed as a **market-state gate**, not a complete trade engine.
|
|
|
|
It says:
|
|
|
|
- “the market is in the sort of stressed, damaged regime where long squeeze /
|
|
unwind opportunities become meaningfully more likely”
|
|
|
|
It does **not** by itself say:
|
|
|
|
- which asset is the best expression
|
|
- how to size
|
|
- how to exit
|
|
|
|
That is where the next layers belong:
|
|
|
|
- deterministic or learned asset selection
|
|
- OBF / ARS / bounce overlays
|
|
- TP / MAX_HOLD policy
|
|
|
|
## Recommendation
|
|
|
|
If a single deterministic long gate must be named now, use:
|
|
|
|
```text
|
|
LONG_REGIME if instability_50 >= 20.5 and v300 < 0 and v750 < 0
|
|
```
|
|
|
|
If maximum simplicity is the priority, use:
|
|
|
|
```text
|
|
LONG_REGIME_SIMPLE if instability_50 >= 20.5
|
|
```
|
|
|
|
And explicitly do **not** promote `vel_div > 0.01` as the basal long rule.
|
|
|
|
## Deferred analysis idea: dual-shadow regime sampler
|
|
|
|
This is a **later analysis / control-layer research note**, not a live-rule
|
|
recommendation.
|
|
|
|
One plausible way to sample the market in real time without committing the full
|
|
system immediately is a very lightweight **dual-shadow engine**:
|
|
|
|
- Shadow A: the basal SHORT engine (`vel_div < -0.02` Alpha Engine posture)
|
|
- Shadow B: the basal LONG engine (currently the older negative-`vel_div`
|
|
mean-reversion LONG posture is the best simple candidate)
|
|
|
|
The intent is not merely paper PnL logging. It is to use live, recent
|
|
sample-trade outcomes as a **micro-regime probe**:
|
|
|
|
- if SHORT shadow performance degrades while LONG shadow performance improves,
|
|
the tape may have rotated into a LONG-favorable regime
|
|
- if LONG degrades while SHORT improves, the inverse may be true
|
|
- if both are performing acceptably, the tape may be permissive / broad enough
|
|
that either side can express edge
|
|
- if both are failing, the tape is likely choppy / non-coherent and abstention
|
|
becomes a first-class candidate
|
|
|
|
This should be implemented, if ever pursued, as:
|
|
|
|
- very fast
|
|
- very lightweight
|
|
- explicitly shadow-only at first
|
|
- based on small, recent sample trades rather than a heavy fitted model
|
|
|
|
Longer-term, the entire shadow stream can itself become training data:
|
|
|
|
- market fingerprints at shadow-entry time
|
|
- concurrent SHORT-shadow and LONG-shadow outcomes
|
|
- relative WR / ROI-per-trade / drawdown / time-to-win asymmetries
|
|
|
|
That would allow a later learner to predict or simplify the regime switcher.
|
|
But even before ML, the dual-shadow process may already serve as a useful
|
|
real-time market-sampling / regime-detection mechanism.
|
|
|
|
## Dual-shadow persistence characterization
|
|
|
|
This section records the first persistence pass over extant trades. The goal
|
|
was not to prove a full regime-switch system, but to test whether the observed
|
|
short-loss streaks are durable enough to justify a regime-favorableness probe.
|
|
|
|
Important caveat:
|
|
|
|
- the live SHORT series and the replay LONG series are on different date spans
|
|
- this is therefore a side-specific persistence study, not a same-bar paired
|
|
dominance study
|
|
- the numbers below are still useful for run-length and hysteresis design
|
|
|
|
### Live SHORT stream
|
|
|
|
From the current BLUE trader log:
|
|
|
|
- trades: `234`
|
|
- win rate: `44.44%`
|
|
- mean `pnl_pct`: `+0.000506`
|
|
- median `pnl_pct`: `-0.000234`
|
|
- average win streak: `1.65 trades`
|
|
- average loss streak: `2.03 trades`
|
|
- `P(win -> win) = 0.394`
|
|
- `P(loss -> loss) = 0.512`
|
|
- average positive-day run: `1.5 days`
|
|
- average negative-day run: `1.5 days`
|
|
|
|
Interpretation:
|
|
|
|
- short failures do cluster
|
|
- the cluster is real enough to notice
|
|
- but it is only mildly persistent
|
|
- by itself, it is not strong enough to justify a raw ping-pong switch
|
|
|
|
### Basal LONG shadow, old mirror posture
|
|
|
|
Using the recent bullish-month replay and the single comparable `10-bar /
|
|
worst_10bar` configuration:
|
|
|
|
- trades: `2,243`
|
|
- win rate: `48.33%`
|
|
- mean `pnl_pct`: `+0.000320`
|
|
- median `pnl_pct`: `-0.000400`
|
|
- average win streak: `1.93 trades`
|
|
- average loss streak: `2.07 trades`
|
|
- `P(win -> win) = 0.483`
|
|
- `P(loss -> loss) = 0.517`
|
|
- average positive-day run: `3.0 days`
|
|
- average negative-day run: `1.86 days`
|
|
|
|
Interpretation:
|
|
|
|
- this is the clearest durable long-favorable candidate seen so far
|
|
- the multi-day positive run length is materially better than the live short
|
|
stream
|
|
- this supports a long-favorable regime probe, but not an unconditional flip
|
|
|
|
### Basal LONG shadow, new stressed-unwind posture
|
|
|
|
Same replay setup:
|
|
|
|
- trades: `569`
|
|
- win rate: `50.44%`
|
|
- mean `pnl_pct`: `-0.000078`
|
|
- median `pnl_pct`: `+0.000068`
|
|
- average win streak: `2.24 trades`
|
|
- average loss streak: `2.20 trades`
|
|
- `P(win -> win) = 0.556`
|
|
- `P(loss -> loss) = 0.546`
|
|
- average positive-day run: `1.36 days`
|
|
- average negative-day run: `1.18 days`
|
|
|
|
Interpretation:
|
|
|
|
- the new long posture has decent local persistence
|
|
- but it is more fragile than the mirror-long posture as a regime switch
|
|
- it does not yet justify itself as the primary flip trigger
|
|
|
|
### Conclusion for regime switching
|
|
|
|
The data support a **smoothed regime-favorableness detector**, not a raw
|
|
flip-on-first-loss system.
|
|
|
|
Practical reading:
|
|
|
|
- short-loss streak persistence is real but modest
|
|
- long-favorable states exist and can persist
|
|
- persistence is on the order of a few trades, not a dramatic regime lock
|
|
- the correct implementation is a shadow score with hysteresis and abstain
|
|
logic, not a hard immediate SHORT/LONG switch
|
|
|
|
Suggested rule shape for later analysis:
|
|
|
|
- compute rolling shadow scores for SHORT and LONG
|
|
- use persistence thresholds before flipping
|
|
- require stronger evidence to reverse than to stay put
|
|
- abstain when both shadows are weak or both are losing
|
|
|
|
This is enough to justify the next engineering step:
|
|
|
|
- live dual-shadow logging on the same bars
|
|
- market-fingerprint tagging of each shadow entry
|
|
- later ML over shadow outcomes if the deterministic layer proves stable
|
|
|
|
## Rolling flip-worthiness test
|
|
|
|
To make the side-switch question stricter, the recent live short slice was
|
|
retested with a `5-trade` rolling shadow-delta proxy:
|
|
|
|
- short shadow return = actual live short `pnl_pct`
|
|
- long shadow return = counterfactual `-pnl_pct - fee`
|
|
- rolling delta = rolling mean of `(long_shadow - short_shadow)`
|
|
|
|
Recent 3-day slice (`2026-05-04` to `2026-05-06`):
|
|
|
|
- trades: `168`
|
|
- short actual WR: `39.88%`
|
|
- short actual compounded return: `+10.02%`
|
|
- long counterfactual WR: `47.62%`
|
|
- long counterfactual compounded return: `-16.92%`
|
|
- flip-to-long signals from the `5-trade` rolling delta: `68`
|
|
- flip-to-short signals from the `5-trade` rolling delta: `79`
|
|
|
|
Interpretation:
|
|
|
|
- the rolling delta does detect alternating regime pockets
|
|
- but it does so often enough that a raw flip would be too twitchy
|
|
- on the most recent 30 live trades, the regime buckets were:
|
|
- `13` long-favorable
|
|
- `7` short-favorable
|
|
- `10` neutral
|
|
- the long-favorable bucket had positive expected PnL, but the short-favorable
|
|
bucket was also positive and slightly stronger
|
|
|
|
The important point is that the signal is not “switch now on first loss.”
|
|
It is:
|
|
|
|
- keep a smoothed side-dominance score
|
|
- require persistence before flipping
|
|
- use hysteresis
|
|
- abstain when the shadow spread is weak or oscillatory
|
|
|
|
So the stricter test reinforces the earlier conclusion:
|
|
|
|
- there is enough structure to justify a regime-favorableness detector
|
|
- there is not yet enough stability to justify a raw mechanical flip
|
|
- the right next step is live dual-shadow logging on the same bars, then
|
|
threshold and persistence calibration on that shared stream
|
|
|
|
## Flip-after-loss counterfactual
|
|
|
|
The actual live short ledger was also replayed under a simple finite-state
|
|
side-switch rule:
|
|
|
|
- start `SHORT`
|
|
- if the current side loses `N` trades in a row, flip to the other side
|
|
- keep applying the same rule across the whole trade sequence
|
|
|
|
This is the cleanest way to test the idea “short losses are the long cue.”
|
|
|
|
On the current `234`-trade live ledger:
|
|
|
|
- always short: WR `44.44%`, compounded return `+11.35%`, max DD `5.71%`
|
|
- always long: WR `44.87%`, compounded return `-20.13%`, max DD `23.09%`
|
|
|
|
Threshold sweep:
|
|
|
|
- `N=1`: WR `40.60%`, compounded return `+5.33%`, max DD `11.11%`, flips `139`
|
|
- `N=2`: WR `44.44%`, compounded return `-17.72%`, max DD `17.77%`, flips `43`
|
|
- `N=3`: WR `48.29%`, compounded return `+5.48%`, max DD `6.35%`, flips `13`
|
|
- `N=4`: WR `47.86%`, compounded return `+6.21%`, max DD `6.55%`, flips `7`
|
|
- `N=5`: WR `43.59%`, compounded return `+10.52%`, max DD `5.59%`, flips `5`
|
|
- `N=6`: WR `45.73%`, compounded return `+15.17%`, max DD `4.84%`, flips `3`
|
|
|
|
Interpretation:
|
|
|
|
- side switching can help
|
|
- it helps best when the flip threshold is fairly high
|
|
- the best observed threshold in this small grid was `N=6`
|
|
- low thresholds are too twitchy and can destroy the edge
|
|
|
|
So the practical conclusion is:
|
|
|
|
- a raw flip-on-first-loss rule is not justified
|
|
- a slower loss-cluster regime switcher is plausible
|
|
- the switcher must be hysteretic and persistence-gated
|
|
|
|
This is consistent with the earlier shadow-score recommendation and explains
|
|
why the observed “8 or 9 losses, then a couple wins” pattern can be useful
|
|
without being directly automatable at a low threshold.
|
|
|
|
## Condition-gated flip replay
|
|
|
|
I then reran the side-switch counterfactual with an additional gate:
|
|
|
|
- the current side must first hit `N` consecutive losses
|
|
- the opposite side must also satisfy its own deterministic long/short entry condition
|
|
- the replay uses the same 10-bar tape skeleton and the worst-10-bar asset expression
|
|
|
|
Two long theories were tested separately:
|
|
|
|
- **Old mirror-long**: `vel_div < -0.02` and cross-sectional 10-bar momentum `< 0`
|
|
- **New stressed-unwind long**: `instability_50 >= 20.5` and `v300 < 0` and `v750 < 0`
|
|
|
|
Results on the long research windows:
|
|
|
|
- old mirror-long becomes marginally usable only at high thresholds:
|
|
- `N=5`: WR `47.00%`, compounded return `+6.34%`, DD `46.23%`, flips `11`
|
|
- `N=6`: WR `46.52%`, compounded return `+28.34%`, DD `43.78%`, flips `5`
|
|
- the new stressed-unwind long does **not** survive this gate cleanly:
|
|
- `N=1..6`: compounded return stays negative, with severe drawdown
|
|
|
|
Interpretation:
|
|
|
|
- the condition gate does not rescue the new long theory
|
|
- it does preserve the old mirror-long as a late, low-frequency fallback
|
|
- the market still looks too unstable for a low-threshold flip rule
|
|
- if we keep this path, it should be a smoothed regime sampler, not an immediate switcher
|
|
|
|
Report:
|
|
|
|
- [`flip_on_loss_condition_gate_report.md`](</mnt/dolphinng5_predict/run_logs/flip_on_loss_condition_gate_report.md>)
|
|
|
|
## Full-history condition-gated replay
|
|
|
|
I then ran the same condition-gated flip simulator across the entire
|
|
available price tape:
|
|
|
|
- root: `/mnt/dolphin_training/share_offload/vbt_cache_klines`
|
|
- rows: `2,553,401`
|
|
- span: `2021-06-15 00:01:00+00:00 -> 2026-03-18 18:16:40.041456896+00:00`
|
|
|
|
This is the hardest and most useful stress test because it removes the
|
|
recent-slice bias entirely.
|
|
|
|
Results:
|
|
|
|
- **old mirror-long**
|
|
- `N=1..6` win rate range: `44.95% -> 46.60%`
|
|
- best mean PnL at `N=6`: `-0.000163` per trade
|
|
- best threshold still compounds to `-100%` over the full archive
|
|
- **new stressed-unwind long**
|
|
- `N=1..6` win rate range: `44.16% -> 46.86%`
|
|
- best mean PnL at `N=6`: `-0.000218` per trade
|
|
- best threshold also compounds to `-100%`
|
|
|
|
Interpretation:
|
|
|
|
- the condition gate does not rescue either long theory at full-archive scale
|
|
- the old mirror-long is still the stronger of the two, but only marginally
|
|
- the long-side edge, if it exists, is too weak or too regime-dependent to
|
|
survive this archive-wide flip rule without additional filtering
|
|
- the full-tape result is a warning against over-trusting the favorable
|
|
recent-month slices
|
|
|
|
Report:
|
|
|
|
- [`flip_on_loss_condition_gate_stream_full_report.md`](</mnt/dolphinng5_predict/run_logs/flip_on_loss_condition_gate_stream_full_report.md>)
|
|
|
|
## Post-outlier-short-win long-flip probe
|
|
|
|
Motivation: the May 8 live footer showed a familiar-looking pattern:
|
|
|
|
- large 9x short win, e.g. `ALGOUSDT` `+$466` or `VETUSDT` `+$574`
|
|
- immediately followed by a somewhat larger-than-normal short loss, e.g.
|
|
`DASHUSDT -$191` or `STXUSDT -$54`
|
|
|
|
The question was whether this is a real post-outlier rebound signature:
|
|
|
|
```text
|
|
after a very large short win,
|
|
should the next trade, or next few trades, be treated as LONG candidates?
|
|
```
|
|
|
|
Dataset and hygiene:
|
|
|
|
- source: BLUE only
|
|
- ClickHouse `dolphin.trade_events`: `1305` rows, `1296` unique trade IDs
|
|
- trader logs: `1712` exit rows, `1092` unique trade IDs
|
|
- merged near-duplicate-cleaned sequence: `1609` unique trade IDs
|
|
- analysis subset after excluding hibernate / subday ACB exits: `1321` trades
|
|
- span: `2026-03-31 01:10:34 UTC` to `2026-05-08 13:26:06 UTC`
|
|
|
|
The log and warehouse streams overlap but do not have perfectly identical
|
|
timestamps, so the analysis de-duplicates by trade id where possible and by
|
|
near-time / asset / reason / realized PnL where the same exit was written by
|
|
both paths. This matters because a naive merge double-counts many recent exits.
|
|
|
|
Counterfactual method:
|
|
|
|
- keep the same entry/exit skeleton
|
|
- actual side is the live BLUE short
|
|
- counterfactual long return is approximated as `-short_return - 4 bps`
|
|
- this is not a separately selected long engine; it only tests whether the
|
|
immediate post-win tape direction would have favored the other side
|
|
|
|
Baseline over the cleaned sequence:
|
|
|
|
- always short: `1321` trades, WR `55.79%`, mean return/trade `+0.0781%`,
|
|
compounded return `+166.36%`, max DD `15.70%`
|
|
- always long on the same skeleton: WR `38.46%`, mean return/trade `-0.1181%`,
|
|
compounded return `-80.08%`, max DD `80.48%`
|
|
|
|
So the full ledger does **not** support a broad long flip. The question only
|
|
survives as a narrow post-outlier condition.
|
|
|
|
Primary post-outlier trigger:
|
|
|
|
```text
|
|
trigger if prior trade:
|
|
pnl_abs >= $400
|
|
leverage >= 8.5x
|
|
pnl_pct >= +0.50%
|
|
```
|
|
|
|
Immediate next-trade result:
|
|
|
|
- triggers: `47`
|
|
- next trades affected: `47`
|
|
- actual next short subset: WR `53.19%`, mean return `-0.0821%`,
|
|
compounded return `-4.05%`, realized PnL `-$1,725.40`
|
|
- flipped-to-long subset: WR `40.43%`, mean return `+0.0421%`,
|
|
compounded return `+1.72%`, estimated PnL `+$409.47`
|
|
- estimated dollar delta: `+$2,134.88`
|
|
- whole-sequence policy if only those next trades are flipped:
|
|
compounded return improves from `+166.36%` to `+182.38%`
|
|
and max DD improves from `15.70%` to `13.33%`
|
|
|
|
The stricter trigger `pnl_abs >= $400`, `leverage >= 8.5x`,
|
|
`pnl_pct >= +0.95%` is similar:
|
|
|
|
- triggers: `46`
|
|
- actual next short subset: `-$1,534.21`
|
|
- flipped-to-long estimate: `+$276.64`
|
|
- estimated dollar delta: `+$1,810.85`
|
|
- whole-sequence compounded return: `+180.91%`
|
|
|
|
The effect is strongest on the immediately following trade. It decays quickly:
|
|
|
|
- next `2` trades after the primary trigger: affected `91`, actual `-$2,689.16`,
|
|
flipped estimate `+$555.98`, dollar delta `+$3,245.15`
|
|
- next `3` trades: affected `134`, actual `-$2,357.77`, flipped estimate
|
|
`-$588.02`, dollar delta still positive because the flip loses less
|
|
- next `5` trades: benefit becomes materially less clean
|
|
|
|
Examples from the live tail:
|
|
|
|
- `ALGOUSDT` `2026-05-08 09:55 UTC`, `+466.34`, `9x`, `+0.929%`
|
|
- next trade `DASHUSDT`: actual short `-191.19`; same-skeleton long would
|
|
have been directionally positive after fee
|
|
- `VETUSDT` `2026-05-08 12:37 UTC`, `+573.64`, `9x`, `+1.546%`
|
|
- next trade `STXUSDT`: actual short `-53.52`; same-skeleton long would
|
|
have been directionally positive after fee
|
|
- larger historic outlier `STXUSDT` `2026-05-05 20:29 UTC`, `+6796.86`,
|
|
`9x`, `+13.845%`
|
|
- the following trade was a small short loss, and the next several trades
|
|
were mixed rather than uniformly long-favorable
|
|
|
|
Interpretation:
|
|
|
|
- there is a real event-conditioned post-outlier rebound / exhaustion signal
|
|
- it is not a win-rate improvement; it is a dollar / drawdown improvement
|
|
- it should not be promoted as a general long engine
|
|
- it is best framed as a one-trade post-outlier **long probe** or short
|
|
cooldown candidate, not as a multi-trade regime flip
|
|
|
|
Relationship to the long-system research:
|
|
|
|
- this is different from both deterministic long theories already studied:
|
|
- old mirror-long: negative `vel_div` mean-reversion long
|
|
- new stressed-unwind long: high instability plus negative slow velocities
|
|
- the post-outlier signal is more local and path-conditioned:
|
|
- a violent short win likely means the chosen asset or local basket has
|
|
just completed an exhaustion leg
|
|
- the next trade may be more exposed to rebound / adverse short continuation
|
|
than to fresh downside continuation
|
|
- this should become a feature inside the dual-shadow side-selection sampler:
|
|
- `last_trade_was_outlier_short_win`
|
|
- `last_trade_leverage`
|
|
- `last_trade_realized_pnl_abs`
|
|
- `last_trade_return_pct`
|
|
- `bars_since_outlier_win`
|
|
- `same_asset_or_correlated_asset_followup`
|
|
|
|
Research conclusion:
|
|
|
|
- broad `SHORT -> LONG` inversion remains false on the full sequence
|
|
- immediate one-trade long probing after a large 9x short win is empirically
|
|
plausible and improved historical BLUE dollars in this cleaned replay
|
|
- the next test should condition this event trigger on the existing long gates
|
|
and market fingerprint state, rather than using it as a naked side switch
|
|
|
|
## Leverage-as-conviction win-probe sweep
|
|
|
|
Follow-up thesis:
|
|
|
|
```text
|
|
leverage is a conviction expression
|
|
|
|
if a high-conviction short probe wins:
|
|
make subsequent / next trades LONG
|
|
|
|
if leverage is below roughly 0.69:
|
|
possibly do not trade
|
|
```
|
|
|
|
The initial test used:
|
|
|
|
```text
|
|
trigger_lev = 0.70
|
|
trade_min_lev = 0.69
|
|
win = net PnL > 0
|
|
```
|
|
|
|
Two side-selection forms were tested:
|
|
|
|
- **persistent shadow probe**: the short engine continues to run as a shadow.
|
|
A high-lev short-shadow win turns the traded side LONG. A high-lev
|
|
short-shadow loss resets the traded side SHORT.
|
|
- **one-shot after win**: a high-lev short-shadow win arms only the next
|
|
eligible trade as LONG, then resets.
|
|
|
|
The test used the same cleaned BLUE sequence as the post-outlier study, updated
|
|
through `2026-05-08 13:40:04 UTC`:
|
|
|
|
- ClickHouse rows: `1307`
|
|
- ClickHouse unique trade IDs: `1298`
|
|
- trader-log exit rows: `1716`
|
|
- merged near-duplicate-cleaned trade IDs: `1612`
|
|
- analysis subset after excluding hibernate / subday ACB exits: `1324`
|
|
|
|
Baselines:
|
|
|
|
- always short: `1324` trades, WR `55.82%`, mean return/trade `+0.0784%`,
|
|
compounded return `+168.02%`, max DD `15.70%`, PnL `+$11,135.86`
|
|
- always long on the same skeleton: WR `38.44%`, compounded return `-80.23%`,
|
|
max DD `80.62%`, PnL `-$36,875.48`
|
|
- short-only with `trade_min_lev >= 0.69`: `1050` trades, compounded return
|
|
`+81.86%`, max DD `20.80%`, PnL `+$11,063.86`
|
|
- short-only with `trade_min_lev >= 5.0`: `565` trades, compounded return
|
|
`+88.08%`, max DD `8.94%`, PnL `+$11,980.01`
|
|
- short-only with `trade_min_lev >= 8.5`: `501` trades, compounded return
|
|
`+82.57%`, max DD `7.58%`, PnL `+$12,193.65`
|
|
|
|
Initial `0.70 / 0.69` thesis result:
|
|
|
|
- persistent shadow-probe switch:
|
|
- traded: `1050`
|
|
- LONG trades: `457`
|
|
- flips to LONG: `249`
|
|
- WR `37.08%`
|
|
- compounded return `-5.61%`
|
|
- max DD `26.60%`
|
|
- PnL `-$2,527.65`
|
|
- one-shot after high-lev win:
|
|
- traded: `1050`
|
|
- LONG trades: `455`
|
|
- flips to LONG: `456`
|
|
- WR `37.24%`
|
|
- compounded return `-3.56%`
|
|
- max DD `26.19%`
|
|
- PnL `-$2,113.83`
|
|
|
|
So the literal initial thesis fails. `0.70` is too low as a
|
|
side-switch trigger. It arms hundreds of LONG trades and turns a strong
|
|
short-led ledger into a slightly losing one.
|
|
|
|
Important evaluation frame:
|
|
|
|
The goal is **not** to find a LONG overlay that beats the whole short-only
|
|
engine by itself. The goal is to find a side-selection overlay that adds
|
|
marginal value only on the subset where it intervenes. The correct comparison
|
|
is therefore:
|
|
|
|
```text
|
|
overlay_delta =
|
|
pnl_if_intervened_long_on_triggered_trades
|
|
- pnl_if_original_short_was_left_unchanged_on_same_triggered_trades
|
|
```
|
|
|
|
The overlay is useful only if it satisfies all of the following:
|
|
|
|
- it has positive `overlay_delta` after fees and conservative slippage
|
|
- it reduces realized drawdown or loss clustering on the intervention subset
|
|
- it does not cut too many profitable short trades
|
|
- it remains positive across time splits, assets, and neighboring thresholds
|
|
- it has enough triggers to be statistically more than a single accident
|
|
|
|
Under that marginal-overlay framing, the broad leverage-win thesis still fails:
|
|
|
|
- persistent `0.70 / 0.69` switch delta vs same `lev >= 0.69` short-only
|
|
baseline: about `-$13,591.51`
|
|
- one-shot `0.70 / 0.69` switch delta vs same `lev >= 0.69` short-only
|
|
baseline: about `-$13,177.69`
|
|
- best swept dollar switch delta vs same `lev >= 0.69` short-only baseline:
|
|
about `-$5,949.36`
|
|
|
|
By contrast, the narrower post-outlier rule did show positive marginal overlay
|
|
value on its triggered subset:
|
|
|
|
- triggered next-trade cases: `47`
|
|
- leaving the next trade SHORT: PnL `-$1,725.40`
|
|
- flipping only that next trade LONG: PnL `+$409.47`
|
|
- marginal overlay delta: `+$2,134.87`
|
|
- whole-sequence drawdown improved from about `15.70%` to `13.33%`
|
|
|
|
That is the key distinction. The broad high-leverage-win rule is not reliable
|
|
enough. The narrow post-outlier rule is a legitimate candidate for guarded
|
|
shadow/live-probe research because it adds value exactly where it intervenes,
|
|
but the sample is still too small for unconditional deployment.
|
|
|
|
### Lowered big-win threshold grid
|
|
|
|
The phrase "sample too small" applies only to the original high-tail trigger
|
|
(`pnl_abs >= $400`, `lev >= 8.5`, immediate next trade). It does **not** mean
|
|
the BLUE ledger is small. The cleaned replay now spans:
|
|
|
|
- `1328` non-hibernate / non-subday-ACB BLUE trades
|
|
- `1616` merged near-duplicate-cleaned trade IDs
|
|
- `2026-03-31 01:10:34 UTC` through `2026-05-08 14:21:31 UTC`
|
|
|
|
To test whether the effect survives with more triggers, the post-win sweep was
|
|
expanded to:
|
|
|
|
- dollar win thresholds: `$10`, `$25`, `$50`, `$75`, `$100`, `$150`, `$200`,
|
|
`$300`, `$400`, `$500`, `$750`, `$1000`
|
|
- leverage thresholds: `0`, `0.69`, `0.70`, `1`, `2`, `3`, `5`, `8.5`, `9`
|
|
- return thresholds: `0`, `0.10%`, `0.25%`, `0.50%`, `0.75%`, `0.95%`,
|
|
`1.25%`
|
|
- follow-on horizons: next `1`, `2`, `3`, and `5` trades
|
|
|
|
Important result:
|
|
|
|
- lowering **dollar threshold alone** does not work
|
|
- lowering dollar threshold **with a realized-return threshold** does work
|
|
- the effect is mostly next `1` to `2` trades
|
|
- by next `5` trades, flipping LONG is not positive; cooldown / abstain is
|
|
better than LONG if the horizon is that wide
|
|
|
|
Grid-wide stability:
|
|
|
|
- horizon `1`: `630` eligible threshold combinations, `60.0%` positive
|
|
marginal delta, `45.87%` positive LONG PnL
|
|
- horizon `2`: `630` eligible threshold combinations, `57.30%` positive
|
|
marginal delta, `39.52%` positive LONG PnL
|
|
- horizon `3`: `693` eligible threshold combinations, `59.60%` positive
|
|
marginal delta, `12.99%` positive LONG PnL
|
|
- horizon `5`: `693` eligible threshold combinations, `51.08%` positive
|
|
marginal delta, `0.0%` positive LONG PnL
|
|
|
|
This says the post-win effect is a short-lived exhaustion / rebound artifact,
|
|
not a durable multi-trade LONG regime.
|
|
|
|
Fixed dollar-only immediate-next-trade rows:
|
|
|
|
| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|
|
|---|---:|---:|---:|---:|---:|---:|
|
|
| `$10+`, no lev gate | 277 | `+$3,044` | `-$9,146` | `-$12,190` | `+24.74%` | `24.55%` |
|
|
| `$50+`, no lev gate | 181 | `+$4,495` | `-$8,870` | `-$13,365` | `+42.58%` | `22.18%` |
|
|
| `$100+`, no lev gate | 135 | `+$908` | `-$4,252` | `-$5,160` | `+97.78%` | `18.09%` |
|
|
| `$200+`, no lev gate | 89 | `-$947` | `-$1,496` | `-$549` | `+140.76%` | `14.96%` |
|
|
| `$300+`, no lev gate | 62 | `-$1,695` | `-$45` | `+$1,651` | `+174.25%` | `13.70%` |
|
|
| `$400+`, no lev gate | 48 | `-$1,725` | `+$407` | `+$2,133` | `+180.70%` | `13.33%` |
|
|
| `$500+`, no lev gate | 40 | `-$1,153` | `+$90` | `+$1,242` | `+173.51%` | `13.33%` |
|
|
|
|
Dollar-only conclusion:
|
|
|
|
- below about `$300`, the next short trade is still net-profitable or less bad
|
|
than the LONG flip
|
|
- around `$300`, the next short trade turns bad, but LONG is only near-flat
|
|
- around `$400` to `$500`, the next-trade LONG flip becomes positive
|
|
|
|
Fixed immediate-next-trade rows with a `+0.75%` realized-return trigger:
|
|
|
|
| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|
|
|---|---:|---:|---:|---:|---:|---:|
|
|
| `$10+` and `+0.75%` | 99 | `-$1,735` | `-$409` | `+$1,326` | `+104.45%` | `14.03%` |
|
|
| `$50+` and `+0.75%` | 74 | `-$1,950` | `+$105` | `+$2,055` | `+155.62%` | `14.03%` |
|
|
| `$75+` and `+0.75%` | 70 | `-$2,028` | `+$194` | `+$2,223` | `+166.91%` | `13.95%` |
|
|
| `$100+` and `+0.75%` | 67 | `-$2,083` | `+$336` | `+$2,419` | `+168.60%` | `13.69%` |
|
|
| `$150+` and `+0.75%` | 63 | `-$2,082` | `+$344` | `+$2,426` | `+175.37%` | `13.69%` |
|
|
| `$300+` and `+0.75%` | 58 | `-$1,738` | `+$58` | `+$1,796` | `+173.61%` | `13.70%` |
|
|
| `$400+` and `+0.75%` | 48 | `-$1,725` | `+$407` | `+$2,133` | `+180.70%` | `13.33%` |
|
|
|
|
Return-conditioned conclusion:
|
|
|
|
- the effect becomes visible with more triggers when the dollar threshold is
|
|
lowered to `$50-$150` **and** the prior win is also at least `+0.75%`
|
|
- the best immediate-next-trade delta in this grid was around `$150+` and
|
|
`+0.75%`: `63` next trades, SHORT `-$2,081.81`, LONG `+$343.94`, delta
|
|
`+$2,425.75`
|
|
- the original `$400+`, high-leverage trigger remains good but is not the only
|
|
viable threshold; it is the cleaner high-tail version
|
|
|
|
Two-trade horizon:
|
|
|
|
| Trigger | Affected next trades | Leave SHORT | Flip LONG | Delta | Whole-policy compound | DD |
|
|
|---|---:|---:|---:|---:|---:|---:|
|
|
| `$300+`, `lev >= 8.5` | 115 | `-$3,201` | `+$511` | `+$3,712` | `+168.52%` | `14.27%` |
|
|
| `$400+`, `lev >= 8.5` | 91 | `-$2,689` | `+$556` | `+$3,245` | `+175.26%` | `13.71%` |
|
|
| `$500+`, `lev >= 8.5` | 75 | `-$2,237` | `+$509` | `+$2,747` | `+167.53%` | `14.71%` |
|
|
|
|
Two-trade conclusion:
|
|
|
|
- the high-leverage `$300-$500` zone supports a two-trade exhaustion rebound
|
|
more strongly than the original one-trade-only statement
|
|
- the best two-trade variant in this fixed grid was `$300+`, `lev >= 8.5`,
|
|
next two trades: delta `+$3,712`, estimated LONG PnL `+$511`
|
|
- the five-trade horizon should not be traded LONG; it is only a damage-control
|
|
/ cooldown signal
|
|
|
|
Reliability statement:
|
|
|
|
The post-win overlay is more solid than initially stated. The robust form is
|
|
not "after any win"; that is false. The robust form is:
|
|
|
|
```text
|
|
after a sufficiently large realized short win,
|
|
especially a high-return or high-leverage win,
|
|
the next 1-2 short-engine opportunities are often contaminated by rebound risk
|
|
and can be improved by LONG flip or, at minimum, cooldown/abstain.
|
|
```
|
|
|
|
The strongest candidates for shadow/live-probe research are:
|
|
|
|
- immediate next trade after `$100-$200` win **and** prior return `>= +0.75%`
|
|
- immediate next trade after `$400+` win, especially `lev >= 8.5`
|
|
- next two trades after `$300-$500` win with `lev >= 8.5`
|
|
|
|
Guardrail:
|
|
|
|
The overlay should not optimize on WR. LONG WR remains lower than SHORT WR on
|
|
many triggered subsets. The edge is payoff asymmetry / loss-tail avoidance:
|
|
short wins become smaller or disappear after the exhaustion event, while short
|
|
losses on the next trade(s) become expensive.
|
|
|
|
### Candidate codified overlay rule and EFSM
|
|
|
|
Terminology:
|
|
|
|
- **EFSM** means **Execution FSM**
|
|
- refer to this component as the post-win **EFSM**, not merely a generic
|
|
"state machine"
|
|
|
|
Candidate rule proposed after the lowered-threshold sweep:
|
|
|
|
```text
|
|
after a completed BLUE SHORT trade:
|
|
|
|
if pnl_abs > $397:
|
|
tag next 1 trade as FLIP_LONG
|
|
|
|
if pnl_abs > $397 and leverage > 8.6:
|
|
tag next 2 trades as FLIP_LONG
|
|
|
|
if 0 < pnl_abs < $250 and pnl_pct >= +0.75%:
|
|
tag next 1 trade as FLIP_LONG
|
|
|
|
after the armed slots are consumed:
|
|
reset to SHORT
|
|
```
|
|
|
|
EFSM semantics:
|
|
|
|
- this is a **slot-based Execution FSM**, not a persistent regime switch
|
|
- each trigger arms an explicit number of future slots
|
|
- each future entry consumes exactly one slot
|
|
- when `slots_remaining == 0`, the state resets to SHORT
|
|
- while slots are active, new triggers are ignored by default
|
|
- a flipped LONG trade outcome is not allowed to re-arm the overlay
|
|
- this prevents the reset bug where one flipped trade recursively arms the next
|
|
and converts a bounded rebound probe into an unbounded side switch
|
|
- the implementation supports arbitrary future slot counts, not only `1` and
|
|
`2`
|
|
|
|
Implementation location:
|
|
|
|
- EFSM: `adaptive_exit/post_win_long_overlay.py`
|
|
- canonical class names: `PostWinExecutionFSM`,
|
|
`PostWinExecutionFSMConfig`
|
|
- compatibility aliases: `PostWinLongOverlay`, `PostWinLongOverlayConfig`
|
|
- tests: `prod/tests/test_post_win_long_overlay.py`
|
|
|
|
Focused test coverage:
|
|
|
|
- `$397+` non-high-leverage win arms one slot
|
|
- `$397+` and `lev > 8.6` arms two slots
|
|
- `< $250` and `pnl_pct >= +0.75%` arms one slot
|
|
- active arms consume deterministically and reset to SHORT
|
|
- re-arm attempts while active are ignored
|
|
- flipped LONG outcomes cannot re-arm
|
|
- optional TTL expiry works
|
|
- future `3+` slot rules work
|
|
|
|
Focused verification:
|
|
|
|
```text
|
|
python -m pytest -o cache_dir=/tmp/pytest-cache-post-win-overlay \
|
|
prod/tests/test_post_win_long_overlay.py -q
|
|
|
|
7 passed
|
|
```
|
|
|
|
Exact candidate replay, no re-arm during active flip slots:
|
|
|
|
- input: `1333` cleaned BLUE trades through `2026-05-08 14:34:57 UTC`
|
|
- baseline short-only estimated PnL: `+$10,953.50`
|
|
- candidate policy estimated PnL: `+$12,464.30`
|
|
- marginal dollar delta: `+$1,510.80`
|
|
- baseline max DD: `15.70%`
|
|
- candidate max DD: `14.78%`
|
|
- long-flipped trades: `160`
|
|
- affected subset left SHORT: `-$2,415.46`
|
|
- affected subset flipped LONG: `-$904.67`
|
|
- affected subset marginal delta: `+$1,510.80`
|
|
- triggers armed:
|
|
- `small_dollar_high_return`: `77`
|
|
- `big_win_high_lev`: `41`
|
|
- `big_win`: `1`
|
|
- slots consumed:
|
|
- `small_dollar_high_return`: `77`
|
|
- `big_win_high_lev`: `82`
|
|
- `big_win`: `1`
|
|
- consumed arms: `119`
|
|
- dangling slots at end: `0`
|
|
- ignored re-arm attempts while active: `20`
|
|
|
|
Reset sensitivity:
|
|
|
|
Allowing active flipped trades / active arms to re-arm is harmful:
|
|
|
|
- unsafe recursive re-arm variant long flips: `183`
|
|
- unsafe marginal delta: `-$5,425.32`
|
|
- safe no-rearm marginal delta: `+$1,510.80`
|
|
|
|
Therefore the no-recursive-rearm reset invariant is not optional. It is part of
|
|
the edge definition.
|
|
|
|
Compound-return caveat:
|
|
|
|
- baseline short-only compound: `+164.89%`
|
|
- candidate compound: `+107.26%`
|
|
|
|
This is why the overlay must be treated as a dollar-tail / drawdown-control
|
|
overlay first, not as a compounding optimizer. The current counterfactual uses
|
|
same entry/exit skeleton and estimated flipped LONG PnL, so the next validation
|
|
step must include actual LONG execution assumptions, long-side V7 behavior, and
|
|
time-to-next-entry gating.
|
|
|
|
Time dependency:
|
|
|
|
The replay showed material timing dependence:
|
|
|
|
| Delay from trigger to flipped entry | n | SHORT PnL | LONG PnL | Delta |
|
|
|---|---:|---:|---:|---:|
|
|
| `<=15m` | 19 | `+$2,765.51` | `-$3,062.37` | `-$5,827.88` |
|
|
| `15-30m` | 67 | `-$3,588.76` | `+$2,381.96` | `+$5,970.72` |
|
|
| `30-60m` | 40 | `-$882.57` | `-$104.33` | `+$778.24` |
|
|
| `>60m` | 34 | `-$709.64` | `-$119.93` | `+$589.72` |
|
|
|
|
This means the overlay may need a lower-bound delay, an upper-bound TTL, or
|
|
market-state confirmation. The current EFSM already supports TTL;
|
|
the exact timing gate remains research, not deployed doctrine.
|
|
|
|
AdvancedExitManagerV7 / AlphaExitEngineV7 caveat:
|
|
|
|
`AlphaExitEngineV7` is mechanically side-aware:
|
|
|
|
- `side=0` means LONG
|
|
- `side=1` means SHORT
|
|
- PnL, MFE, MAE, trend direction, and adverse/favorable movement are signed by
|
|
`ctx.side`
|
|
|
|
However, V7 calibration is SHORT-lineage:
|
|
|
|
- bounce model labels were trained on BLUE SHORT adverse-bar samples
|
|
- pressure threshold `2.69` was selected on SHORT/GREEN-lineage replay
|
|
- MAE/MFE concepts are symmetric in code but not guaranteed symmetric in
|
|
fitted thresholds or bounce probabilities
|
|
|
|
Before any live FLIP_LONG execution, V7 must be validated in one of these modes:
|
|
|
|
- shadow-only LONG contexts using actual flipped LONG entries
|
|
- conservative LONG-specific V7 threshold override
|
|
- disable V7 live exits for overlay LONGs until enough shadow decisions show
|
|
it does not prematurely cut the rebound edge
|
|
|
|
The rule can be codified, but production wiring must keep the EFSM, side
|
|
selection, and V7 exit policy explicitly separable.
|
|
|
|
Sweep results:
|
|
|
|
- best by compounded return:
|
|
- mode: one-shot after win
|
|
- `trigger_lev = 9.0`
|
|
- `trade_min_lev = 0.0`
|
|
- traded: `1324`
|
|
- LONG trades: `222`
|
|
- WR `50.91%`
|
|
- compounded return `+61.93%`
|
|
- max DD `19.36%`
|
|
- PnL `-$257.03`
|
|
- best by estimated dollars:
|
|
- mode: one-shot after win
|
|
- `trigger_lev = 2.0`
|
|
- `trade_min_lev = 0.69`
|
|
- traded: `1050`
|
|
- LONG trades: `297`
|
|
- WR `40.03%`
|
|
- compounded return `+27.71%`
|
|
- max DD `22.44%`
|
|
- PnL `+$5,114.50`
|
|
|
|
Both sweep optima still underperform the relevant short-only baselines. In
|
|
particular, simply treating high leverage as a short-side quality filter is
|
|
stronger than using high-leverage short wins as a broad long-switch trigger:
|
|
|
|
- `lev >= 8.5`, short-only: PnL `+$12,193.65`, max DD `7.58%`
|
|
- best long-switch dollar policy: PnL `+$5,114.50`, max DD `22.44%`
|
|
|
|
Interpretation:
|
|
|
|
- leverage does behave like conviction, but the first-order use is filtering /
|
|
sizing, not side inversion
|
|
- ordinary high-lev wins are too common to serve as a LONG regime switch
|
|
- the previous post-outlier result survives only because it was much narrower:
|
|
large dollar win, 9x, and immediate next trade
|
|
- high-lev wins may still be useful as **features** in the dual-shadow /
|
|
market-fingerprint layer:
|
|
- `last_high_lev_short_win`
|
|
- `last_high_lev_short_win_count`
|
|
- `last_high_lev_short_win_pnl_abs`
|
|
- `last_high_lev_short_win_return_pct`
|
|
- `bars_since_high_lev_short_win`
|
|
- `consecutive_high_lev_short_wins`
|
|
|
|
Research conclusion:
|
|
|
|
- do not implement the literal `lev > 0.70` long switch
|
|
- do preserve leverage as a strong conviction feature
|
|
- do keep the narrower post-outlier one-trade long probe in the research queue
|
|
- the strongest immediate operational lesson is that low-leverage trades may be
|
|
unnecessary, while high-leverage shorts remain the cleaner expression
|
|
|
|
## AlphaExitEngineV7 LONG calibration replay
|
|
|
|
Date: `2026-05-08`
|
|
|
|
Scope:
|
|
|
|
- system: BLUE only
|
|
- exit engine: `AlphaExitEngineV7`
|
|
- harness: `adaptive_exit/calibrate_v7_long_from_journal.py`
|
|
- source data: ClickHouse `dolphin.v7_decision_events`
|
|
- source rows: `6,812`
|
|
- reconstructed BLUE V7-tracked paths: `97`
|
|
- path side in source journal: SHORT
|
|
- replay side for calibration: synthetic LONG (`side=0`)
|
|
- fee assumption: `4 bps`
|
|
- natural exit comparator: final logged decision-row price for the same path
|
|
- V7 exit comparator: first replayed V7 `EXIT` on the same price path
|
|
- bounce model: disabled for this replay by intentionally using a missing model
|
|
path, because the current bounce model is trained on BLUE SHORT adverse-bar
|
|
samples and should not be treated as a validated LONG probability model
|
|
|
|
This is a LONG-exit calibration proxy, not proof from exchange-filled LONG
|
|
trades. It answers a narrower question: if the post-win EFSM had flipped a
|
|
trade LONG on price paths that BLUE V7 actually observed, would a LONG-side V7
|
|
cut/exit surface have improved or harmed the synthetic LONG outcome versus
|
|
holding to the path's natural end?
|
|
|
|
### Original V7 SHORT calibration pattern
|
|
|
|
The original V7 calibration was a pressure-threshold sweep over live shadow
|
|
decisions. V7 computes:
|
|
|
|
```text
|
|
exit_pressure = clamp(directional_term + risk_term, -3.0, +3.0)
|
|
```
|
|
|
|
Then:
|
|
|
|
```text
|
|
if exit_pressure > 2.69:
|
|
EXIT
|
|
elif exit_pressure > 1.0:
|
|
RETRACT
|
|
elif exit_pressure < -0.5 and pnl_pct > 0:
|
|
EXTEND
|
|
else:
|
|
HOLD
|
|
```
|
|
|
|
The documented SHORT lineage was:
|
|
|
|
| Pressure threshold | Fires | Result |
|
|
|---:|---:|---:|
|
|
| `2.00` | `22/24` | `+$439`, ROI `+1.67%` |
|
|
| `2.35` | `17/24` | `+$891`, ROI `+3.38%` |
|
|
| `2.60` | `17/24` | `+$891`, ROI `+3.38%` |
|
|
| `3.00` | `14/24` | `+$796`, ROI `+3.02%` |
|
|
| base/no V7 | n/a | `+$784`, ROI `+2.98%` |
|
|
|
|
The deployed threshold `2.69` was chosen as the high end of the useful
|
|
`2.35-2.70` band so V7 stayed closer to base behavior and avoided cutting
|
|
winners on transient pressure.
|
|
|
|
### Threshold surface now explicit
|
|
|
|
`AlphaExitEngineV7` now accepts an optional per-engine
|
|
`AlphaExitV7Config`. Defaults preserve the deployed SHORT-calibrated behavior.
|
|
This lets BLUE instantiate separate SHORT and LONG V7 engines later without
|
|
global mutation.
|
|
|
|
V7-specific configurable fields:
|
|
|
|
| Config field | Default | Meaning |
|
|
|---|---:|---|
|
|
| `rvol_w15` | `0.50` | realized-vol composite weight for 15-bar volatility |
|
|
| `rvol_w30` | `0.30` | realized-vol composite weight for 30-bar volatility |
|
|
| `rvol_w50` | `0.20` | realized-vol composite weight for 50-bar volatility |
|
|
| `rvol_floor` | `0.000001` | minimum realized-vol denominator |
|
|
| `mae_tier1_k` | `3.5` | MAE tier-1 multiplier on `rv_comp` |
|
|
| `mae_tier2_k` | `7.0` | MAE tier-2 multiplier on `rv_comp` |
|
|
| `mae_tier3_k` | `12.0` | MAE tier-3 multiplier on `rv_comp` |
|
|
| `mae_tier1_floor` | `0.005` | MAE tier-1 absolute floor |
|
|
| `mae_tier2_floor` | `0.012` | MAE tier-2 absolute floor |
|
|
| `mae_tier3_floor` | `0.025` | MAE tier-3 absolute floor |
|
|
| `mae_tier1_risk` | `0.5` | pressure contribution once tier 1 is breached |
|
|
| `mae_tier2_risk` | `0.8` | pressure contribution once tier 2 is breached |
|
|
| `mae_tier3_risk` | `1.2` | pressure contribution once tier 3 is breached |
|
|
| `mae_accel_min_bars` | `3` | minimum bars before adverse-acceleration gate can fire |
|
|
| `mae_accel_peak_floor` | `0.003` | adverse peak floor for MAE acceleration risk |
|
|
| `mae_accel_risk` | `0.6` | pressure contribution for MAE acceleration |
|
|
| `mae_recovery_peak_floor` | `0.004` | adverse peak floor for failed-recovery gate |
|
|
| `mae_recovery_prev_min` | `0.25` | prior recovery ratio required before snapback risk |
|
|
| `mae_recovery_snapback_max` | `0.10` | recovery ratio below which recovery is treated as failed |
|
|
| `mae_recovery_risk` | `1.0` | pressure contribution for failed recovery |
|
|
| `mae_late_floor` | `0.003` | MAE required before late adverse ramp applies |
|
|
| `mae_late_start_frac` | `0.60` | bars-held fraction where late adverse ramp starts |
|
|
| `mae_late_risk_max` | `0.4` | maximum late adverse pressure contribution |
|
|
| `max_hold_ref_mult_3m` | `3.0` | V7 internal max-hold reference multiplier |
|
|
| `mfe_slope_peak_floor` | `0.01` | peak favorable floor for convexity slope break |
|
|
| `mfe_convexity_decay_exit` | `0.35` | decay ratio for hard MFE giveback pressure |
|
|
| `mfe_convexity_decay_soft` | `0.20` | decay ratio for soft MFE giveback pressure |
|
|
| `mfe_convexity_exit_risk` | `1.5` | pressure contribution for hard MFE giveback |
|
|
| `mfe_convexity_soft_risk` | `0.3` | pressure contribution for soft MFE giveback |
|
|
| `mfe_accel_floor` | `-0.00001` | MFE acceleration floor for adverse convexity |
|
|
| `mfe_accel_peak_floor` | `0.005` | peak favorable floor for MFE acceleration risk |
|
|
| `mfe_accel_risk` | `0.2` | pressure contribution for MFE acceleration risk |
|
|
| `bounce_dir_w` | `0.15` | bounce score directional-term weight |
|
|
| `bounce_risk_w` | `0.35` | bounce risk-term weight |
|
|
| `bounce_rv_safe_floor` | `0.00001` | bounce feature volatility denominator floor |
|
|
| `exit_pressure_threshold` | `2.69` | live `EXIT` threshold |
|
|
| `retract_pressure_threshold` | `1.0` | `RETRACT` threshold |
|
|
| `extend_pressure_threshold` | `-0.5` | profitable `EXTEND` threshold |
|
|
| `pressure_min` | `-3.0` | pressure clamp lower bound |
|
|
| `pressure_max` | `3.0` | pressure clamp upper bound |
|
|
|
|
Inherited V6 weight priors remain configurable through the existing
|
|
`WeightAdapter`/`WeightPriors` seam. The new config is specifically for V7
|
|
threshold/gate surfaces and is init-time/per-engine configurable.
|
|
|
|
### LONG replay results
|
|
|
|
Baseline synthetic LONG natural exit across the 97 paths:
|
|
|
|
- natural PnL: `-$328.84`
|
|
- natural WR: `59.79%`
|
|
- natural compound: `+3.50%`
|
|
- natural max DD: `2.28%`
|
|
|
|
The dollar PnL and compound can diverge because path notionals differ. For this
|
|
exit calibration, dollar PnL is the more relevant metric because BLUE sizing is
|
|
not uniform.
|
|
|
|
Top tested surfaces:
|
|
|
|
| Candidate | V7 PnL | Delta vs natural | Exits | Exit rate | V7 WR | V7 max DD |
|
|
|---|---:|---:|---:|---:|---:|---:|
|
|
| `mfe_risk_scale_0.5` | `+$205.32` | `+$534.15` | `36` | `37.11%` | `50.52%` | `1.69%` |
|
|
| `mfe_risk_scale_0.75` | `+$205.32` | `+$534.15` | `36` | `37.11%` | `50.52%` | `1.69%` |
|
|
| `combo_p1.7_mae0.75` | `+$47.24` | `+$376.08` | `51` | `52.58%` | `47.42%` | `1.55%` |
|
|
| `exit_p1.7` | `+$36.88` | `+$365.72` | `51` | `52.58%` | `47.42%` | `1.53%` |
|
|
| `exit_p2.0` | `+$19.68` | `+$348.52` | `41` | `42.27%` | `49.48%` | `1.53%` |
|
|
| `short_default` / `exit_p2.69` | `+$1.43` | `+$330.26` | `38` | `39.18%` | `49.48%` | `1.81%` |
|
|
| `exit_p3.0` | `-$328.84` | `$0.00` | `0` | `0.00%` | `59.79%` | `2.28%` |
|
|
|
|
Interpretation:
|
|
|
|
- The deployed SHORT default is not mechanically broken for LONG. It improved
|
|
synthetic LONG dollar outcome by `+$330.26` versus natural exit on the 97
|
|
replayed paths.
|
|
- The best tested LONG proxy did not come from lowering the pressure threshold.
|
|
It came from reducing MFE giveback/convexity pressure contribution
|
|
(`mfe_risk_scale_0.5` or `0.75`).
|
|
- Aggressively lowering `exit_pressure_threshold` to `1.4` over-fires:
|
|
`78/97` exits, V7 PnL `-$11.78`, and many negative deltas. That resembles the
|
|
original SHORT calibration failure at `2.0`: pressure that is too sensitive
|
|
cuts too much transient noise.
|
|
- A moderate pressure threshold around `1.7-2.0` is useful, but still inferior
|
|
to leaving pressure at `2.69` and reducing MFE-risk contributions in this
|
|
proxy.
|
|
|
|
Recommended LONG overlay calibration candidate for shadow:
|
|
|
|
```python
|
|
AlphaExitV7Config(
|
|
mfe_convexity_exit_risk=0.75,
|
|
mfe_convexity_soft_risk=0.15,
|
|
mfe_accel_risk=0.10,
|
|
)
|
|
```
|
|
|
|
This is the `mfe_risk_scale_0.5` surface. It keeps:
|
|
|
|
- `exit_pressure_threshold = 2.69`
|
|
- all MAE vol-normalized loss-cut thresholds unchanged
|
|
- pressure clamp unchanged
|
|
- bounce disabled or neutral until a LONG-trained bounce model exists
|
|
|
|
Why this candidate is preferable to simply lowering `exit_pressure_threshold`:
|
|
|
|
- it preserved the useful loss-cut behavior while avoiding broad pressure
|
|
over-firing
|
|
- it improved dollar PnL more than all pressure-threshold sweeps tested
|
|
- it left MAE protection intact, which matters if the flipped LONG thesis is
|
|
wrong and the asset continues down
|
|
- it respects that the post-win EFSM edge is a rebound/cooldown edge, so the
|
|
exit manager should not over-penalize ordinary post-entry MFE shape
|
|
|
|
Do not deploy this LONG config live yet. It should first be run in shadow on
|
|
actual EFSM-flipped candidate LONG contexts, because this replay uses SHORT
|
|
entries inverted to LONG and not real LONG fills.
|
|
|
|
### Regression and safety notes
|
|
|
|
Implemented code seams:
|
|
|
|
- `nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.py`
|
|
defines `AlphaExitV7Config`
|
|
- default `AlphaExitEngineV7()` behavior remains the SHORT-calibrated config
|
|
- a LONG-specific engine can be instantiated with `AlphaExitEngineV7(config=...)`
|
|
- the calibration harness writes full results to `/tmp/v7_long_calibration.json`
|
|
|
|
Tests added:
|
|
|
|
- default config equals the legacy SHORT threshold surface
|
|
- custom config is per-instance and does not mutate the default engine
|
|
- V7 remains mechanically side-aware for LONG and SHORT PnL/MFE/MAE
|
|
- BLUE live V7 provider wiring still records journal decisions and uses OB
|
|
signal input
|
|
- EFSM reset/no-recursive-rearm tests remain separate from V7 exit calibration
|
|
|
|
Research caveats:
|
|
|
|
- only `97` V7-tracked BLUE paths existed in the current decision journal
|
|
- this is enough to reject obviously bad LONG exit settings, but not enough to
|
|
canonize a live LONG exit policy
|
|
- bounce must remain neutral for LONG until trained or validated on LONG samples
|
|
- V7 `max_hold_ref_mult_3m` still uses an internal time reference rather than
|
|
the orchestrator's effective max hold; the system bible already tracks this
|
|
as a V7 TODO/bug because it can make adverse-ramp pressure too early
|