hjnormey/siloqy

Public Access

Fork 0

Files

Codex 83f007caa8 Checkpoint BLUE V7 long overlay work

2026-05-08 19:54:13 +02:00

47 KiB

Raw Blame History

LONG Deterministic Rule Research

Date: 2026-05-07

Goal

Find the simplest deterministic long-side market rule, using primarily Dolphin NG eigendata, that behaves like the original short Alpha Engine rule in spirit:

few moving parts
market-structural
explainable in one breath
reliable enough to serve as a basal gate before asset selection and later overlays

This note is explicitly not about a fitted long model.

Data source

The analysis uses the raw daily scan cache summarized by:

adaptive_exit/characterize_long_signals.py
/mnt/dolphin_training/long_signal_research/long_signal_scan_summary_h24.parquet
/mnt/dolphin_training/long_signal_research/long_signal_characterization_report.json

Only eigendata and scan-price-derived outcomes are used here:

instability_50
v50/v150/v300/v750_lambda_max_velocity
vel_div
vel_div lag / delta terms

No ExF, EsoF, or OBF are required for the core finding.

What does not work as the basal long rule

The obvious mirror thesis,

vel_div > 0.01

is too weak to be the basal long edge.

Recent HQ slice (2025-12-31 onward):

support: 39.65%
strong_long lift: 1.15x
broad_long lift: 1.22x

That is not useless, but it is not elegant enough nor selective enough to be the long analogue of vel_div < -0.02.

Strongest deterministic shape

The long side shows up most clearly as a stressed unwind / squeeze regime, not as a generic bullish breakout regime.

Candidate primary deterministic rule

LONG_REGIME if
    instability_50 >= 20.5
    and v300_lambda_max_velocity < 0
    and v750_lambda_max_velocity < 0

Interpretation:

instability_50 >= 20.5: the market is structurally stressed
v300 < 0 and v750 < 0: the slower eigenspace is still negative / damaged
together: this is a high-stress unwind state where long opportunities tend to appear as reversals / squeezes on the same manifold that produces short dislocations

Why `20.5`

20.5 is the rounded recent-HQ instability_50 90th-percentile threshold (20.546996...). It is the most practical fixed threshold found in the recent-era characterization.

Empirical support

Recent HQ (`2025-12-31` onward)

Base rates:

strong_long: 0.1648
broad_long: 0.1367

Rule:

support: 6356 rows (3.58%)
strong_long: 0.3409 (2.07x lift)
broad_long: 0.3538 (2.59x lift)

Full history

Base rates:

strong_long: 0.2603
broad_long: 0.2472

Rule:

support: 300,728 rows (12.59%)
strong_long: 0.3330 (1.28x lift)
broad_long: 0.3375 (1.37x lift)

Simpler fallback

If maximum elegance is preferred over extra selectivity, the one-factor fallback is:

LONG_REGIME_SIMPLE if instability_50 >= 20.5

Recent HQ:

support: 10.10%
strong_long: 0.3297 (2.00x lift)
broad_long: 0.3420 (2.50x lift)

This is surprisingly strong for a one-variable rule. It is the closest thing found to a pure long-side analogue of the short vel_div < -0.02 gate.

Tradeoff:

simpler
broader
slightly less selective than adding v300 < 0 and v750 < 0

Optional stricter confirmation

If later tuning wants more explicit “healing after stress” confirmation, the strict variant is:

LONG_REGIME_STRICT if
    instability_50 >= 20.5
    and vel_div_lag6 < -0.03
    and vel_div_delta6 > 0.02

This is directionally sensible, but it is not materially better than the instability_50 + v300 + v750 rule, so it should be treated as an optional refinement, not the basal rule.

Monthly sanity check

For the candidate primary rule (instability_50 >= 20.5 && v300 < 0 && v750 < 0) in the recent HQ window:

2026-01: strong_long = 0.348
2026-02: strong_long = 0.344
2026-03: strong_long = 0.312

The monthly base rates for the same period were:

2026-01: 0.289
2026-02: 0.271
2026-03: 0.068

So even into the weak March tape, the rule remains elevated relative to base.

Practical interpretation

This should be viewed as a market-state gate, not a complete trade engine.

It says:

“the market is in the sort of stressed, damaged regime where long squeeze / unwind opportunities become meaningfully more likely”

It does not by itself say:

which asset is the best expression
how to size
how to exit

That is where the next layers belong:

deterministic or learned asset selection
OBF / ARS / bounce overlays
TP / MAX_HOLD policy

Recommendation

If a single deterministic long gate must be named now, use:

LONG_REGIME if instability_50 >= 20.5 and v300 < 0 and v750 < 0

If maximum simplicity is the priority, use:

LONG_REGIME_SIMPLE if instability_50 >= 20.5

And explicitly do not promote vel_div > 0.01 as the basal long rule.

Deferred analysis idea: dual-shadow regime sampler

This is a later analysis / control-layer research note, not a live-rule recommendation.

One plausible way to sample the market in real time without committing the full system immediately is a very lightweight dual-shadow engine:

Shadow A: the basal SHORT engine (vel_div < -0.02 Alpha Engine posture)
Shadow B: the basal LONG engine (currently the older negative-vel_div mean-reversion LONG posture is the best simple candidate)

The intent is not merely paper PnL logging. It is to use live, recent sample-trade outcomes as a micro-regime probe:

if SHORT shadow performance degrades while LONG shadow performance improves, the tape may have rotated into a LONG-favorable regime
if LONG degrades while SHORT improves, the inverse may be true
if both are performing acceptably, the tape may be permissive / broad enough that either side can express edge
if both are failing, the tape is likely choppy / non-coherent and abstention becomes a first-class candidate

This should be implemented, if ever pursued, as:

very fast
very lightweight
explicitly shadow-only at first
based on small, recent sample trades rather than a heavy fitted model

Longer-term, the entire shadow stream can itself become training data:

market fingerprints at shadow-entry time
concurrent SHORT-shadow and LONG-shadow outcomes
relative WR / ROI-per-trade / drawdown / time-to-win asymmetries

That would allow a later learner to predict or simplify the regime switcher. But even before ML, the dual-shadow process may already serve as a useful real-time market-sampling / regime-detection mechanism.

Dual-shadow persistence characterization

This section records the first persistence pass over extant trades. The goal was not to prove a full regime-switch system, but to test whether the observed short-loss streaks are durable enough to justify a regime-favorableness probe.

Important caveat:

the live SHORT series and the replay LONG series are on different date spans
this is therefore a side-specific persistence study, not a same-bar paired dominance study
the numbers below are still useful for run-length and hysteresis design

Live SHORT stream

From the current BLUE trader log:

trades: 234
win rate: 44.44%
mean pnl_pct: +0.000506
median pnl_pct: -0.000234
average win streak: 1.65 trades
average loss streak: 2.03 trades
P(win -> win) = 0.394
P(loss -> loss) = 0.512
average positive-day run: 1.5 days
average negative-day run: 1.5 days

Interpretation:

short failures do cluster
the cluster is real enough to notice
but it is only mildly persistent
by itself, it is not strong enough to justify a raw ping-pong switch

Basal LONG shadow, old mirror posture

Using the recent bullish-month replay and the single comparable 10-bar / worst_10bar configuration:

trades: 2,243
win rate: 48.33%
mean pnl_pct: +0.000320
median pnl_pct: -0.000400
average win streak: 1.93 trades
average loss streak: 2.07 trades
P(win -> win) = 0.483
P(loss -> loss) = 0.517
average positive-day run: 3.0 days
average negative-day run: 1.86 days

Interpretation:

this is the clearest durable long-favorable candidate seen so far
the multi-day positive run length is materially better than the live short stream
this supports a long-favorable regime probe, but not an unconditional flip

Basal LONG shadow, new stressed-unwind posture

Same replay setup:

trades: 569
win rate: 50.44%
mean pnl_pct: -0.000078
median pnl_pct: +0.000068
average win streak: 2.24 trades
average loss streak: 2.20 trades
P(win -> win) = 0.556
P(loss -> loss) = 0.546
average positive-day run: 1.36 days
average negative-day run: 1.18 days

Interpretation:

the new long posture has decent local persistence
but it is more fragile than the mirror-long posture as a regime switch
it does not yet justify itself as the primary flip trigger

Conclusion for regime switching

The data support a smoothed regime-favorableness detector, not a raw flip-on-first-loss system.

Practical reading:

short-loss streak persistence is real but modest
long-favorable states exist and can persist
persistence is on the order of a few trades, not a dramatic regime lock
the correct implementation is a shadow score with hysteresis and abstain logic, not a hard immediate SHORT/LONG switch

Suggested rule shape for later analysis:

compute rolling shadow scores for SHORT and LONG
use persistence thresholds before flipping
require stronger evidence to reverse than to stay put
abstain when both shadows are weak or both are losing

This is enough to justify the next engineering step:

live dual-shadow logging on the same bars
market-fingerprint tagging of each shadow entry
later ML over shadow outcomes if the deterministic layer proves stable

Rolling flip-worthiness test

To make the side-switch question stricter, the recent live short slice was retested with a 5-trade rolling shadow-delta proxy:

short shadow return = actual live short pnl_pct
long shadow return = counterfactual -pnl_pct - fee
rolling delta = rolling mean of (long_shadow - short_shadow)

Recent 3-day slice (2026-05-04 to 2026-05-06):

trades: 168
short actual WR: 39.88%
short actual compounded return: +10.02%
long counterfactual WR: 47.62%
long counterfactual compounded return: -16.92%
flip-to-long signals from the 5-trade rolling delta: 68
flip-to-short signals from the 5-trade rolling delta: 79

Interpretation:

the rolling delta does detect alternating regime pockets
but it does so often enough that a raw flip would be too twitchy
on the most recent 30 live trades, the regime buckets were:
- 13 long-favorable
- 7 short-favorable
- 10 neutral
the long-favorable bucket had positive expected PnL, but the short-favorable bucket was also positive and slightly stronger

The important point is that the signal is not “switch now on first loss.” It is:

keep a smoothed side-dominance score
require persistence before flipping
use hysteresis
abstain when the shadow spread is weak or oscillatory

So the stricter test reinforces the earlier conclusion:

there is enough structure to justify a regime-favorableness detector
there is not yet enough stability to justify a raw mechanical flip
the right next step is live dual-shadow logging on the same bars, then threshold and persistence calibration on that shared stream

Flip-after-loss counterfactual

The actual live short ledger was also replayed under a simple finite-state side-switch rule:

start SHORT
if the current side loses N trades in a row, flip to the other side
keep applying the same rule across the whole trade sequence

This is the cleanest way to test the idea “short losses are the long cue.”

On the current 234-trade live ledger:

always short: WR 44.44%, compounded return +11.35%, max DD 5.71%
always long: WR 44.87%, compounded return -20.13%, max DD 23.09%

Threshold sweep:

N=1: WR 40.60%, compounded return +5.33%, max DD 11.11%, flips 139
N=2: WR 44.44%, compounded return -17.72%, max DD 17.77%, flips 43
N=3: WR 48.29%, compounded return +5.48%, max DD 6.35%, flips 13
N=4: WR 47.86%, compounded return +6.21%, max DD 6.55%, flips 7
N=5: WR 43.59%, compounded return +10.52%, max DD 5.59%, flips 5
N=6: WR 45.73%, compounded return +15.17%, max DD 4.84%, flips 3

Interpretation:

side switching can help
it helps best when the flip threshold is fairly high
the best observed threshold in this small grid was N=6
low thresholds are too twitchy and can destroy the edge

So the practical conclusion is:

a raw flip-on-first-loss rule is not justified
a slower loss-cluster regime switcher is plausible
the switcher must be hysteretic and persistence-gated

This is consistent with the earlier shadow-score recommendation and explains why the observed “8 or 9 losses, then a couple wins” pattern can be useful without being directly automatable at a low threshold.

Condition-gated flip replay

I then reran the side-switch counterfactual with an additional gate:

the current side must first hit N consecutive losses
the opposite side must also satisfy its own deterministic long/short entry condition
the replay uses the same 10-bar tape skeleton and the worst-10-bar asset expression

Two long theories were tested separately:

Old mirror-long: vel_div < -0.02 and cross-sectional 10-bar momentum < 0
New stressed-unwind long: instability_50 >= 20.5 and v300 < 0 and v750 < 0

Results on the long research windows:

old mirror-long becomes marginally usable only at high thresholds:
- N=5: WR 47.00%, compounded return +6.34%, DD 46.23%, flips 11
- N=6: WR 46.52%, compounded return +28.34%, DD 43.78%, flips 5
the new stressed-unwind long does not survive this gate cleanly:
- N=1..6: compounded return stays negative, with severe drawdown

Interpretation:

the condition gate does not rescue the new long theory
it does preserve the old mirror-long as a late, low-frequency fallback
the market still looks too unstable for a low-threshold flip rule
if we keep this path, it should be a smoothed regime sampler, not an immediate switcher

Report:

flip_on_loss_condition_gate_report.md

Full-history condition-gated replay

I then ran the same condition-gated flip simulator across the entire available price tape:

root: /mnt/dolphin_training/share_offload/vbt_cache_klines
rows: 2,553,401
span: 2021-06-15 00:01:00+00:00 -> 2026-03-18 18:16:40.041456896+00:00

This is the hardest and most useful stress test because it removes the recent-slice bias entirely.

Results:

old mirror-long
- N=1..6 win rate range: 44.95% -> 46.60%
- best mean PnL at N=6: -0.000163 per trade
- best threshold still compounds to -100% over the full archive
new stressed-unwind long
- N=1..6 win rate range: 44.16% -> 46.86%
- best mean PnL at N=6: -0.000218 per trade
- best threshold also compounds to -100%

Interpretation:

the condition gate does not rescue either long theory at full-archive scale
the old mirror-long is still the stronger of the two, but only marginally
the long-side edge, if it exists, is too weak or too regime-dependent to survive this archive-wide flip rule without additional filtering
the full-tape result is a warning against over-trusting the favorable recent-month slices

Report:

flip_on_loss_condition_gate_stream_full_report.md

Post-outlier-short-win long-flip probe

Motivation: the May 8 live footer showed a familiar-looking pattern:

large 9x short win, e.g. ALGOUSDT +$466 or VETUSDT +$574
immediately followed by a somewhat larger-than-normal short loss, e.g. DASHUSDT -$191 or STXUSDT -$54

The question was whether this is a real post-outlier rebound signature:

after a very large short win,
should the next trade, or next few trades, be treated as LONG candidates?

Dataset and hygiene:

source: BLUE only
ClickHouse dolphin.trade_events: 1305 rows, 1296 unique trade IDs
trader logs: 1712 exit rows, 1092 unique trade IDs
merged near-duplicate-cleaned sequence: 1609 unique trade IDs
analysis subset after excluding hibernate / subday ACB exits: 1321 trades
span: 2026-03-31 01:10:34 UTC to 2026-05-08 13:26:06 UTC

The log and warehouse streams overlap but do not have perfectly identical timestamps, so the analysis de-duplicates by trade id where possible and by near-time / asset / reason / realized PnL where the same exit was written by both paths. This matters because a naive merge double-counts many recent exits.

Counterfactual method:

keep the same entry/exit skeleton
actual side is the live BLUE short
counterfactual long return is approximated as -short_return - 4 bps
this is not a separately selected long engine; it only tests whether the immediate post-win tape direction would have favored the other side

Baseline over the cleaned sequence:

always short: 1321 trades, WR 55.79%, mean return/trade +0.0781%, compounded return +166.36%, max DD 15.70%
always long on the same skeleton: WR 38.46%, mean return/trade -0.1181%, compounded return -80.08%, max DD 80.48%

So the full ledger does not support a broad long flip. The question only survives as a narrow post-outlier condition.

Primary post-outlier trigger:

trigger if prior trade:
  pnl_abs >= $400
  leverage >= 8.5x
  pnl_pct >= +0.50%

Immediate next-trade result:

triggers: 47
next trades affected: 47
actual next short subset: WR 53.19%, mean return -0.0821%, compounded return -4.05%, realized PnL -$1,725.40
flipped-to-long subset: WR 40.43%, mean return +0.0421%, compounded return +1.72%, estimated PnL +$409.47
estimated dollar delta: +$2,134.88
whole-sequence policy if only those next trades are flipped: compounded return improves from +166.36% to +182.38% and max DD improves from 15.70% to 13.33%

The stricter trigger pnl_abs >= $400, leverage >= 8.5x, pnl_pct >= +0.95% is similar:

triggers: 46
actual next short subset: -$1,534.21
flipped-to-long estimate: +$276.64
estimated dollar delta: +$1,810.85
whole-sequence compounded return: +180.91%

The effect is strongest on the immediately following trade. It decays quickly:

next 2 trades after the primary trigger: affected 91, actual -$2,689.16, flipped estimate +$555.98, dollar delta +$3,245.15
next 3 trades: affected 134, actual -$2,357.77, flipped estimate -$588.02, dollar delta still positive because the flip loses less
next 5 trades: benefit becomes materially less clean

Examples from the live tail:

ALGOUSDT 2026-05-08 09:55 UTC, +466.34, 9x, +0.929%
- next trade DASHUSDT: actual short -191.19; same-skeleton long would have been directionally positive after fee
VETUSDT 2026-05-08 12:37 UTC, +573.64, 9x, +1.546%
- next trade STXUSDT: actual short -53.52; same-skeleton long would have been directionally positive after fee
larger historic outlier STXUSDT 2026-05-05 20:29 UTC, +6796.86, 9x, +13.845%
- the following trade was a small short loss, and the next several trades were mixed rather than uniformly long-favorable

Interpretation:

there is a real event-conditioned post-outlier rebound / exhaustion signal
it is not a win-rate improvement; it is a dollar / drawdown improvement
it should not be promoted as a general long engine
it is best framed as a one-trade post-outlier long probe or short cooldown candidate, not as a multi-trade regime flip

Relationship to the long-system research:

this is different from both deterministic long theories already studied:
- old mirror-long: negative vel_div mean-reversion long
- new stressed-unwind long: high instability plus negative slow velocities
the post-outlier signal is more local and path-conditioned:
- a violent short win likely means the chosen asset or local basket has just completed an exhaustion leg
- the next trade may be more exposed to rebound / adverse short continuation than to fresh downside continuation
this should become a feature inside the dual-shadow side-selection sampler:
- last_trade_was_outlier_short_win
- last_trade_leverage
- last_trade_realized_pnl_abs
- last_trade_return_pct
- bars_since_outlier_win
- same_asset_or_correlated_asset_followup

Research conclusion:

broad SHORT -> LONG inversion remains false on the full sequence
immediate one-trade long probing after a large 9x short win is empirically plausible and improved historical BLUE dollars in this cleaned replay
the next test should condition this event trigger on the existing long gates and market fingerprint state, rather than using it as a naked side switch

Leverage-as-conviction win-probe sweep

Follow-up thesis:

leverage is a conviction expression

if a high-conviction short probe wins:
  make subsequent / next trades LONG

if leverage is below roughly 0.69:
  possibly do not trade

The initial test used:

trigger_lev = 0.70
trade_min_lev = 0.69
win = net PnL > 0

Two side-selection forms were tested:

persistent shadow probe: the short engine continues to run as a shadow. A high-lev short-shadow win turns the traded side LONG. A high-lev short-shadow loss resets the traded side SHORT.
one-shot after win: a high-lev short-shadow win arms only the next eligible trade as LONG, then resets.

The test used the same cleaned BLUE sequence as the post-outlier study, updated through 2026-05-08 13:40:04 UTC:

ClickHouse rows: 1307
ClickHouse unique trade IDs: 1298
trader-log exit rows: 1716
merged near-duplicate-cleaned trade IDs: 1612
analysis subset after excluding hibernate / subday ACB exits: 1324

Baselines:

always short: 1324 trades, WR 55.82%, mean return/trade +0.0784%, compounded return +168.02%, max DD 15.70%, PnL +$11,135.86
always long on the same skeleton: WR 38.44%, compounded return -80.23%, max DD 80.62%, PnL -$36,875.48
short-only with trade_min_lev >= 0.69: 1050 trades, compounded return +81.86%, max DD 20.80%, PnL +$11,063.86
short-only with trade_min_lev >= 5.0: 565 trades, compounded return +88.08%, max DD 8.94%, PnL +$11,980.01
short-only with trade_min_lev >= 8.5: 501 trades, compounded return +82.57%, max DD 7.58%, PnL +$12,193.65

Initial 0.70 / 0.69 thesis result:

persistent shadow-probe switch:
- traded: 1050
- LONG trades: 457
- flips to LONG: 249
- WR 37.08%
- compounded return -5.61%
- max DD 26.60%
- PnL -$2,527.65
one-shot after high-lev win:
- traded: 1050
- LONG trades: 455
- flips to LONG: 456
- WR 37.24%
- compounded return -3.56%
- max DD 26.19%
- PnL -$2,113.83

So the literal initial thesis fails. 0.70 is too low as a side-switch trigger. It arms hundreds of LONG trades and turns a strong short-led ledger into a slightly losing one.

Important evaluation frame:

The goal is not to find a LONG overlay that beats the whole short-only engine by itself. The goal is to find a side-selection overlay that adds marginal value only on the subset where it intervenes. The correct comparison is therefore:

overlay_delta =
    pnl_if_intervened_long_on_triggered_trades
  - pnl_if_original_short_was_left_unchanged_on_same_triggered_trades

The overlay is useful only if it satisfies all of the following:

it has positive overlay_delta after fees and conservative slippage
it reduces realized drawdown or loss clustering on the intervention subset
it does not cut too many profitable short trades
it remains positive across time splits, assets, and neighboring thresholds
it has enough triggers to be statistically more than a single accident

Under that marginal-overlay framing, the broad leverage-win thesis still fails:

persistent 0.70 / 0.69 switch delta vs same lev >= 0.69 short-only baseline: about -$13,591.51
one-shot 0.70 / 0.69 switch delta vs same lev >= 0.69 short-only baseline: about -$13,177.69
best swept dollar switch delta vs same lev >= 0.69 short-only baseline: about -$5,949.36

By contrast, the narrower post-outlier rule did show positive marginal overlay value on its triggered subset:

triggered next-trade cases: 47
leaving the next trade SHORT: PnL -$1,725.40
flipping only that next trade LONG: PnL +$409.47
marginal overlay delta: +$2,134.87
whole-sequence drawdown improved from about 15.70% to 13.33%

That is the key distinction. The broad high-leverage-win rule is not reliable enough. The narrow post-outlier rule is a legitimate candidate for guarded shadow/live-probe research because it adds value exactly where it intervenes, but the sample is still too small for unconditional deployment.

Lowered big-win threshold grid

The phrase "sample too small" applies only to the original high-tail trigger (pnl_abs >= $400, lev >= 8.5, immediate next trade). It does not mean the BLUE ledger is small. The cleaned replay now spans:

1328 non-hibernate / non-subday-ACB BLUE trades
1616 merged near-duplicate-cleaned trade IDs
2026-03-31 01:10:34 UTC through 2026-05-08 14:21:31 UTC

To test whether the effect survives with more triggers, the post-win sweep was expanded to:

dollar win thresholds: $10, $25, $50, $75, $100, $150, $200, $300, $400, $500, $750, $1000
leverage thresholds: 0, 0.69, 0.70, 1, 2, 3, 5, 8.5, 9
return thresholds: 0, 0.10%, 0.25%, 0.50%, 0.75%, 0.95%, 1.25%
follow-on horizons: next 1, 2, 3, and 5 trades

Important result:

lowering dollar threshold alone does not work
lowering dollar threshold with a realized-return threshold does work
the effect is mostly next 1 to 2 trades
by next 5 trades, flipping LONG is not positive; cooldown / abstain is better than LONG if the horizon is that wide

Grid-wide stability:

horizon 1: 630 eligible threshold combinations, 60.0% positive marginal delta, 45.87% positive LONG PnL
horizon 2: 630 eligible threshold combinations, 57.30% positive marginal delta, 39.52% positive LONG PnL
horizon 3: 693 eligible threshold combinations, 59.60% positive marginal delta, 12.99% positive LONG PnL
horizon 5: 693 eligible threshold combinations, 51.08% positive marginal delta, 0.0% positive LONG PnL

This says the post-win effect is a short-lived exhaustion / rebound artifact, not a durable multi-trade LONG regime.

Fixed dollar-only immediate-next-trade rows:

Trigger	Affected next trades	Leave SHORT	Flip LONG	Delta	Whole-policy compound	DD
`$10+`, no lev gate	277	`+$3,044`	`-$9,146`	`-$12,190`	`+24.74%`	`24.55%`
`$50+`, no lev gate	181	`+$4,495`	`-$8,870`	`-$13,365`	`+42.58%`	`22.18%`
`$100+`, no lev gate	135	`+$908`	`-$4,252`	`-$5,160`	`+97.78%`	`18.09%`
`$200+`, no lev gate	89	`-$947`	`-$1,496`	`-$549`	`+140.76%`	`14.96%`
`$300+`, no lev gate	62	`-$1,695`	`-$45`	`+$1,651`	`+174.25%`	`13.70%`
`$400+`, no lev gate	48	`-$1,725`	`+$407`	`+$2,133`	`+180.70%`	`13.33%`
`$500+`, no lev gate	40	`-$1,153`	`+$90`	`+$1,242`	`+173.51%`	`13.33%`

Dollar-only conclusion:

below about $300, the next short trade is still net-profitable or less bad than the LONG flip
around $300, the next short trade turns bad, but LONG is only near-flat
around $400 to $500, the next-trade LONG flip becomes positive

Fixed immediate-next-trade rows with a +0.75% realized-return trigger:

Trigger	Affected next trades	Leave SHORT	Flip LONG	Delta	Whole-policy compound	DD
`$10+` and `+0.75%`	99	`-$1,735`	`-$409`	`+$1,326`	`+104.45%`	`14.03%`
`$50+` and `+0.75%`	74	`-$1,950`	`+$105`	`+$2,055`	`+155.62%`	`14.03%`
`$75+` and `+0.75%`	70	`-$2,028`	`+$194`	`+$2,223`	`+166.91%`	`13.95%`
`$100+` and `+0.75%`	67	`-$2,083`	`+$336`	`+$2,419`	`+168.60%`	`13.69%`
`$150+` and `+0.75%`	63	`-$2,082`	`+$344`	`+$2,426`	`+175.37%`	`13.69%`
`$300+` and `+0.75%`	58	`-$1,738`	`+$58`	`+$1,796`	`+173.61%`	`13.70%`
`$400+` and `+0.75%`	48	`-$1,725`	`+$407`	`+$2,133`	`+180.70%`	`13.33%`

Return-conditioned conclusion:

the effect becomes visible with more triggers when the dollar threshold is lowered to $50-$150 and the prior win is also at least +0.75%
the best immediate-next-trade delta in this grid was around $150+ and +0.75%: 63 next trades, SHORT -$2,081.81, LONG +$343.94, delta +$2,425.75
the original $400+, high-leverage trigger remains good but is not the only viable threshold; it is the cleaner high-tail version

Two-trade horizon:

Trigger	Affected next trades	Leave SHORT	Flip LONG	Delta	Whole-policy compound	DD
`$300+`, `lev >= 8.5`	115	`-$3,201`	`+$511`	`+$3,712`	`+168.52%`	`14.27%`
`$400+`, `lev >= 8.5`	91	`-$2,689`	`+$556`	`+$3,245`	`+175.26%`	`13.71%`
`$500+`, `lev >= 8.5`	75	`-$2,237`	`+$509`	`+$2,747`	`+167.53%`	`14.71%`

Two-trade conclusion:

the high-leverage $300-$500 zone supports a two-trade exhaustion rebound more strongly than the original one-trade-only statement
the best two-trade variant in this fixed grid was $300+, lev >= 8.5, next two trades: delta +$3,712, estimated LONG PnL +$511
the five-trade horizon should not be traded LONG; it is only a damage-control / cooldown signal

Reliability statement:

The post-win overlay is more solid than initially stated. The robust form is not "after any win"; that is false. The robust form is:

after a sufficiently large realized short win,
especially a high-return or high-leverage win,
the next 1-2 short-engine opportunities are often contaminated by rebound risk
and can be improved by LONG flip or, at minimum, cooldown/abstain.

The strongest candidates for shadow/live-probe research are:

immediate next trade after $100-$200 win and prior return >= +0.75%
immediate next trade after $400+ win, especially lev >= 8.5
next two trades after $300-$500 win with lev >= 8.5

Guardrail:

The overlay should not optimize on WR. LONG WR remains lower than SHORT WR on many triggered subsets. The edge is payoff asymmetry / loss-tail avoidance: short wins become smaller or disappear after the exhaustion event, while short losses on the next trade(s) become expensive.

Candidate codified overlay rule and EFSM

Terminology:

EFSM means Execution FSM
refer to this component as the post-win EFSM, not merely a generic "state machine"

Candidate rule proposed after the lowered-threshold sweep:

after a completed BLUE SHORT trade:

  if pnl_abs > $397:
      tag next 1 trade as FLIP_LONG

  if pnl_abs > $397 and leverage > 8.6:
      tag next 2 trades as FLIP_LONG

  if 0 < pnl_abs < $250 and pnl_pct >= +0.75%:
      tag next 1 trade as FLIP_LONG

  after the armed slots are consumed:
      reset to SHORT

EFSM semantics:

this is a slot-based Execution FSM, not a persistent regime switch
each trigger arms an explicit number of future slots
each future entry consumes exactly one slot
when slots_remaining == 0, the state resets to SHORT
while slots are active, new triggers are ignored by default
a flipped LONG trade outcome is not allowed to re-arm the overlay
this prevents the reset bug where one flipped trade recursively arms the next and converts a bounded rebound probe into an unbounded side switch
the implementation supports arbitrary future slot counts, not only 1 and 2

Implementation location:

EFSM: adaptive_exit/post_win_long_overlay.py
canonical class names: PostWinExecutionFSM, PostWinExecutionFSMConfig
compatibility aliases: PostWinLongOverlay, PostWinLongOverlayConfig
tests: prod/tests/test_post_win_long_overlay.py

Focused test coverage:

$397+ non-high-leverage win arms one slot
$397+ and lev > 8.6 arms two slots
< $250 and pnl_pct >= +0.75% arms one slot
active arms consume deterministically and reset to SHORT
re-arm attempts while active are ignored
flipped LONG outcomes cannot re-arm
optional TTL expiry works
future 3+ slot rules work

Focused verification:

python -m pytest -o cache_dir=/tmp/pytest-cache-post-win-overlay \
  prod/tests/test_post_win_long_overlay.py -q

7 passed

Exact candidate replay, no re-arm during active flip slots:

input: 1333 cleaned BLUE trades through 2026-05-08 14:34:57 UTC
baseline short-only estimated PnL: +$10,953.50
candidate policy estimated PnL: +$12,464.30
marginal dollar delta: +$1,510.80
baseline max DD: 15.70%
candidate max DD: 14.78%
long-flipped trades: 160
affected subset left SHORT: -$2,415.46
affected subset flipped LONG: -$904.67
affected subset marginal delta: +$1,510.80
triggers armed:
- small_dollar_high_return: 77
- big_win_high_lev: 41
- big_win: 1
slots consumed:
- small_dollar_high_return: 77
- big_win_high_lev: 82
- big_win: 1
consumed arms: 119
dangling slots at end: 0
ignored re-arm attempts while active: 20

Reset sensitivity:

Allowing active flipped trades / active arms to re-arm is harmful:

unsafe recursive re-arm variant long flips: 183
unsafe marginal delta: -$5,425.32
safe no-rearm marginal delta: +$1,510.80

Therefore the no-recursive-rearm reset invariant is not optional. It is part of the edge definition.

Compound-return caveat:

baseline short-only compound: +164.89%
candidate compound: +107.26%

This is why the overlay must be treated as a dollar-tail / drawdown-control overlay first, not as a compounding optimizer. The current counterfactual uses same entry/exit skeleton and estimated flipped LONG PnL, so the next validation step must include actual LONG execution assumptions, long-side V7 behavior, and time-to-next-entry gating.

Time dependency:

The replay showed material timing dependence:

Delay from trigger to flipped entry	n	SHORT PnL	LONG PnL	Delta
`<=15m`	19	`+$2,765.51`	`-$3,062.37`	`-$5,827.88`
`15-30m`	67	`-$3,588.76`	`+$2,381.96`	`+$5,970.72`
`30-60m`	40	`-$882.57`	`-$104.33`	`+$778.24`
`>60m`	34	`-$709.64`	`-$119.93`	`+$589.72`

This means the overlay may need a lower-bound delay, an upper-bound TTL, or market-state confirmation. The current EFSM already supports TTL; the exact timing gate remains research, not deployed doctrine.

AdvancedExitManagerV7 / AlphaExitEngineV7 caveat:

AlphaExitEngineV7 is mechanically side-aware:

side=0 means LONG
side=1 means SHORT
PnL, MFE, MAE, trend direction, and adverse/favorable movement are signed by ctx.side

However, V7 calibration is SHORT-lineage:

bounce model labels were trained on BLUE SHORT adverse-bar samples
pressure threshold 2.69 was selected on SHORT/GREEN-lineage replay
MAE/MFE concepts are symmetric in code but not guaranteed symmetric in fitted thresholds or bounce probabilities

Before any live FLIP_LONG execution, V7 must be validated in one of these modes:

shadow-only LONG contexts using actual flipped LONG entries
conservative LONG-specific V7 threshold override
disable V7 live exits for overlay LONGs until enough shadow decisions show it does not prematurely cut the rebound edge

The rule can be codified, but production wiring must keep the EFSM, side selection, and V7 exit policy explicitly separable.

Sweep results:

best by compounded return:
- mode: one-shot after win
- trigger_lev = 9.0
- trade_min_lev = 0.0
- traded: 1324
- LONG trades: 222
- WR 50.91%
- compounded return +61.93%
- max DD 19.36%
- PnL -$257.03
best by estimated dollars:
- mode: one-shot after win
- trigger_lev = 2.0
- trade_min_lev = 0.69
- traded: 1050
- LONG trades: 297
- WR 40.03%
- compounded return +27.71%
- max DD 22.44%
- PnL +$5,114.50

Both sweep optima still underperform the relevant short-only baselines. In particular, simply treating high leverage as a short-side quality filter is stronger than using high-leverage short wins as a broad long-switch trigger:

lev >= 8.5, short-only: PnL +$12,193.65, max DD 7.58%
best long-switch dollar policy: PnL +$5,114.50, max DD 22.44%

Interpretation:

leverage does behave like conviction, but the first-order use is filtering / sizing, not side inversion
ordinary high-lev wins are too common to serve as a LONG regime switch
the previous post-outlier result survives only because it was much narrower: large dollar win, 9x, and immediate next trade
high-lev wins may still be useful as features in the dual-shadow / market-fingerprint layer:
- last_high_lev_short_win
- last_high_lev_short_win_count
- last_high_lev_short_win_pnl_abs
- last_high_lev_short_win_return_pct
- bars_since_high_lev_short_win
- consecutive_high_lev_short_wins

Research conclusion:

do not implement the literal lev > 0.70 long switch
do preserve leverage as a strong conviction feature
do keep the narrower post-outlier one-trade long probe in the research queue
the strongest immediate operational lesson is that low-leverage trades may be unnecessary, while high-leverage shorts remain the cleaner expression

AlphaExitEngineV7 LONG calibration replay

Date: 2026-05-08

Scope:

system: BLUE only
exit engine: AlphaExitEngineV7
harness: adaptive_exit/calibrate_v7_long_from_journal.py
source data: ClickHouse dolphin.v7_decision_events
source rows: 6,812
reconstructed BLUE V7-tracked paths: 97
path side in source journal: SHORT
replay side for calibration: synthetic LONG (side=0)
fee assumption: 4 bps
natural exit comparator: final logged decision-row price for the same path
V7 exit comparator: first replayed V7 EXIT on the same price path
bounce model: disabled for this replay by intentionally using a missing model path, because the current bounce model is trained on BLUE SHORT adverse-bar samples and should not be treated as a validated LONG probability model

This is a LONG-exit calibration proxy, not proof from exchange-filled LONG trades. It answers a narrower question: if the post-win EFSM had flipped a trade LONG on price paths that BLUE V7 actually observed, would a LONG-side V7 cut/exit surface have improved or harmed the synthetic LONG outcome versus holding to the path's natural end?

Original V7 SHORT calibration pattern

The original V7 calibration was a pressure-threshold sweep over live shadow decisions. V7 computes:

exit_pressure = clamp(directional_term + risk_term, -3.0, +3.0)

Then:

if exit_pressure > 2.69:
    EXIT
elif exit_pressure > 1.0:
    RETRACT
elif exit_pressure < -0.5 and pnl_pct > 0:
    EXTEND
else:
    HOLD

The documented SHORT lineage was:

Pressure threshold	Fires	Result
`2.00`	`22/24`	`+$439`, ROI `+1.67%`
`2.35`	`17/24`	`+$891`, ROI `+3.38%`
`2.60`	`17/24`	`+$891`, ROI `+3.38%`
`3.00`	`14/24`	`+$796`, ROI `+3.02%`
base/no V7	n/a	`+$784`, ROI `+2.98%`

The deployed threshold 2.69 was chosen as the high end of the useful 2.35-2.70 band so V7 stayed closer to base behavior and avoided cutting winners on transient pressure.

Threshold surface now explicit

AlphaExitEngineV7 now accepts an optional per-engine AlphaExitV7Config. Defaults preserve the deployed SHORT-calibrated behavior. This lets BLUE instantiate separate SHORT and LONG V7 engines later without global mutation.

V7-specific configurable fields:

Config field	Default	Meaning
`rvol_w15`	`0.50`	realized-vol composite weight for 15-bar volatility
`rvol_w30`	`0.30`	realized-vol composite weight for 30-bar volatility
`rvol_w50`	`0.20`	realized-vol composite weight for 50-bar volatility
`rvol_floor`	`0.000001`	minimum realized-vol denominator
`mae_tier1_k`	`3.5`	MAE tier-1 multiplier on `rv_comp`
`mae_tier2_k`	`7.0`	MAE tier-2 multiplier on `rv_comp`
`mae_tier3_k`	`12.0`	MAE tier-3 multiplier on `rv_comp`
`mae_tier1_floor`	`0.005`	MAE tier-1 absolute floor
`mae_tier2_floor`	`0.012`	MAE tier-2 absolute floor
`mae_tier3_floor`	`0.025`	MAE tier-3 absolute floor
`mae_tier1_risk`	`0.5`	pressure contribution once tier 1 is breached
`mae_tier2_risk`	`0.8`	pressure contribution once tier 2 is breached
`mae_tier3_risk`	`1.2`	pressure contribution once tier 3 is breached
`mae_accel_min_bars`	`3`	minimum bars before adverse-acceleration gate can fire
`mae_accel_peak_floor`	`0.003`	adverse peak floor for MAE acceleration risk
`mae_accel_risk`	`0.6`	pressure contribution for MAE acceleration
`mae_recovery_peak_floor`	`0.004`	adverse peak floor for failed-recovery gate
`mae_recovery_prev_min`	`0.25`	prior recovery ratio required before snapback risk
`mae_recovery_snapback_max`	`0.10`	recovery ratio below which recovery is treated as failed
`mae_recovery_risk`	`1.0`	pressure contribution for failed recovery
`mae_late_floor`	`0.003`	MAE required before late adverse ramp applies
`mae_late_start_frac`	`0.60`	bars-held fraction where late adverse ramp starts
`mae_late_risk_max`	`0.4`	maximum late adverse pressure contribution
`max_hold_ref_mult_3m`	`3.0`	V7 internal max-hold reference multiplier
`mfe_slope_peak_floor`	`0.01`	peak favorable floor for convexity slope break
`mfe_convexity_decay_exit`	`0.35`	decay ratio for hard MFE giveback pressure
`mfe_convexity_decay_soft`	`0.20`	decay ratio for soft MFE giveback pressure
`mfe_convexity_exit_risk`	`1.5`	pressure contribution for hard MFE giveback
`mfe_convexity_soft_risk`	`0.3`	pressure contribution for soft MFE giveback
`mfe_accel_floor`	`-0.00001`	MFE acceleration floor for adverse convexity
`mfe_accel_peak_floor`	`0.005`	peak favorable floor for MFE acceleration risk
`mfe_accel_risk`	`0.2`	pressure contribution for MFE acceleration risk
`bounce_dir_w`	`0.15`	bounce score directional-term weight
`bounce_risk_w`	`0.35`	bounce risk-term weight
`bounce_rv_safe_floor`	`0.00001`	bounce feature volatility denominator floor
`exit_pressure_threshold`	`2.69`	live `EXIT` threshold
`retract_pressure_threshold`	`1.0`	`RETRACT` threshold
`extend_pressure_threshold`	`-0.5`	profitable `EXTEND` threshold
`pressure_min`	`-3.0`	pressure clamp lower bound
`pressure_max`	`3.0`	pressure clamp upper bound

Inherited V6 weight priors remain configurable through the existing WeightAdapter/WeightPriors seam. The new config is specifically for V7 threshold/gate surfaces and is init-time/per-engine configurable.

LONG replay results

Baseline synthetic LONG natural exit across the 97 paths:

natural PnL: -$328.84
natural WR: 59.79%
natural compound: +3.50%
natural max DD: 2.28%

The dollar PnL and compound can diverge because path notionals differ. For this exit calibration, dollar PnL is the more relevant metric because BLUE sizing is not uniform.

Top tested surfaces:

Candidate	V7 PnL	Delta vs natural	Exits	Exit rate	V7 WR	V7 max DD
`mfe_risk_scale_0.5`	`+$205.32`	`+$534.15`	`36`	`37.11%`	`50.52%`	`1.69%`
`mfe_risk_scale_0.75`	`+$205.32`	`+$534.15`	`36`	`37.11%`	`50.52%`	`1.69%`
`combo_p1.7_mae0.75`	`+$47.24`	`+$376.08`	`51`	`52.58%`	`47.42%`	`1.55%`
`exit_p1.7`	`+$36.88`	`+$365.72`	`51`	`52.58%`	`47.42%`	`1.53%`
`exit_p2.0`	`+$19.68`	`+$348.52`	`41`	`42.27%`	`49.48%`	`1.53%`
`short_default` / `exit_p2.69`	`+$1.43`	`+$330.26`	`38`	`39.18%`	`49.48%`	`1.81%`
`exit_p3.0`	`-$328.84`	`$0.00`	`0`	`0.00%`	`59.79%`	`2.28%`

Interpretation:

The deployed SHORT default is not mechanically broken for LONG. It improved synthetic LONG dollar outcome by +$330.26 versus natural exit on the 97 replayed paths.
The best tested LONG proxy did not come from lowering the pressure threshold. It came from reducing MFE giveback/convexity pressure contribution (mfe_risk_scale_0.5 or 0.75).
Aggressively lowering exit_pressure_threshold to 1.4 over-fires: 78/97 exits, V7 PnL -$11.78, and many negative deltas. That resembles the original SHORT calibration failure at 2.0: pressure that is too sensitive cuts too much transient noise.
A moderate pressure threshold around 1.7-2.0 is useful, but still inferior to leaving pressure at 2.69 and reducing MFE-risk contributions in this proxy.

Recommended LONG overlay calibration candidate for shadow:

AlphaExitV7Config(
    mfe_convexity_exit_risk=0.75,
    mfe_convexity_soft_risk=0.15,
    mfe_accel_risk=0.10,
)

This is the mfe_risk_scale_0.5 surface. It keeps:

exit_pressure_threshold = 2.69
all MAE vol-normalized loss-cut thresholds unchanged
pressure clamp unchanged
bounce disabled or neutral until a LONG-trained bounce model exists

Why this candidate is preferable to simply lowering exit_pressure_threshold:

it preserved the useful loss-cut behavior while avoiding broad pressure over-firing
it improved dollar PnL more than all pressure-threshold sweeps tested
it left MAE protection intact, which matters if the flipped LONG thesis is wrong and the asset continues down
it respects that the post-win EFSM edge is a rebound/cooldown edge, so the exit manager should not over-penalize ordinary post-entry MFE shape

Do not deploy this LONG config live yet. It should first be run in shadow on actual EFSM-flipped candidate LONG contexts, because this replay uses SHORT entries inverted to LONG and not real LONG fills.

Regression and safety notes

Implemented code seams:

nautilus_dolphin/nautilus_dolphin/nautilus/alpha_exit_v7_engine.py defines AlphaExitV7Config
default AlphaExitEngineV7() behavior remains the SHORT-calibrated config
a LONG-specific engine can be instantiated with AlphaExitEngineV7(config=...)
the calibration harness writes full results to /tmp/v7_long_calibration.json

Tests added:

default config equals the legacy SHORT threshold surface
custom config is per-instance and does not mutate the default engine
V7 remains mechanically side-aware for LONG and SHORT PnL/MFE/MAE
BLUE live V7 provider wiring still records journal decisions and uses OB signal input
EFSM reset/no-recursive-rearm tests remain separate from V7 exit calibration

Research caveats:

only 97 V7-tracked BLUE paths existed in the current decision journal
this is enough to reject obviously bad LONG exit settings, but not enough to canonize a live LONG exit policy
bounce must remain neutral for LONG until trained or validated on LONG samples
V7 max_hold_ref_mult_3m still uses an internal time reference rather than the orchestrator's effective max hold; the system bible already tracks this as a V7 TODO/bug because it can make adverse-ramp pressure too early

47 KiB Raw Blame History

LONG Deterministic Rule Research

Goal

Data source

What does not work as the basal long rule

Strongest deterministic shape

Candidate primary deterministic rule

Why 20.5

Empirical support

Recent HQ (2025-12-31 onward)

Full history

Simpler fallback

Optional stricter confirmation

Monthly sanity check

Practical interpretation

Recommendation

Deferred analysis idea: dual-shadow regime sampler

Dual-shadow persistence characterization

Live SHORT stream

Basal LONG shadow, old mirror posture

Basal LONG shadow, new stressed-unwind posture

Conclusion for regime switching

Rolling flip-worthiness test

Flip-after-loss counterfactual

Condition-gated flip replay

Full-history condition-gated replay

Post-outlier-short-win long-flip probe

Leverage-as-conviction win-probe sweep

Lowered big-win threshold grid

Candidate codified overlay rule and EFSM

AlphaExitEngineV7 LONG calibration replay

Original V7 SHORT calibration pattern

Threshold surface now explicit

LONG replay results

Regression and safety notes

47 KiB

Raw Blame History

Why `20.5`

Recent HQ (`2025-12-31` onward)