DOLPHIN/nautilus_dolphin/Tail_Stats_FINAL_TEST.md

Prompt: Definitive Convex Hazard Validation for Micro-Entropy (v50)
You are to rigorously test whether extreme micro-entropy (v50_lambda_max_velocity) represents a true convex hazard zone that justifies nonlinear leverage tapering.
The goal is not correlation. The goal is economic exploitability.
Follow these steps precisely.
1️⃣ Data Requirements
Use:
Entire available historical dataset, use the fast vbt backtesting engine.
Same production trading engine
Same 6.0x leverage ceiling
No modification to signal logic
T-1 precursor alignment only (strict shift)
Define:
Daily return
Tail event = bottom 10% of daily returns (fixed percentile, global)
2️⃣ Core Conditional Hazard Curve
Compute:
Baseline:
Copy code

P(Tail)
Then for v50 (T-1):
For thresholds:
75th percentile
85th percentile
90th percentile
95th percentile
97.5th percentile
99th percentile
Compute:
Copy code

P(Tail | v50 > threshold)
Also record:
Number of days above threshold
Number of tail days inside threshold
95% confidence interval (Wilson or exact binomial)
Output full hazard curve.
We are looking for nonlinear convex acceleration, not linear drift.
3️⃣ Economic Viability Test (CRITICAL)
For each threshold:
Compute:
Mean return on spike days
Mean return on non-spike days
Median return
Standard deviation
Contribution of spike days to total CAGR
Then simulate:
Scenario A: Static 6.0x (baseline)
Scenario B: 6.0x with taper when v50 > threshold
(e.g., reduce leverage to 5.0x or apply 0.8 multiplier)
Run:
Median CAGR
5th percentile CAGR
P(>40% DD)
Median max DD
Terminal wealth distribution (Monte Carlo, 1,000+ paths)
If tapering:
Reduces DD materially
Does not reduce median CAGR significantly
Improves 5th percentile CAGR
→ Hazard is economically real.
If CAGR drops more than DD improves, → It is volatility clustering, not exploitable convexity.
4️⃣ Stability / Overfit Check
Split data:
First 50%
Second 50%
Compute hazard curve independently.
If convexity disappears out-of-sample, discard hypothesis.
Then run rolling 60-day window hazard estimation. Check consistency of lift.
5️⃣ Interaction Test
Test whether hazard strengthens when:
Copy code

v50 > 95th AND cross_corr > 95th
Compute:
P(Tail | joint condition)
If joint hazard > 50% with low frequency, this may justify stronger taper.
If not, keep taper mild.
6️⃣ Randomization Sanity Check
Shuffle daily returns (destroy temporal relation). Recompute hazard curve.
If similar convexity appears in shuffled data, your signal is statistical artifact.
7️⃣ Decision Criteria
Micro-entropy qualifies as a true convex hazard zone only if:
P(Tail | >95th) ≥ 2.5× baseline
Convex acceleration visible between 90 → 95 → 97.5
Spike frequency ≤ 8% of days
Taper improves 5th percentile CAGR
Out-of-sample lift persists
If any of these fail, reject hypothesis.
8️⃣ Final Output
Produce:
Hazard curve table
Economic impact table
Out-of-sample comparison
Monte Carlo comparison
Final verdict:
True convex hazard
Weak clustering
Statistical artifact
No narrative.
Only statistical and economic evidence.