331 lines
18 KiB
Markdown
331 lines
18 KiB
Markdown
|
|
# Noise Experiment Findings + Adaptive Parameter Sensing Architecture
|
|||
|
|
**Date:** 2026-03-05
|
|||
|
|
**Experiment:** `test_noise_experiment.py` (branch: `experiment/noise-resonance`)
|
|||
|
|
**Runtime:** 7.3 hours | 176 runs × 25 seeds | Results: `run_logs/noise_exp_20260304_230311.csv`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. EXPERIMENT RESULTS — FULL FINDINGS
|
|||
|
|
|
|||
|
|
### Setup
|
|||
|
|
- Baseline: deterministic champion (ROI=+44.89%, PF=1.123, DD=14.95%, Sharpe=2.50, T=2128)
|
|||
|
|
- 8 noise configurations × N=25 seeds each (except baseline=1)
|
|||
|
|
- All engine stack layers active: ACBv6 + OB 4D + MC-Forewarner + EsoF(neutral) + ExF
|
|||
|
|
|
|||
|
|
### Results Table
|
|||
|
|
|
|||
|
|
| Config | σ | E[ROI] | ΔROI | std(ROI) | E[PF] | E[Trades] | Beat% |
|
|||
|
|
|---|---|---|---|---|---|---|---|
|
|||
|
|
| baseline | — | +44.89% | — | 0.00% | 1.123 | 2128 | — |
|
|||
|
|
| sr_5pct | 0.001 | +41.43% | **-3.5%** | 12.53% | 1.117 | 2130 | 32% |
|
|||
|
|
| sr_15pct | 0.003 | +26.27% | **-18.6%** | 21.33% | 1.075 | 2137 | 20% |
|
|||
|
|
| sr_25pct | 0.005 | +18.27% | **-26.6%** | 19.78% | 1.055 | 2148 | 16% |
|
|||
|
|
| sr_50pct | 0.010 | +6.44% | **-38.5%** | 21.22% | 1.017 | 2201 | 8% |
|
|||
|
|
| price_1bp | 0.0001 | **-52.56%** | **-97.5%** | 8.86% | 0.760 | 2721 | 0% |
|
|||
|
|
| price_5bp | 0.0005 | **-63.40%** | **-108.3%** | 12.75% | 0.707 | 2717 | 0% |
|
|||
|
|
| tp_1bp | 0.0001 | **+49.16%** | **+4.3%** | 2.17% | 1.134 | 2130 | **96%** |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. INTERPRETATION BY HYPOTHESIS
|
|||
|
|
|
|||
|
|
### H1 — Stochastic Resonance on vel_div: REJECTED
|
|||
|
|
|
|||
|
|
**Result:** Monotonic degradation. No sweet spot. Every sigma level hurts.
|
|||
|
|
|
|||
|
|
**Why SR failed here:**
|
|||
|
|
The necessary condition for stochastic resonance is a signal that is *periodically sub-threshold* — i.e., a real signal that exists but is too weak to cross the detection boundary. In this system, vel_div carries genuine eigenvalue velocity structure. Bars below -0.02 have real regime signal. Bars above -0.02 are genuinely noise. There is no latent sub-threshold signal to unlock.
|
|||
|
|
|
|||
|
|
Adding Gaussian noise to vel_div:
|
|||
|
|
1. Fires false entries (noise pushes above-threshold bars below -0.02)
|
|||
|
|
2. Suppresses some real entries (noise pushes genuine below-threshold bars above -0.02)
|
|||
|
|
3. Net effect: trade count creeps up (+3% at σ=0.001, +3.5% at σ=0.010), WR stays flat (all new trades are coin-flips), PF degrades proportionally
|
|||
|
|
|
|||
|
|
**WR stability is the tell:** WR held at ~49.3% across ALL sigma levels. If SR were working, WR would *improve* (near-miss real signals would fire with higher selectivity). Instead WR is flat because noise-added entries are indistinguishable from random. The signal space is not sub-threshold — it's binary.
|
|||
|
|
|
|||
|
|
**VERDICT:** vel_div threshold of -0.02 is correct and tight. Do not perturb it. SR is not applicable to this signal type.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### H2 — Price Dither (fill clustering avoidance): CATASTROPHIC FAILURE
|
|||
|
|
|
|||
|
|
**Result:** +1bp of price noise → E[ROI] = -52.56%, trade count 2128 → 2721 (+28%).
|
|||
|
|
|
|||
|
|
**What happened (root cause analysis):**
|
|||
|
|
|
|||
|
|
The catastrophic failure is not from fills — it's from the **volatility gate cascading**. The vol gate (`vol_ok = dvol > p60`) is computed from rolling BTC price standard deviations. With 1bp multiplicative noise on BTC prices:
|
|||
|
|
- Rolling std (50-bar window) changes per bar
|
|||
|
|
- vol_ok flips for many borderline bars (those near the p60 percentile)
|
|||
|
|
- This opens/closes the gate on ~600 additional bars per run
|
|||
|
|
- Those 600 additional entries have no real signal → immediate dilution
|
|||
|
|
|
|||
|
|
**Unintended finding — system fragility warning:**
|
|||
|
|
This experiment revealed that the vol gate is *extremely sensitive* to input data quality. Even 1bp of feed noise in live BTC price data could corrupt 28% of entry decisions. In production with real WebSocket feeds:
|
|||
|
|
- Feed jitter / interpolation artifacts > 1bp could trigger spurious vol_ok flips
|
|||
|
|
- Should add: rolling vol gate with EMA smoothing (not raw per-bar std) to dampen sensitivity
|
|||
|
|
- Consider: vol_ok gate based on 5-bar EWMA of realized vol rather than point estimate
|
|||
|
|
|
|||
|
|
**VERDICT:** Price dither is inapplicable — and revealed a production risk. Vol gate needs smoothing before live deployment.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### H3 — TP Target Dither (sensitivity analysis): POSITIVE SIGNAL
|
|||
|
|
|
|||
|
|
**Result:** 96/25 seeds beat baseline. E[ROI]=+49.16% (+4.3% ΔROI), std=2.17% (tight).
|
|||
|
|
|
|||
|
|
**What's actually being measured:**
|
|||
|
|
TP dither with σ=0.0001 samples from ~N(0.0099, 0.0001²). The effective TP range across 25 seeds was approximately 0.0096–0.0103 (±3bp 2-sigma band). One seed (seed=22) happened to draw exactly 0.0099 → produced ROI=+44.89% (identical to baseline, confirming the system is deterministic for a given TP).
|
|||
|
|
|
|||
|
|
**Interpretation:**
|
|||
|
|
Most TP values in the 99–103bps range outperformed the current 99bps. This is a **local minimum signal** — the current TP=99bps may not be the global optimum. The distribution of winning seeds leans toward *slightly higher TP*, suggesting trades frequently pass through 99bps and continue to profit before eventually reversing. Recall: 86% of exits are MAX_HOLD — only 14% hit TP. A slightly higher TP (e.g., 103–107bps) would capture more of the winning tail before the 120-bar limit fires.
|
|||
|
|
|
|||
|
|
**VERDICT:** TP=99bps is sub-optimal. A proper 1D sweep from 85–120bps is warranted. Likely optimum is 103–108bps given the distribution shape. **Priority: HIGH.** Expected gain: +3–6% ROI with no other changes.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. ADAPTIVE PARAMETER SENSING SYSTEM (APSS) — ARCHITECTURAL PROPOSAL
|
|||
|
|
|
|||
|
|
### Motivation
|
|||
|
|
|
|||
|
|
The noise experiment confirmed two truths simultaneously:
|
|||
|
|
1. **Random noise always hurts** (the current params are near-optimal for the current regime)
|
|||
|
|
2. **But markets are non-stationary** — what is optimal for Dec 2025–Feb 2026 will not be optimal indefinitely
|
|||
|
|
|
|||
|
|
Fold-3 (Feb 6–25) ROI = -9.4%, PF = 0.906 while Fold-2 (Jan 18–Feb 5) ROI = +54.7%, PF = 1.458. The *same fixed params* swung from dominant to loss-making in 20 days. This is not a code problem — it's regime non-stationarity. The solution is not to fix params permanently but to build a system that senses regime-driven parameter drift and adapts.
|
|||
|
|
|
|||
|
|
This is **not** the same as adding noise (which always hurts). It is a *directed* adaptive system that measures the gradient of performance with respect to each parameter and follows that gradient with appropriate dampening.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Architecture: APSS v1
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ LIVE ENGINE (Champion) │
|
|||
|
|
│ Fixed params, executes real trades │
|
|||
|
|
│ Emits: per-trade PnL stream │
|
|||
|
|
└──────────────────────────┬─────────────────────────────────────┘
|
|||
|
|
│ PnL stream (shared, read-only)
|
|||
|
|
▼
|
|||
|
|
┌────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ SHADOW ENGINE POOL (N=2K variants) │
|
|||
|
|
│ Each variant: champion params + single ±δ perturbation │
|
|||
|
|
│ Runs on SAME market data, ZERO real capital │
|
|||
|
|
│ Output: per-trade PnL stream per variant │
|
|||
|
|
└──────────────────────────┬─────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ PARAMETER GRADIENT ESTIMATOR (SPSA) │
|
|||
|
|
│ For each param P: │
|
|||
|
|
│ ΔP_estimate = EWMA[ (PnL(P+δ) - PnL(P-δ)) / 2δ ] │
|
|||
|
|
│ Decay λ tuned to regime half-life (~30 trading days) │
|
|||
|
|
│ Min N_trades = 200 before any gradient is trusted │
|
|||
|
|
└──────────────────────────┬─────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ DAMPENED UPDATE RULE │
|
|||
|
|
│ New_P = Old_P + η * Δ_P_estimate │
|
|||
|
|
│ Constraints: │
|
|||
|
|
│ - η = 0 if N_trades < N_MIN │
|
|||
|
|
│ - Δ_P per cycle ≤ MAX_STEP[P] (per-param hard limit) │
|
|||
|
|
│ - P must stay within BOUNDS[P] (pre-tested feasible range) │
|
|||
|
|
│ - Revert if 7-day Sharpe drops > 0.5σ after update │
|
|||
|
|
└──────────────────────────┬─────────────────────────────────────┘
|
|||
|
|
│
|
|||
|
|
▼
|
|||
|
|
┌──────────────┴──────────────┐
|
|||
|
|
▼ ▼
|
|||
|
|
ADJUST live engine LOG + WARN only
|
|||
|
|
(if confidence high) (if significance low)
|
|||
|
|
+ write audit trail + alert dashboard
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Parameter Classification
|
|||
|
|
|
|||
|
|
#### ADAPTIVE (can be tuned by APSS)
|
|||
|
|
| Parameter | Current | Bounds | Max Step/Cycle | Mechanism |
|
|||
|
|
|---|---|---|---|---|
|
|||
|
|
| `fixed_tp_pct` | 99bps | [70, 150]bps | ±3bps | SPSA gradient |
|
|||
|
|
| `max_hold_bars` | 120 | [60, 180] | ±5 bars | SPSA gradient |
|
|||
|
|
| `abs_max_leverage` ceiling | 6.0x | [4.0, 8.0] | ±0.25x | Regime-conditional |
|
|||
|
|
| `vol_gate_pct` | p60 | [p45, p75] | ±2pct | SPSA gradient |
|
|||
|
|
| `min_irp_alignment` | 0.45 | [0.30, 0.60] | ±0.02 | SPSA gradient |
|
|||
|
|
| `ob_min_depth_quality` | 0.40 | [0.20, 0.65] | ±0.03 | SPSA gradient |
|
|||
|
|
| `ob_confirm_rate` | 0.40 | [0.25, 0.60] | ±0.03 | SPSA gradient |
|
|||
|
|
|
|||
|
|
#### LOCKED — IRON RULE (never touched by APSS)
|
|||
|
|
| Parameter | Reason |
|
|||
|
|
|---|---|
|
|||
|
|
| `vel_div_threshold` (-0.02) | Eigenvalue physics. Not a tunable hyperparameter. |
|
|||
|
|
| `vel_div_extreme` (-0.05) | Same. |
|
|||
|
|
| Signal generation logic | Iron Rule: ExF/EsoF/APSS never touch entry signal |
|
|||
|
|
| ACB NPZ-based signals | Pre-computed from exogenous data, not engine params |
|
|||
|
|
| `fraction` (0.20) | Risk management constant, changes require full re-risk |
|
|||
|
|
|
|||
|
|
#### MONITOR-ONLY (log drift, don't adapt)
|
|||
|
|
| Parameter | What to watch |
|
|||
|
|
|---|---|
|
|||
|
|
| `dc_min_magnitude_bps` | Direction confirm sensitivity |
|
|||
|
|
| `sp_maker_entry_rate` | Fill rate drift (venue changes) |
|
|||
|
|
| ACB boost formula constants | Beta/boost landscape shifts |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### The SPSA Algorithm (Simultaneous Perturbation Stochastic Approximation)
|
|||
|
|
|
|||
|
|
SPSA was chosen because:
|
|||
|
|
1. Estimates gradient in **2 evaluations** regardless of parameter dimensionality (vs N for finite differences)
|
|||
|
|
2. **Proven convergent** in non-stationary stochastic environments (Spall 1992, 1998)
|
|||
|
|
3. Naturally handles noise in the objective (PnL is inherently stochastic)
|
|||
|
|
4. Step size schedule provides built-in dampening
|
|||
|
|
|
|||
|
|
**Per-cycle update:**
|
|||
|
|
```python
|
|||
|
|
# Per cycle (every N_CYCLE=500 trades):
|
|||
|
|
delta_k = rng.choice([-1, +1], size=n_params) # Rademacher random direction
|
|||
|
|
theta_plus = theta + c_k * delta_k
|
|||
|
|
theta_minus = theta - c_k * delta_k
|
|||
|
|
y_plus = evaluate(theta_plus, trades_in_window) # shadow engine PnL
|
|||
|
|
y_minus = evaluate(theta_minus, trades_in_window) # shadow engine PnL
|
|||
|
|
g_hat_k = (y_plus - y_minus) / (2 * c_k * delta_k) # gradient estimate
|
|||
|
|
theta = theta + a_k * g_hat_k # update
|
|||
|
|
# Schedules: a_k = a/(k+A)^alpha, c_k = c/k^gamma
|
|||
|
|
# Standard: alpha=0.602, gamma=0.101 (Spall recommended)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Dampening System (anti-whipsaw)
|
|||
|
|
|
|||
|
|
Non-stationarity creates a fundamental problem: the gradient estimate from recent trades reflects the *current* regime, not the long-run optimal. A false positive gradient can move params in the wrong direction and get stuck.
|
|||
|
|
|
|||
|
|
**Three-layer dampening:**
|
|||
|
|
|
|||
|
|
**Layer 1 — Minimum observations gate**
|
|||
|
|
```
|
|||
|
|
N_trades_since_last_update < N_MIN (200) → skip update, accumulate
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Layer 2 — EWMA smoothing of gradient**
|
|||
|
|
```
|
|||
|
|
g_smooth = λ * g_smooth_prev + (1-λ) * g_hat_k
|
|||
|
|
λ = exp(-1 / HALFLIFE) where HALFLIFE = 30 trading days ≈ 300 bars
|
|||
|
|
Only apply update when |g_smooth| > SIGNIFICANCE_THRESHOLD[P]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Layer 3 — Reversion trigger**
|
|||
|
|
```
|
|||
|
|
After each update, monitor 7-day Sharpe:
|
|||
|
|
If Sharpe drops > 0.5σ below pre-update 30-day Sharpe → revert to prior params
|
|||
|
|
Reversion is hard: write prior params back, increment reversion counter
|
|||
|
|
If 3 reversions on same param in 30 days → freeze that param, escalate alert
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Pseudo-Genetic / Self-Improvement Angle
|
|||
|
|
|
|||
|
|
The user's intuition about "pseudo-genetic" improvement maps precisely to **Evolution Strategies (ES)**:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Generation = N_CYCLE trades
|
|||
|
|
Population = M shadow engines (M=20, each with perturbed params)
|
|||
|
|
Fitness = EWMA Sharpe over generation
|
|||
|
|
Selection = top K% params survive
|
|||
|
|
Mutation = ±δ Gaussian perturbation of survivors
|
|||
|
|
Recombination = weighted average of top K survivors (CMA-ES style)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
CMA-ES (Covariance Matrix Adaptation ES) is the state-of-the-art for this:
|
|||
|
|
- Learns the *covariance structure* of the parameter landscape
|
|||
|
|
- Adapts the mutation ellipse to the geometry of the fitness function
|
|||
|
|
- Self-tunes step size — no manual schedule needed
|
|||
|
|
- Directly applicable here: each "generation" is a rolling window of N trades
|
|||
|
|
|
|||
|
|
**The key advantage over SPSA:** CMA-ES captures *parameter interactions* (e.g., TP and max_hold are correlated — increasing both together may be better than either alone). SPSA estimates each parameter's gradient independently.
|
|||
|
|
|
|||
|
|
**Recommended implementation order:**
|
|||
|
|
1. SPSA first (simpler, proven, 2 shadow engines per param)
|
|||
|
|
2. ES/CMA-ES second (requires M shadow engines, more compute)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Practical Deployment Considerations
|
|||
|
|
|
|||
|
|
**Compute:**
|
|||
|
|
- Each shadow engine run ≈ 145s for 55 days / 2128 trades
|
|||
|
|
- In live: shadow engines run on real-time bar stream, not replay
|
|||
|
|
- Real-time bar = 10 minutes → shadow engine lag ≈ 0 (same bar, different params)
|
|||
|
|
- CPU cost: M shadow engines × bar computation = M × (live bar compute time)
|
|||
|
|
- With M=10, estimate: 10x live compute overhead. Acceptable with Hazelcast Jet.
|
|||
|
|
|
|||
|
|
**Hazelcast integration (Phase MIG6 alignment):**
|
|||
|
|
- Shadow engine outputs → Hazelcast IMap `SHADOW_PERF`
|
|||
|
|
- APSS gradient estimator reads from `SHADOW_PERF`, writes param updates to `LIVE_PARAMS` IMap
|
|||
|
|
- Live engine reads current params from `LIVE_PARAMS` on each bar
|
|||
|
|
- Full audit trail: every param change logged to Hazelcast journal (append-only)
|
|||
|
|
|
|||
|
|
**Confidence gating:**
|
|||
|
|
- APSS never adapts params during RED/ORANGE MC-Forewarner state
|
|||
|
|
- APSS freezes during high-DD periods (current DD > 10%): regime is pathological, don't adapt
|
|||
|
|
- APSS resets EWMA after 5-day gap in trading (data discontinuity)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### What This System Is NOT
|
|||
|
|
|
|||
|
|
- It is **not** a signal generator — APSS never touches vel_div, never changes when entries fire
|
|||
|
|
- It is **not** an ExF replacement — macro factors are already tested and found redundant to ACB
|
|||
|
|
- It is **not** a replacement for dataset expansion — 55 days is still too short for confident adaptation; APSS becomes meaningful at 6+ months of production data
|
|||
|
|
- It is **not** autonomous — all param changes require audit logging; reversions are automatic; human override always possible
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Does It Make Sense? — Assessment
|
|||
|
|
|
|||
|
|
**YES, conditionally.**
|
|||
|
|
|
|||
|
|
The core thesis is correct: markets are non-stationary, and fixed params will decay. The TP finding from this experiment (+4.3% by sampling a 3bp band) is direct evidence that optimal params drift. The vol gate sensitivity finding (1bp price noise → 28% more trades) shows the system has tunable knobs that meaningfully affect performance.
|
|||
|
|
|
|||
|
|
**The right conditions for APSS to be productive:**
|
|||
|
|
1. ≥ 6 months of live production data (currently have 55 days backtest — insufficient for confident adaptation)
|
|||
|
|
2. Shadow engine pool is computationally feasible (Hazelcast Jet + Phase MIG6)
|
|||
|
|
3. Dampening is implemented before adaptation (Phase MIG6 prerequisite)
|
|||
|
|
4. Iron Rule is enforced at architecture level (not just code comments)
|
|||
|
|
|
|||
|
|
**Current priority:** LOG-ONLY mode. Wire shadow engines in production, accumulate drift statistics for 90 days, do not adapt yet. Use the data to validate whether TP drift is real and directional before enabling live adaptation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. IMMEDIATE ACTIONABLE FINDINGS
|
|||
|
|
|
|||
|
|
### Priority 1 — TP Sweep (HIGH, 1 day work)
|
|||
|
|
The experiment strongly suggests TP=99bps is sub-optimal. 96% of seeds in the 96–103bps range beat baseline.
|
|||
|
|
- **Action:** Run `test_tp_sweep.py` from 85–120bps in 2bps steps (19 runs, ~45min)
|
|||
|
|
- **Expected:** Global optimum around 103–108bps, +3–6% ROI
|
|||
|
|
- **Risk:** Low (pure TP change, no other modifications)
|
|||
|
|
|
|||
|
|
### Priority 2 — Vol Gate Smoothing (MEDIUM, production risk)
|
|||
|
|
Price dither experiment revealed vol gate is brittle to input data quality.
|
|||
|
|
- **Action:** Replace point-estimate `dvol` with 5-bar EWMA before p60 comparison
|
|||
|
|
- **Location:** `test_pf_dynamic_beta_validate.py` data loading + live feed preprocessor
|
|||
|
|
- **Risk:** May change baseline slightly — re-benchmark after
|
|||
|
|
|
|||
|
|
### Priority 3 — APSS Shadow Engine (LOW, 6 month horizon)
|
|||
|
|
As described in Section 3. Do not implement adaptation until 6 months live data.
|
|||
|
|
- **Action:** Start with LOG-ONLY shadow engine pool in Phase MIG6
|
|||
|
|
- **Prerequisites:** Hazelcast Jet (Phase MIG6), 6 months production data
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. EXPERIMENT METADATA
|
|||
|
|
- Branch: `experiment/noise-resonance`
|
|||
|
|
- Script: `test_noise_experiment.py`
|
|||
|
|
- Results CSV: `run_logs/noise_exp_20260304_230311.csv`
|
|||
|
|
- Total compute: 7.3 hours (176 runs, ~145s/run on Siloqy venv)
|
|||
|
|
- Baseline confirmed: ROI=+44.89%, PF=1.123, DD=14.95%, Sharpe=2.50, T=2128
|