Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
105 lines
4.8 KiB
Markdown
Executable File
105 lines
4.8 KiB
Markdown
Executable File
# Agent Change Analysis Report
|
||
**Date: 2026-03-21**
|
||
**Author: Claude Code audit of Antigravity AI agent document**
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
**FORK TEST RESULT: 0/2 PASS** — Both fork tests produce ~12% ROI vs gold 181.81%.
|
||
|
||
The agent's claims are PARTIALLY correct in diagnosis but the remediation INTRODUCES new regressions.
|
||
|
||
---
|
||
|
||
## Test Results
|
||
|
||
| Test | ROI | Trades | DD | Verdict |
|
||
|------|-----|--------|-----|---------|
|
||
| D_LIQ_GOLD perfect-maker (fork) | +12.83% | 1739 | 26.24% | FAIL ✗ |
|
||
| D_LIQ_GOLD stochastic 0.62 (fork) | +5.92% | 1739 | 27.95% | FAIL ✗ |
|
||
| replicate_181 style (no hazard call, float64, static vol_p60) | +111.03% | 1959 | 16.89% | FAIL ✗ |
|
||
| Gold reference | +181.81% | 2155 | 17.65% | — |
|
||
|
||
---
|
||
|
||
## Root Cause Analysis
|
||
|
||
### Cause 1: `set_esoteric_hazard_multiplier(0.0)` in exp_shared.run_backtest
|
||
|
||
The agent added `eng.set_esoteric_hazard_multiplier(0.0)` to `exp_shared.run_backtest`. With the new ceiling=10.0:
|
||
|
||
- Sets `base_max_leverage = 10.0` on a D_LIQ engine designed for 8.0 soft / 9.0 hard
|
||
- On unboosted days: effective leverage = 9.0x (vs certified 8.0x)
|
||
- 5-day comparison confirms: TEST A at 9.0x amplifies bad-day losses more than good-day gains
|
||
|
||
**Effect**: Variance increase that over 56 days results in 12.83% vs 111% (replicate style)
|
||
|
||
### Cause 2: Rolling vol_p60 (lower threshold on some days)
|
||
|
||
The rolling vol_p60 can be LOWER than static vol_p60 (especially after quiet days like Jan 1 holiday). This allows more bars to trade in low-quality signal environments.
|
||
|
||
Day 2 (Jan 1): TEST A vol_ok=1588 bars vs TEST B=791 (2× more eligible, vp60=0.000099 vs 0.000121).
|
||
More trades on bad signal days → net negative over 56 days.
|
||
|
||
### Cause 3: Pre-existing regression (111% vs 181.81%)
|
||
|
||
Even WITHOUT the agent's specific exp_shared changes, the current code produces 111%/1959 vs gold 181.81%/2155. This regression predates the agent's changes and stems from:
|
||
|
||
1. **ACB change**: `fund_dbt_btc` (Deribit funding) now preferred over `funding_btc`. If Deribit funding is less bearish in Dec-Feb 2026 period, ACB gives lower boost → lower leverage → lower ROI.
|
||
2. **Orchestrator refactoring**: 277+ lines added (begin_day/step_bar/end_day), 68 removed. Subtle behavioral changes may have affected trade quality.
|
||
|
||
---
|
||
|
||
## Verdict on Agent's Claims
|
||
|
||
| Claim | Assessment |
|
||
|-------|-----------|
|
||
| A. Ceiling_lev 6→10 | CORRECT in concept: old 6.0 DID suppress D_LIQ below certified 8.0x. But fix leaves `set_esoteric_hazard_multiplier(0.0)` in run_backtest, which now drives to 9.0x (not 8.0x) — over-correction. |
|
||
| B. MC proportional 0.8x | NEUTRAL for no-forewarner runs (forewarner=None → never called). |
|
||
| C. Rolling vol_p60 | NEGATIVE: rolling vol_p60 can be lower than static, enabling trading in worse signal environments. |
|
||
| D. Float32 / lazy OB | NEUTRAL for trade count (float32 at $50k has sufficient precision; OB mock data is date-agnostic). |
|
||
|
||
---
|
||
|
||
## Confirmed Mechanism (leverage verification)
|
||
|
||
Direct Python verification of the hazard call effect:
|
||
|
||
```
|
||
BEFORE set_esoteric_hazard_multiplier(0.0) [ceiling=10.0]:
|
||
base_max_leverage = 8.0 (certified D_LIQ soft cap)
|
||
bet_sizer.max_leverage = 8.0
|
||
abs_max_leverage = 9.0 (certified D_LIQ hard cap)
|
||
|
||
AFTER set_esoteric_hazard_multiplier(0.0) [ceiling=10.0]:
|
||
base_max_leverage = 10.0 ← overridden!
|
||
bet_sizer.max_leverage = 10.0 ← overridden!
|
||
abs_max_leverage = 9.0 (unchanged — abs is not touched by hazard call)
|
||
```
|
||
|
||
Result: effective leverage = min(base=10, abs=9) = **9.0x on ALL days**.
|
||
D_LIQ is certified at 8.0x soft / 9.0x hard. The hard cap should only trigger on proxy_B boost events.
|
||
The hazard call **unconditionally removes the 8.0x soft limit** — every day runs at 9.0x.
|
||
|
||
---
|
||
|
||
## The Real Problem
|
||
|
||
The gold standard (181.81%) was certified using code where **`set_esoteric_hazard_multiplier` was NOT called in the backtest loop**. The replicate_181_gold.py script (which doesn't call it) was the certification vehicle.
|
||
|
||
The agent's fix (ceiling 6→10) was meant to address the case WHERE `set_esoteric_hazard_multiplier(0.0)` IS called. With ceiling=6.0: sets base=6.0 < D_LIQ's 8.0 → suppresses leverage. With ceiling=10.0: sets base=10.0 > D_LIQ's abs=9.0 → raises leverage beyond certified. Both are wrong.
|
||
|
||
**Correct fix**: Remove `eng.set_esoteric_hazard_multiplier(0.0)` from `exp_shared.run_backtest`, OR don't call it when using D_LIQ (which manages its own leverage via extended_soft_cap/extended_abs_cap).
|
||
|
||
---
|
||
|
||
## Gold Standard Status
|
||
|
||
The gold standard (181.81%/2155/DD=17.65%) **CANNOT be replicated** from current code via ANY tested path:
|
||
- `exp_shared.run_backtest`: 12.83%/1739 (agent's hazard call + rolling vol_p60 + 9x leverage)
|
||
- `replicate_181_gold.py` style: 111.03%/1959 (pre-existing regression from orchestrator/ACB changes)
|
||
|
||
The agent correctly identified that the codebase had regressed but their fix is incomplete.
|
||
|