Files
DOLPHIN/prod/docs/AGENT_CHANGE_ANALYSIS_REPORT.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

105 lines
4.8 KiB
Markdown
Executable File
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Agent Change Analysis Report
**Date: 2026-03-21**
**Author: Claude Code audit of Antigravity AI agent document**
---
## Executive Summary
**FORK TEST RESULT: 0/2 PASS** — Both fork tests produce ~12% ROI vs gold 181.81%.
The agent's claims are PARTIALLY correct in diagnosis but the remediation INTRODUCES new regressions.
---
## Test Results
| Test | ROI | Trades | DD | Verdict |
|------|-----|--------|-----|---------|
| D_LIQ_GOLD perfect-maker (fork) | +12.83% | 1739 | 26.24% | FAIL ✗ |
| D_LIQ_GOLD stochastic 0.62 (fork) | +5.92% | 1739 | 27.95% | FAIL ✗ |
| replicate_181 style (no hazard call, float64, static vol_p60) | +111.03% | 1959 | 16.89% | FAIL ✗ |
| Gold reference | +181.81% | 2155 | 17.65% | — |
---
## Root Cause Analysis
### Cause 1: `set_esoteric_hazard_multiplier(0.0)` in exp_shared.run_backtest
The agent added `eng.set_esoteric_hazard_multiplier(0.0)` to `exp_shared.run_backtest`. With the new ceiling=10.0:
- Sets `base_max_leverage = 10.0` on a D_LIQ engine designed for 8.0 soft / 9.0 hard
- On unboosted days: effective leverage = 9.0x (vs certified 8.0x)
- 5-day comparison confirms: TEST A at 9.0x amplifies bad-day losses more than good-day gains
**Effect**: Variance increase that over 56 days results in 12.83% vs 111% (replicate style)
### Cause 2: Rolling vol_p60 (lower threshold on some days)
The rolling vol_p60 can be LOWER than static vol_p60 (especially after quiet days like Jan 1 holiday). This allows more bars to trade in low-quality signal environments.
Day 2 (Jan 1): TEST A vol_ok=1588 bars vs TEST B=791 (2× more eligible, vp60=0.000099 vs 0.000121).
More trades on bad signal days → net negative over 56 days.
### Cause 3: Pre-existing regression (111% vs 181.81%)
Even WITHOUT the agent's specific exp_shared changes, the current code produces 111%/1959 vs gold 181.81%/2155. This regression predates the agent's changes and stems from:
1. **ACB change**: `fund_dbt_btc` (Deribit funding) now preferred over `funding_btc`. If Deribit funding is less bearish in Dec-Feb 2026 period, ACB gives lower boost → lower leverage → lower ROI.
2. **Orchestrator refactoring**: 277+ lines added (begin_day/step_bar/end_day), 68 removed. Subtle behavioral changes may have affected trade quality.
---
## Verdict on Agent's Claims
| Claim | Assessment |
|-------|-----------|
| A. Ceiling_lev 6→10 | CORRECT in concept: old 6.0 DID suppress D_LIQ below certified 8.0x. But fix leaves `set_esoteric_hazard_multiplier(0.0)` in run_backtest, which now drives to 9.0x (not 8.0x) — over-correction. |
| B. MC proportional 0.8x | NEUTRAL for no-forewarner runs (forewarner=None → never called). |
| C. Rolling vol_p60 | NEGATIVE: rolling vol_p60 can be lower than static, enabling trading in worse signal environments. |
| D. Float32 / lazy OB | NEUTRAL for trade count (float32 at $50k has sufficient precision; OB mock data is date-agnostic). |
---
## Confirmed Mechanism (leverage verification)
Direct Python verification of the hazard call effect:
```
BEFORE set_esoteric_hazard_multiplier(0.0) [ceiling=10.0]:
base_max_leverage = 8.0 (certified D_LIQ soft cap)
bet_sizer.max_leverage = 8.0
abs_max_leverage = 9.0 (certified D_LIQ hard cap)
AFTER set_esoteric_hazard_multiplier(0.0) [ceiling=10.0]:
base_max_leverage = 10.0 ← overridden!
bet_sizer.max_leverage = 10.0 ← overridden!
abs_max_leverage = 9.0 (unchanged — abs is not touched by hazard call)
```
Result: effective leverage = min(base=10, abs=9) = **9.0x on ALL days**.
D_LIQ is certified at 8.0x soft / 9.0x hard. The hard cap should only trigger on proxy_B boost events.
The hazard call **unconditionally removes the 8.0x soft limit** — every day runs at 9.0x.
---
## The Real Problem
The gold standard (181.81%) was certified using code where **`set_esoteric_hazard_multiplier` was NOT called in the backtest loop**. The replicate_181_gold.py script (which doesn't call it) was the certification vehicle.
The agent's fix (ceiling 6→10) was meant to address the case WHERE `set_esoteric_hazard_multiplier(0.0)` IS called. With ceiling=6.0: sets base=6.0 < D_LIQ's 8.0 suppresses leverage. With ceiling=10.0: sets base=10.0 > D_LIQ's abs=9.0 → raises leverage beyond certified. Both are wrong.
**Correct fix**: Remove `eng.set_esoteric_hazard_multiplier(0.0)` from `exp_shared.run_backtest`, OR don't call it when using D_LIQ (which manages its own leverage via extended_soft_cap/extended_abs_cap).
---
## Gold Standard Status
The gold standard (181.81%/2155/DD=17.65%) **CANNOT be replicated** from current code via ANY tested path:
- `exp_shared.run_backtest`: 12.83%/1739 (agent's hazard call + rolling vol_p60 + 9x leverage)
- `replicate_181_gold.py` style: 111.03%/1959 (pre-existing regression from orchestrator/ACB changes)
The agent correctly identified that the codebase had regressed but their fix is incomplete.