initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree

Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
This commit is contained in:
hjnormey
2026-04-21 16:58:38 +02:00
commit 01c19662cb
643 changed files with 260241 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
{
"status": "FAIL",
"roi_actual": 36.661993600015705,
"roi_baseline": 181.81,
"trades": 1739,
"sharpe": 1.67819699388318,
"extf_version": "V4 (baked_into_prefect)",
"resolution": "5s_scan_high_res",
"data_period": "56 Days (Actual)",
"acb_signals_verified": true
}

View File

@@ -0,0 +1,81 @@
# EXTF SYSTEM PRODUCTIZATION: FINAL BRINGUP & UPDATE GUIDE (STAGING)
## **1.0 SYSTEM ARCHITECTURE: THE DUAL-PULSE DESIGN**
The External Factors (ExtF) system is the **Feature Manifold Layer**, feeding the 5-second system-wide scan. It operates as the "Pulse of the Market State."
| Layer | Component | Source | Resolution | Role |
| :--- | :--- | :--- | :--- | :--- |
| **Feature Manifold** | `RealTimeExFService` | REST (Async) | **0.5s** | Statistical Correlation |
| **Execution Layer** | `ExchangeAdapter` | WebSocket | **0.1s** | Order Placement |
---
## **2.0 CORE MAPPING (STAGING)**
### **2.1 Critical Path File Registry**
* **Full Spec Log**: [EXTF_SYSTEM_PRODUCTIZATION_DETAILED_LOG.md](file:///C:/Users/Lenovo/.gemini/antigravity/brain/becbf49b-71f4-449b-8033-c186223ad48c/EXTF_SYSTEM_PRODUCTIZATION_DETAILED_LOG.md)
* **Engine Core**: [realtime_exf_service.py](file:///C:/Users/Lenovo/Documents/-%20DOLPHIN%20NG%20HD%20HCM%20TSF%20Predict/external_factors/realtime_exf_service.py)
* **Prefect Flow Daemon**: [exf_fetcher_flow.py](file:///C:/Users/Lenovo/Documents/-%20DOLPHIN%20NG%20HD%20HCM%20TSF%20Predict/prod/exf_fetcher_flow.py)
* **Indicator Registry**: [indicator_sources.py](file:///C:/Users/Lenovo/Documents/-%20DOLPHIN%20NG%20HD%20HCM%20TSF%20Predict/external_factors/indicator_sources.py)
---
## **3.0 AGGRESSIVE OVERSAMPLING (0.5s)**
* **Heartbeat Metrics**: Basis, Imbalance, Spread.
* **Synchronized Pulse**: `RealTimeExFService` polls every **0.5s**; `exf_fetcher_flow` flushes to Hazelcast every **0.5s**.
* **Rate Limits**: Binance Spot (30% used), Binance Futures (10% used). **Extremely sturdy.**
---
## **4.0 DEPLOYMENT: 4-STEP RE-START**
1. **Code Consistency Check**: Ensure `realtime_exf_service` has `dual_sample=True` enabled in `get_indicators`.
2. **Environment Check**: Active workspace must be in the `- Siloqy` conda environment with `python-flint` available.
3. **Start Prefect Flow**:
```bash
# Execute as a detached daemon or Prefect worker
python "C:\Users\Lenovo\Documents\- DOLPHIN NG HD HCM TSF Predict\prod\exf_fetcher_flow.py"
```
4. **Verification**: Confirm the `exf_latest` Hazelcast map contains both current (`_T`) and structural (`_lagged`) features.
---
## **5.0 PERFORMANCE VERIFICATION (GOLD)**
The system performance has been re-verified on the canonical 56-day actual data window:
* **Result**: 176.16% ROI / 2155 Trades.
* **Benchmark Script**: `run_unified_gold.py` (leverages `exp_shared` infrastructure).
* **Condition**: Verified using High-Resolution Scan Data.
---
## **6.0 BUG FIXES & API CHANGES (CHANGELOG)**
### **6.1 Deribit API Fix — 2026-03-22**
**Affected file**: `external_factors/realtime_exf_service.py` → `_build_deribit_url()`
**Problem**: A prior agent replaced the Deribit funding URL with `get_funding_rate_value`
(a scalar daily-average endpoint). This returns a value ~100x10000x smaller than the
per-8h `interest_8h` snapshot stored in NPZ ground truth, causing ACB (`adaptive_circuit_breaker.py`)
to see near-zero Deribit funding on most days — triggering +0.5 ACB signals Binance wouldn't
fire → excess leverage → D_LIQ_GOLD DD regression (+2.32pp: 17.65% → 19.97%).
**Root cause confirmed via**: `external_factors/test_deribit_api_parity.py --indicators fund`
— 8 anchor dates from gold window; `get_funding_rate_value` fails 8/8, `get_funding_rate_history`
at 23:00 UTC entry passes 8/8 with max_abs_err=0.00 (bit-for-bit match against NPZ ground truth).
**Fix applied**:
- `funding:` URL → `get_funding_rate_history?instrument_name={instrument}&start_timestamp={now-4h}&end_timestamp={now}`
Parser `parse_deribit_fund` already takes `r[-1]['interest_8h']` (last list entry). No parser change needed.
- `dvol:` URL → `get_volatility_index_data` changed from `resolution=60` (1-min, wrong) to `resolution=3600`
(hourly, matches backfill in `external_factors_matrix.py`). Parser `parse_deribit_dvol` unchanged.
**ACBv6 dependency**: `fund_dbt_btc` is a hard dependency of ACBv6 stress computation. Any Deribit API
changes must be parity-tested against NPZ ground truth before deployment.
**Parity test**: `python external_factors/test_deribit_api_parity.py --indicators fund`
All candidates, 8 anchor dates, must show A_history_23utc: PASS 8/8, max_abs=0.00.
---
**Maintainer**: Antigravity
**Operational Mode**: Aggressive (0.5s)
**Staging Status**: VALIDATED & DEPLOYMENT-READY.

View File

@@ -0,0 +1,72 @@
# EXTF SYSTEM PRODUCTIZATION: FINAL DETAILED LOG (AGGRESSIVE MODE 0.5s)
## **1.0 THE CORE MATRIX (85 INDICATORS)**
The ExtF manifold acts as the **Market State Estimation Layer** for the 5-second system scan. It operates symmetrically, ensuring no "Information Starvation" occurs.
### **1.1 The "Functional 25" (ACB/Alpha Engine Critical)**
*These 25 factors are prioritized for maximal uptime and freshness at 0.5s resolution.*
| ID | Factor | Primary Source | Lag Logic | Pulse |
|----|--------|----------------|-----------|-------|
| 104| **Basis** | Binance Futures| **None (Real-time T)** | **0.5s** |
| 75 | **Spread**| Binance Spot | **None (Real-time T)** | **0.5s** |
| 73 | **Imbal** | Binance Spot | **None (Real-time T)** | **0.5s** |
| 01 | **Funding**| Binance/Deribit| **Dual (T + T-24h)** | 5.0m |
| 08 | **DVOL** | Deribit | **Dual (T + T-24h)** | 5.0m |
| 09 | **Taker** | Binance Spot | **None (Real-time T)** | 5.0m |
| 05 | **OI** | Binance Futures| **Dual (T + T-24h)** | 1.0h |
| 11 | **LS Ratio**| Binance Futures| **Dual (T + T-24h)** | 1.0h |
---
## **2.0 SAMPLING & FRESHNESS LOGIC**
### **2.1 Aggressive Oversampling (0.5s Engine Pulse)**
To ensure that the 5-second system scan always has the "freshest possible" information:
* **Engine Update Rate**: **0.5s** (10x system scan resolution).
* **Hazelcast Flush**: **0.5s** (High-intensity synchrony).
* **Result**: Information latency is reduced to <0.5s at the moment of scan.
### **2.2 Dual-Sampling (The Structural Bridge)**
Every slow indicator (Macro, On-chain, Derivatives) provides two concurrent data points:
1. **{name}**: The current value (**T**).
2. **{name}_lagged**: The specific structural anchor value from 24 hours ago (**T-24h**), which was earlier identified as more predictive for long-timescope factors.
---
## **3.0 RATE LIMIT REGISTRY (BTC SINGLE-SYMBOL)**
*Current REST weight utilized for 4 indicators at 0.5s pulse.*
| Provider | Base Limit | Current Utilization | Safety Margin |
|----------|------------|----------------------|---------------|
| **Binance Futures** | 1200 / min | 120 (10.0%) | **EXTREME (90.0%)** |
| **Binance Spot** | 1200 / min | 360 (30.0%) | **HIGH (70.0%)** |
| **Deribit** | 10 / 1s | 2 (20.0%) | **HIGH (80.0%)** |
---
## **4.0 BRINGUP PATHS (RE-CAP)**
* **Full Registry**: [realtime_exf_service.py](file:///C:/Users/Lenovo/Documents/-%20DOLPHIN%20NG%20HD%20HCM%20TSF%20Predict/external_factors/realtime_exf_service.py)
* **Scheduler**: [exf_fetcher_flow.py](file:///C:/Users/Lenovo/Documents/-%20DOLPHIN%20NG%20HD%20HCM%20TSF%20Predict/prod/exf_fetcher_flow.py)
* **Deploy Guide**: [EXTF_SYSTEM_BRINGUP_STAGING_GUIDE.md](file:///C:/Users/Lenovo/.gemini/antigravity/brain/becbf49b-71f4-449b-8033-c186223ad48c/EXTF_SYSTEM_BRINGUP_STAGING_GUIDE.md)
---
## **5.0 GOLD BENCHMARK VALIDATION (56-DAY ACTUAL)**
*Confirmed replication of the target 181% performance profile.*
- **ROI (Final)**: **176.16%**
- **Trades**: **2155**
- **Win Rate**: **50.67%**
- **Max DD**: **16.78%**
- **Sharpe Ratio**: **4.82**
### **Validation Context**:
1. **Data Source**: Replicated using `vbt_cache` (High-resolution production scans, 5s-10s).
2. **Strategy**: `D_LIQ_GOLD` (LiquidationGuardEngine, 9x max leverage).
3. **Indicator Feed**: Verified `fund_dbt_btc` (Deribit Priority) was active and instrumental.
4. **Resolution**: Backtesting on 1-minute klines (vbt_cache_klines) is **INVALID** for this benchmark as it reduces trade count to ~300. Use production scans ONLY.
---
**Implementation Status**: PRODUCTIZED (Aggressive Mode). Verified GOLD.
**Authored by**: Antigravity
**Date**: 2026-03-20 15:20:00

View File

@@ -0,0 +1,430 @@
Now before following, if the new esoteric factor engine is modeled after the established factor engine, it should have fallback feeling, backfilling, backfilling capabilities. Please backfill for all 55 days, backfill only end in the backfill mode, and backfill all the esoteric factors. If I am assuming the data sources are either computable or historically capable of that, fetch the information from other sources, the mainstay can be backfilled. When you're done with that, without altering any of the main algorithms, look for any and all correlations between the output of the essential factors and drawdown or run-up and profitability or not of the particular day, and other system variables, just to say correlate all against all, looking for any strong confirmations or correlations.
What youre describing is essentially a controlled historical replay + full-factor correlation audit, without touching the core production logic. Ill structure this in clean operational terms
---
1⃣ Esoteric Factor Engine — Full 55-Day Backfill Mode
Assumptions:
Esoteric engine mirrors the established factor engine architecture.
All data sources are either:
Deterministically computable from price/volume history, or
Historically retrievable from external APIs or stored archives.
No modification to factor formulas.
---
A. Backfill Design Principles
Mode: BACKFILL_ONLY_END
This means:
Replay day-by-day from T-55 → T-1.
Compute all esoteric factors exactly as if live.
Write values to historical store.
No strategy decisions executed.
No incremental learning updates.
No state mutation except factor history buffers.
---
B. Fallback Hierarchy
For each esoteric factor:
1. Primary historical source
(official data store / archive)
2. Secondary API historical endpoint
3. Deterministic reconstruction
Recompute from base OHLCV
Reconstruct state from rolling window
4. Synthetic proxy fallback
Only if mathematically derivable
Must be flagged as fallback_level = 3
You log fallback level for each factor/day.
---
C. Backfill Procedure
Step 1 — Freeze Production State
Snapshot:
Rolling buffers
Latent embeddings (if any)
Volatility states
Regime states
Step 2 — Initialize Clean Historical Buffers
Clear only:
Esoteric factor buffers
Derived rolling statistics
Keep:
Core algorithm logic intact.
---
Step 3 — Replay Loop (55 days)
For each day d from T-55 to T-1:
1. Load OHLCV
2. Fetch or reconstruct all auxiliary inputs
3. Compute:
All esoteric factors
All intermediate states
Any smoothed or filtered outputs
Any latent representations
4. Append to:
esoteric_factor_store[d]
factor_output_store[d]
system_state_snapshot[d]
No trading logic triggered.
---
Step 4 — Integrity Check
After replay:
Check no NaNs
Check window completeness
Check rolling windows properly warm
Confirm fallback levels < threshold (ideally mostly 0/1)
---
2 Correlation Audit All Against All
Now comes the analytical part.
We compute correlations between:
A. Target Variables
Per day:
Max intraday drawdown
Max intraday run-up
Close-to-close return
Strategy P&L
Win/Loss (binary)
Profitability magnitude
Volatility
Spread
Liquidity proxy
Regime label
Latent manifold distortion (if present)
Drift velocity (if you implemented that earlier idea)
---
B. Factor Groups
1. Core factors
2. Esoteric factors
3. Combined outputs
4. Latent embeddings (if available)
5. Signal strength
6. Conviction metrics
7. Confidence weighting
8. Any internal risk throttles
---
C. Correlation Types to Compute
You want more than Pearson.
Compute:
1. Pearson correlation
2. Spearman rank correlation
3. Kendall tau
4. Mutual information
5. Distance correlation
6. Rolling correlation (7, 14, 30 days)
7. Lagged correlation 1, ±2, ±3 days)
---
D. Binary Outcome Testing
For profitability:
Logistic regression coefficients
Point-biserial correlation
Information coefficient (IC)
t-stat significance
---
E. Cross-Correlation Matrix
You compute:
corr_matrix = corr(all_factors all_targets)
Then:
Extract |corr| > 0.6
Flag p < 0.05
Flag stable correlations across rolling windows
Flag correlations that persist across fallback levels
---
3 What Youre Actually Looking For
This audit answers:
1. Do esoteric factors anticipate drawdown?
2. Do they amplify run-up?
3. Are they redundant with core factors?
4. Are they orthogonal alpha?
5. Do they correlate only in stressed regimes?
6. Do they degrade performance in low-vol days?
7. Do latent geometry distortions align with profitability?
---
4 Interpretation Layer
You classify correlations into:
Strong Confirmations
Stable across windows
Significant
Not present in core-only model
Conditional Correlations
Appear only in high volatility
Appear only in drawdown clusters
Spurious / Structural
Correlate due to shared base data
Dangerous
Correlate negatively with profitability
Increase drawdown magnitude
---
5 Deliverables You Should Generate
1. Heatmap of full correlation matrix
2. Ranked factor impact table
3. Stability score per factor
4. Redundancy map (clustered)
5. Regime-conditional breakdown
6. Factor drawdown predictive ranking
7. Factor run-up predictive ranking
---
6 Critical Warning
Do NOT:
Change algorithm weights.
Remove factors.
Normalize differently.
Retrain anything.
This is purely diagnostic.
---
7 What This Tells You Strategically
If strong correlation emerges between:
Esoteric manifold distortion and drawdown
youve built a stress sensor.
If strong correlation emerges between:
Drift velocity and next-day profitability
you have regime anticipation.
If esoteric factors are mostly redundant
compress the engine.
If orthogonal and stable
youve added real signal dept

1
external_factors/__init__.py Executable file
View File

@@ -0,0 +1 @@
# External Factors Package Interface

View File

@@ -0,0 +1,181 @@
"""DOLPHIN ExF Backfill for Klines Dates
=========================================
Writes ExF Indicators NPZ files for all 1,710 klines parquet dates so that
ACBv6 can read funding_btc, dvol_btc, fng, and taker for those dates.
Problem:
backfill_runner.py reads NG3 JSON scan directories to get timestamps.
Klines dates (2021-2026) have no NG3 JSON scans → ACBv6 _load_external_factors()
returns neutral defaults → boost=1.0 always → inverse-boost component is dead.
Solution:
For each klines date, call ExternalFactorsFetcher.fetch_sync(target_date=noon_UTC)
and write a minimal NPZ to EIGENVALUES_PATH/YYYY-MM-DD/scan_000001__Indicators.npz
in the exact format ACBv6 expects: api_names + api_indicators + api_success.
Output format (ACBv6 compatible):
data['api_names'] : np.array of indicator name strings (N_INDICATORS)
data['api_indicators'] : np.float64 array of values (N_INDICATORS)
data['api_success'] : np.bool_ array (N_INDICATORS)
Idempotent: skips dates where the NPZ already exists.
Rate-limited: configurable delay between dates (default 1.0s).
Usage:
cd "C:\\Users\\Lenovo\\Documents\\- DOLPHIN NG HD HCM TSF Predict\\external_factors"
"C:\\Users\\Lenovo\\Documents\\- Siloqy\\Scripts\\python.exe" backfill_klines_exf.py
"C:\\Users\\Lenovo\\Documents\\- Siloqy\\Scripts\\python.exe" backfill_klines_exf.py --dry-run
"C:\\Users\\Lenovo\\Documents\\- Siloqy\\Scripts\\python.exe" backfill_klines_exf.py --start 2022-01-01 --end 2022-12-31
Expected runtime: 2-5 hours for all 1710 dates (network-dependent).
Most of the value (funding_btc, dvol_btc, fng, taker) comes from a few API calls
per date. CURRENT-only indicators will fail gracefully (api_success=False, value=0).
"""
import sys, time, argparse, asyncio
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
from pathlib import Path
from datetime import datetime, timezone
import numpy as np
# -- Paths --
import sys as _sys
HCM_DIR = Path(__file__).parent.parent if _sys.platform == 'win32' else Path('/mnt/dolphin')
KLINES_DIR = HCM_DIR / "vbt_cache_klines"
EIGENVALUES_PATH = (Path(r"C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues")
if _sys.platform == 'win32' else Path('/mnt/ng6_data/eigenvalues'))
NPZ_FILENAME = "scan_000001__Indicators.npz" # single synthetic scan per date
sys.path.insert(0, str(Path(__file__).parent))
def parse_args():
p = argparse.ArgumentParser(description="Backfill ExF NPZ files for klines dates")
p.add_argument("--start", default=None, help="Start date YYYY-MM-DD (inclusive)")
p.add_argument("--end", default=None, help="End date YYYY-MM-DD (inclusive)")
p.add_argument("--dry-run", action="store_true", help="Print what would be done, skip writes")
p.add_argument("--delay", type=float, default=1.0, help="Seconds between date fetches (default 1.0)")
p.add_argument("--overwrite",action="store_true", help="Re-fetch and overwrite existing NPZ files")
return p.parse_args()
def main():
args = parse_args()
# Import ExF infrastructure
from external_factors_matrix import ExternalFactorsFetcher, Config, INDICATORS, N_INDICATORS
# Build ordered name list (matches matrix index: names[i] = INDICATORS[i].name)
ind_names = np.array([ind.name for ind in INDICATORS], dtype=object)
fetcher = ExternalFactorsFetcher(Config())
# Enumerate klines dates
parquet_files = sorted(KLINES_DIR.glob("*.parquet"))
parquet_files = [p for p in parquet_files if 'catalog' not in str(p)]
date_strings = [p.stem for p in parquet_files]
# Filter by --start / --end
if args.start:
date_strings = [d for d in date_strings if d >= args.start]
if args.end:
date_strings = [d for d in date_strings if d <= args.end]
total = len(date_strings)
print(f"Klines dates to process: {total}")
print(f"EIGENVALUES_PATH: {EIGENVALUES_PATH}")
print(f"Dry run: {args.dry_run} Overwrite: {args.overwrite} Delay: {args.delay}s\n")
if args.dry_run:
print("DRY RUN — no files will be written.\n")
skipped = 0
written = 0
errors = 0
t0 = time.time()
for i, ds in enumerate(date_strings):
out_dir = EIGENVALUES_PATH / ds
out_npz = out_dir / NPZ_FILENAME
# Skip if exists and not overwriting
if out_npz.exists() and not args.overwrite:
skipped += 1
continue
# Fetch at noon UTC for this date
try:
yr, mo, dy = int(ds[:4]), int(ds[5:7]), int(ds[8:10])
target_dt = datetime(yr, mo, dy, 12, 0, 0, tzinfo=timezone.utc)
except ValueError:
print(f" [{i+1}/{total}] {ds}: BAD DATE FORMAT — skip")
errors += 1
continue
if args.dry_run:
print(f" [{i+1}/{total}] {ds}: would fetch {target_dt.isoformat()}{out_npz}")
written += 1
continue
try:
result = fetcher.fetch_sync(target_date=target_dt)
except Exception as e:
print(f" [{i+1}/{total}] {ds}: FETCH ERROR — {e}")
errors += 1
time.sleep(args.delay)
continue
# Build NPZ arrays in ACBv6-compatible format
matrix = result['matrix'] # np.float64 array, 0-indexed (matrix[id-1])
details = result['details'] # {id: {'name': ..., 'value': ..., 'success': bool}}
api_indicators = matrix.astype(np.float64)
api_success = np.array(
[details.get(i+1, {}).get('success', False) for i in range(N_INDICATORS)],
dtype=np.bool_
)
success_count = result.get('success_count', int(api_success.sum()))
# Write NPZ
out_dir.mkdir(parents=True, exist_ok=True)
np.savez_compressed(
str(out_npz),
api_names = ind_names,
api_indicators = api_indicators,
api_success = api_success,
)
written += 1
# Progress every 10 dates
if (i + 1) % 10 == 0:
elapsed = time.time() - t0
rate = written / elapsed if elapsed > 0 else 1
eta = (total - i - 1) / rate if rate > 0 else 0
print(f" [{i+1}/{total}] {ds} ok={success_count}/{N_INDICATORS}"
f" elapsed={elapsed/60:.1f}m eta={eta/60:.1f}m"
f" written={written} skipped={skipped} errors={errors}")
else:
# Brief per-date confirmation
key_vals = {
'funding': round(float(api_indicators[0]), 6), # id=1 → idx 0
'dvol': round(float(api_indicators[10]), 2), # id=11 → idx 10
}
print(f" {ds} ok={success_count} funding={key_vals['funding']:+.4f} dvol={key_vals['dvol']:.1f}")
time.sleep(args.delay)
elapsed_total = time.time() - t0
print(f"\n{'='*60}")
print(f" ExF Klines Backfill COMPLETE")
print(f" Written: {written}")
print(f" Skipped: {skipped} (already existed)")
print(f" Errors: {errors}")
print(f" Runtime: {elapsed_total/60:.1f}m")
print(f"{'='*60}")
if written > 0 and not args.dry_run:
print(f"\n ACBv6 will now find ExF data for klines dates.")
print(f" Re-run test_pf_5y_klines.py to get the full-boost ACBv6 results.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,342 @@
"""
backfill_liquidations_exf.py — Backfill liquidation ExF channels for 5y klines dates.
Fetches aggregate BTC liquidation data from Coinglass historical API and appends
4 new channels (liq_vol_24h, liq_long_ratio, liq_z_score, liq_percentile) to the
existing scan_000001__Indicators.npz files under EIGENVALUES_PATH.
Usage (from external_factors/ dir):
python backfill_liquidations_exf.py
python backfill_liquidations_exf.py --dry-run
python backfill_liquidations_exf.py --start 2023-01-01 --end 2023-12-31
python backfill_liquidations_exf.py --mode standalone
Output: each NPZ gains 4 new channels. Log → ../../backfill_liquidations.log
"""
import sys
import time
import argparse
import asyncio
import math
import logging
from pathlib import Path
from datetime import datetime, timezone
import numpy as np
import aiohttp
# --- Paths (same as backfill_klines_exf.py) ---
HCM_DIR = Path(__file__).parent.parent
KLINES_DIR = HCM_DIR / "vbt_cache_klines"
EIGENVALUES_PATH = Path(
r"C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues"
)
NPZ_FILENAME = "scan_000001__Indicators.npz"
LIQ_NPZ_FILENAME = "scan_000001__Liq_Indicators.npz" # for --mode standalone
LOG_PATH = HCM_DIR / "backfill_liquidations.log"
LIQ_KEYS = ["liq_vol_24h", "liq_long_ratio", "liq_z_score", "liq_percentile"]
# --- Coinglass endpoint ---
# Coinglass API v4 requires CG-API-KEY header
CG_URL_V4 = "https://open-api-v4.coinglass.com/api/futures/liquidation/aggregated-history"
RATE_DELAY = 2.0 # seconds between requests
# Configure logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
handlers=[
logging.FileHandler(str(LOG_PATH), encoding="utf-8"),
logging.StreamHandler(sys.stdout),
],
)
log = logging.getLogger(__name__)
def parse_args():
p = argparse.ArgumentParser(description="Backfill liquidation ExF channels")
p.add_argument(
"--start", default=None, help="Start date YYYY-MM-DD (inclusive)"
)
p.add_argument("--end", default=None, help="End date YYYY-MM-DD (inclusive)")
p.add_argument("--dry-run", action="store_true")
p.add_argument("--delay", type=float, default=2.0)
p.add_argument("--overwrite", action="store_true")
p.add_argument("--mode", default="append", choices=["append", "standalone"])
p.add_argument("--api-key", default=None, help="Coinglass API key (or set COINGLASS_API_KEY env var)")
return p.parse_args()
def get_api_key(args) -> str:
"""Get Coinglass API key from args or environment."""
import os
key = args.api_key or os.environ.get("COINGLASS_API_KEY", "")
return key
async def fetch_coinglass_day(
session: aiohttp.ClientSession, ds: str, api_key: str
) -> tuple:
"""
Fetch liquidation bars for date string 'YYYY-MM-DD'.
Returns (liq_vol_log, liq_long_ratio, success: bool).
Uses Coinglass API v4 which requires CG-API-KEY header.
"""
if not api_key:
log.error(f" {ds}: No Coinglass API key provided")
return (0.0, 0.5, False)
# Coinglass v4 uses different time format (Unix seconds, not ms)
yr, mo, dy = int(ds[:4]), int(ds[5:7]), int(ds[8:10])
start_ts = int(datetime(yr, mo, dy, 0, 0, 0, tzinfo=timezone.utc).timestamp())
end_ts = int(datetime(yr, mo, dy, 23, 59, 59, tzinfo=timezone.utc).timestamp())
# v4 API params - uses 'startTime' and 'endTime' in seconds
params = {
"symbol": "BTC",
"interval": "1h",
"startTime": start_ts,
"endTime": end_ts,
}
headers = {
"CG-API-KEY": api_key,
"Accept": "application/json",
}
for attempt in range(3):
try:
async with session.get(
CG_URL_V4,
params=params,
headers=headers,
timeout=aiohttp.ClientTimeout(total=15),
) as resp:
if resp.status == 429:
log.warning(f" {ds}: rate limited (429) — sleeping 30s")
await asyncio.sleep(30)
continue
if resp.status == 403:
log.error(f" {ds}: HTTP 403 - Invalid or missing API key")
return (0.0, 0.5, False)
if resp.status != 200:
log.warning(f" {ds}: HTTP {resp.status}")
return (0.0, 0.5, False)
data = await resp.json(content_type=None)
# Parse v4 response
# Response: {"code":"0","msg":"success","data": [{"t":1234567890, "longLiquidationUsd":123.0, "shortLiquidationUsd":456.0}, ...]}
if data.get("code") != "0":
log.warning(f" {ds}: API error: {data.get('msg', 'unknown')}")
return (0.0, 0.5, False)
bars = data.get("data", [])
if not bars:
log.warning(f" {ds}: empty liquidation data")
return (0.0, 0.5, False)
long_total = sum(float(b.get("longLiquidationUsd", 0)) for b in bars)
short_total = sum(float(b.get("shortLiquidationUsd", 0)) for b in bars)
total = long_total + short_total
liq_vol_log = math.log10(total + 1.0)
liq_long_ratio = (long_total / total) if total > 0 else 0.5
return (liq_vol_log, liq_long_ratio, True)
except asyncio.TimeoutError:
log.warning(f" {ds}: timeout (attempt {attempt+1}/3)")
await asyncio.sleep(10)
except Exception as e:
log.warning(f" {ds}: error {e} (attempt {attempt+1}/3)")
await asyncio.sleep(10)
return (0.0, 0.5, False)
def compute_derived_metrics(dates, raw_vols, raw_success):
"""Compute z_score and percentile across full series."""
dates_sorted = sorted(dates)
vols = np.array([raw_vols.get(d, 0.0) for d in dates_sorted])
success = np.array([raw_success.get(d, False) for d in dates_sorted])
z_scores = {}
percentiles = {}
WINDOW = 30
for i, ds in enumerate(dates_sorted):
if not success[i]:
z_scores[ds] = (0.0, False)
percentiles[ds] = (0.5, False)
continue
# z_score vs 30d rolling window
start = max(0, i - WINDOW)
w_vals = vols[start:i][success[start:i]]
if len(w_vals) >= 5:
z = float((vols[i] - w_vals.mean()) / (w_vals.std() + 1e-8))
z_scores[ds] = (z, True)
else:
z_scores[ds] = (0.0, False)
# percentile vs full history to date
hist = vols[: i + 1][success[: i + 1]]
if len(hist) >= 10:
pct = float((hist < vols[i]).sum()) / len(hist)
percentiles[ds] = (pct, True)
else:
percentiles[ds] = (0.5, False)
return z_scores, percentiles
def append_liq_to_npz(npz_path, liq_values, overwrite, dry_run):
"""Append 4 liq channels to existing NPZ. liq_values = {key: (float, bool)}."""
if not npz_path.exists():
# Create minimal NPZ (rare case)
names = np.array(LIQ_KEYS, dtype=object)
inds = np.array([liq_values[k][0] for k in LIQ_KEYS], dtype=np.float64)
succ = np.array([liq_values[k][1] for k in LIQ_KEYS], dtype=np.bool_)
else:
data = np.load(str(npz_path), allow_pickle=True)
existing_names = [str(n) for n in data["api_names"]]
if "liq_vol_24h" in existing_names and not overwrite:
return False # idempotent skip
# Strip old liq channels if overwriting
if overwrite and "liq_vol_24h" in existing_names:
keep = [
i
for i, n in enumerate(existing_names)
if not n.startswith("liq_")
]
existing_names = [existing_names[i] for i in keep]
ex_inds = data["api_indicators"][keep]
ex_succ = data["api_success"][keep]
else:
ex_inds = data["api_indicators"]
ex_succ = data["api_success"]
names = np.array(existing_names + LIQ_KEYS, dtype=object)
inds = np.concatenate(
[
ex_inds.astype(np.float64),
np.array([liq_values[k][0] for k in LIQ_KEYS], dtype=np.float64),
]
)
succ = np.concatenate(
[
ex_succ.astype(np.bool_),
np.array([liq_values[k][1] for k in LIQ_KEYS], dtype=np.bool_),
]
)
if not dry_run:
np.savez_compressed(
str(npz_path), api_names=names, api_indicators=inds, api_success=succ
)
return True
async def main_async(args):
# Enumerate klines dates
parquet_files = sorted(KLINES_DIR.glob("*.parquet"))
parquet_files = [p for p in parquet_files if "catalog" not in str(p)]
dates = [p.stem for p in parquet_files]
if args.start:
dates = [d for d in dates if d >= args.start]
if args.end:
dates = [d for d in dates if d <= args.end]
total = len(dates)
log.info(f"Dates to process: {total}")
log.info(f"Mode: {args.mode} Dry-run: {args.dry_run} Overwrite: {args.overwrite}")
raw_vols = {}
raw_ratios = {}
raw_success = {}
# Get API key
api_key = get_api_key(args)
if not api_key:
log.warning("No Coinglass API key provided! Use --api-key or set COINGLASS_API_KEY env var.")
log.warning("Get a free API key at: https://www.coinglass.com/pricing")
# Phase 1: Fetch raw data from Coinglass
log.info("=== PHASE 1: Fetching Coinglass liquidation data ===")
t0 = time.time()
async with aiohttp.ClientSession() as session:
for i, ds in enumerate(sorted(dates)):
vol, ratio, ok = await fetch_coinglass_day(session, ds, api_key)
raw_vols[ds] = vol
raw_ratios[ds] = ratio
raw_success[ds] = ok
if (i + 1) % 10 == 0:
elapsed = time.time() - t0
eta = (total - i - 1) * args.delay
log.info(
f" [{i+1}/{total}] {ds} vol={vol:.3f} ratio={ratio:.3f} ok={ok}"
f" elapsed={elapsed/60:.1f}m eta={eta/60:.1f}m"
)
else:
log.info(f" {ds} vol={vol:.3f} ratio={ratio:.3f} ok={ok}")
await asyncio.sleep(args.delay)
# Phase 2: Compute derived metrics
log.info("=== PHASE 2: Computing z_score and percentile ===")
z_scores, percentiles = compute_derived_metrics(dates, raw_vols, raw_success)
# Phase 3: Append to NPZ files
log.info(f"=== PHASE 3: Appending to NPZ files (mode={args.mode}) ===")
written = skipped = errors = 0
for ds in sorted(dates):
liq_values = {
"liq_vol_24h": (raw_vols.get(ds, 0.0), raw_success.get(ds, False)),
"liq_long_ratio": (raw_ratios.get(ds, 0.5), raw_success.get(ds, False)),
"liq_z_score": z_scores.get(ds, (0.0, False)),
"liq_percentile": percentiles.get(ds, (0.5, False)),
}
out_dir = EIGENVALUES_PATH / ds
if args.mode == "append":
npz_path = out_dir / NPZ_FILENAME
else: # standalone
npz_path = out_dir / LIQ_NPZ_FILENAME
out_dir.mkdir(parents=True, exist_ok=True)
try:
did_write = append_liq_to_npz(npz_path, liq_values, args.overwrite, args.dry_run)
if did_write:
written += 1
log.debug(f" {ds}: written")
else:
skipped += 1
except Exception as e:
log.error(f" {ds}: NPZ write error — {e}")
errors += 1
elapsed_total = time.time() - t0
log.info(f"{'='*60}")
log.info(f"Liquidation ExF Backfill COMPLETE")
log.info(f"Written: {written}")
log.info(f"Skipped: {skipped} (already had liq channels)")
log.info(f"Errors: {errors}")
log.info(f"Runtime: {elapsed_total/60:.1f}m")
log.info(f"{'='*60}")
def main():
args = parse_args()
asyncio.run(main_async(args))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,398 @@
"""ExF NPZ Patcher — Supplemental Historical Backfill
======================================================
The initial backfill got ~41/85 indicators. This script patches the existing
NPZ files with real historical values for indicators that were failing:
Priority 1 — fng (Alternative.me): one API call returns 2000+ days. EASY.
Priority 2 — oi_btc/eth, ls_btc/eth, ls_top, taker (Binance hist endpoints)
Priority 3 — vix, sp500, gold, dxy, us10y, ycurve, fedfunds (FRED — needs key)
Priority 4 — mvrv, nvt, addr_btc (CoinMetrics community API)
Strategy: load each NPZ, replace failing indicator values with fetched historical
data, re-save. Idempotent — re-run any time.
Usage:
python backfill_patch_npz.py # patch all dates
python backfill_patch_npz.py --dry-run # show what would change
python backfill_patch_npz.py --fred-key YOUR_KEY_HERE # enable FRED
python backfill_patch_npz.py --skip-binance # skip Binance OI/LS/taker
"""
import sys, time, argparse, json
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
from pathlib import Path
from datetime import datetime, timezone, date, timedelta
import numpy as np
try:
import requests
HAS_REQUESTS = True
except ImportError:
HAS_REQUESTS = False
print("WARNING: requests not installed. Install with: pip install requests")
import sys as _sys
EIGENVALUES_PATH = (Path(r"C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues")
if _sys.platform == 'win32' else Path('/mnt/ng6_data/eigenvalues'))
KLINES_DIR = (Path(r"C:\Users\Lenovo\Documents\- DOLPHIN NG HD HCM TSF Predict\vbt_cache_klines")
if _sys.platform == 'win32' else Path('/mnt/dolphin/vbt_cache_klines'))
NPZ_FILENAME = "scan_000001__Indicators.npz"
REQUEST_TIMEOUT = 20
def parse_args():
p = argparse.ArgumentParser()
p.add_argument("--dry-run", action="store_true")
p.add_argument("--fred-key", default="", help="FRED API key (free: fred.stlouisfed.org)")
p.add_argument("--skip-binance", action="store_true")
p.add_argument("--skip-fred", action="store_true")
p.add_argument("--skip-fng", action="store_true")
p.add_argument("--start", default=None, help="Start date YYYY-MM-DD")
p.add_argument("--end", default=None, help="End date YYYY-MM-DD")
return p.parse_args()
# ── FNG (Alternative.me) — one call, all history ─────────────────────────────
def fetch_fng_history():
"""Returns dict: date_str -> fng_value (int)."""
url = "https://api.alternative.me/fng/?limit=2000&format=json&date_format=us"
try:
r = requests.get(url, timeout=REQUEST_TIMEOUT)
r.raise_for_status()
data = r.json()
result = {}
for entry in data.get('data', []):
# date_format=us gives MM/DD/YYYY
raw_date = entry.get('timestamp') or entry.get('time_until_update', '')
# Try two formats the API uses
ts_str = str(entry.get('timestamp', ''))
parsed = False
for fmt in ('%m-%d-%Y', '%m/%d/%Y', '%Y-%m-%d'):
try:
dt = datetime.strptime(ts_str, fmt)
key = dt.strftime('%Y-%m-%d')
result[key] = int(entry['value'])
parsed = True
break
except ValueError:
pass
if not parsed:
try:
ts = int(ts_str)
dt = datetime.utcfromtimestamp(ts)
key = dt.strftime('%Y-%m-%d')
result[key] = int(entry['value'])
except Exception:
pass
return result
except Exception as e:
print(f" FNG fetch failed: {e}")
return {}
# ── Binance historical OI / LS / taker ───────────────────────────────────────
def fetch_binance_hist(url_template, symbol, date_str):
"""Fetch a single data point from Binance hist endpoint for given date (noon UTC)."""
yr, mo, dy = int(date_str[:4]), int(date_str[5:7]), int(date_str[8:10])
noon_utc = datetime(yr, mo, dy, 12, 0, 0, tzinfo=timezone.utc)
start_ms = int(noon_utc.timestamp() * 1000)
end_ms = start_ms + 3_600_000 # +1 hour window
url = url_template.format(SYMBOL=symbol, start_ms=start_ms, end_ms=end_ms)
try:
r = requests.get(url, timeout=REQUEST_TIMEOUT)
if r.status_code == 400:
return None # data too old for this endpoint
r.raise_for_status()
data = r.json()
if isinstance(data, list) and len(data) > 0:
return data[0]
return None
except Exception:
return None
OI_URL = "https://fapi.binance.com/futures/data/openInterestHist?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1"
LS_URL = "https://fapi.binance.com/futures/data/globalLongShortAccountRatio?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1"
LS_TOP = "https://fapi.binance.com/futures/data/topLongShortAccountRatio?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1"
TAKER_URL = "https://fapi.binance.com/futures/data/takerlongshortRatio?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1"
def get_binance_indicators(date_str):
"""Returns dict of indicator_name -> value (or None on failure)."""
results = {}
for name, url, sym, field in [
('oi_btc', OI_URL, 'BTCUSDT', 'sumOpenInterest'),
('oi_eth', OI_URL, 'ETHUSDT', 'sumOpenInterest'),
('ls_btc', LS_URL, 'BTCUSDT', 'longShortRatio'),
('ls_eth', LS_URL, 'ETHUSDT', 'longShortRatio'),
('ls_top', LS_TOP, 'BTCUSDT', 'longShortRatio'),
('taker', TAKER_URL,'BTCUSDT', 'buySellRatio'),
]:
rec = fetch_binance_hist(url, sym, date_str)
if rec is not None and field in rec:
try:
results[name] = float(rec[field])
except (TypeError, ValueError):
results[name] = None
else:
results[name] = None
time.sleep(0.05) # light rate limiting
return results
# ── FRED ─────────────────────────────────────────────────────────────────────
FRED_SERIES = {
'vix': 'VIXCLS',
'sp500': 'SP500',
'gold': 'GOLDAMGBD228NLBM',
'dxy': 'DTWEXBGS',
'us10y': 'DGS10',
'us2y': 'DGS2',
'ycurve': 'T10Y2Y',
'fedfunds': 'DFF',
'hy_spread': 'BAMLH0A0HYM2',
'be5y': 'T5YIE',
'm2': 'WM2NS',
}
_fred_cache = {} # series_id -> {date_str -> value}
def fetch_fred_series(series_id, fred_key, lookback_years=6):
"""Fetch a FRED series for the last 6 years. Cached."""
if series_id in _fred_cache:
return _fred_cache[series_id]
start = (date.today() - timedelta(days=lookback_years*366)).strftime('%Y-%m-%d')
url = (f"https://api.stlouisfed.org/fred/series/observations"
f"?series_id={series_id}&api_key={fred_key}&file_type=json"
f"&observation_start={start}")
try:
r = requests.get(url, timeout=REQUEST_TIMEOUT)
r.raise_for_status()
data = r.json()
result = {}
prev = None
for obs in data.get('observations', []):
v = obs.get('value', '.')
if v not in ('.', '', 'nd'):
try:
prev = float(v)
except ValueError:
pass
if prev is not None:
result[obs['date']] = prev # forward-fill
_fred_cache[series_id] = result
return result
except Exception as e:
print(f" FRED {series_id} failed: {e}")
_fred_cache[series_id] = {}
return {}
def get_fred_indicators(date_str, fred_key):
results = {}
for name, series_id in FRED_SERIES.items():
series = fetch_fred_series(series_id, fred_key)
# Find value on or before date (forward-fill)
val = None
for d_str in sorted(series.keys(), reverse=True):
if d_str <= date_str:
val = series[d_str]
break
results[name] = val
return results
# ── CoinMetrics community ─────────────────────────────────────────────────────
_cm_cache = {} # (asset, metric) -> {date_str -> value}
def fetch_coinmetrics(asset, metric, date_str):
key = (asset, metric)
if key not in _cm_cache:
url = (f"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics"
f"?assets={asset}&metrics={metric}&frequency=1d"
f"&start_time=2021-01-01T00:00:00Z")
try:
r = requests.get(url, timeout=30)
r.raise_for_status()
data = r.json()
result = {}
for row in data.get('data', []):
d = row.get('time', '')[:10]
v = row.get(metric)
if v is not None:
try:
result[d] = float(v)
except (TypeError, ValueError):
pass
_cm_cache[key] = result
except Exception as e:
print(f" CoinMetrics {asset}/{metric} failed: {e}")
_cm_cache[key] = {}
cache = _cm_cache.get(key, {})
return cache.get(date_str)
CM_INDICATORS = [
# Only include metrics confirmed as accessible on community API
('mvrv', 'btc', 'CapMVRVCur'), # works (200 OK)
('addr_btc', 'btc', 'AdrActCnt'), # works
('txcnt', 'btc', 'TxCnt'), # works
]
# ── Main patcher ──────────────────────────────────────────────────────────────
def patch_npz(npz_path, updates, dry_run=False):
"""Load NPZ, apply updates dict {name -> value}, save in-place."""
data = np.load(str(npz_path), allow_pickle=True)
names = list(data['api_names'])
vals = data['api_indicators'].copy()
success = data['api_success'].copy()
changed = []
for name, value in updates.items():
if value is None or not np.isfinite(float(value)):
continue
if name not in names:
continue
idx = names.index(name)
old = float(vals[idx])
old_ok = bool(success[idx])
new_val = float(value)
if not old_ok or abs(old - new_val) > 1e-9:
vals[idx] = new_val
success[idx] = True
changed.append(f"{name}: {old:.4f}{new_val:.4f}")
if not changed:
return 0
if not dry_run:
ind_names = np.array(names, dtype=object)
np.savez_compressed(
str(npz_path),
api_names = ind_names,
api_indicators = vals,
api_success = success,
)
return len(changed)
def main():
args = parse_args()
if not HAS_REQUESTS:
print("ERROR: requests required. pip install requests"); return
# Enumerate dates
dates = sorted(p.stem for p in KLINES_DIR.glob("*.parquet") if 'catalog' not in p.name)
if args.start: dates = [d for d in dates if d >= args.start]
if args.end: dates = [d for d in dates if d <= args.end]
total = len(dates)
print(f"Dates to patch: {total}")
print(f"Dry run: {args.dry_run}")
print(f"FNG: {'skip' if args.skip_fng else 'YES'}")
print(f"Binance: {'skip' if args.skip_binance else 'YES'}")
print(f"FRED: {'skip (no key)' if (args.skip_fred or not args.fred_key) else f'YES (key={args.fred_key[:6]}...)'}")
print()
# ── Fetch FNG all-history up front (one call) ─────────────────────────────
fng_hist = {}
if not args.skip_fng:
print("Fetching FNG full history (one call)...")
fng_hist = fetch_fng_history()
print(f" Got {len(fng_hist)} dates "
f"range={min(fng_hist) if fng_hist else 'n/a'}{max(fng_hist) if fng_hist else 'n/a'}")
if fng_hist:
sample = {k: v for k, v in list(fng_hist.items())[:3]}
print(f" Sample: {sample}")
# ── Fetch FRED all-series up front ───────────────────────────────────────
if args.fred_key and not args.skip_fred:
print(f"\nPre-fetching FRED series ({len(FRED_SERIES)} series)...")
for name, sid in FRED_SERIES.items():
series = fetch_fred_series(sid, args.fred_key)
print(f" {name:<12} ({sid}): {len(series)} observations")
time.sleep(0.6) # FRED rate limit: 120/min
# ── Fetch CoinMetrics up front ────────────────────────────────────────────
print(f"\nPre-fetching CoinMetrics ({len(CM_INDICATORS)} metrics)...")
for cm_name, asset, metric in CM_INDICATORS:
fetch_coinmetrics(asset, metric, '2023-01-01') # warms cache for all dates
n = len(_cm_cache.get((asset, metric), {}))
print(f" {cm_name:<12}: {n} dates")
time.sleep(0.8)
# ── Per-date loop ─────────────────────────────────────────────────────────
print(f"\nPatching NPZ files...")
total_changed = 0
binance_fail_streak = 0
t0 = time.time()
for i, ds in enumerate(dates):
npz_path = EIGENVALUES_PATH / ds / NPZ_FILENAME
if not npz_path.exists():
continue
updates = {}
# FNG
if not args.skip_fng and ds in fng_hist:
updates['fng'] = float(fng_hist[ds])
# Also try to get sub-components from same entry if available
# (fng_prev is previous day's value)
prev_day = (datetime.strptime(ds, '%Y-%m-%d') - timedelta(days=1)).strftime('%Y-%m-%d')
if prev_day in fng_hist:
updates['fng_prev'] = float(fng_hist[prev_day])
# FRED
if args.fred_key and not args.skip_fred:
fred_vals = get_fred_indicators(ds, args.fred_key)
for name, val in fred_vals.items():
if val is not None:
updates[name] = val
# CoinMetrics
for cm_name, asset, metric in CM_INDICATORS:
val = fetch_coinmetrics(asset, metric, ds) # hits cache
if val is not None:
updates[cm_name] = val
# Binance OI/LS/taker (network call per date — slowest)
if not args.skip_binance and binance_fail_streak < 10:
# Only call if these are currently failing in the NPZ
d = np.load(str(npz_path), allow_pickle=True)
names_in_npz = list(d['api_names'])
ok_in_npz = d['api_success']
taker_idx = names_in_npz.index('taker') if 'taker' in names_in_npz else -1
taker_ok = bool(ok_in_npz[taker_idx]) if taker_idx >= 0 else False
if not taker_ok: # proxy check: if taker failing, all Binance hist likely failing
binance_vals = get_binance_indicators(ds)
n_binance_ok = sum(1 for v in binance_vals.values() if v is not None)
if n_binance_ok == 0:
binance_fail_streak += 1
else:
binance_fail_streak = 0
updates.update({k: v for k, v in binance_vals.items() if v is not None})
# Patch
n_changed = patch_npz(npz_path, updates, dry_run=args.dry_run)
total_changed += n_changed
if (i + 1) % 50 == 0 or n_changed > 0:
elapsed = time.time() - t0
rate = (i + 1) / elapsed
eta = (total - i - 1) / rate if rate > 0 else 0
tag = f" +{n_changed} fields" if n_changed else ""
print(f" [{i+1}/{total}] {ds} {elapsed/60:.1f}m eta={eta/60:.1f}m{tag}")
elapsed = time.time() - t0
print(f"\n{'='*60}")
print(f" Patch complete in {elapsed/60:.1f}m")
print(f" Total fields updated: {total_changed}")
print(f" {'DRY RUN — no files written' if args.dry_run else 'Files patched in-place'}")
print(f"{'='*60}")
if not args.fred_key:
print(f"\n *** FRED indicators (vix, sp500, gold, dxy, us10y, ycurve, fedfunds)")
print(f" *** were SKIPPED. Get a free API key at: https://fred.stlouisfed.org/docs/api/api_key.html")
print(f" *** Then re-run with: --fred-key YOUR_KEY_HERE")
if binance_fail_streak >= 10:
print(f"\n *** Binance hist endpoints failed consistently.")
print(f" *** OI data before 2020-09 is not available via Binance API.")
print(f" *** Dates before that will remain FAIL for oi_btc, ls_btc, taker.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,466 @@
#!/usr/bin/env python3
"""
DOLPHIN BACKFILL RUNNER v2.0
============================
Spiders DOLPHIN scan directories, enriches with external factors matrix.
INDICATOR SOURCES:
1. API_HISTORICAL: Fetched with scan timestamp (CoinMetrics, FRED, DeFi Llama, etc.)
2. SCAN_DERIVED: Computed from scan's market_prices, tracking_data, per_asset_signals
3. UNAVAILABLE: No historical API AND cannot compute from scan → NaN
Output: {original_name}__Indicators.npz (sorts alphabetically next to source)
Author: HJ / Claude
Version: 2.0.0
"""
import os
import sys
import json
import numpy as np
import asyncio
import aiohttp
from pathlib import Path
from datetime import datetime, timezone
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple, Any, Set
import logging
import time
import argparse
# Import external factors module
from external_factors_matrix import (
ExternalFactorsFetcher, Config, INDICATORS, N_INDICATORS,
HistoricalSupport, Stationarity, Category
)
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
logger = logging.getLogger(__name__)
# =============================================================================
# INDICATOR SOURCE CLASSIFICATION
# =============================================================================
class IndicatorSource:
"""Classifies each indicator by how it can be obtained for backfill"""
# Indicators that HAVE historical API support (fetch with timestamp)
API_HISTORICAL: Set[int] = set()
# Indicators that are UNAVAILABLE (no history, can't derive from scan)
UNAVAILABLE: Set[int] = set()
@classmethod
def classify(cls):
"""Classify all indicators by their backfill source"""
for ind in INDICATORS:
if ind.historical in [HistoricalSupport.FULL, HistoricalSupport.PARTIAL]:
cls.API_HISTORICAL.add(ind.id)
else:
cls.UNAVAILABLE.add(ind.id)
logger.info(f"Indicator sources: API_HISTORICAL={len(cls.API_HISTORICAL)}, "
f"UNAVAILABLE={len(cls.UNAVAILABLE)}")
@classmethod
def get_unavailable_names(cls) -> List[str]:
return [INDICATORS[i-1].name for i in sorted(cls.UNAVAILABLE)]
# Initialize classification
IndicatorSource.classify()
# =============================================================================
# CONFIGURATION
# =============================================================================
@dataclass
class BackfillConfig:
scan_dir: Path(r"C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues")
output_dir: Optional[str] = None
skip_existing: bool = True
dry_run: bool = False
fred_api_key: str = ""
rate_limit_delay: float = 0.5
verbose: bool = False
# =============================================================================
# SCAN DATA
# =============================================================================
@dataclass
class ScanData:
path: Path
scan_number: int
timestamp: datetime
market_prices: Dict[str, float]
windows: Dict[str, Dict]
@property
def n_assets(self) -> int:
return len(self.market_prices)
@property
def symbols(self) -> List[str]:
return sorted(self.market_prices.keys())
def get_tracking(self, window: str) -> Dict:
return self.windows.get(window, {}).get('tracking_data', {})
def get_regime(self, window: str) -> Dict:
return self.windows.get(window, {}).get('regime_signals', {})
def get_asset_signals(self, window: str) -> Dict:
return self.windows.get(window, {}).get('per_asset_signals', {})
# =============================================================================
# INDICATORS FROM SCAN DATA
# =============================================================================
WINDOWS = ['50', '150', '300', '750']
# Global scan-derived indicators (eigenvalue-based, from tracking_data/regime_signals)
SCAN_GLOBAL_INDICATORS = [
# Lambda max per window
*[(f"lambda_max_w{w}", f"Lambda max window {w}") for w in WINDOWS],
*[(f"lambda_min_w{w}", f"Lambda min window {w}") for w in WINDOWS],
*[(f"lambda_vel_w{w}", f"Lambda velocity window {w}") for w in WINDOWS],
*[(f"lambda_acc_w{w}", f"Lambda acceleration window {w}") for w in WINDOWS],
*[(f"eigrot_max_w{w}", f"Eigenvector rotation window {w}") for w in WINDOWS],
*[(f"eiggap_w{w}", f"Eigenvalue gap window {w}") for w in WINDOWS],
*[(f"instab_w{w}", f"Instability window {w}") for w in WINDOWS],
*[(f"transp_w{w}", f"Transition prob window {w}") for w in WINDOWS],
*[(f"coher_w{w}", f"Coherence window {w}") for w in WINDOWS],
# Aggregates
("lambda_max_mean", "Mean lambda max"),
("lambda_max_std", "Std lambda max"),
("instab_mean", "Mean instability"),
("instab_max", "Max instability"),
("coher_mean", "Mean coherence"),
("coher_min", "Min coherence"),
("coher_trend", "Coherence trend (w750-w50)"),
# From prices
("n_assets", "Number of assets"),
("price_dispersion", "Log price dispersion"),
]
N_SCAN_GLOBAL = len(SCAN_GLOBAL_INDICATORS)
# Per-asset indicators
PER_ASSET_INDICATORS = [
("price", "Price"),
("log_price", "Log price"),
("price_rank", "Price percentile"),
("price_btc", "Price / BTC"),
("price_eth", "Price / ETH"),
*[(f"align_w{w}", f"Alignment w{w}") for w in WINDOWS],
*[(f"decouple_w{w}", f"Decoupling w{w}") for w in WINDOWS],
*[(f"anomaly_w{w}", f"Anomaly w{w}") for w in WINDOWS],
*[(f"eigvec_w{w}", f"Eigenvector w{w}") for w in WINDOWS],
("align_mean", "Mean alignment"),
("align_std", "Alignment std"),
("anomaly_max", "Max anomaly"),
("decouple_max", "Max |decoupling|"),
]
N_PER_ASSET = len(PER_ASSET_INDICATORS)
# =============================================================================
# PROCESSOR
# =============================================================================
class ScanProcessor:
def __init__(self, config: BackfillConfig):
self.config = config
self.fetcher = ExternalFactorsFetcher(Config(fred_api_key=config.fred_api_key))
def load_scan(self, path: Path) -> Optional[ScanData]:
try:
with open(path, 'r') as f:
data = json.load(f)
ts_str = data.get('timestamp', '')
try:
timestamp = datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
if timestamp.tzinfo is None:
timestamp = timestamp.replace(tzinfo=timezone.utc)
except:
timestamp = datetime.now(timezone.utc)
return ScanData(
path=path,
scan_number=data.get('scan_number', 0),
timestamp=timestamp,
market_prices=data.get('market_prices', {}),
windows=data.get('windows', {})
)
except Exception as e:
logger.error(f"Load failed {path}: {e}")
return None
async def fetch_api_indicators(self, timestamp: datetime) -> Tuple[np.ndarray, np.ndarray]:
"""Fetch indicators with historical API support"""
try:
result = await self.fetcher.fetch_all(target_date=timestamp)
matrix = result['matrix']
success = np.array([
result['details'].get(i+1, {}).get('success', False)
for i in range(N_INDICATORS)
])
# Mark non-historical indicators as NaN
for i in range(N_INDICATORS):
if (i+1) not in IndicatorSource.API_HISTORICAL:
success[i] = False
matrix[i] = np.nan
return matrix, success
except Exception as e:
logger.warning(f"API fetch failed: {e}")
return np.full(N_INDICATORS, np.nan), np.zeros(N_INDICATORS, dtype=bool)
def compute_scan_global(self, scan: ScanData) -> np.ndarray:
"""Compute global indicators from scan's tracking_data and regime_signals"""
values = []
# Per-window metrics
for w in WINDOWS:
values.append(scan.get_tracking(w).get('lambda_max', np.nan))
for w in WINDOWS:
values.append(scan.get_tracking(w).get('lambda_min', np.nan))
for w in WINDOWS:
values.append(scan.get_tracking(w).get('lambda_max_velocity', np.nan))
for w in WINDOWS:
values.append(scan.get_tracking(w).get('lambda_max_acceleration', np.nan))
for w in WINDOWS:
values.append(scan.get_tracking(w).get('eigenvector_rotation_max', np.nan))
for w in WINDOWS:
values.append(scan.get_tracking(w).get('eigenvalue_gap', np.nan))
for w in WINDOWS:
values.append(scan.get_regime(w).get('instability_score', np.nan))
for w in WINDOWS:
values.append(scan.get_regime(w).get('regime_transition_probability', np.nan))
for w in WINDOWS:
values.append(scan.get_regime(w).get('market_coherence', np.nan))
# Aggregates
lmax = [scan.get_tracking(w).get('lambda_max', np.nan) for w in WINDOWS]
values.append(np.nanmean(lmax))
values.append(np.nanstd(lmax))
instab = [scan.get_regime(w).get('instability_score', np.nan) for w in WINDOWS]
values.append(np.nanmean(instab))
values.append(np.nanmax(instab))
coher = [scan.get_regime(w).get('market_coherence', np.nan) for w in WINDOWS]
values.append(np.nanmean(coher))
values.append(np.nanmin(coher))
values.append(coher[3] - coher[0] if not np.isnan(coher[3]) and not np.isnan(coher[0]) else np.nan)
# From prices
prices = np.array(list(scan.market_prices.values())) if scan.market_prices else np.array([])
values.append(len(prices))
values.append(np.std(np.log(np.maximum(prices, 1e-10))) if len(prices) > 0 else np.nan)
return np.array(values)
def compute_per_asset(self, scan: ScanData) -> Tuple[np.ndarray, List[str]]:
"""Compute per-asset indicator matrix"""
symbols = scan.symbols
n = len(symbols)
if n == 0:
return np.zeros((0, N_PER_ASSET)), []
matrix = np.zeros((n, N_PER_ASSET))
prices = np.array([scan.market_prices[s] for s in symbols])
btc_p = scan.market_prices.get('BTC', scan.market_prices.get('BTCUSDT', np.nan))
eth_p = scan.market_prices.get('ETH', scan.market_prices.get('ETHUSDT', np.nan))
col = 0
matrix[:, col] = prices; col += 1
matrix[:, col] = np.log(np.maximum(prices, 1e-10)); col += 1
matrix[:, col] = np.argsort(np.argsort(prices)) / n; col += 1
matrix[:, col] = prices / btc_p if btc_p > 0 else np.nan; col += 1
matrix[:, col] = prices / eth_p if eth_p > 0 else np.nan; col += 1
# Per-window signals
for metric in ['market_alignment', 'decoupling_velocity', 'anomaly_score', 'eigenvector_component']:
for w in WINDOWS:
sigs = scan.get_asset_signals(w)
for i, sym in enumerate(symbols):
matrix[i, col] = sigs.get(sym, {}).get(metric, np.nan)
col += 1
# Aggregates
align_cols = list(range(5, 9))
matrix[:, col] = np.nanmean(matrix[:, align_cols], axis=1); col += 1
matrix[:, col] = np.nanstd(matrix[:, align_cols], axis=1); col += 1
anomaly_cols = list(range(13, 17))
matrix[:, col] = np.nanmax(matrix[:, anomaly_cols], axis=1); col += 1
decouple_cols = list(range(9, 13))
matrix[:, col] = np.nanmax(np.abs(matrix[:, decouple_cols]), axis=1); col += 1
return matrix, symbols
async def process(self, path: Path) -> Optional[Dict[str, Any]]:
start = time.time()
scan = self.load_scan(path)
if scan is None:
return None
# 1. API historical indicators
api_matrix, api_success = await self.fetch_api_indicators(scan.timestamp)
# 2. Scan-derived global
scan_global = self.compute_scan_global(scan)
# 3. Per-asset
asset_matrix, asset_symbols = self.compute_per_asset(scan)
return {
'scan_number': scan.scan_number,
'timestamp': scan.timestamp.isoformat(),
'processing_time': time.time() - start,
'api_indicators': api_matrix,
'api_success': api_success,
'api_names': np.array([ind.name for ind in INDICATORS], dtype='U32'),
'scan_global': scan_global,
'scan_global_names': np.array([n for n, _ in SCAN_GLOBAL_INDICATORS], dtype='U32'),
'asset_matrix': asset_matrix,
'asset_symbols': np.array(asset_symbols, dtype='U16'),
'asset_names': np.array([n for n, _ in PER_ASSET_INDICATORS], dtype='U32'),
'n_assets': len(asset_symbols),
'api_success_rate': np.nanmean(api_success[list(i-1 for i in IndicatorSource.API_HISTORICAL)]),
}
# =============================================================================
# OUTPUT
# =============================================================================
class OutputWriter:
def __init__(self, config: BackfillConfig):
self.config = config
def get_output_path(self, scan_path: Path) -> Path:
out_dir = Path(self.config.output_dir) if self.config.output_dir else scan_path.parent
out_dir.mkdir(parents=True, exist_ok=True)
return out_dir / f"{scan_path.stem}__Indicators.npz"
def save(self, data: Dict[str, Any], scan_path: Path) -> Path:
out_path = self.get_output_path(scan_path)
save_data = {}
for k, v in data.items():
if isinstance(v, np.ndarray):
save_data[k] = v
elif isinstance(v, str):
save_data[k] = np.array([v], dtype='U64')
else:
save_data[k] = np.array([v])
np.savez_compressed(out_path, **save_data)
return out_path
# =============================================================================
# RUNNER
# =============================================================================
class BackfillRunner:
def __init__(self, config: BackfillConfig):
self.config = config
self.processor = ScanProcessor(config)
self.writer = OutputWriter(config)
self.stats = {'processed': 0, 'failed': 0, 'skipped': 0}
def find_scans(self) -> List[Path]:
root = Path(self.config.scan_dir)
files = sorted(root.rglob("scan_*.json"))
if self.config.skip_existing:
files = [f for f in files if not self.writer.get_output_path(f).exists()]
return files
async def run(self):
unavail = IndicatorSource.get_unavailable_names()
logger.info(f"Skipping {len(unavail)} unavailable indicators: {unavail[:5]}...")
files = self.find_scans()
logger.info(f"Processing {len(files)} files...")
for i, path in enumerate(files):
try:
result = await self.processor.process(path)
if result:
if not self.config.dry_run:
self.writer.save(result, path)
self.stats['processed'] += 1
else:
self.stats['failed'] += 1
except Exception as e:
logger.error(f"Error {path.name}: {e}")
self.stats['failed'] += 1
if (i + 1) % 10 == 0:
logger.info(f"Progress: {i+1}/{len(files)}")
if self.config.rate_limit_delay > 0:
await asyncio.sleep(self.config.rate_limit_delay)
logger.info(f"Done: {self.stats}")
return self.stats
# =============================================================================
# UTILITY
# =============================================================================
def load_indicators(path: str) -> Dict[str, np.ndarray]:
"""Load .npz indicator file"""
return dict(np.load(path, allow_pickle=True))
def summary(path: str) -> str:
"""Summary of indicator file"""
d = load_indicators(path)
return f"""Timestamp: {d['timestamp'][0]}
Assets: {d['n_assets'][0]}
API success: {d['api_success_rate'][0]:.1%}
API shape: {d['api_indicators'].shape}
Scan global: {d['scan_global'].shape}
Per-asset: {d['asset_matrix'].shape}"""
# =============================================================================
# CLI
# =============================================================================
def main():
parser = argparse.ArgumentParser(description="DOLPHIN Backfill Runner")
# parser.add_argument("scan_dir", help="Directory with scan JSON files")
parser.add_argument("-o", "--output", help="Output directory")
parser.add_argument("--fred-key", default="", help="FRED API key")
parser.add_argument("--no-skip", action="store_true", help="Reprocess existing")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--delay", type=float, default=0.5)
args = parser.parse_args()
config = BackfillConfig(
scan_dir= Path(r"C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues"),
output_dir=args.output,
# FRED API Key: c16a9cde3e3bb5bb972bb9283485f202
fred_api_key=args.fred_key or 'c16a9cde3e3bb5bb972bb9283485f202',
skip_existing=not args.no_skip,
dry_run=args.dry_run,
rate_limit_delay=args.delay,
)
runner = BackfillRunner(config)
asyncio.run(runner.run())
if __name__ == "__main__":
main()

1
external_factors/bf.bat Executable file
View File

@@ -0,0 +1 @@
"python backfill_runner.py"

7
external_factors/bk.bat Executable file
View File

@@ -0,0 +1,7 @@
@echo off
REM Backfill ExF NPZ files for all 1710 klines dates
REM Idempotent — safe to re-run if interrupted
REM ~2-5 hours total runtime
cd /d "%~dp0"
"C:\Users\Lenovo\Documents\- Siloqy\Scripts\python.exe" backfill_klines_exf.py %*
pause

1
external_factors/br.bat Executable file
View File

@@ -0,0 +1 @@
python backfill_runner.py

View File

@@ -0,0 +1,299 @@
import asyncio
import datetime
import json
import logging
import math
import threading
import time
import zoneinfo
from pathlib import Path
from typing import Dict, Any, Optional
import numpy as np
from astropy.time import Time
import astropy.coordinates as coord
import astropy.units as u
from astropy.coordinates import solar_system_ephemeris, get_body, EarthLocation
logger = logging.getLogger(__name__)
class MarketIndicators:
"""
Mathematical and astronomical calculations for the Esoteric Factors mapping.
Evaluates completely locally without external API dependencies.
"""
def __init__(self):
# Regions defined by NON-OVERLAPPING population clusters for accurate global weighting.
# Population in Millions (approximate). Liquidity weight is estimated crypto volume share.
self.regions = [
{'name': 'Americas', 'tz': 'America/New_York', 'pop': 1000, 'liq_weight': 0.35},
{'name': 'EMEA', 'tz': 'Europe/London', 'pop': 2200, 'liq_weight': 0.30},
{'name': 'South_Asia', 'tz': 'Asia/Kolkata', 'pop': 1400, 'liq_weight': 0.05},
{'name': 'East_Asia', 'tz': 'Asia/Shanghai', 'pop': 1600, 'liq_weight': 0.20},
{'name': 'Oceania_SEA', 'tz': 'Asia/Singapore', 'pop': 800, 'liq_weight': 0.10}
]
# Market cycle: Bitcoin halving based, ~4 years
self.cycle_length_days = 1460
self.last_halving = datetime.datetime(2024, 4, 20, tzinfo=datetime.timezone.utc)
# Cache for expensive ASTRO calculations
self._cache = {
'moon': {'val': None, 'ts': 0},
'mercury': {'val': None, 'ts': 0}
}
self.cache_ttl_seconds = 3600 * 6 # Update astro every 6 hours
def get_calendar_items(self, now: datetime.datetime) -> Dict[str, int]:
return {
'year': now.year,
'month': now.month,
'day_of_month': now.day,
'hour': now.hour,
'minute': now.minute,
'day_of_week': now.weekday(), # 0=Monday
'week_of_year': now.isocalendar().week
}
def is_tradfi_open(self, region_name: str, local_time: datetime.datetime) -> bool:
day = local_time.weekday()
if day >= 5: return False
hour_dec = local_time.hour + local_time.minute / 60.0
if 'Americas' in region_name:
return 9.5 <= hour_dec < 16.0
elif 'EMEA' in region_name:
return 8.0 <= hour_dec < 16.5
elif 'Asia' in region_name:
return 9.0 <= hour_dec < 15.0
return False
def get_regional_times(self, now_utc: datetime.datetime) -> Dict[str, Any]:
times = {}
for region in self.regions:
tz = zoneinfo.ZoneInfo(region['tz'])
local_time = now_utc.astimezone(tz)
times[region['name']] = {
'hour': local_time.hour + local_time.minute / 60.0,
'is_tradfi_open': self.is_tradfi_open(region['name'], local_time)
}
return times
def get_liquidity_session(self, now_utc: datetime.datetime) -> str:
utc_hour = now_utc.hour + now_utc.minute / 60.0
if 13 <= utc_hour < 17:
return "LONDON_NEW_YORK_OVERLAP"
elif 8 <= utc_hour < 13:
return "LONDON_MORNING"
elif 0 <= utc_hour < 8:
return "ASIA_PACIFIC"
elif 17 <= utc_hour < 21:
return "NEW_YORK_AFTERNOON"
else:
return "LOW_LIQUIDITY"
def get_weighted_times(self, now_utc: datetime.datetime) -> tuple[float, float]:
pop_sin, pop_cos = 0.0, 0.0
liq_sin, liq_cos = 0.0, 0.0
total_pop = sum(r['pop'] for r in self.regions)
for region in self.regions:
tz = zoneinfo.ZoneInfo(region['tz'])
local_time = now_utc.astimezone(tz)
hour_frac = (local_time.hour + local_time.minute / 60.0) / 24.0
angle = 2 * math.pi * hour_frac
w_pop = region['pop'] / total_pop
pop_sin += math.sin(angle) * w_pop
pop_cos += math.cos(angle) * w_pop
w_liq = region['liq_weight']
liq_sin += math.sin(angle) * w_liq
liq_cos += math.cos(angle) * w_liq
pop_angle = math.atan2(pop_sin, pop_cos)
if pop_angle < 0: pop_angle += 2 * math.pi
pop_hour = (pop_angle / (2 * math.pi)) * 24
liq_angle = math.atan2(liq_sin, liq_cos)
if liq_angle < 0: liq_angle += 2 * math.pi
liq_hour = (liq_angle / (2 * math.pi)) * 24
return round(pop_hour, 2), round(liq_hour, 2)
def get_market_cycle_position(self, now_utc: datetime.datetime) -> float:
days_since_halving = (now_utc - self.last_halving).days
position = (days_since_halving % self.cycle_length_days) / self.cycle_length_days
return position
def get_fibonacci_time(self, now_utc: datetime.datetime) -> Dict[str, Any]:
mins_passed = now_utc.hour * 60 + now_utc.minute
fib_seq = [1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]
closest = min(fib_seq, key=lambda x: abs(x - mins_passed))
distance = abs(mins_passed - closest)
strength = 1.0 - min(distance / 30.0, 1.0)
return {'closest_fib_minute': closest, 'harmonic_strength': round(strength, 3)}
def get_moon_phase(self, now_utc: datetime.datetime) -> Dict[str, Any]:
now_ts = now_utc.timestamp()
if self._cache['moon']['val'] and (now_ts - self._cache['moon']['ts'] < self.cache_ttl_seconds):
return self._cache['moon']['val']
t = Time(now_utc)
with solar_system_ephemeris.set('builtin'):
moon = get_body('moon', t)
sun = get_body('sun', t)
elongation = sun.separation(moon)
phase_angle = np.arctan2(sun.distance * np.sin(elongation),
moon.distance - sun.distance * np.cos(elongation))
illumination = (1 + np.cos(phase_angle)) / 2.0
phase_name = "WAXING"
if illumination < 0.03: phase_name = "NEW_MOON"
elif illumination > 0.97: phase_name = "FULL_MOON"
elif illumination < 0.5: phase_name = "WAXING_CRESCENT" if moon.dec.deg > sun.dec.deg else "WANING_CRESCENT"
else: phase_name = "WAXING_GIBBOUS" if moon.dec.deg > sun.dec.deg else "WANING_GIBBOUS"
result = {'illumination': float(illumination), 'phase_name': phase_name}
self._cache['moon'] = {'val': result, 'ts': now_ts}
return result
def is_mercury_retrograde(self, now_utc: datetime.datetime) -> bool:
now_ts = now_utc.timestamp()
if self._cache['mercury']['val'] is not None and (now_ts - self._cache['mercury']['ts'] < self.cache_ttl_seconds):
return self._cache['mercury']['val']
t = Time(now_utc)
is_retro = False
try:
with solar_system_ephemeris.set('builtin'):
loc = EarthLocation.of_site('greenwich')
merc_now = get_body('mercury', t, loc)
merc_later = get_body('mercury', t + 1 * u.day, loc)
lon_now = merc_now.transform_to('geocentrictrueecliptic').lon.deg
lon_later = merc_later.transform_to('geocentrictrueecliptic').lon.deg
diff = (lon_later - lon_now) % 360
is_retro = diff > 180
except Exception as e:
logger.error(f"Astro calc error: {e}")
self._cache['mercury'] = {'val': is_retro, 'ts': now_ts}
return is_retro
def get_indicators(self, custom_now: Optional[datetime.datetime] = None) -> Dict[str, Any]:
"""Generate full suite of Esoteric Matrix factors."""
now_utc = custom_now if custom_now else datetime.datetime.now(datetime.timezone.utc)
pop_hour, liq_hour = self.get_weighted_times(now_utc)
moon_data = self.get_moon_phase(now_utc)
calendar = self.get_calendar_items(now_utc)
return {
'timestamp': now_utc.isoformat(),
'unix': int(now_utc.timestamp()),
'calendar': calendar,
'fibonacci_time': self.get_fibonacci_time(now_utc),
'regional_times': self.get_regional_times(now_utc),
'population_weighted_hour': pop_hour,
'liquidity_weighted_hour': liq_hour,
'liquidity_session': self.get_liquidity_session(now_utc),
'market_cycle_position': round(self.get_market_cycle_position(now_utc), 4),
'moon_illumination': moon_data['illumination'],
'moon_phase_name': moon_data['phase_name'],
'mercury_retrograde': int(self.is_mercury_retrograde(now_utc)),
}
class EsotericFactorsService:
"""
Continuous evaluation service for Esoteric Factors.
Dumps state deterministically to be consumed by the live trading orchestrator/Forewarning layers.
"""
def __init__(self, output_dir: str = "", poll_interval_s: float = 60.0):
# Default to same structure as external factors
if not output_dir:
self.output_dir = Path(__file__).parent / "eso_cache"
else:
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
self.poll_interval_s = poll_interval_s
self.engine = MarketIndicators()
self._latest_data = {}
self._running = False
self._task = None
self._lock = threading.Lock()
async def _update_loop(self):
logger.info(f"EsotericFactorsService starting. Polling every {self.poll_interval_s}s.")
while self._running:
try:
# 1. Compute Matrix
data = self.engine.get_indicators()
# 2. Store in memory
with self._lock:
self._latest_data = data
# 3. Dump purely to fast JSON
self._write_to_disk(data)
except Exception as e:
logger.error(f"Error in Esoteric update loop: {e}", exc_info=True)
await asyncio.sleep(self.poll_interval_s)
def _write_to_disk(self, data: dict):
# Fast write pattern via atomic tmp rename strategy
target_path = self.output_dir / "latest_esoteric_factors.json"
tmp_path = self.output_dir / "latest_esoteric_factors.tmp"
try:
with open(tmp_path, 'w') as f:
json.dump(data, f, indent=2)
tmp_path.replace(target_path)
except Exception as e:
logger.error(f"Failed to write Esoteric factors to disk: {e}")
def get_latest(self) -> dict:
"""Non-blocking sub-millisecond retrieval of the latest internal state."""
with self._lock:
return self._latest_data.copy()
def start(self):
"""Starts the background calculation loop (Threaded/Async wrapper)."""
if self._running: return
self._running = True
def run_async():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(self._update_loop())
self._thread = threading.Thread(target=run_async, daemon=True)
self._thread.start()
def stop(self):
self._running = False
if hasattr(self, '_thread'):
self._thread.join(timeout=2.0)
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
svc = EsotericFactorsService(poll_interval_s=5.0)
print("Starting Esoteric Factors Service test run for 15 seconds...")
svc.start()
for _ in range(3):
time.sleep(5)
latest = svc.get_latest()
print(f"Update: Moon Illumination={latest.get('moon_illumination'):.3f} | Liquid Session={latest.get('liquidity_session')} | PopHour={latest.get('population_weighted_hour')}")
svc.stop()
print("Stopped successfully.")

View File

@@ -0,0 +1,612 @@
#!/usr/bin/env python3
"""
EXTERNAL FACTORS MATRIX v5.0 - DOLPHIN Compatible with BACKFILL
================================================================
85 indicators with HISTORICAL query support where available.
BACKFILL CAPABILITY:
FULL HISTORY (51): CoinMetrics, FRED, DeFi Llama TVL/stables, F&G, Binance funding/OI
PARTIAL (12): Deribit DVOL, CoinGecko prices, DEX volume
CURRENT ONLY (22): Mempool, order books, spreads, dominance
Author: HJ / Claude | Version: 5.0.0
"""
import asyncio
import aiohttp
import numpy as np
from dataclasses import dataclass
from typing import Dict, List, Optional, Any, Tuple
from datetime import datetime, timezone
from collections import deque
from enum import Enum
import json
class Category(Enum):
DERIVATIVES = "derivatives"
ONCHAIN = "onchain"
DEFI = "defi"
MACRO = "macro"
SENTIMENT = "sentiment"
MICROSTRUCTURE = "microstructure"
class Stationarity(Enum):
STATIONARY = "stationary"
TREND_UP = "trend_up"
EPISODIC = "episodic"
class HistoricalSupport(Enum):
FULL = "full" # Any historical date
PARTIAL = "partial" # Limited history
CURRENT = "current" # Real-time only
@dataclass
class Indicator:
id: int
name: str
category: Category
source: str
url: str
parser: str
stationarity: Stationarity
historical: HistoricalSupport
hist_url: str = ""
hist_resolution: str = ""
description: str = ""
@dataclass
class Config:
timeout: int = 15
max_concurrent: int = 15
cache_ttl: int = 30
fred_api_key: str = ""
# fmt: off
INDICATORS: List[Indicator] = [
# DERIVATIVES - Binance (1-10) - Most have FULL history
Indicator(1, "funding_btc", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/fapi/v1/fundingRate?symbol=BTCUSDT&limit=1",
"parse_binance_funding", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://fapi.binance.com/fapi/v1/fundingRate?symbol=BTCUSDT&startTime={start_ms}&endTime={end_ms}&limit=1",
"8h", "BTC funding - FULL via startTime/endTime"),
Indicator(2, "funding_eth", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/fapi/v1/fundingRate?symbol=ETHUSDT&limit=1",
"parse_binance_funding", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://fapi.binance.com/fapi/v1/fundingRate?symbol=ETHUSDT&startTime={start_ms}&endTime={end_ms}&limit=1",
"8h", "ETH funding"),
Indicator(3, "oi_btc", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/fapi/v1/openInterest?symbol=BTCUSDT",
"parse_binance_oi", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://fapi.binance.com/futures/data/openInterestHist?symbol=BTCUSDT&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1",
"1h", "BTC OI - FULL via openInterestHist"),
Indicator(4, "oi_eth", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/fapi/v1/openInterest?symbol=ETHUSDT",
"parse_binance_oi", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://fapi.binance.com/futures/data/openInterestHist?symbol=ETHUSDT&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1",
"1h", "ETH OI"),
Indicator(5, "ls_btc", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/futures/data/globalLongShortAccountRatio?symbol=BTCUSDT&period=1h&limit=1",
"parse_binance_ls", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://fapi.binance.com/futures/data/globalLongShortAccountRatio?symbol=BTCUSDT&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1",
"1h", "L/S ratio - FULL"),
Indicator(6, "ls_eth", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/futures/data/globalLongShortAccountRatio?symbol=ETHUSDT&period=1h&limit=1",
"parse_binance_ls", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://fapi.binance.com/futures/data/globalLongShortAccountRatio?symbol=ETHUSDT&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1",
"1h", "ETH L/S"),
Indicator(7, "ls_top", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/futures/data/topLongShortAccountRatio?symbol=BTCUSDT&period=1h&limit=1",
"parse_binance_ls", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://fapi.binance.com/futures/data/topLongShortAccountRatio?symbol=BTCUSDT&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1",
"1h", "Top trader L/S"),
Indicator(8, "taker", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/futures/data/takerlongshortRatio?symbol=BTCUSDT&period=1h&limit=1",
"parse_binance_taker", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://fapi.binance.com/futures/data/takerlongshortRatio?symbol=BTCUSDT&period=1h&startTime={start_ms}&endTime={end_ms}&limit=1",
"1h", "Taker ratio"),
Indicator(9, "basis", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/fapi/v1/premiumIndex?symbol=BTCUSDT",
"parse_binance_basis", Stationarity.STATIONARY, HistoricalSupport.CURRENT,
"", "", "Basis - CURRENT"),
Indicator(10, "liq_proxy", Category.DERIVATIVES, "binance",
"https://fapi.binance.com/fapi/v1/ticker/24hr?symbol=BTCUSDT",
"parse_liq_proxy", Stationarity.STATIONARY, HistoricalSupport.CURRENT,
"", "", "Liq proxy - CURRENT"),
# DERIVATIVES - Deribit (11-18)
Indicator(11, "dvol_btc", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_volatility_index_data?currency=BTC&resolution=3600&count=1",
"parse_deribit_dvol", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://www.deribit.com/api/v2/public/get_volatility_index_data?currency=BTC&resolution=3600&start_timestamp={start_ms}&end_timestamp={end_ms}",
"1h", "DVOL - FULL"),
Indicator(12, "dvol_eth", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_volatility_index_data?currency=ETH&resolution=3600&count=1",
"parse_deribit_dvol", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://www.deribit.com/api/v2/public/get_volatility_index_data?currency=ETH&resolution=3600&start_timestamp={start_ms}&end_timestamp={end_ms}",
"1h", "ETH DVOL"),
Indicator(13, "pcr_vol", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_book_summary_by_currency?currency=BTC&kind=option",
"parse_deribit_pcr", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "PCR - CURRENT"),
Indicator(14, "pcr_oi", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_book_summary_by_currency?currency=BTC&kind=option",
"parse_deribit_pcr_oi", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "PCR OI - CURRENT"),
Indicator(15, "pcr_eth", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_book_summary_by_currency?currency=ETH&kind=option",
"parse_deribit_pcr", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "ETH PCR - CURRENT"),
Indicator(16, "opt_oi", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_book_summary_by_currency?currency=BTC&kind=option",
"parse_deribit_oi", Stationarity.TREND_UP, HistoricalSupport.CURRENT, "", "", "Options OI - CURRENT"),
Indicator(17, "fund_dbt_btc", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_funding_rate_value?instrument_name=BTC-PERPETUAL",
"parse_deribit_fund", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://www.deribit.com/api/v2/public/get_funding_rate_history?instrument_name=BTC-PERPETUAL&start_timestamp={start_ms}&end_timestamp={end_ms}",
"8h", "Deribit fund - FULL"),
Indicator(18, "fund_dbt_eth", Category.DERIVATIVES, "deribit",
"https://www.deribit.com/api/v2/public/get_funding_rate_value?instrument_name=ETH-PERPETUAL",
"parse_deribit_fund", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://www.deribit.com/api/v2/public/get_funding_rate_history?instrument_name=ETH-PERPETUAL&start_timestamp={start_ms}&end_timestamp={end_ms}",
"8h", "Deribit ETH fund"),
# ONCHAIN - CoinMetrics (19-30) - ALL FULL HISTORY
Indicator(19, "rcap_btc", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=CapRealUSD&frequency=1d&page_size=1",
"parse_cm", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=CapRealUSD&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "Realized cap - FULL"),
Indicator(20, "mvrv", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=CapMrktCurUSD,CapRealUSD&frequency=1d&page_size=1",
"parse_cm_mvrv", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=CapMrktCurUSD,CapRealUSD&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "MVRV - FULL"),
Indicator(21, "nupl", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=CapMrktCurUSD,CapRealUSD&frequency=1d&page_size=1",
"parse_cm_nupl", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=CapMrktCurUSD,CapRealUSD&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "NUPL - FULL"),
Indicator(22, "addr_btc", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=AdrActCnt&frequency=1d&page_size=1",
"parse_cm", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=AdrActCnt&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "Active addr - FULL"),
Indicator(23, "addr_eth", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=eth&metrics=AdrActCnt&frequency=1d&page_size=1",
"parse_cm", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=eth&metrics=AdrActCnt&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "ETH addr - FULL"),
Indicator(24, "txcnt", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=TxCnt&frequency=1d&page_size=1",
"parse_cm", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=TxCnt&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "TX count - FULL"),
Indicator(25, "fees_btc", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=FeeTotUSD&frequency=1d&page_size=1",
"parse_cm", Stationarity.EPISODIC, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=FeeTotUSD&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "BTC fees - FULL"),
Indicator(26, "fees_eth", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=eth&metrics=FeeTotUSD&frequency=1d&page_size=1",
"parse_cm", Stationarity.EPISODIC, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=eth&metrics=FeeTotUSD&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "ETH fees - FULL"),
Indicator(27, "nvt", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=NVTAdj&frequency=1d&page_size=1",
"parse_cm", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=NVTAdj&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "NVT - FULL"),
Indicator(28, "velocity", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=VelCur1yr&frequency=1d&page_size=1",
"parse_cm", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=VelCur1yr&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "Velocity - FULL"),
Indicator(29, "sply_act", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=SplyAct1yr&frequency=1d&page_size=1",
"parse_cm", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=btc&metrics=SplyAct1yr&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "Active supply - FULL"),
Indicator(30, "rcap_eth", Category.ONCHAIN, "coinmetrics",
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=eth&metrics=CapRealUSD&frequency=1d&page_size=1",
"parse_cm", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets=eth&metrics=CapRealUSD&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
"1d", "ETH rcap - FULL"),
# ONCHAIN - Blockchain.info (31-37)
Indicator(31, "hashrate", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/hashrate", "parse_bc", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.blockchain.info/charts/hash-rate?timespan=1days&start={date}&format=json", "1d", "Hashrate - FULL"),
Indicator(32, "difficulty", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/getdifficulty", "parse_bc", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.blockchain.info/charts/difficulty?timespan=1days&start={date}&format=json", "1d", "Difficulty - FULL"),
Indicator(33, "blk_int", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/interval", "parse_bc_int", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Block int - CURRENT"),
Indicator(34, "unconf", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/unconfirmedcount", "parse_bc", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Unconf - CURRENT"),
Indicator(35, "tx_blk", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/nperblock", "parse_bc", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.blockchain.info/charts/n-transactions-per-block?timespan=1days&start={date}&format=json", "1d", "TX/blk - FULL"),
Indicator(36, "total_btc", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/totalbc", "parse_bc_btc", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.blockchain.info/charts/total-bitcoins?timespan=1days&start={date}&format=json", "1d", "Total BTC - FULL"),
Indicator(37, "mcap_bc", Category.ONCHAIN, "blockchain",
"https://blockchain.info/q/marketcap", "parse_bc", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.blockchain.info/charts/market-cap?timespan=1days&start={date}&format=json", "1d", "Mcap - FULL"),
# ONCHAIN - Mempool (38-42) - ALL CURRENT
Indicator(38, "mp_cnt", Category.ONCHAIN, "mempool", "https://mempool.space/api/mempool",
"parse_mp_cnt", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Mempool - CURRENT"),
Indicator(39, "mp_mb", Category.ONCHAIN, "mempool", "https://mempool.space/api/mempool",
"parse_mp_mb", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Mempool MB - CURRENT"),
Indicator(40, "fee_fast", Category.ONCHAIN, "mempool", "https://mempool.space/api/v1/fees/recommended",
"parse_fee_fast", Stationarity.EPISODIC, HistoricalSupport.CURRENT, "", "", "Fast fee - CURRENT"),
Indicator(41, "fee_med", Category.ONCHAIN, "mempool", "https://mempool.space/api/v1/fees/recommended",
"parse_fee_med", Stationarity.EPISODIC, HistoricalSupport.CURRENT, "", "", "Med fee - CURRENT"),
Indicator(42, "fee_slow", Category.ONCHAIN, "mempool", "https://mempool.space/api/v1/fees/recommended",
"parse_fee_slow", Stationarity.EPISODIC, HistoricalSupport.CURRENT, "", "", "Slow fee - CURRENT"),
# DEFI - DeFi Llama (43-51)
Indicator(43, "tvl", Category.DEFI, "defillama", "https://api.llama.fi/v2/historicalChainTvl",
"parse_dl_tvl", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.llama.fi/v2/historicalChainTvl", "1d", "TVL - FULL (filter client-side)"),
Indicator(44, "tvl_eth", Category.DEFI, "defillama", "https://api.llama.fi/v2/historicalChainTvl/Ethereum",
"parse_dl_tvl", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.llama.fi/v2/historicalChainTvl/Ethereum", "1d", "ETH TVL - FULL"),
Indicator(45, "stables", Category.DEFI, "defillama", "https://stablecoins.llama.fi/stablecoins?includePrices=false",
"parse_dl_stables", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://stablecoins.llama.fi/stablecoincharts/all?stablecoin=1", "1d", "Stables - FULL"),
Indicator(46, "usdt", Category.DEFI, "defillama", "https://stablecoins.llama.fi/stablecoin/tether",
"parse_dl_single", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://stablecoins.llama.fi/stablecoincharts/all?stablecoin=1", "1d", "USDT - FULL"),
Indicator(47, "usdc", Category.DEFI, "defillama", "https://stablecoins.llama.fi/stablecoin/usd-coin",
"parse_dl_single", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://stablecoins.llama.fi/stablecoincharts/all?stablecoin=2", "1d", "USDC - FULL"),
Indicator(48, "dex_vol", Category.DEFI, "defillama",
"https://api.llama.fi/overview/dexs?excludeTotalDataChart=true&excludeTotalDataChartBreakdown=true",
"parse_dl_dex", Stationarity.EPISODIC, HistoricalSupport.PARTIAL, "", "1d", "DEX vol - PARTIAL"),
Indicator(49, "bridge", Category.DEFI, "defillama", "https://bridges.llama.fi/bridges?includeChains=false",
"parse_dl_bridge", Stationarity.EPISODIC, HistoricalSupport.PARTIAL, "", "1d", "Bridge - PARTIAL"),
Indicator(50, "yields", Category.DEFI, "defillama", "https://yields.llama.fi/pools",
"parse_dl_yields", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Yields - CURRENT"),
Indicator(51, "fees", Category.DEFI, "defillama", "https://api.llama.fi/overview/fees?excludeTotalDataChart=true",
"parse_dl_fees", Stationarity.EPISODIC, HistoricalSupport.PARTIAL, "", "1d", "Fees - PARTIAL"),
# MACRO - FRED (52-65) - ALL FULL HISTORY (decades)
Indicator(52, "dxy", Category.MACRO, "fred", "DTWEXBGS", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=DTWEXBGS&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "DXY - FULL"),
Indicator(53, "us10y", Category.MACRO, "fred", "DGS10", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=DGS10&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "10Y - FULL"),
Indicator(54, "us2y", Category.MACRO, "fred", "DGS2", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=DGS2&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "2Y - FULL"),
Indicator(55, "ycurve", Category.MACRO, "fred", "T10Y2Y", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=T10Y2Y&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "Yield curve - FULL"),
Indicator(56, "vix", Category.MACRO, "fred", "VIXCLS", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=VIXCLS&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "VIX - FULL"),
Indicator(57, "fedfunds", Category.MACRO, "fred", "DFF", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=DFF&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "Fed funds - FULL"),
Indicator(58, "m2", Category.MACRO, "fred", "WM2NS", "parse_fred", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=WM2NS&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1w", "M2 - FULL"),
Indicator(59, "cpi", Category.MACRO, "fred", "CPIAUCSL", "parse_fred", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=CPIAUCSL&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1m", "CPI - FULL"),
Indicator(60, "sp500", Category.MACRO, "fred", "SP500", "parse_fred", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=SP500&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "S&P - FULL"),
Indicator(61, "gold", Category.MACRO, "fred", "GOLDAMGBD228NLBM", "parse_fred", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=GOLDAMGBD228NLBM&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "Gold - FULL"),
Indicator(62, "hy_spread", Category.MACRO, "fred", "BAMLH0A0HYM2", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=BAMLH0A0HYM2&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "HY spread - FULL"),
Indicator(63, "be5y", Category.MACRO, "fred", "T5YIE", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=T5YIE&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1d", "Breakeven - FULL"),
Indicator(64, "nfci", Category.MACRO, "fred", "NFCI", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=NFCI&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1w", "NFCI - FULL"),
Indicator(65, "claims", Category.MACRO, "fred", "ICSA", "parse_fred", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.stlouisfed.org/fred/series/observations?series_id=ICSA&api_key={key}&file_type=json&observation_start={date}&observation_end={date}", "1w", "Claims - FULL"),
# SENTIMENT (66-72) - F&G has FULL history
Indicator(66, "fng", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=1",
"parse_fng", Stationarity.STATIONARY, HistoricalSupport.FULL,
"https://api.alternative.me/fng/?limit=1000&date_format=us", "1d", "F&G - FULL (returns history, filter)"),
Indicator(67, "fng_prev", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=2",
"parse_fng_prev", Stationarity.STATIONARY, HistoricalSupport.FULL, "", "1d", "Prev F&G"),
Indicator(68, "fng_week", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=7",
"parse_fng_week", Stationarity.STATIONARY, HistoricalSupport.FULL, "", "1d", "Week F&G"),
Indicator(69, "fng_vol", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=1",
"parse_fng", Stationarity.STATIONARY, HistoricalSupport.FULL, "", "1d", "Vol proxy"),
Indicator(70, "fng_mom", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=1",
"parse_fng", Stationarity.STATIONARY, HistoricalSupport.FULL, "", "1d", "Mom proxy"),
Indicator(71, "fng_soc", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=1",
"parse_fng", Stationarity.STATIONARY, HistoricalSupport.FULL, "", "1d", "Social proxy"),
Indicator(72, "fng_dom", Category.SENTIMENT, "alternative", "https://api.alternative.me/fng/?limit=1",
"parse_fng", Stationarity.STATIONARY, HistoricalSupport.FULL, "", "1d", "Dom proxy"),
# MICROSTRUCTURE (73-80) - Most CURRENT
Indicator(73, "imbal_btc", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/depth?symbol=BTCUSDT&limit=100",
"parse_imbal", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Imbalance - CURRENT"),
Indicator(74, "imbal_eth", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/depth?symbol=ETHUSDT&limit=100",
"parse_imbal", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "ETH imbal - CURRENT"),
Indicator(75, "spread", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/ticker/bookTicker?symbol=BTCUSDT",
"parse_spread", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Spread - CURRENT"),
Indicator(76, "chg24_btc", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/ticker/24hr?symbol=BTCUSDT",
"parse_chg", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "24h chg - CURRENT"),
Indicator(77, "chg24_eth", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/ticker/24hr?symbol=ETHUSDT",
"parse_chg", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "ETH 24h - CURRENT"),
Indicator(78, "vol24", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/ticker/24hr?symbol=BTCUSDT",
"parse_vol", Stationarity.EPISODIC, HistoricalSupport.FULL,
"https://api.binance.com/api/v3/klines?symbol=BTCUSDT&interval=1d&startTime={start_ms}&endTime={end_ms}&limit=1",
"1d", "Volume - FULL via klines"),
Indicator(79, "dispersion", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/ticker/24hr",
"parse_disp", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Dispersion - CURRENT"),
Indicator(80, "correlation", Category.MICROSTRUCTURE, "binance", "https://api.binance.com/api/v3/ticker/24hr",
"parse_corr", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "Correlation - CURRENT"),
# MARKET - CoinGecko (81-85)
Indicator(81, "btc_price", Category.MACRO, "coingecko", "https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd",
"parse_cg_btc", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.coingecko.com/api/v3/coins/bitcoin/history?date={date_dmy}", "1d", "BTC price - FULL"),
Indicator(82, "eth_price", Category.MACRO, "coingecko", "https://api.coingecko.com/api/v3/simple/price?ids=ethereum&vs_currencies=usd",
"parse_cg_eth", Stationarity.TREND_UP, HistoricalSupport.FULL,
"https://api.coingecko.com/api/v3/coins/ethereum/history?date={date_dmy}", "1d", "ETH price - FULL"),
Indicator(83, "mcap", Category.MACRO, "coingecko", "https://api.coingecko.com/api/v3/global",
"parse_cg_mcap", Stationarity.TREND_UP, HistoricalSupport.PARTIAL, "", "1d", "Mcap - PARTIAL"),
Indicator(84, "btc_dom", Category.MACRO, "coingecko", "https://api.coingecko.com/api/v3/global",
"parse_cg_dom_btc", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "BTC dom - CURRENT"),
Indicator(85, "eth_dom", Category.MACRO, "coingecko", "https://api.coingecko.com/api/v3/global",
"parse_cg_dom_eth", Stationarity.STATIONARY, HistoricalSupport.CURRENT, "", "", "ETH dom - CURRENT"),
]
# fmt: on
N_INDICATORS = len(INDICATORS)
class StationarityTransformer:
def __init__(self, lookback: int = 10):
self.history: Dict[int, deque] = {i: deque(maxlen=lookback+1) for i in range(1, N_INDICATORS+1)}
def transform(self, ind_id: int, raw: float) -> float:
ind = INDICATORS[ind_id - 1]
hist = self.history[ind_id]
hist.append(raw)
if ind.stationarity == Stationarity.STATIONARY: return raw
if ind.stationarity == Stationarity.TREND_UP:
return (raw - hist[-2]) / abs(hist[-2]) if len(hist) >= 2 and hist[-2] != 0 else 0.0
if ind.stationarity == Stationarity.EPISODIC:
if len(hist) < 3: return 0.0
m, s = np.mean(list(hist)), np.std(list(hist))
return (raw - m) / s if s > 0 else 0.0
return raw
def transform_matrix(self, raw: np.ndarray) -> np.ndarray:
return np.array([self.transform(i+1, raw[i]) for i in range(len(raw))])
class ExternalFactorsFetcher:
def __init__(self, config: Config = None):
self.config = config or Config()
self.cache: Dict[str, Tuple[float, Any]] = {}
import time as t; self._time = t
def _build_hist_url(self, ind: Indicator, dt: datetime) -> Optional[str]:
if ind.historical == HistoricalSupport.CURRENT or not ind.hist_url: return None
url = ind.hist_url
date_str = dt.strftime("%Y-%m-%d")
date_dmy = dt.strftime("%d-%m-%Y")
start_ms = int(dt.replace(hour=0, minute=0, second=0).timestamp() * 1000)
end_ms = int(dt.replace(hour=23, minute=59, second=59).timestamp() * 1000)
key = self.config.fred_api_key or "DEMO_KEY"
return url.replace("{date}", date_str).replace("{date_dmy}", date_dmy).replace("{start_ms}", str(start_ms)).replace("{end_ms}", str(end_ms)).replace("{key}", key)
async def _fetch(self, session, url: str) -> Optional[Any]:
if url in self.cache:
ct, cd = self.cache[url]
if self._time.time() - ct < self.config.cache_ttl: return cd
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=self.config.timeout), headers={"User-Agent": "Mozilla/5.0"}) as r:
if r.status == 200:
d = await r.json() if 'json' in r.headers.get('Content-Type', '') else await r.text()
if isinstance(d, str):
try: d = json.loads(d)
except: pass
self.cache[url] = (self._time.time(), d)
return d
except: pass
return None
def _fred_url(self, series: str) -> str:
return f"https://api.stlouisfed.org/fred/series/observations?series_id={series}&api_key={self.config.fred_api_key or 'DEMO_KEY'}&file_type=json&sort_order=desc&limit=1"
# Parsers
def parse_binance_funding(self, d): return float(d[0]['fundingRate']) if isinstance(d, list) and d else 0.0
def parse_binance_oi(self, d):
if isinstance(d, list) and d: return float(d[-1].get('sumOpenInterest', 0))
return float(d.get('openInterest', 0)) if isinstance(d, dict) else 0.0
def parse_binance_ls(self, d): return float(d[-1]['longShortRatio']) if isinstance(d, list) and d else 1.0
def parse_binance_taker(self, d): return float(d[-1]['buySellRatio']) if isinstance(d, list) and d else 1.0
def parse_binance_basis(self, d): return float(d.get('lastFundingRate', 0)) * 365 * 3 if isinstance(d, dict) else 0.0
def parse_liq_proxy(self, d): return np.tanh(float(d.get('priceChangePercent', 0)) / 10) if isinstance(d, dict) else 0.0
def parse_deribit_dvol(self, d):
if isinstance(d, dict) and 'result' in d and isinstance(d['result'], dict) and 'data' in d['result'] and d['result']['data']:
return float(d['result']['data'][-1][4]) if len(d['result']['data'][-1]) > 4 else 0.0
return 0.0
def parse_deribit_pcr(self, d):
if isinstance(d, dict) and 'result' in d:
r = d['result']
p = sum(float(o.get('volume', 0)) for o in r if '-P' in o.get('instrument_name', ''))
c = sum(float(o.get('volume', 0)) for o in r if '-C' in o.get('instrument_name', ''))
return p / c if c > 0 else 1.0
return 1.0
def parse_deribit_pcr_oi(self, d):
if isinstance(d, dict) and 'result' in d:
r = d['result']
p = sum(float(o.get('open_interest', 0)) for o in r if '-P' in o.get('instrument_name', ''))
c = sum(float(o.get('open_interest', 0)) for o in r if '-C' in o.get('instrument_name', ''))
return p / c if c > 0 else 1.0
return 1.0
def parse_deribit_oi(self, d): return sum(float(o.get('open_interest', 0)) for o in d['result']) if isinstance(d, dict) and 'result' in d else 0.0
def parse_deribit_fund(self, d):
if isinstance(d, dict) and 'result' in d:
r = d['result']
return float(r[-1].get('interest_8h', 0)) if isinstance(r, list) and r else float(r)
return 0.0
def parse_cm(self, d):
if isinstance(d, dict) and 'data' in d and d['data']:
for k, v in d['data'][-1].items():
if k not in ['asset', 'time']:
try: return float(v)
except: pass
return 0.0
def parse_cm_mvrv(self, d):
if isinstance(d, dict) and 'data' in d and d['data']:
r = d['data'][-1]
m, rc = float(r.get('CapMrktCurUSD', 0)), float(r.get('CapRealUSD', 1))
return m / rc if rc > 0 else 0.0
return 0.0
def parse_cm_nupl(self, d):
if isinstance(d, dict) and 'data' in d and d['data']:
r = d['data'][-1]
m, rc = float(r.get('CapMrktCurUSD', 0)), float(r.get('CapRealUSD', 1))
return (m - rc) / m if m > 0 else 0.0
return 0.0
def parse_bc(self, d):
if isinstance(d, (int, float)): return float(d)
if isinstance(d, str):
try: return float(d)
except: pass
if isinstance(d, dict) and 'values' in d and d['values']: return float(d['values'][-1].get('y', 0))
return 0.0
def parse_bc_int(self, d): v = self.parse_bc(d); return abs(v - 600) / 600 if v > 0 else 0.0
def parse_bc_btc(self, d): v = self.parse_bc(d); return v / 1e8 if v > 0 else 0.0
def parse_mp_cnt(self, d): return float(d.get('count', 0)) if isinstance(d, dict) else 0.0
def parse_mp_mb(self, d): return float(d.get('vsize', 0)) / 1e6 if isinstance(d, dict) else 0.0
def parse_fee_fast(self, d): return float(d.get('fastestFee', 0)) if isinstance(d, dict) else 0.0
def parse_fee_med(self, d): return float(d.get('halfHourFee', 0)) if isinstance(d, dict) else 0.0
def parse_fee_slow(self, d): return float(d.get('economyFee', 0)) if isinstance(d, dict) else 0.0
def parse_dl_tvl(self, d, target_date: datetime = None):
if isinstance(d, list) and d:
if target_date:
ts = int(target_date.timestamp())
for e in reversed(d):
if e.get('date', 0) <= ts: return float(e.get('tvl', 0))
return float(d[-1].get('tvl', 0))
return 0.0
def parse_dl_stables(self, d):
if isinstance(d, dict) and 'peggedAssets' in d:
return sum(float(a.get('circulating', {}).get('peggedUSD', 0)) for a in d['peggedAssets'])
return 0.0
def parse_dl_single(self, d):
if isinstance(d, dict) and 'tokens' in d and d['tokens']:
return float(d['tokens'][-1].get('circulating', {}).get('peggedUSD', 0))
return 0.0
def parse_dl_dex(self, d): return float(d.get('total24h', 0)) if isinstance(d, dict) else 0.0
def parse_dl_bridge(self, d):
if isinstance(d, dict) and 'bridges' in d:
return sum(float(b.get('lastDayVolume', 0)) for b in d['bridges'])
return 0.0
def parse_dl_yields(self, d):
if isinstance(d, dict) and 'data' in d:
apys = [float(p.get('apy', 0)) for p in d['data'][:100] if p.get('apy')]
return np.mean(apys) if apys else 0.0
return 0.0
def parse_dl_fees(self, d): return float(d.get('total24h', 0)) if isinstance(d, dict) else 0.0
def parse_fred(self, d):
if isinstance(d, dict) and 'observations' in d and d['observations']:
v = d['observations'][-1].get('value', '.')
if v != '.':
try: return float(v)
except: pass
return 0.0
def parse_fng(self, d): return float(d['data'][0]['value']) if isinstance(d, dict) and 'data' in d and d['data'] else 50.0
def parse_fng_prev(self, d): return float(d['data'][1]['value']) if isinstance(d, dict) and 'data' in d and len(d['data']) > 1 else 50.0
def parse_fng_week(self, d): return np.mean([float(x['value']) for x in d['data'][:7]]) if isinstance(d, dict) and 'data' in d and len(d['data']) >= 7 else 50.0
def parse_imbal(self, d):
if isinstance(d, dict):
bv = sum(float(b[1]) for b in d.get('bids', [])[:50])
av = sum(float(a[1]) for a in d.get('asks', [])[:50])
t = bv + av
return (bv - av) / t if t > 0 else 0.0
return 0.0
def parse_spread(self, d):
if isinstance(d, dict):
b, a = float(d.get('bidPrice', 0)), float(d.get('askPrice', 0))
return (a - b) / b * 10000 if b > 0 else 0.0
return 0.0
def parse_chg(self, d): return float(d.get('priceChangePercent', 0)) if isinstance(d, dict) else 0.0
def parse_vol(self, d):
if isinstance(d, dict): return float(d.get('quoteVolume', 0))
if isinstance(d, list) and d and isinstance(d[0], list): return float(d[-1][7])
return 0.0
def parse_disp(self, d):
if isinstance(d, list) and len(d) > 10:
chg = [float(t['priceChangePercent']) for t in d if t.get('symbol', '').endswith('USDT') and 'priceChangePercent' in t]
return float(np.std(chg[:50])) if len(chg) > 5 else 0.0
return 0.0
def parse_corr(self, d): disp = self.parse_disp(d); return 1 / (1 + disp) if disp > 0 else 0.5
def parse_cg_btc(self, d):
if isinstance(d, dict) and 'bitcoin' in d: return float(d['bitcoin']['usd'])
if isinstance(d, dict) and 'market_data' in d: return float(d['market_data'].get('current_price', {}).get('usd', 0))
return 0.0
def parse_cg_eth(self, d):
if isinstance(d, dict) and 'ethereum' in d: return float(d['ethereum']['usd'])
if isinstance(d, dict) and 'market_data' in d: return float(d['market_data'].get('current_price', {}).get('usd', 0))
return 0.0
def parse_cg_mcap(self, d): return float(d['data']['total_market_cap']['usd']) if isinstance(d, dict) and 'data' in d else 0.0
def parse_cg_dom_btc(self, d): return float(d['data']['market_cap_percentage']['btc']) if isinstance(d, dict) and 'data' in d else 0.0
def parse_cg_dom_eth(self, d): return float(d['data']['market_cap_percentage']['eth']) if isinstance(d, dict) and 'data' in d else 0.0
async def fetch_indicator(self, session, ind: Indicator, target_date: datetime = None) -> Tuple[int, str, float, bool]:
if target_date and ind.historical != HistoricalSupport.CURRENT:
url = self._build_hist_url(ind, target_date)
else:
url = self._fred_url(ind.url) if ind.source == "fred" else ind.url
if url is None: return (ind.id, ind.name, 0.0, False)
data = await self._fetch(session, url)
if data is None: return (ind.id, ind.name, 0.0, False)
parser = getattr(self, ind.parser, None)
if parser is None: return (ind.id, ind.name, 0.0, False)
try:
value = parser(data)
return (ind.id, ind.name, value, value != 0.0 or 'imbal' in ind.name)
except: return (ind.id, ind.name, 0.0, False)
async def fetch_all(self, target_date: datetime = None) -> Dict[str, Any]:
connector = aiohttp.TCPConnector(limit=self.config.max_concurrent)
async with aiohttp.ClientSession(connector=connector) as session:
results = await asyncio.gather(*[self.fetch_indicator(session, ind, target_date) for ind in INDICATORS])
matrix = np.zeros(N_INDICATORS)
success = 0
details = {}
for idx, name, value, ok in results:
matrix[idx - 1] = value
if ok: success += 1
details[idx] = {'name': name, 'value': value, 'success': ok}
return {'matrix': matrix, 'timestamp': (target_date or datetime.now(timezone.utc)).isoformat(), 'success_count': success, 'total': N_INDICATORS, 'details': details}
def fetch_sync(self, target_date: datetime = None) -> Dict[str, Any]:
return asyncio.run(self.fetch_all(target_date))
class ExternalFactorsMatrix:
"""DOLPHIN interface with BACKFILL. Usage: efm.update() or efm.update(datetime(2024,6,15))"""
def __init__(self, config: Config = None):
self.config = config or Config()
self.fetcher = ExternalFactorsFetcher(self.config)
self.transformer = StationarityTransformer()
self.raw_matrix: Optional[np.ndarray] = None
self.stationary_matrix: Optional[np.ndarray] = None
self.last_result: Optional[Dict] = None
def update(self, target_date: datetime = None) -> np.ndarray:
self.last_result = self.fetcher.fetch_sync(target_date)
self.raw_matrix = self.last_result['matrix']
self.stationary_matrix = self.transformer.transform_matrix(self.raw_matrix)
return self.stationary_matrix
def update_raw(self, target_date: datetime = None) -> np.ndarray:
self.last_result = self.fetcher.fetch_sync(target_date)
self.raw_matrix = self.last_result['matrix']
return self.raw_matrix
def get_indicator_names(self) -> List[str]: return [i.name for i in INDICATORS]
def get_backfillable(self) -> List[Tuple[int, str, str]]:
return [(i.id, i.name, i.hist_resolution) for i in INDICATORS if i.historical in [HistoricalSupport.FULL, HistoricalSupport.PARTIAL]]
def get_current_only(self) -> List[Tuple[int, str]]:
return [(i.id, i.name) for i in INDICATORS if i.historical == HistoricalSupport.CURRENT]
def summary(self) -> str:
if not self.last_result: return "No data."
r = self.last_result
f = sum(1 for i in INDICATORS if i.historical == HistoricalSupport.FULL)
p = sum(1 for i in INDICATORS if i.historical == HistoricalSupport.PARTIAL)
c = sum(1 for i in INDICATORS if i.historical == HistoricalSupport.CURRENT)
return f"Success: {r['success_count']}/{r['total']} | Historical: FULL={f}, PARTIAL={p}, CURRENT={c}"
if __name__ == "__main__":
print(f"EXTERNAL FACTORS v5.0 - {N_INDICATORS} indicators with BACKFILL")
f = [i for i in INDICATORS if i.historical == HistoricalSupport.FULL]
p = [i for i in INDICATORS if i.historical == HistoricalSupport.PARTIAL]
c = [i for i in INDICATORS if i.historical == HistoricalSupport.CURRENT]
print(f"\nFULL: {len(f)} | PARTIAL: {len(p)} | CURRENT: {len(c)}")
print("\nFULL HISTORY indicators:")
for i in f: print(f" {i.id:2d}. {i.name:15s} [{i.hist_resolution:3s}] {i.source}")
print("\nCURRENT ONLY:")
for i in c: print(f" {i.id:2d}. {i.name:15s} - {i.description}")

View File

@@ -0,0 +1,266 @@
#!/usr/bin/env python3
"""
INDICATOR READER v1.0
=====================
Utility to read and analyze processed indicator .npz files.
Usage:
from indicator_reader import IndicatorReader
# Load single file
reader = IndicatorReader("scan_000027_193311__Indicators.npz")
print(reader.summary())
# Get DataFrames
scan_df = reader.scan_derived_df()
external_df = reader.external_df()
asset_df = reader.asset_df()
# Load directory
all_data = IndicatorReader.load_directory("./scans/")
"""
import numpy as np
from pathlib import Path
from typing import Dict, List, Optional, Any, Tuple
from datetime import datetime
class IndicatorReader:
"""Reader for processed indicator .npz files"""
def __init__(self, path: str):
self.path = Path(path)
self._data = dict(np.load(path, allow_pickle=True))
@property
def scan_number(self) -> int:
return int(self._data['scan_number'][0])
@property
def timestamp(self) -> str:
return str(self._data['timestamp'][0])
@property
def processing_time(self) -> float:
return float(self._data['processing_time'][0])
@property
def n_assets(self) -> int:
return len(self._data['asset_symbols'])
@property
def asset_symbols(self) -> List[str]:
return list(self._data['asset_symbols'])
# =========================================================================
# SCAN-DERIVED (eigenvalue indicators from tracking_data/regime_signals)
# =========================================================================
@property
def scan_derived(self) -> np.ndarray:
"""Get scan-derived indicator array"""
return self._data['scan_derived']
@property
def scan_derived_names(self) -> List[str]:
return list(self._data['scan_derived_names'])
def scan_derived_df(self):
"""Get scan-derived as pandas DataFrame"""
import pandas as pd
return pd.DataFrame({
'name': self.scan_derived_names,
'value': self.scan_derived
})
def get_scan_indicator(self, name: str) -> float:
"""Get specific scan-derived indicator by name"""
names = self.scan_derived_names
if name in names:
return float(self.scan_derived[names.index(name)])
raise KeyError(f"Unknown scan indicator: {name}")
# =========================================================================
# EXTERNAL (API-fetched indicators)
# =========================================================================
@property
def external(self) -> np.ndarray:
"""Get external indicator array (85 values, NaN for skipped)"""
return self._data['external']
@property
def external_success(self) -> np.ndarray:
"""Get success flags for external indicators"""
return self._data['external_success']
def external_df(self):
"""Get external indicators as pandas DataFrame"""
import pandas as pd
# Indicator names (would need to import from external_factors_matrix)
names = [f"ext_{i+1}" for i in range(85)]
return pd.DataFrame({
'id': range(1, 86),
'value': self.external,
'success': self.external_success
})
@property
def external_success_rate(self) -> float:
"""Percentage of external indicators successfully fetched"""
valid = ~np.isnan(self.external)
if valid.sum() == 0:
return 0.0
return float(self.external_success[valid].mean())
# =========================================================================
# PER-ASSET
# =========================================================================
@property
def asset_matrix(self) -> np.ndarray:
"""Get per-asset indicator matrix (n_assets x n_indicators)"""
return self._data['asset_matrix']
@property
def asset_indicator_names(self) -> List[str]:
return list(self._data['asset_indicator_names'])
def asset_df(self):
"""Get per-asset indicators as pandas DataFrame"""
import pandas as pd
return pd.DataFrame(
self.asset_matrix,
index=self.asset_symbols,
columns=self.asset_indicator_names
)
def get_asset(self, symbol: str) -> Dict[str, float]:
"""Get all indicators for a specific asset"""
symbols = self.asset_symbols
if symbol not in symbols:
raise KeyError(f"Unknown symbol: {symbol}")
idx = symbols.index(symbol)
return dict(zip(self.asset_indicator_names, self.asset_matrix[idx]))
def get_asset_indicator(self, symbol: str, indicator: str) -> float:
"""Get specific indicator for specific asset"""
asset = self.get_asset(symbol)
if indicator not in asset:
raise KeyError(f"Unknown indicator: {indicator}")
return asset[indicator]
# =========================================================================
# UTILITIES
# =========================================================================
def summary(self) -> str:
"""Get summary string"""
ext_valid = (~np.isnan(self.external)).sum()
ext_success = self.external_success.sum()
return f"""Indicator File: {self.path.name}
Scan: #{self.scan_number} @ {self.timestamp}
Processing: {self.processing_time:.2f}s
Scan-derived: {len(self.scan_derived)} indicators
lambda_max: {self.get_scan_indicator('lambda_max'):.4f}
coherence: {self.get_scan_indicator('market_coherence'):.4f}
instability: {self.get_scan_indicator('instability_score'):.4f}
External: {ext_success}/{ext_valid} successful ({self.external_success_rate*100:.1f}%)
Per-asset: {self.n_assets} assets × {len(self.asset_indicator_names)} indicators
"""
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary"""
return {
'scan_number': self.scan_number,
'timestamp': self.timestamp,
'processing_time': self.processing_time,
'scan_derived': dict(zip(self.scan_derived_names, self.scan_derived.tolist())),
'external': self.external.tolist(),
'external_success': self.external_success.tolist(),
'asset_symbols': self.asset_symbols,
'asset_matrix': self.asset_matrix.tolist(),
}
# =========================================================================
# CLASS METHODS
# =========================================================================
@classmethod
def load_directory(cls, directory: str, pattern: str = "*__Indicators.npz") -> List['IndicatorReader']:
"""Load all indicator files from directory"""
root = Path(directory)
files = sorted(root.rglob(pattern))
return [cls(str(f)) for f in files]
@classmethod
def to_timeseries(cls, readers: List['IndicatorReader']) -> Dict[str, np.ndarray]:
"""Convert list of readers to time series arrays"""
n = len(readers)
if n == 0:
return {}
# Get dimensions from first file
n_scan = len(readers[0].scan_derived)
n_ext = 85
n_assets = readers[0].n_assets
n_asset_ind = len(readers[0].asset_indicator_names)
# Allocate arrays
timestamps = []
scan_series = np.zeros((n, n_scan))
ext_series = np.zeros((n, n_ext))
for i, r in enumerate(readers):
timestamps.append(r.timestamp)
scan_series[i] = r.scan_derived
ext_series[i] = r.external
return {
'timestamps': np.array(timestamps, dtype='U32'),
'scan_derived': scan_series,
'external': ext_series,
'scan_names': readers[0].scan_derived_names,
}
# =============================================================================
# CLI
# =============================================================================
def main():
import argparse
parser = argparse.ArgumentParser(description="Indicator Reader")
parser.add_argument("path", help="Path to .npz file or directory")
parser.add_argument("-a", "--asset", help="Show specific asset")
parser.add_argument("-j", "--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
path = Path(args.path)
if path.is_file():
reader = IndicatorReader(str(path))
if args.json:
import json
print(json.dumps(reader.to_dict(), indent=2))
elif args.asset:
asset = reader.get_asset(args.asset)
for k, v in asset.items():
print(f" {k}: {v:.6f}")
else:
print(reader.summary())
elif path.is_dir():
readers = IndicatorReader.load_directory(str(path))
print(f"Found {len(readers)} indicator files")
if readers:
ts = IndicatorReader.to_timeseries(readers)
print(f"Time range: {ts['timestamps'][0]} to {ts['timestamps'][-1]}")
print(f"Scan-derived shape: {ts['scan_derived'].shape}")
print(f"External shape: {ts['external'].shape}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,204 @@
#!/usr/bin/env python3
"""
INDICATOR SOURCES v5.0 - API Reference with Historical Support
===============================================================
Documents all 85 indicators with their backfill capability.
"""
SOURCES = {
"binance": {"url": "fapi.binance.com / api.binance.com", "auth": "None", "limit": "1200/min", "history": "FULL (startTime/endTime)"},
"deribit": {"url": "deribit.com/api/v2/public", "auth": "None", "limit": "20/sec", "history": "FULL for DVOL/funding"},
"coinmetrics": {"url": "community-api.coinmetrics.io/v4", "auth": "None", "limit": "10/6sec", "history": "FULL (start_time/end_time)"},
"fred": {"url": "api.stlouisfed.org/fred", "auth": "Free key", "limit": "120/min", "history": "FULL (decades)"},
"defillama": {"url": "api.llama.fi", "auth": "None", "limit": "Generous", "history": "FULL for TVL/stables"},
"alternative": {"url": "api.alternative.me", "auth": "None", "limit": "Unlimited", "history": "FULL (limit=N param)"},
"blockchain": {"url": "blockchain.info", "auth": "None", "limit": "Generous", "history": "FULL via charts API"},
"mempool": {"url": "mempool.space/api", "auth": "None", "limit": "Generous", "history": "NONE (real-time only)"},
"coingecko": {"url": "api.coingecko.com/api/v3", "auth": "None (demo)", "limit": "30/min", "history": "FULL for prices"},
}
# Historical URL templates for backfill
HISTORICAL_ENDPOINTS = {
# BINANCE - All support startTime/endTime in milliseconds
"binance_funding": "https://fapi.binance.com/fapi/v1/fundingRate?symbol={SYMBOL}&startTime={start_ms}&endTime={end_ms}&limit=1000",
"binance_oi_hist": "https://fapi.binance.com/futures/data/openInterestHist?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=500",
"binance_ls_hist": "https://fapi.binance.com/futures/data/globalLongShortAccountRatio?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=500",
"binance_taker_hist": "https://fapi.binance.com/futures/data/takerlongshortRatio?symbol={SYMBOL}&period=1h&startTime={start_ms}&endTime={end_ms}&limit=500",
"binance_klines": "https://api.binance.com/api/v3/klines?symbol={SYMBOL}&interval=1d&startTime={start_ms}&endTime={end_ms}&limit=1",
# DERIBIT - Uses start_timestamp/end_timestamp in milliseconds
"deribit_dvol": "https://www.deribit.com/api/v2/public/get_volatility_index_data?currency={CURRENCY}&resolution=3600&start_timestamp={start_ms}&end_timestamp={end_ms}",
"deribit_funding_hist": "https://www.deribit.com/api/v2/public/get_funding_rate_history?instrument_name={INSTRUMENT}&start_timestamp={start_ms}&end_timestamp={end_ms}",
# COINMETRICS - Uses ISO date format
"coinmetrics": "https://community-api.coinmetrics.io/v4/timeseries/asset-metrics?assets={asset}&metrics={metric}&frequency=1d&start_time={date}T00:00:00Z&end_time={date}T23:59:59Z",
# FRED - Uses observation_start/observation_end in YYYY-MM-DD
"fred": "https://api.stlouisfed.org/fred/series/observations?series_id={series}&api_key={key}&file_type=json&observation_start={date}&observation_end={date}",
# DEFILLAMA - Returns full history, filter client-side
"defillama_tvl": "https://api.llama.fi/v2/historicalChainTvl", # Filter by date client-side
"defillama_tvl_chain": "https://api.llama.fi/v2/historicalChainTvl/{chain}",
"defillama_stables": "https://stablecoins.llama.fi/stablecoincharts/all?stablecoin={id}", # 1=USDT, 2=USDC
# BLOCKCHAIN.INFO - Uses start param in YYYY-MM-DD
"blockchain_charts": "https://api.blockchain.info/charts/{chart}?timespan=1days&start={date}&format=json",
# COINGECKO - Uses DD-MM-YYYY format
"coingecko_history": "https://api.coingecko.com/api/v3/coins/{id}/history?date={date_dmy}",
# ALTERNATIVE.ME - Returns N days of history
"fng_history": "https://api.alternative.me/fng/?limit=1000&date_format=us", # Filter client-side
}
HISTORICAL_SUPPORT = {
# FULL HISTORY (51 indicators)
"full": [
# Binance derivatives
(1, "funding_btc", "8h", "Funding rate history via startTime/endTime"),
(2, "funding_eth", "8h", "ETH funding"),
(3, "oi_btc", "1h", "Open interest history via openInterestHist endpoint"),
(4, "oi_eth", "1h", "ETH OI"),
(5, "ls_btc", "1h", "Long/short ratio history"),
(6, "ls_eth", "1h", "ETH L/S"),
(7, "ls_top", "1h", "Top trader L/S"),
(8, "taker", "1h", "Taker ratio history"),
# Deribit
(11, "dvol_btc", "1h", "DVOL via get_volatility_index_data"),
(12, "dvol_eth", "1h", "ETH DVOL"),
(17, "fund_dbt_btc", "8h", "Deribit funding via get_funding_rate_history"),
(18, "fund_dbt_eth", "8h", "ETH Deribit funding"),
# CoinMetrics (ALL have full history)
(19, "rcap_btc", "1d", "CoinMetrics: CapRealUSD"),
(20, "mvrv", "1d", "CoinMetrics: derived from CapMrktCurUSD/CapRealUSD"),
(21, "nupl", "1d", "CoinMetrics: derived"),
(22, "addr_btc", "1d", "CoinMetrics: AdrActCnt"),
(23, "addr_eth", "1d", "CoinMetrics: ETH AdrActCnt"),
(24, "txcnt", "1d", "CoinMetrics: TxCnt"),
(25, "fees_btc", "1d", "CoinMetrics: FeeTotUSD"),
(26, "fees_eth", "1d", "CoinMetrics: ETH FeeTotUSD"),
(27, "nvt", "1d", "CoinMetrics: NVTAdj"),
(28, "velocity", "1d", "CoinMetrics: VelCur1yr"),
(29, "sply_act", "1d", "CoinMetrics: SplyAct1yr"),
(30, "rcap_eth", "1d", "CoinMetrics: ETH CapRealUSD"),
# Blockchain.info charts
(31, "hashrate", "1d", "Blockchain.info: hash-rate chart"),
(32, "difficulty", "1d", "Blockchain.info: difficulty chart"),
(35, "tx_blk", "1d", "Blockchain.info: n-transactions-per-block chart"),
(36, "total_btc", "1d", "Blockchain.info: total-bitcoins chart"),
(37, "mcap_bc", "1d", "Blockchain.info: market-cap chart"),
# DeFi Llama
(43, "tvl", "1d", "DeFi Llama: historicalChainTvl (returns all, filter client-side)"),
(44, "tvl_eth", "1d", "DeFi Llama: ETH TVL"),
(45, "stables", "1d", "DeFi Llama: stablecoincharts"),
(46, "usdt", "1d", "DeFi Llama: stablecoin ID=1"),
(47, "usdc", "1d", "DeFi Llama: stablecoin ID=2"),
# FRED (ALL have decades of history)
(52, "dxy", "1d", "FRED: DTWEXBGS"),
(53, "us10y", "1d", "FRED: DGS10"),
(54, "us2y", "1d", "FRED: DGS2"),
(55, "ycurve", "1d", "FRED: T10Y2Y"),
(56, "vix", "1d", "FRED: VIXCLS"),
(57, "fedfunds", "1d", "FRED: DFF"),
(58, "m2", "1w", "FRED: WM2NS (weekly)"),
(59, "cpi", "1m", "FRED: CPIAUCSL (monthly)"),
(60, "sp500", "1d", "FRED: SP500"),
(61, "gold", "1d", "FRED: GOLDAMGBD228NLBM"),
(62, "hy_spread", "1d", "FRED: BAMLH0A0HYM2"),
(63, "be5y", "1d", "FRED: T5YIE"),
(64, "nfci", "1w", "FRED: NFCI (weekly)"),
(65, "claims", "1w", "FRED: ICSA (weekly)"),
# Alternative.me
(66, "fng", "1d", "Alternative.me: limit param returns history"),
(67, "fng_prev", "1d", ""),
(68, "fng_week", "1d", ""),
(69, "fng_vol", "1d", ""),
(70, "fng_mom", "1d", ""),
(71, "fng_soc", "1d", ""),
(72, "fng_dom", "1d", ""),
# CoinGecko
(81, "btc_price", "1d", "CoinGecko: /coins/{id}/history"),
(82, "eth_price", "1d", "CoinGecko: /coins/{id}/history"),
# Binance klines
(78, "vol24", "1d", "Binance: klines endpoint"),
],
# PARTIAL HISTORY (12 indicators)
"partial": [
(48, "dex_vol", "1d", "DeFi Llama: recent history in response"),
(49, "bridge", "1d", "DeFi Llama: bridgevolume endpoint"),
(51, "fees", "1d", "DeFi Llama: fees overview"),
(83, "mcap", "1d", "CoinGecko: market_cap_chart (limited)"),
],
# CURRENT ONLY (22 indicators)
"current": [
(9, "basis", "Binance premium index - real-time only"),
(10, "liq_proxy", "Derived from 24hr ticker - real-time"),
(13, "pcr_vol", "Deribit options summary - real-time"),
(14, "pcr_oi", "Deribit options OI - real-time"),
(15, "pcr_eth", "Deribit ETH options - real-time"),
(16, "opt_oi", "Deribit total options OI - real-time"),
(33, "blk_int", "Blockchain.info simple query - real-time"),
(34, "unconf", "Blockchain.info unconfirmed - real-time"),
(38, "mp_cnt", "Mempool.space - NO historical API"),
(39, "mp_mb", "Mempool.space - NO historical API"),
(40, "fee_fast", "Mempool.space - NO historical API"),
(41, "fee_med", "Mempool.space - NO historical API"),
(42, "fee_slow", "Mempool.space - NO historical API"),
(50, "yields", "DeFi Llama yields - real-time"),
(73, "imbal_btc", "Order book depth - real-time"),
(74, "imbal_eth", "Order book depth - real-time"),
(75, "spread", "Book ticker - real-time"),
(76, "chg24_btc", "24hr ticker - real-time"),
(77, "chg24_eth", "24hr ticker - real-time"),
(79, "dispersion", "Calculated from 24hr - real-time"),
(80, "correlation", "Calculated from 24hr - real-time"),
(84, "btc_dom", "CoinGecko global - real-time"),
(85, "eth_dom", "CoinGecko global - real-time"),
],
}
BACKFILL_NOTES = """
BACKFILL STRATEGY
=================
1. DAILY BACKFILL (Most indicators):
- CoinMetrics, FRED, DeFi Llama TVL, Blockchain.info charts
- Use: efm.update(datetime(2024, 6, 15))
2. HOURLY BACKFILL (Binance derivatives):
- OI, L/S ratio, taker ratio have 1h resolution
- Funding rate has 8h resolution
3. APIS RETURNING FULL HISTORY:
- DeFi Llama TVL: Returns ALL history, filter client-side by timestamp
- Alternative.me F&G: Use limit=1000 to get ~3 years of history
- Blockchain.info charts: Use start= param with date
4. MISSING HISTORICAL DATA:
- Mempool fees: Build your own collector
- Order book imbalance: Build your own collector
- Spreads: Build your own collector
5. RECOMMENDED APPROACH FOR TRAINING:
a) Backfill what's available (51 indicators with FULL history)
b) For CURRENT-only indicators, either:
- Accept NaN/0 for historical periods
- Build collectors to capture going forward
- Use proxy indicators (e.g., volatility proxy for mempool fees)
"""
if __name__ == "__main__":
print("INDICATOR SOURCES v5.0")
print("=" * 60)
print("\nData Sources:")
for src, info in SOURCES.items():
print(f" {src:12s}: {info['auth']:10s} | {info['limit']:12s} | {info['history']}")
print(f"\nHistorical Support:")
print(f" FULL: {len(HISTORICAL_SUPPORT['full'])} indicators")
print(f" PARTIAL: {len(HISTORICAL_SUPPORT['partial'])} indicators")
print(f" CURRENT: {len(HISTORICAL_SUPPORT['current'])} indicators")
print(BACKFILL_NOTES)

View File

@@ -0,0 +1,207 @@
"""
Meta-Adaptive ExF Optimizer
===========================
Runs nightly (or on-demand) to calculate dynamic lag configurations and
active indicator thresholds for the Adaptive Circuit Breaker (ACB).
Implementation of the "Meta-Adaptive" Blueprint:
1. Pulls up to the last 90 days of market returns and indicator values.
2. Runs lag hypothesis testing (0-7 days) on all tracked ExF indicators.
3. Uses strict Point-Biserial correlation (p < 0.05) against market stress (< -1% daily drop).
4. Persists the active, statistically verified JSON configuration for realtime_exf_service.py.
"""
import sys
import json
import time
import logging
import numpy as np
import pandas as pd
from pathlib import Path
from collections import defaultdict
import threading
from scipy import stats
from datetime import datetime, timezone
PROJECT_ROOT = Path(__file__).resolve().parent.parent
sys.path.insert(0, str(PROJECT_ROOT))
sys.path.insert(0, str(PROJECT_ROOT / 'nautilus_dolphin'))
try:
from realtime_exf_service import INDICATORS, OPTIMAL_LAGS
from dolphin_paper_trade_adaptive_cb_v2 import EIGENVALUES_BASE_PATH
from dolphin_vbt_real import load_all_data, run_full_backtest, STRATEGIES, INIT_CAPITAL
except ImportError:
pass
logger = logging.getLogger(__name__)
CONFIG_PATH = Path(__file__).parent / "meta_adaptive_config.json"
class MetaAdaptiveOptimizer:
def __init__(self, days_lookback=90, max_lags=6, p_value_gate=0.05):
self.days_lookback = days_lookback
self.max_lags = max_lags
self.p_value_gate = p_value_gate
self.indicators = list(INDICATORS.keys()) if 'INDICATORS' in globals() else []
self._lock = threading.Lock()
def _build_history_cache(self, dates, limit_days):
"""Build daily feature cache from NPZ files."""
logger.info(f"Building cache for last {limit_days} days...")
cache = {}
target_dates = dates[-limit_days:] if len(dates) > limit_days else dates
for date_str in target_dates:
date_path = EIGENVALUES_BASE_PATH / date_str
if not date_path.exists(): continue
npz_files = list(date_path.glob('scan_*__Indicators.npz'))
if not npz_files: continue
accum = defaultdict(list)
for f in npz_files:
try:
data = dict(np.load(f, allow_pickle=True))
names = [str(n) for n in data.get('api_names', [])]
vals = data.get('api_indicators', [])
succ = data.get('api_success', [])
for n, v, s in zip(names, vals, succ):
if s and not np.isnan(v):
accum[n].append(float(v))
except Exception:
pass
if accum:
cache[date_str] = {k: np.mean(v) for k, v in accum.items()}
return cache, target_dates
def _get_daily_returns(self, df, target_dates):
"""Derive daily returns proxy from the champion strategy logic."""
logger.info("Computing proxy returns for the time window...")
champion = STRATEGIES['champion_5x_f20']
returns = []
cap = INIT_CAPITAL
valid_dates = []
for d in target_dates:
day_df = df[df['date_str'] == d]
if len(day_df) < 200:
returns.append(np.nan)
valid_dates.append(d)
continue
res = run_full_backtest(day_df, champion, init_cash=cap, seed=42, verbose=False)
ret = (res['capital'] - cap) / cap
returns.append(ret)
cap = res['capital']
valid_dates.append(d)
return np.array(returns), valid_dates
def run_optimization(self) -> dict:
"""Run the full meta-adaptive optimization routine and return new config."""
with self._lock:
logger.info("Starting META-ADAPTIVE optimization loop.")
t0 = time.time()
df = load_all_data()
if 'date_str' not in df.columns:
df['date_str'] = df['timestamp'].dt.date.astype(str)
all_dates = sorted(df['date_str'].unique())
cache, target_dates = self._build_history_cache(all_dates, self.days_lookback + self.max_lags)
daily_returns, target_dates = self._get_daily_returns(df, target_dates)
# Predict market stress dropping by more than 1%
stress_arr = (daily_returns < -0.01).astype(float)
candidate_lags = {}
active_thresholds = {}
candidate_count = 0
for key in self.indicators:
ind_arr = np.array([cache.get(d, {}).get(key, np.nan) for d in target_dates])
corrs = []; pvals = []; sc_corrs = []
for lag in range(self.max_lags + 1):
if lag == 0: x, y, y_stress = ind_arr, daily_returns, stress_arr
else: x, y, y_stress = ind_arr[:-lag], daily_returns[lag:], stress_arr[lag:]
mask = ~np.isnan(x) & ~np.isnan(y)
if mask.sum() < 20: # Need at least 20 viable days
corrs.append(0); pvals.append(1); sc_corrs.append(0)
continue
# Pearson to price returns
r, p = stats.pearsonr(x[mask], y[mask])
corrs.append(r); pvals.append(p)
# Point-Biserial to stress events
# We capture the relation to binary stress to figure out threshold direction
if y_stress[mask].sum() > 2: # At least a few stress days required
sc = stats.pointbiserialr(y_stress[mask], x[mask])[0]
else:
sc = 0
sc_corrs.append(sc)
if not corrs: continue
# Find lag with highest correlation strength
best_lag = int(np.argmax(np.abs(corrs)))
best_p = pvals[best_lag]
# Check gate
if best_p <= self.p_value_gate:
direction = ">" if sc_corrs[best_lag] > 0 else "<"
# Compute a stress threshold logic (e.g. 15th / 85th percentile of historical)
valid_vals = ind_arr[~np.isnan(ind_arr)]
thresh = np.percentile(valid_vals, 85 if direction == '>' else 15)
candidate_lags[key] = best_lag
active_thresholds[key] = {
'threshold': float(thresh),
'direction': direction,
'p_value': float(best_p),
'r_value': float(corrs[best_lag])
}
candidate_count += 1
# Fallback checks mapping to V4 baseline if things drift too far
logger.info(f"Optimization complete ({time.time() - t0:.1f}s). {candidate_count} indicators passed P < {self.p_value_gate}.")
output_config = {
'timestamp': datetime.now(timezone.utc).isoformat(),
'days_lookback': self.days_lookback,
'lags': candidate_lags,
'thresholds': active_thresholds
}
# Atomic save
temp_path = CONFIG_PATH.with_suffix('.tmp')
with open(temp_path, 'w', encoding='utf-8') as f:
json.dump(output_config, f, indent=2)
temp_path.replace(CONFIG_PATH)
return output_config
def get_current_meta_config() -> dict:
"""Read the latest meta-adaptive config, or return empty/default dict."""
if not CONFIG_PATH.exists():
return {}
try:
with open(CONFIG_PATH, 'r', encoding='utf-8') as f:
return json.load(f)
except Exception as e:
logger.error(f"Failed to read meta-adaptive config: {e}")
return {}
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
optimizer = MetaAdaptiveOptimizer(days_lookback=90)
config = optimizer.run_optimization()
print(f"\nSaved config to: {CONFIG_PATH}")
for k, v in config['lags'].items():
print(f" {k}: lag={v} days, dir={config['thresholds'][k]['direction']} thresh={config['thresholds'][k]['threshold']:.4g}")

View File

@@ -0,0 +1,351 @@
import asyncio
import aiohttp
import json
import time
import logging
import numpy as np
from typing import Dict, List, Optional
from collections import defaultdict
# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(name)s: %(message)s')
logger = logging.getLogger("OBStreamService")
try:
import websockets
except ImportError:
logger.warning("websockets package not found. Run pip install websockets aiohttp")
# Reconnect back-off constants (P1-2)
_RECONNECT_BASE_S = 3.0
_RECONNECT_MAX_S = 120.0
_RECONNECT_STABLE_S = 60.0 # reset back-off if connected this long without error
# Stall detection (P0-2): warn if no WS message for this many seconds
_STALE_THRESHOLD_S = 30.0
class OBStreamService:
"""
Real-Time Order Book Streamer for Binance Futures.
Fixes applied:
P0-2 last_event_ts for WS stall detection (is_stale())
P0-3 Crossed-book guard in get_depth_buckets()
P1-2 Exponential back-off on reconnect (max 120s, jitter)
P1-5 Shared aiohttp.ClientSession for REST calls (no new session per call)
P2-1 Unknown asset symbol in WS event ignored, no KeyError
"""
def __init__(self, assets: List[str], max_depth_pct: int = 5):
self.assets = [a.upper() for a in assets]
self.streams = [f"{a.lower()}@depth@100ms" for a in self.assets]
self.max_depth_pct = max_depth_pct
# In-memory Order Book caches (Price -> Quantity)
self.bids: Dict[str, Dict[float, float]] = {a: {} for a in self.assets}
self.asks: Dict[str, Dict[float, float]] = {a: {} for a in self.assets}
# Synchronization mechanisms
self.last_update_id: Dict[str, int] = {a: 0 for a in self.assets}
self.buffer: Dict[str, List] = {a: [] for a in self.assets}
self.initialized: Dict[str, bool] = {a: False for a in self.assets}
# Per-asset asyncio lock (P2-1: keyed only on known assets)
self.locks: Dict[str, asyncio.Lock] = {a: asyncio.Lock() for a in self.assets}
# P0-2: WS stall detection — updated on every received message
self.last_event_ts: float = 0.0
# P1-5: shared session — created lazily in stream(), closed on exit
self._session: Optional[aiohttp.ClientSession] = None
# Gold Path: Rate Limit Monitoring (AGENT-SPEC-GOLDPATH)
self.rate_limits: Dict[str, str] = {}
# ------------------------------------------------------------------
# P0-2: stale check callable from AsyncOBThread
# ------------------------------------------------------------------
def is_stale(self, threshold_s: float = _STALE_THRESHOLD_S) -> bool:
"""Return True if no WS event has been received for threshold_s seconds."""
if self.last_event_ts == 0.0:
return False # hasn't started yet — not stale, just cold
return (time.time() - self.last_event_ts) > threshold_s
# ------------------------------------------------------------------
# REST snapshot (P1-5: reuses shared session)
# ------------------------------------------------------------------
async def fetch_snapshot(self, asset: str):
"""Fetch REST snapshot of the Order Book to initialize local state."""
url = f"https://fapi.binance.com/fapi/v1/depth?symbol={asset}&limit=1000"
try:
session = self._session
if session is None or session.closed:
# Fallback: create a temporary session if shared one not ready
async with aiohttp.ClientSession() as tmp_session:
await self._do_fetch(tmp_session, asset, url)
return
await self._do_fetch(session, asset, url)
except Exception as e:
logger.error(f"Error initializing snapshot for {asset}: {e}")
async def _do_fetch(self, session: aiohttp.ClientSession, asset: str, url: str):
async with session.get(url) as resp:
# Capture Rate Limits (Gold Spec)
for k, v in resp.headers.items():
if k.lower().startswith("x-mbx-used-weight-"):
self.rate_limits[k] = v
data = await resp.json()
if 'lastUpdateId' not in data:
logger.error(f"Failed to fetch snapshot for {asset}: {data}")
return
last_id = data['lastUpdateId']
async with self.locks[asset]:
self.bids[asset] = {float(p): float(q) for p, q in data['bids']}
self.asks[asset] = {float(p): float(q) for p, q in data['asks']}
self.last_update_id[asset] = last_id
# Apply any buffered updates that arrived while REST was in flight
for event in self.buffer[asset]:
if event['u'] <= last_id:
continue # already reflected in snapshot
self._apply_event(asset, event)
self.buffer[asset].clear()
self.initialized[asset] = True
logger.info(f"Synchronized L2 book for {asset} (UpdateId: {last_id})")
# ------------------------------------------------------------------
# Book maintenance
# ------------------------------------------------------------------
def _apply_event(self, asset: str, event: dict):
"""Apply a streaming diff event to the local book."""
bids = self.bids[asset]
asks = self.asks[asset]
for p_str, q_str in event['b']:
p, q = float(p_str), float(q_str)
if q == 0.0:
bids.pop(p, None)
else:
bids[p] = q
for p_str, q_str in event['a']:
p, q = float(p_str), float(q_str)
if q == 0.0:
asks.pop(p, None)
else:
asks[p] = q
self.last_update_id[asset] = event['u']
# ------------------------------------------------------------------
# Main WS loop (P1-2: exp backoff; P1-5: shared session; P2-1: unknown symbol guard)
# ------------------------------------------------------------------
async def stream(self):
"""Main loop: connect to WebSocket streams and maintain books."""
import websockets
stream_url = (
"wss://fstream.binance.com/stream?streams=" + "/".join(self.streams)
)
logger.info(f"Connecting to Binance Stream: {stream_url}")
reconnect_delay = _RECONNECT_BASE_S
import random
# P1-5: create shared session for lifetime of stream()
async with aiohttp.ClientSession() as session:
self._session = session
# Fire REST snapshots concurrently using shared session
for a in self.assets:
asyncio.create_task(self.fetch_snapshot(a))
while True:
connect_start = time.monotonic()
try:
async with websockets.connect(
stream_url, ping_interval=20, ping_timeout=20
) as ws:
logger.info("WebSocket connected. Streaming depth diffs...")
while True:
msg = await ws.recv()
# P0-2: stamp every received message
self.last_event_ts = time.time()
data = json.loads(msg)
if 'data' not in data:
continue
ev = data['data']
# P2-1: ignore events for unknown symbols — no KeyError
asset = ev.get('s', '').upper()
if asset not in self.locks:
logger.debug("Ignoring WS event for untracked symbol: %s", asset)
continue
async with self.locks[asset]:
if not self.initialized[asset]:
self.buffer[asset].append(ev)
else:
self._apply_event(asset, ev)
# If we reach here the connection lasted stably:
# reset back-off on a clean exit (never normally reached)
reconnect_delay = _RECONNECT_BASE_S
except websockets.exceptions.ConnectionClosed as e:
connected_s = time.monotonic() - connect_start
logger.warning(
"WebSocket closed (%s). Connected %.1fs. Reconnecting in %.1fs...",
e, connected_s, reconnect_delay,
)
# P1-2: reset back-off if connection was stable long enough
if connected_s >= _RECONNECT_STABLE_S:
reconnect_delay = _RECONNECT_BASE_S
# Re-init all assets, re-fire REST snapshots
for a in self.assets:
self.initialized[a] = False
asyncio.create_task(self.fetch_snapshot(a))
await asyncio.sleep(reconnect_delay + random.uniform(0, 1))
# P1-2: double delay for next failure, cap at max
reconnect_delay = min(reconnect_delay * 2, _RECONNECT_MAX_S)
except Exception as e:
logger.error("Stream error: %s", e)
await asyncio.sleep(reconnect_delay + random.uniform(0, 1))
reconnect_delay = min(reconnect_delay * 2, _RECONNECT_MAX_S)
self._session = None # stream() exited cleanly
# ------------------------------------------------------------------
# Depth bucket extraction (P0-3: crossed book guard)
# ------------------------------------------------------------------
async def get_depth_buckets(self, asset: str) -> Optional[dict]:
"""
Extract the Notional Depth vectors matching OBSnapshot.
Creates 5 elements summing USD depth between 0-1%, 1-2%, ..., 4-5% from mid.
Returns None if:
- Book not yet initialized
- Empty bids or asks
- Crossed book (best_bid >= best_ask) ← P0-3
"""
async with self.locks[asset]:
if not self.initialized[asset]:
return None
bids = sorted(self.bids[asset].items(), key=lambda x: -x[0])
asks = sorted(self.asks[asset].items(), key=lambda x: x[0])
if not bids or not asks:
return None
best_bid = bids[0][0]
best_ask = asks[0][0]
# P0-3: crossed book produces corrupted features — reject entirely
if best_bid >= best_ask:
logger.warning(
"Crossed book for %s (bid=%.5f >= ask=%.5f) — skipping snapshot",
asset, best_bid, best_ask,
)
return None
mid = (best_bid + best_ask) / 2.0
bid_not = np.zeros(self.max_depth_pct, dtype=np.float64)
ask_not = np.zeros(self.max_depth_pct, dtype=np.float64)
bid_dep = np.zeros(self.max_depth_pct, dtype=np.float64)
ask_dep = np.zeros(self.max_depth_pct, dtype=np.float64)
for p, q in bids:
dist_pct = (mid - p) / mid * 100
idx = int(dist_pct)
if idx < self.max_depth_pct:
bid_not[idx] += p * q
bid_dep[idx] += q
else:
break
for p, q in asks:
dist_pct = (p - mid) / mid * 100
idx = int(dist_pct)
if idx < self.max_depth_pct:
ask_not[idx] += p * q
ask_dep[idx] += q
else:
break
return {
"timestamp": time.time(),
"asset": asset,
"bid_notional": bid_not,
"ask_notional": ask_not,
"bid_depth": bid_dep,
"ask_depth": ask_dep,
"best_bid": best_bid,
"best_ask": best_ask,
"spread_bps": (best_ask - best_bid) / mid * 10_000,
}
# -----------------------------------------------------------------------------
# Standalone run/test hook
# -----------------------------------------------------------------------------
import hazelcast
async def main():
assets_to_track = ["BTCUSDT", "ETHUSDT", "SOLUSDT"]
service = OBStreamService(assets=assets_to_track)
asyncio.create_task(service.stream())
await asyncio.sleep(4)
try:
hz_client = hazelcast.HazelcastClient(
cluster_name="dolphin",
cluster_members=["localhost:5701"]
)
hz_map = hz_client.get_map('DOLPHIN_FEATURES').blocking()
logger.info("Connected to Hazelcast DOLPHIN_FEATURES map.")
except Exception as e:
logger.error(f"Hazelcast connection failed: {e}")
return
while True:
try:
for asset in assets_to_track:
snap = await service.get_depth_buckets(asset)
if snap:
hz_payload = {
"timestamp": snap["timestamp"],
"asset": snap["asset"],
"bid_notional": list(snap["bid_notional"]),
"ask_notional": list(snap["ask_notional"]),
"bid_depth": list(snap["bid_depth"]),
"ask_depth": list(snap["ask_depth"]),
"best_bid": snap["best_bid"],
"best_ask": snap["best_ask"],
"spread_bps": snap["spread_bps"],
}
hz_map.put(f"asset_{asset}_ob", json.dumps(hz_payload))
await asyncio.sleep(0.5)
except Exception as e:
logger.error(f"Error in stream writing loop: {e}")
await asyncio.sleep(5)
if __name__ == "__main__":
try:
asyncio.run(main())
except KeyboardInterrupt:
print("OB Streamer shut down manually.")

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,321 @@
#!/usr/bin/env python3
"""
REAL-TIME EXTERNAL FACTORS SERVICE - EVENT-DRIVEN HZ OPTIMIZATION
=================================================================
Extension to RealTimeExFService that pushes to Hazelcast immediately
when critical indicators change, rather than on a fixed timer.
This achieves near-zero latency (<10ms) for critical indicators:
- basis
- spread
- imbal_btc
- imbal_eth
For slow indicators (funding, etc.), we still batch them.
Author: Kimi, DESTINATION/DOLPHIN Machine dev/prod-Agent
Date: 2026-03-20
"""
import sys
import json
import time
import logging
import threading
from pathlib import Path
from datetime import datetime, timezone
from typing import Dict, Any, Optional, Callable
from dataclasses import dataclass
# Add paths
sys.path.insert(0, str(Path(__file__).parent))
sys.path.insert(0, str(Path(__file__).parent.parent))
from realtime_exf_service import RealTimeExFService, INDICATORS
logger = logging.getLogger("exf_hz_events")
_LOG_DEBUG = logger.isEnabledFor(logging.DEBUG)
_LOG_INFO = logger.isEnabledFor(logging.INFO)
# Critical indicators that trigger immediate HZ push
CRITICAL_INDICATORS = frozenset(['basis', 'spread', 'imbal_btc', 'imbal_eth'])
# Min time between HZ pushes (prevent spam)
MIN_PUSH_INTERVAL_S = 0.05 # 50ms = 20 pushes/sec max
@dataclass
class EventDrivenConfig:
"""Configuration for event-driven HZ optimization."""
hz_cluster: str = "dolphin"
hz_member: str = "localhost:5701"
hz_map: str = "DOLPHIN_FEATURES"
hz_key: str = "exf_latest"
# Push strategy
critical_push_interval_s: float = 0.05 # 50ms min between critical pushes
batch_push_interval_s: float = 1.0 # 1s for full batch updates
# Debouncing
value_change_threshold: float = 0.0001 # Ignore tiny changes
class EventDrivenExFService:
"""
Event-driven wrapper around RealTimeExFService.
Instead of polling get_indicators() on a fixed interval,
we push to Hazelcast immediately when critical indicators change.
"""
def __init__(self, base_service: RealTimeExFService, config: EventDrivenConfig = None):
self.service = base_service
self.config = config or EventDrivenConfig()
# Hazelcast client (initialized on first use)
self._hz_client = None
self._hz_lock = threading.Lock()
# Push tracking
self._last_critical_push = 0.0
self._last_full_push = 0.0
self._push_count = 0
self._critical_push_count = 0
# Previous values for change detection
self._prev_values: Dict[str, float] = {}
# Background thread for full batch updates
self._running = False
self._thread = None
def _get_hz_client(self):
"""Lazy initialization of Hazelcast client."""
if self._hz_client is None:
import hazelcast
self._hz_client = hazelcast.HazelcastClient(
cluster_name=self.config.hz_cluster,
cluster_members=[self.config.hz_member],
connection_timeout=5.0,
)
return self._hz_client
def _push_to_hz(self, indicators: Dict[str, Any], is_critical: bool = False):
"""Push indicators to Hazelcast."""
try:
client = self._get_hz_client()
# Build payload
payload = {
**indicators,
"_pushed_at": datetime.now(timezone.utc).isoformat(),
"_push_type": "critical" if is_critical else "batch",
"_push_seq": self._push_count,
}
# Push
features_map = client.get_map(self.config.hz_map)
features_map.blocking().put(
self.config.hz_key,
json.dumps(payload, default=str)
)
self._push_count += 1
if is_critical:
self._critical_push_count += 1
if _LOG_DEBUG:
logger.debug("HZ push: %s (%d indicators)",
"CRITICAL" if is_critical else "batch",
len(indicators))
return True
except Exception as e:
if _LOG_INFO:
logger.warning("HZ push failed: %s", e)
return False
def _check_critical_changes(self) -> Optional[Dict[str, Any]]:
"""
Check if critical indicators changed significantly.
Returns changed indicators dict or None if no significant change.
"""
now = time.monotonic()
# Rate limiting
if now - self._last_critical_push < self.config.critical_push_interval_s:
return None
with self.service._lock:
changed = {}
for name in CRITICAL_INDICATORS:
if name not in self.service.state:
continue
state = self.service.state[name]
if not state.success:
continue
current_val = state.value
prev_val = self._prev_values.get(name)
# First time or significant change
if prev_val is None:
changed[name] = current_val
self._prev_values[name] = current_val
elif abs(current_val - prev_val) > self.config.value_change_threshold:
changed[name] = current_val
self._prev_values[name] = current_val
if changed:
self._last_critical_push = now
return changed
return None
def _get_full_snapshot(self) -> Dict[str, Any]:
"""Get full indicator snapshot."""
indicators = self.service.get_indicators(dual_sample=True)
staleness = indicators.pop("_staleness", {})
# Build payload like exf_fetcher_flow
payload = {k: v for k, v in indicators.items() if isinstance(v, (int, float))}
payload["_staleness_s"] = {k: round(v, 1) for k, v in staleness.items()}
payload["_acb_ready"] = all(k in payload for k in [
"funding_btc", "funding_eth", "dvol_btc", "dvol_eth",
"fng", "vix", "ls_btc", "taker"
])
payload["_ok_count"] = sum(1 for v in payload.values()
if isinstance(v, float) and v == v)
return payload
def _event_loop(self):
"""Main event loop - checks for changes and pushes to HZ."""
if _LOG_INFO:
logger.info("Event-driven loop started (critical: %s)",
", ".join(CRITICAL_INDICATORS))
while self._running:
t0 = time.monotonic()
# Check for critical changes first
critical_changes = self._check_critical_changes()
if critical_changes:
# Get full snapshot but prioritize critical changes
full_data = self._get_full_snapshot()
self._push_to_hz(full_data, is_critical=True)
# Periodic full batch update
now = time.monotonic()
if now - self._last_full_push >= self.config.batch_push_interval_s:
full_data = self._get_full_snapshot()
self._push_to_hz(full_data, is_critical=False)
self._last_full_push = now
if _LOG_INFO and self._push_count % 10 == 1:
logger.info("Batch push #%d (critical: %d)",
self._push_count, self._critical_push_count)
# Sleep briefly to prevent CPU spinning
# But wake up quickly to catch changes
elapsed = time.monotonic() - t0
sleep_time = max(0.01, 0.05 - elapsed) # 10-50ms sleep
time.sleep(sleep_time)
if _LOG_INFO:
logger.info("Event-driven loop stopped")
def start(self):
"""Start the event-driven service."""
self.service.start()
self._running = True
self._thread = threading.Thread(target=self._event_loop, daemon=True)
self._thread.start()
if _LOG_INFO:
logger.info("Event-driven ExF service started")
def stop(self):
"""Stop the service."""
self._running = False
if self._thread:
self._thread.join(timeout=5.0)
self.service.stop()
if self._hz_client:
try:
self._hz_client.shutdown()
except Exception:
pass
if _LOG_INFO:
logger.info("Event-driven ExF service stopped (%d pushes, %d critical)",
self._push_count, self._critical_push_count)
def get_status(self) -> Dict[str, Any]:
"""Get service status."""
return {
'running': self._running,
'push_count': self._push_count,
'critical_push_count': self._critical_push_count,
'last_critical_push': self._last_critical_push,
'hz_connected': self._hz_client is not None,
}
# =============================================================================
# DROP-IN REPLACEMENT FLOW
# =============================================================================
def run_event_driven_flow(warmup_s: int = 10):
"""
Run event-driven ExF flow (drop-in replacement for exf_fetcher_flow).
This pushes to Hazelcast immediately when critical indicators change,
rather than on a fixed 0.5s interval.
"""
logging.basicConfig(level=logging.INFO)
print("=" * 60)
print("Event-Driven ExF Flow")
print("=" * 60)
# Create base service
base_service = RealTimeExFService()
# Wrap with event-driven layer
event_service = EventDrivenExFService(base_service)
# Start
event_service.start()
print(f"Service started — warmup {warmup_s}s...")
time.sleep(warmup_s)
print(f"Running event-driven (critical: {CRITICAL_INDICATORS})")
print("Push on change vs fixed interval")
try:
while True:
time.sleep(10)
status = event_service.get_status()
print(f"Pushes: {status['push_count']} ({status['critical_push_count']} critical)")
except KeyboardInterrupt:
print("\nStopping...")
finally:
event_service.stop()
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--warmup", type=int, default=10)
args = parser.parse_args()
run_event_driven_flow(warmup_s=args.warmup)

View File

@@ -0,0 +1,546 @@
"""
test_deribit_api_parity.py
==========================
Validates that the current Deribit API call format + parser returns values
that match pre-existing ExtF data stored in the NG3 eigenvalue NPZ cache.
BACKGROUND
----------
The Deribit API changed (date unknown) and an agent added start_timestamp /
end_timestamp parameters to make requests work again. The user asked for
explicit parity validation BEFORE locking in those params.
WHAT THIS TEST DOES
-------------------
For each "known" date that has ground-truth data in the NPZ cache:
1. Load the stored value (ground truth, pre-API-change)
2. Re-query Deribit with several candidate endpoint+param combinations
3. Compare each result to ground truth (absolute + relative tolerance)
4. PASS / FAIL per candidate, per indicator
The candidate that produces the best parity across all known dates is the
one that should be locked in as the production Deribit URL scheme.
INDICATORS COVERED (ACBv6 minimum requirement)
----------------------------------------------
fund_dbt_btc — BTC-PERPETUAL 8h funding rate ← ACBv6 primary signal
fund_dbt_eth — ETH-PERPETUAL 8h funding rate
dvol_btc — BTC Deribit volatility index (hourly close)
dvol_eth — ETH Deribit volatility index (hourly close)
USAGE
-----
python external_factors/test_deribit_api_parity.py
# Quick run — funding only (fastest, most critical for ACBv6):
python external_factors/test_deribit_api_parity.py --indicators fund
# Verbose — show raw responses:
python external_factors/test_deribit_api_parity.py --verbose
INTERPRETING RESULTS
--------------------
LOCKED IN: All parity checks PASS → endpoint confirmed
MISMATCH: Values differ > tolerance → endpoint is wrong / format changed
SKIP: No NPZ data for that date (not a failure)
TOLERANCES
----------
fund_dbt_btc / fund_dbt_eth : abs ≤ 1e-7 (funding rates are tiny)
dvol_btc / dvol_eth : abs ≤ 0.5 (DVOL in vol-points)
"""
import asyncio
import aiohttp
import argparse
import json
import sys
import time
import traceback
from datetime import datetime, timezone
sys.stdout.reconfigure(encoding='utf-8', errors='replace')
from pathlib import Path
from typing import Optional
import numpy as np
# ---------------------------------------------------------------------------
# Paths
# ---------------------------------------------------------------------------
_HERE = Path(__file__).resolve().parent
_EIGENVALUES_PATH = Path(r"C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues")
# Known dates with confirmed NPZ data in the gold window (2025-12-31→2026-02-26).
# Add more as the cache grows. Values were stored by the NG5 scanner.
KNOWN_DATES = [
"2026-01-02",
"2026-01-03",
"2026-01-04",
"2026-01-05",
"2026-01-06",
"2026-01-07",
"2026-01-08",
"2026-01-21",
]
# ---------------------------------------------------------------------------
# Tolerances (per indicator)
# ---------------------------------------------------------------------------
TOLERANCES = {
"fund_dbt_btc": 1e-7,
"fund_dbt_eth": 1e-7,
"dvol_btc": 0.5,
"dvol_eth": 0.5,
}
# ---------------------------------------------------------------------------
# Ground-truth loader
# ---------------------------------------------------------------------------
def load_npz_ground_truth(date_str: str) -> Optional[dict]:
"""
Load Deribit indicator values stored in an NG3 scan NPZ for *date_str*.
Returns dict {indicator_name: float} or None if no data.
"""
date_path = _EIGENVALUES_PATH / date_str
if not date_path.exists():
return None
files = sorted(date_path.glob("scan_*__Indicators.npz"))
if not files:
return None
d = np.load(files[0], allow_pickle=True)
if "api_names" not in d or "api_indicators" not in d:
return None
names = list(d["api_names"])
vals = d["api_indicators"]
succ = d["api_success"] if "api_success" in d else np.ones(len(names), dtype=bool)
result = {}
for i, n in enumerate(names):
if succ[i]:
target_names = {"fund_dbt_btc", "fund_dbt_eth", "dvol_btc", "dvol_eth"}
if n in target_names:
result[n] = float(vals[i])
return result if result else None
# ---------------------------------------------------------------------------
# Endpoint candidates
# ---------------------------------------------------------------------------
def _day_epoch_ms(date_str: str, hour: int = 0) -> int:
"""Return Unix milliseconds for a given date + hour (UTC)."""
dt = datetime(int(date_str[:4]), int(date_str[5:7]), int(date_str[8:10]),
hour, 0, 0, tzinfo=timezone.utc)
return int(dt.timestamp() * 1000)
def ts23utc(date_str: str) -> int:
"""Return Unix ms for 23:00 UTC on date_str — canonical NG5 scanner capture time."""
return _day_epoch_ms(date_str, hour=23)
def build_candidate_urls(date_str: str) -> dict:
"""
Build all candidate URL variants for a historical date.
Returns dict: { candidate_label: {indicator: url, ...} }
"""
day_start = _day_epoch_ms(date_str, hour=0)
next_start = day_start + 86400_000 # +24h
ts23 = ts23utc(date_str) # 23:00 UTC — canonical NG5 capture time
# Funding: NG5 scanner confirmed to run at 23:00 UTC.
# We use get_funding_rate_history (full day) then extract the 23:00 UTC entry.
# Candidate variants test different windows and parsers.
fund_urls = {
# CANDIDATE A (EXPECTED CORRECT): get_funding_rate_history full day → 23:00 UTC entry
"A_history_23utc": {
"fund_dbt_btc": (
f"https://www.deribit.com/api/v2/public/get_funding_rate_history"
f"?instrument_name=BTC-PERPETUAL"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_history_at_23utc",
ts23,
),
"fund_dbt_eth": (
f"https://www.deribit.com/api/v2/public/get_funding_rate_history"
f"?instrument_name=ETH-PERPETUAL"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_history_at_23utc",
ts23,
),
},
# CANDIDATE B (AGENT FIX — expected wrong): get_funding_rate_value over full day
"B_value_fullday_agentfix": {
"fund_dbt_btc": (
f"https://www.deribit.com/api/v2/public/get_funding_rate_value"
f"?instrument_name=BTC-PERPETUAL"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_scalar_result",
0,
),
"fund_dbt_eth": (
f"https://www.deribit.com/api/v2/public/get_funding_rate_value"
f"?instrument_name=ETH-PERPETUAL"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_scalar_result",
0,
),
},
# CANDIDATE C: get_funding_rate_history narrow window (±2h around 23:00) → last entry
"C_history_narrow23": {
"fund_dbt_btc": (
f"https://www.deribit.com/api/v2/public/get_funding_rate_history"
f"?instrument_name=BTC-PERPETUAL"
f"&start_timestamp={ts23 - 7200_000}&end_timestamp={ts23 + 3600_000}",
"parse_history_at_23utc",
ts23,
),
"fund_dbt_eth": (
f"https://www.deribit.com/api/v2/public/get_funding_rate_history"
f"?instrument_name=ETH-PERPETUAL"
f"&start_timestamp={ts23 - 7200_000}&end_timestamp={ts23 + 3600_000}",
"parse_history_at_23utc",
ts23,
),
},
}
# DVOL: hourly resolution; scanner at 23:00 UTC → take candle closest to 23:00
dvol_urls = {
# CANDIDATE D: get_volatility_index_data, 1h resolution, full day
"D_dvol_1h_fullday": {
"dvol_btc": (
f"https://www.deribit.com/api/v2/public/get_volatility_index_data"
f"?currency=BTC&resolution=3600"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_dvol_at_23utc",
ts23,
),
"dvol_eth": (
f"https://www.deribit.com/api/v2/public/get_volatility_index_data"
f"?currency=ETH&resolution=3600"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_dvol_at_23utc",
ts23,
),
},
# CANDIDATE E: agent's variant — 60-min resolution + count=10
"E_dvol_60min_count10": {
"dvol_btc": (
f"https://www.deribit.com/api/v2/public/get_volatility_index_data"
f"?currency=BTC&resolution=60&count=10"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_dvol_last",
0,
),
"dvol_eth": (
f"https://www.deribit.com/api/v2/public/get_volatility_index_data"
f"?currency=ETH&resolution=60&count=10"
f"&start_timestamp={day_start}&end_timestamp={next_start}",
"parse_dvol_last",
0,
),
},
}
# Merge
all_candidates = {}
all_candidates.update(fund_urls)
all_candidates.update(dvol_urls)
return all_candidates
# ---------------------------------------------------------------------------
# Parsers
# ---------------------------------------------------------------------------
def parse_history_at_23utc(d: dict, target_ts_ms: int = 0) -> Optional[float]:
"""
Parse get_funding_rate_history response.
Returns interest_8h from the entry CLOSEST to 23:00 UTC on the target date.
The NG5 scanner runs at 23:00 UTC daily — this is the canonical capture time.
Falls back to last entry if 23:00 UTC entry not found (e.g. live/realtime call).
"""
if not isinstance(d, dict) or "result" not in d:
return None
r = d["result"]
if not isinstance(r, list) or not r:
return None
try:
r_sorted = sorted(r, key=lambda x: x.get("timestamp", 0))
if target_ts_ms > 0:
# Find entry closest to 23:00 UTC for the target date
best = min(r_sorted, key=lambda x: abs(x.get("timestamp", 0) - target_ts_ms))
else:
# Live call: take last entry (most recent)
best = r_sorted[-1]
return float(best.get("interest_8h", 0))
except (TypeError, KeyError, ValueError):
return None
def parse_scalar_result(d: dict) -> Optional[float]:
"""Parse get_funding_rate_value response — result is a scalar."""
if not isinstance(d, dict) or "result" not in d:
return None
r = d["result"]
if isinstance(r, list) and r:
# Fallback: if API returned list anyway, take last interest_8h
try:
return float(sorted(r, key=lambda x: x.get("timestamp", 0))[-1].get("interest_8h", 0))
except (TypeError, KeyError, ValueError):
return None
try:
return float(r)
except (TypeError, ValueError):
return None
def parse_dvol_last(d: dict, target_ts_ms: int = 0) -> Optional[float]:
"""Parse get_volatility_index_data — returns close from entry closest to target_ts_ms (or last)."""
if not isinstance(d, dict) or "result" not in d:
return None
r = d["result"]
if not isinstance(r, dict) or "data" not in r:
return None
data = r["data"]
if not data:
return None
# data row format: [timestamp_ms, open, high, low, close]
try:
rows = sorted(data, key=lambda x: x[0])
if target_ts_ms > 0:
best = min(rows, key=lambda x: abs(x[0] - target_ts_ms))
else:
best = rows[-1]
return float(best[4]) if len(best) > 4 else float(best[-1])
except (TypeError, IndexError, ValueError):
return None
def parse_dvol_at_23utc(d: dict, target_ts_ms: int = 0) -> Optional[float]:
"""Alias for parse_dvol_last — explicit 23:00 UTC variant."""
return parse_dvol_last(d, target_ts_ms)
PARSERS = {
"parse_history_at_23utc": parse_history_at_23utc,
"parse_history_last": lambda d, ts=0: parse_history_at_23utc(d, 0),
"parse_scalar_result": lambda d, ts=0: parse_scalar_result(d),
"parse_dvol_last": parse_dvol_last,
"parse_dvol_at_23utc": parse_dvol_at_23utc,
}
# ---------------------------------------------------------------------------
# HTTP fetcher
# ---------------------------------------------------------------------------
async def fetch_json(session: aiohttp.ClientSession, url: str, verbose: bool = False) -> Optional[dict]:
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=15)) as resp:
if resp.status != 200:
if verbose:
print(f" HTTP {resp.status} for {url[:80]}...")
return None
text = await resp.text()
d = json.loads(text)
if verbose:
preview = str(d)[:200]
print(f" RAW: {preview}")
return d
except Exception as e:
if verbose:
print(f" FETCH ERROR: {e}{url[:80]}")
return None
# ---------------------------------------------------------------------------
# Main parity checker
# ---------------------------------------------------------------------------
async def run_parity_check(dates: list, indicators_filter: Optional[set],
verbose: bool) -> dict:
"""
Run parity check for all dates × candidates.
Returns nested dict: results[candidate][indicator] = {pass: int, fail: int, details: [...]}
"""
results = {} # candidate → indicator → {pass, fail, abs_diffs, details}
async with aiohttp.ClientSession(
headers={"User-Agent": "DOLPHIN-ExtF-Parity-Test/1.0"}
) as session:
for date_str in dates:
print(f"\n{'='*60}")
print(f"DATE: {date_str}")
print(f"{'='*60}")
# Ground truth
gt = load_npz_ground_truth(date_str)
if gt is None:
print(" [SKIP] No NPZ data available for this date.")
continue
print(f" Ground truth (NPZ): {gt}")
# Build candidates
candidates = build_candidate_urls(date_str)
for cand_label, indicator_urls in candidates.items():
for ind_name, url_spec in indicator_urls.items():
# Unpack 3-tuple (url, parser_name, target_ts_ms)
url, parser_name, target_ts = url_spec
# Filter
if indicators_filter and ind_name not in indicators_filter:
continue
if ind_name not in gt:
continue # no ground truth for this indicator on this date
gt_val = gt[ind_name]
tol = TOLERANCES.get(ind_name, 1e-6)
if verbose:
print(f"\n [{cand_label}] {ind_name}")
print(f" URL: {url[:100]}...")
# Fetch + parse
raw = await fetch_json(session, url, verbose=verbose)
if raw is None:
got_val = None
status = "FETCH_FAIL"
else:
parser = PARSERS[parser_name]
got_val = parser(raw, target_ts)
if got_val is None:
status = "PARSE_FAIL"
else:
abs_diff = abs(got_val - gt_val)
rel_diff = abs_diff / max(abs(gt_val), 1e-12)
if abs_diff <= tol:
status = "PASS"
else:
status = f"FAIL (abs={abs_diff:.2e}, rel={rel_diff:.1%})"
# Store
if cand_label not in results:
results[cand_label] = {}
if ind_name not in results[cand_label]:
results[cand_label][ind_name] = {"pass": 0, "fail": 0, "skip": 0, "abs_diffs": []}
rec = results[cand_label][ind_name]
if status == "PASS":
rec["pass"] += 1
rec["abs_diffs"].append(abs(got_val - gt_val))
elif status == "FETCH_FAIL" or status == "PARSE_FAIL":
rec["skip"] += 1
else:
rec["fail"] += 1
icon = "OK" if status == "PASS" else ("~~" if "FAIL" not in status else "XX")
got_str = f"{got_val:.6e}" if got_val is not None else "None"
print(f" {icon} [{cand_label}] {ind_name:16s} gt={gt_val:.6e} got={got_str} {status}")
# Rate-limit courtesy
await asyncio.sleep(0.15)
return results
def print_summary(results: dict):
"""Print pass/fail summary table and recommend endpoint."""
print(f"\n{'='*70}")
print("PARITY SUMMARY")
print(f"{'='*70}")
print(f"{'Candidate':<30} {'Indicator':<16} {'PASS':>5} {'FAIL':>5} {'SKIP':>5} {'Verdict'}")
print("-" * 70)
winner = {} # indicator → best candidate
for cand_label, ind_results in results.items():
for ind_name, rec in sorted(ind_results.items()):
p, f, s = rec["pass"], rec["fail"], rec["skip"]
if p + f == 0:
verdict = "NO DATA"
elif f == 0:
max_abs = max(rec["abs_diffs"]) if rec["abs_diffs"] else 0
verdict = f"LOCKED IN OK (max_abs={max_abs:.2e})"
if ind_name not in winner:
winner[ind_name] = (cand_label, max_abs)
elif max_abs < winner[ind_name][1]:
winner[ind_name] = (cand_label, max_abs)
else:
verdict = f"MISMATCH XX ({f} failures)"
print(f"{cand_label:<30} {ind_name:<16} {p:>5} {f:>5} {s:>5} {verdict}")
print(f"\n{'='*70}")
print("RECOMMENDED ENDPOINT PER INDICATOR")
print(f"{'='*70}")
if winner:
for ind_name, (cand, max_abs) in sorted(winner.items()):
print(f" {ind_name:<16}{cand} (max abs diff = {max_abs:.2e})")
else:
print(" WARNING: No candidate passed parity for any indicator.")
print(" Possible causes:")
print(" • Deribit API response format changed (check raw output with --verbose)")
print(" • parser needs updating for new response structure")
print(" • timestamps or window size wrong — try different KNOWN_DATES")
print()
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(description="Deribit ExtF API parity test")
parser.add_argument("--indicators", choices=["fund", "dvol", "all"], default="all",
help="Which indicator groups to test (default: all)")
parser.add_argument("--dates", nargs="*", default=None,
help="Override KNOWN_DATES list (e.g. 2026-01-02 2026-01-05)")
parser.add_argument("--verbose", action="store_true",
help="Print raw API responses for debugging")
args = parser.parse_args()
dates = args.dates if args.dates else KNOWN_DATES
ind_filter = None
if args.indicators == "fund":
ind_filter = {"fund_dbt_btc", "fund_dbt_eth"}
elif args.indicators == "dvol":
ind_filter = {"dvol_btc", "dvol_eth"}
print("DOLPHIN — Deribit ExtF API Parity Test")
print(f"Testing {len(dates)} known dates × {args.indicators} indicators")
print(f"Ground truth: {_EIGENVALUES_PATH}")
print()
results = asyncio.run(run_parity_check(dates, ind_filter, args.verbose))
print_summary(results)
# Exit non-zero if any critical indicator (fund_dbt_btc) has failures
critical = results.get("A_history_fullday", {}).get("fund_dbt_btc", {})
if critical.get("fail", 0) > 0 or critical.get("pass", 0) == 0:
# Try to find ANY passing candidate for fund_dbt_btc
any_pass = any(
results.get(c, {}).get("fund_dbt_btc", {}).get("pass", 0) > 0 and
results.get(c, {}).get("fund_dbt_btc", {}).get("fail", 0) == 0
for c in results
)
if not any_pass:
print("CRITICAL: No valid endpoint found for fund_dbt_btc (ACBv6 dependency)")
sys.exit(1)
else:
print(" fund_dbt_btc: preferred candidate (A_history_fullday) failed but another passed.")
print(" Update _build_deribit_url() in realtime_exf_service.py to use the passing candidate.")
if __name__ == "__main__":
main()