Files
DOLPHIN/prod/docs/DATA_REFERENCE.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

23 KiB
Executable File
Raw Permalink Blame History

Dolphin Data Layer Reference

Overview

The Dolphin system has a three-tier data architecture:

┌─────────────────────────────────────────────────────────────────────┐
│ LIVE HOT PATH (sub-second)                                          │
│ Hazelcast 5.3 (RAM-only) — single source of truth for all services │
│ Port: 5701 | Cluster: dolphin                                       │
└─────────────────────────────────────────────────────────────────────┘
         │ ch_writer (async fire-and-forget)
         ▼
┌─────────────────────────────────────────────────────────────────────┐
│ WARM STORE (analytics, dashboards, recovery)                        │
│ ClickHouse 24.3 (MergeTree) — structured historical data           │
│ Port: 8123 (HTTP) / 9000 (TCP) | DB: dolphin, dolphin_green       │
└─────────────────────────────────────────────────────────────────────┘
         │ periodic dumps / cache builds
         ▼
┌─────────────────────────────────────────────────────────────────────┐
│ COLD STORE (backtesting, training, research)                        │
│ Parquet / Arrow / JSON files on disk under /mnt/                    │
└─────────────────────────────────────────────────────────────────────┘

1. ClickHouse

Connection

Param Value
URL http://localhost:8123
User dolphin
Password dolphin_ch_2026
DB (blue) dolphin
DB (green) dolphin_green
Auth headers X-ClickHouse-User / X-ClickHouse-Key

Quick Query

# CLI (from host)
curl -s "http://localhost:8123/?database=dolphin" \
     -H "X-ClickHouse-User: dolphin" \
     -H "X-ClickHouse-Key: dolphin_ch_2026" \
     -d "SELECT count() FROM trade_events FORMAT JSON"

# Python (urllib)
import urllib.request, json, base64

def ch_query(sql):
    """Execute ClickHouse query, return parsed JSON result."""
    url = "http://localhost:8123/"
    req = urllib.request.Request(url, data=(sql + "\nFORMAT JSON").encode())
    req.add_header("X-ClickHouse-User", "dolphin")
    req.add_header("X-ClickHouse-Key", "dolphin_ch_2026")
    resp = urllib.request.urlopen(req, timeout=10)
    return json.loads(resp.read().decode())

result = ch_query("SELECT * FROM dolphin.trade_events ORDER BY ts DESC LIMIT 10")
for row in result["data"]:
    print(row)

Insert Pattern

# Async fire-and-forget (production — from ch_writer.py)
from ch_writer import ch_put, ts_us

ch_put("trade_events", {
    "ts": ts_us(),               # DateTime64(6) microsecond precision
    "date": "2026-04-15",
    "strategy": "blue",
    "asset": "BTCUSDT",
    "side": "SHORT",
    "entry_price": 84500.0,
    "exit_price": 84200.0,
    "quantity": 0.01,
    "pnl": 3.0,
    "pnl_pct": 0.00355,
    "exit_reason": "FIXED_TP",
    "vel_div_entry": -0.0319,
    "leverage": 9.0,
    "bars_held": 45,
})

# Direct insert (for one-off scripts)
import urllib.request, json

body = (json.dumps(row) + "\n").encode()
url = "http://localhost:8123/?database=dolphin&query=INSERT+INTO+trade_events+FORMAT+JSONEachRow"
req = urllib.request.Request(url, data=body, method="POST")
req.add_header("X-ClickHouse-User", "dolphin")
req.add_header("X-ClickHouse-Key", "dolphin_ch_2026")
urllib.request.urlopen(req, timeout=5)

dolphin Database — Tables

trade_events — Closed trades (471 rows, 2026-03-31 → ongoing)

Column Type Description
ts DateTime64(6, UTC) Trade close timestamp (microsecond)
date Date Trade date (partition key)
strategy LowCardinality(String) "blue" or "green"
asset LowCardinality(String) e.g. "ENJUSDT", "LTCUSDT"
side LowCardinality(String) "SHORT" (always in champion)
entry_price Float64 Entry price
exit_price Float64 Exit price
quantity Float64 Position size in asset units
pnl Float64 Profit/loss in USDT
pnl_pct Float32 PnL as fraction of notional
exit_reason LowCardinality(String) See exit reasons below
vel_div_entry Float32 Velocity divergence at entry
boost_at_entry Float32 ACB boost at entry time
beta_at_entry Float32 ACB beta at entry time
posture LowCardinality(String) System posture at entry
leverage Float32 Applied leverage
regime_signal Int8 Regime classification
capital_before Float64 Capital before trade
capital_after Float64 Capital after trade
peak_capital Float64 Peak capital at time
drawdown_at_entry Float32 Drawdown pct at entry
open_positions_count UInt8 Open positions (always 0 or 1)
scan_uuid String UUIDv7 trace ID
bars_held UInt16 Number of bars held

Engine: MergeTree | Partition: toYYYYMM(ts) | Order: (ts, asset)

Exit reasons observed in production:

Exit Reason Meaning
MAX_HOLD Held for max_hold_bars (125) without TP or stop hit
FIXED_TP Take-profit target (95bps) reached
HIBERNATE_HALT Posture changed to HIBERNATE, position closed
SUBDAY_ACB_NORMALIZATION ACB day-reset forced position close

eigen_scans — Processed eigenscans (68k rows, ~11s cadence)

Column Type Description
ts DateTime64(6, UTC) Scan timestamp
scan_number UInt32 Monotonic scan counter
vel_div Float32 Velocity divergence (v50 - v750)
w50_velocity Float32 50-window correlation velocity
w750_velocity Float32 750-window correlation velocity
instability_50 Float32 Instability measure
scan_to_fill_ms Float32 Latency: scan → fill
step_bar_ms Float32 Latency: step_bar computation
scan_uuid String UUIDv7 trace ID

Engine: MergeTree | Partition: toYYYYMM(ts) | Order: (ts, scan_number) | TTL: 10 years

status_snapshots — System status (686k rows, ~10s cadence, TTL 180 days)

Column Type Description
ts DateTime64(3, UTC) Snapshot timestamp
capital Float64 Current capital
roi_pct Float32 Return on investment %
dd_pct Float32 Current drawdown %
trades_executed UInt16 Total trades count
posture LowCardinality(String) APEX / TURTLE / HIBERNATE
rm Float32 Risk metric
vel_div Float32 Latest velocity divergence
vol_ok UInt8 Volatility within bounds (0/1)
phase LowCardinality(String) Trading phase
mhs_status LowCardinality(String) Meta health status
boost Float32 ACB boost
cat5 Float32 Category 5 risk metric

Engine: MergeTree | Order: ts | TTL: 180 days

posture_events — Posture transitions (92 rows)

Column Type Description
ts DateTime64(6, UTC) Transition timestamp
posture LowCardinality(String) New posture (APEX/TURTLE/HIBERNATE)
rm Float32 Risk metric at transition
prev_posture LowCardinality(String) Previous posture
trigger String JSON with Cat1-Cat4 values that triggered transition
scan_uuid String UUIDv7 trace ID

Postures (ordered by risk): APEX (full risk) → TURTLE (reduced) → HIBERNATE (minimal)

acb_state — Adaptive Circuit Breaker (26k rows, ~30s cadence)

Column Type Description
ts DateTime64(6, UTC) Timestamp
boost Float32 Leverage boost multiplier (≥1.0)
beta Float32 Risk scaling factor
signals Float32 Signal quality metric

meta_health — Meta Health Service v3 (78k rows, ~10s cadence)

Column Type Description
ts DateTime64(6, UTC) Timestamp
status LowCardinality(String) GREEN / YELLOW / RED
rm_meta Float32 Aggregate health score (01)
m1_data_infra Float32 Data infrastructure health
m1_trader Float32 Trader process health
m2_heartbeat Float32 Heartbeat freshness
m3_data_freshness Float32 Scan data freshness
m4_control_plane Float32 Control plane (HZ/Prefect) health
m5_coherence Float32 State coherence across services
m6_test_integrity Float32 Test gate pass status
service_status String JSON service states
hz_key_status String HZ key freshness

exf_data — External Factors (1.56M rows, ~0.5s cadence)

Column Type Description
ts DateTime64(6, UTC) Timestamp
funding_rate Float32 BTC funding rate
dvol Float32 Deribit DVOL (implied volatility)
fear_greed Float32 Fear & Greed index
taker_ratio Float32 Taker buy/sell ratio

obf_universe — Order Book Features (821M rows, ~500ms cadence, 542 symbols)

Column Type Description
ts DateTime64(3, UTC) Timestamp (millisecond)
symbol LowCardinality(String) Trading pair
spread_bps Float32 Bid-ask spread in basis points
depth_1pct_usd Float64 USD depth at 1% from mid
depth_quality Float32 Book quality metric
fill_probability Float32 Estimated fill probability
imbalance Float32 Bid/ask imbalance
best_bid Float64 Best bid price
best_ask Float64 Best ask price
n_bid_levels UInt8 Number of bid levels
n_ask_levels UInt8 Number of ask levels

Engine: MergeTree | Partition: toYYYYMM(ts) | Order: (symbol, ts)

supervisord_state — Process manager state (138k rows)

Column Type Description
ts DateTime64(6, UTC) Timestamp
process_name LowCardinality(String) Service name
group_name LowCardinality(String) dolphin_data or dolphin
state LowCardinality(String) RUNNING / STOPPED / EXITED
pid UInt32 Process ID
uptime_s UInt32 Uptime in seconds
exit_status Int16 Exit code
source LowCardinality(String) Source of state change

service_lifecycle — Service start/stop events (62 rows)

Column Type Description
ts DateTime64(6, UTC) Timestamp
service LowCardinality(String) Service name
event LowCardinality(String) START / EXIT
reason String NORMAL_START / SIGTERM / NORMAL_EXIT
exit_code Int16 Exit code
signal_num Int16 Signal number
pid UInt32 Process ID

system_stats — Host system metrics (35k rows)

Column Type Description
ts DateTime64(3, UTC) Timestamp
mem_used_gb Float32 Memory used (GB)
mem_available_gb Float32 Memory available (GB)
mem_pct Float32 Memory usage %
load_1m / load_5m / load_15m Float32 Load averages
net_rx_mb_s Float32 Network receive (MB/s)
net_tx_mb_s Float32 Network transmit (MB/s)
net_iface LowCardinality(String) Network interface

dolphin Database — Views

v_trade_summary_30d — 30-day rolling trade stats

SELECT * FROM dolphin.v_trade_summary_30d
-- Returns: strategy, n_trades, wins, win_rate_pct, total_pnl,
--          avg_pnl_pct, median_pnl_pct, max_dd_seen_pct

v_current_posture — Latest posture state

SELECT * FROM dolphin.v_current_posture
-- Returns: posture, rm, trigger, ts

v_process_health — Current process states

SELECT * FROM dolphin.v_process_health ORDER BY group_name, process_name
-- Returns: process_name, group_name, state, pid, uptime_s, last_seen

v_scan_latency_1h — Last hour scan latency percentiles

SELECT * FROM dolphin.v_scan_latency_1h
-- Returns: p50_ms, p95_ms, p99_ms, n_scans, window_start

v_system_stats_1h — Last hour system metrics (5-min buckets)

SELECT * FROM dolphin.v_system_stats_1h
-- Returns: bucket, mem_pct_avg, load_avg, net_rx_peak, net_tx_peak

v_scan_causal_chain — Trace scans → trades by scan_uuid

dolphin_green Database

Mirror of dolphin with tables: acb_state, account_events, daily_pnl, eigen_scans, exf_data, meta_health, obf_universe, posture_events, service_lifecycle, status_snapshots, supervisord_state, system_stats, trade_events (325 rows, 2026-04-12 → ongoing).

Useful Queries

-- Last 20 trades with key metrics
SELECT ts, asset, side, entry_price, exit_price, pnl, pnl_pct,
       exit_reason, leverage, vel_div_entry, bars_held, posture
FROM dolphin.trade_events ORDER BY ts DESC LIMIT 20;

-- Today's P&L summary
SELECT count() as trades, sum(pnl) as total_pnl,
       countIf(pnl>0) as wins, round(countIf(pnl>0)/count()*100,1) as win_rate
FROM dolphin.trade_events WHERE date = today();

-- Exit reason distribution
SELECT exit_reason, count() as n, round(sum(pnl),2) as total_pnl,
       round(countIf(pnl>0)/count()*100,1) as win_rate
FROM dolphin.trade_events GROUP BY exit_reason ORDER BY n DESC;

-- Per-asset performance
SELECT asset, count() as trades, round(sum(pnl),2) as pnl,
       round(countIf(pnl>0)/count()*100,1) as wr, round(avg(leverage),1) as avg_lev
FROM dolphin.trade_events GROUP BY asset ORDER BY pnl DESC;

-- Capital curve (from status snapshots)
SELECT ts, capital, roi_pct, dd_pct, posture, vel_div
FROM dolphin.status_snapshots
WHERE ts >= today() - INTERVAL 7 DAY
ORDER BY ts;

-- Scan-to-trade latency distribution
SELECT quantile(0.5)(scan_to_fill_ms) as p50,
       quantile(0.95)(scan_to_fill_ms) as p95,
       quantile(0.99)(scan_to_fill_ms) as p99
FROM dolphin.eigen_scans WHERE ts >= now() - INTERVAL 1 HOUR;

-- Leverage distribution
SELECT round(leverage,1) as lev, count() as n
FROM dolphin.trade_events GROUP BY lev ORDER BY lev;

-- Scan rate per hour
SELECT toStartOfHour(ts) as hour, count() as scans
FROM dolphin.eigen_scans
WHERE ts >= now() - INTERVAL 24 HOUR
GROUP BY hour ORDER BY hour;

2. Hazelcast

Connection

Param Value
Host localhost:5701
Cluster dolphin
Python client hazelcast-python-client 5.x
Management UI http://localhost:8080
import hazelcast

client = hazelcast.HazelcastClient(
    cluster_name="dolphin",
    cluster_members=["localhost:5701"],
    connection_timeout=5.0,
)

WARNING: Hazelcast is RAM-only. Never restart the container — all state is lost on restart.

IMap Reference

DOLPHIN_FEATURES (547 entries) — Central data bus

Key Type Writer Description
latest_eigen_scan JSON string scan_bridge Latest eigenvalue scan with scan_number, vel_div, regime, asset data
exf_latest JSON string exf_fetcher External factors: funding rates, OI, L/S ratio, taker, basis, etc.
acb_boost JSON string acb_processor ACB boost/beta/signals with CP lock
mc_forewarner_latest JSON string mc_forewarning Monte Carlo risk envelope status
asset_<SYMBOL>_ob JSON string obf_universe Per-asset order book snapshot

latest_eigen_scan structure:

{
  "scan_number": 134276,
  "timestamp": 1776269558.4,
  "file_mtime": 1776269558.4,
  "result": {
    "type": "scan",
    "timestamp": "2026-04-15 18:12:33",
    "asset": "BTCUSDT",
    "regime": "BEAR",
    "bull_pct": 42.86,
    "bear_pct": 57.14,
    "sentiment": "BEARISH",
    "total_symbols": 50,
    "correlation_symbol": "...",
    "vel_div": -0.0319,
    "...": "..."
  },
  "target_asset": "BTCUSDT",
  "version": "NG7",
  "_ng7_metadata": { "scan_number": 134276, "uuid": "...", ... }
}

exf_latest structure:

{
  "funding_btc": -5.085e-05,
  "funding_btc_lagged": -5.085e-05,
  "funding_eth": 1.648e-05,
  "oi_btc": 97583.527,
  "oi_eth": 2246343.627,
  "ls_btc": 0.8218,
  "ls_eth": 1.4067,
  "taker": 0.5317,
  "taker_lagged": 2.1506,
  "basis": -0.07784355,
  "imbalance_btc": -0.786,
  "dvol": 52.34,
  "fear_greed": 45.0
}

asset_<SYMBOL>_ob structure:

{
  "best_bid": 84500.0,
  "best_ask": 84501.0,
  "spread_bps": 1.18,
  "depth_1pct_usd": 50000.0,
  "depth_quality": 0.85,
  "fill_probability": 0.95,
  "imbalance": 0.03,
  "n_bid_levels": 5,
  "n_ask_levels": 5
}

DOLPHIN_STATE_BLUE (2 entries) — Blue strategy runtime state

Key Description
capital_checkpoint {"capital": 25705.50, "ts": 1776269557.97}
engine_snapshot Full engine state (see below)

engine_snapshot structure:

{
  "capital": 25705.50,
  "open_positions": [],
  "algo_version": "v2_gold_fix_v50-v750",
  "last_scan_number": 134276,
  "last_vel_div": 0.0201,
  "vol_ok": true,
  "posture": "APEX",
  "scans_processed": 6377,
  "trades_executed": 71,
  "bar_idx": 4655,
  "timestamp": "2026-04-15T16:12:37",
  "leverage_soft_cap": 8.0,
  "leverage_abs_cap": 9.0,
  "open_notional": 0.0,
  "current_leverage": 0.0
}

DOLPHIN_PNL_BLUE (3 entries) — Daily P&L

Keyed by date string: "2026-04-15"{"portfolio_capital": 20654.01, "engine_capital": 20654.01}

DOLPHIN_STATE_GREEN (1 entry) — Green strategy state

Same structure as blue: capital_checkpoint.

DOLPHIN_META_HEALTH (1 entry)

Key: "latest"{"rm_meta": 0.94, "status": "GREEN", "m1_data_infra": 1.0, "m1_trader": 1.0, ...}

Read/Write Patterns

# Read from HZ
features = client.get_map("DOLPHIN_FEATURES").blocking()
scan = json.loads(features.get("latest_eigen_scan"))

# Write to HZ
features.put("exf_latest", json.dumps(payload))

# Atomic update with CP lock (used by ACB)
lock = client.cp_subsystem.get_lock("acb_update_lock").blocking()
lock.lock()
try:
    features.put("acb_boost", json.dumps(acb_data))
finally:
    lock.unlock()

# HZ warmup after restart (reconstruct from ClickHouse)
from hz_warmup import _ch_query
latest = _ch_query("SELECT * FROM dolphin.acb_state ORDER BY ts DESC LIMIT 1")
features.put("acb_boost", json.dumps(latest[0]))

3. File-Based Data

Parquet (VBT Cache)

Location: /mnt/dolphinng5_predict/vbt_cache_synth_15M/

Daily parquet files (YYYY-MM-DD.parquet) containing scan data with columns: vel_div, v50_vel, v150_vel, v750_vel, asset prices, BTC price, and derived features. Used by CI test fixtures and backtesting.

Read:

import pandas as pd
df = pd.read_parquet("/mnt/dolphinng5_predict/vbt_cache_synth_15M/2026-02-25.parquet")

Arrow Scans (Live Pipeline)

Location: /mnt/dolphinng6_data/arrow_scans/<date>/*.arrow

PyArrow IPC files written by NG8 scanner. Each file = one eigenscan. Consumed by scan_bridge_service → pushed to Hazelcast.

Eigenvalue JSON

Location: /mnt/dolphinng6_data/eigenvalues/<date>/*.json

Per-scan JSON files with eigenvalue data: scan_number, eigenvalues array, regime, bull/bear percentages.

Correlation Matrices

Location: /mnt/dolphinng6_data/matrices/<date>/

ZST-compressed 50×50 correlation matrices: scan_NNNNNN_wWWW_HHMMSS.arb512.pkl.zst

Session Logs

Location: /mnt/dolphinng5_predict/session_logs/

Trade session logs: session_YYYYMMDD_HHMMSS.jsonl (JSON Lines) and session_YYYYMMDD.md (human-readable).

Run Logs

Location: /mnt/dolphinng5_predict/run_logs/

Engine run summaries and backtest parity logs. Key file: run_logs/summary_*.json.


4. Data Flow

                    ┌─────────────┐
                    │ NG8 Scanner │ (Linux, /mnt/dolphinng6_data/)
                    └──────┬──────┘
                           │ writes .arrow files
                           ▼
                    ┌─────────────┐
                    │ Scan Bridge │ (supervisord, dolphin group)
                    └──────┬──────┘
                           │ HZ put("latest_eigen_scan")
                           ▼
              ┌──── DOLPHIN_FEATURES (HZ) ────┐
              │                                │
    ┌─────────▼──────────┐          ┌─────────▼──────────┐
    │ nautilus_event_    │          │ clean_arch/         │
    │ trader (prod path) │          │ main.py (alt path)  │
    └─────────┬──────────┘          └────────────────────┘
              │ NDAlphaEngine.step_bar()
              │
    ┌─────────▼──────────┐
    │ ch_put("trade_     │  ← async fire-and-forget
    │ events", {...})     │
    └─────────┬──────────┘
              │
              ▼
    ┌──────────────────────┐
    │ ClickHouse (dolphin) │  ← queries, dashboards, HZ warmup
    └──────────────────────┘

Parallel writers to HZ:
  exf_fetcher    → "exf_latest"
  acb_processor  → "acb_boost"    (with CP lock)
  obf_universe   → "asset_*_ob"
  meta_health    → DOLPHIN_META_HEALTH["latest"]
  mc_forewarning → "mc_forewarner_latest"
  nautilus_trader→ DOLPHIN_STATE_BLUE["engine_snapshot"]
                  DOLPHIN_PNL_BLUE[date_str]

5. Current System State (Live Snapshot)

Metric Value
Blue capital $25,705.50
Blue ROI +5.92% (from $25,000)
Blue trades today 71 total
Posture APEX
MHS status GREEN (rm_meta=0.94)
ACB boost 1.4581 / beta=0.80
Latest scan #134276
Latest vel_div +0.0201
Scan cadence ~11s
Scan→fill latency ~1027ms
Process health All RUNNING (uptime ~22h)

Supervisord Groups

Group Services Autostart
dolphin_data exf_fetcher, acb_processor, obf_universe, meta_health, system_stats Yes
dolphin nautilus_trader, scan_bridge, clean_arch_trader, paper_portfolio, dolphin_live No (manual)