Files
DOLPHIN/prod/docs/CLICKHOUSE_OBSERVABILITY.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

4.1 KiB
Executable File
Raw Blame History

ClickHouse Observability Layer

Deployed: 2026-04-06
CH Version: 24.3-alpine
Ports: HTTP :8123, Native :9000
OTel Collector: OTLP gRPC :4317 / HTTP :4318
Play UI: http://100.105.170.6:8123/play


Architecture

Dolphin services → ch_put() → ch_writer.py (async batch) → dolphin-clickhouse:8123
NG7 laptop       → ng_otel_writer.py (OTel SDK) → dolphin-otelcol:4317 → dolphin-clickhouse
/proc poller     → system_stats_service.py → dolphin.system_stats
supervisord      → supervisord_ch_listener.py (eventlistener) → dolphin.supervisord_state

All writes are fire-and-forget — ch_writer batches in a background thread, drops silently on queue full. OBF hot loop (100ms) is never blocked.


Tables

Table Source Rate Retention
eigen_scans nautilus_event_trader ~8/min 10yr
posture_events meta_health_service_v3 few/day forever
acb_state acb_processor_service ~5/day forever
daily_pnl paper_trade_flow 1/day forever
trade_events DolphinActor (pending) ~40/day 10yr
obf_universe obf_universe_service 540/min forever
obf_fast_intrade DolphinActor (pending) 100ms×assets 5yr
exf_data exf_fetcher_flow ~1/min forever
meta_health meta_health_service_v3 ~1/10s forever
account_events DolphinActor (pending) rare forever
supervisord_state supervisord_ch_listener push+60s poll forever
system_stats system_stats_service 1/30s forever

OTel tables (otel_logs, otel_traces, otel_metrics_*) auto-created by collector for NG7 instrumentation.


Distributed Trace ID

scan_uuid (UUIDv7) is the causal trace root across all tables:

eigen_scans.scan_uuid  ←  NG7 generates one per scan
       │
       ├── obf_fast_intrade.scan_uuid  (100ms OBF while in-trade)
       ├── trade_events.scan_uuid      (entry + exit rows)
       └── posture_events.scan_uuid    (if scan triggered posture re-eval)

NG7 migration: replace uuid.uuid4() with uuid7() from ch_writer.py — same String format, drop-in.


Key Queries (CH Play)

-- Current system state
SELECT * FROM dolphin.v_current_posture;

-- Scan latency last hour
SELECT * FROM dolphin.v_scan_latency_1h;

-- Trade summary last 30 days
SELECT * FROM dolphin.v_trade_summary_30d;

-- Process health
SELECT * FROM dolphin.v_process_health;

-- System resources (5min buckets, last hour)
SELECT * FROM dolphin.v_system_stats_1h ORDER BY bucket;

-- Full causal chain for a scan
SELECT event_type, ts, detail, value1, value2
FROM dolphin.v_scan_causal_chain
WHERE trace_id = '<scan_uuid>'
ORDER BY ts;

-- Scans that preceded losing trades
SELECT e.scan_number, e.vel_div, t.asset, t.pnl, t.exit_reason
FROM dolphin.trade_events t
JOIN dolphin.eigen_scans e ON e.scan_uuid = t.scan_uuid
WHERE t.pnl < 0 AND t.exit_price > 0
ORDER BY t.pnl ASC LIMIT 20;

Files

File Purpose
prod/ch_writer.py Shared singleton — from ch_writer import ch_put, ts_us, uuid7
prod/system_stats_service.py /proc poller, runs under supervisord:system_stats
prod/supervisord_ch_listener.py supervisord eventlistener
prod/ng_otel_writer.py (on NG7) OTel drop-in for remote machines
prod/clickhouse/config.xml CH server config (40% RAM cap, async_insert)
prod/clickhouse/users.xml dolphin user, wait_for_async_insert=0
prod/otelcol/config.yaml OTel Collector → dolphin.otel_*
/root/ch-setup/schema.sql Full DDL — idempotent, re-runnable

Credentials

  • User: dolphin / dolphin_ch_2026
  • OTel DSN: http://dolphin_uptrace_token@100.105.170.6:14318/1 (if Uptrace ever deployed)

Pending (when DolphinActor is wired)

  • trade_events — add ch_put("trade_events", {...}) at entry and exit
  • obf_fast_intrade — add in OBF 100ms tick (only when n_open_positions > 0)
  • account_events — STARTUP/SHUTDOWN/END_DAY hooks
  • daily_pnl — end-of-day in paper_trade_flow / nautilus_prefect_flow
  • See prod/service_integration.py for exact copy-paste snippets