# ClickHouse Observability Layer **Deployed:** 2026-04-06 **CH Version:** 24.3-alpine **Ports:** HTTP :8123, Native :9000 **OTel Collector:** OTLP gRPC :4317 / HTTP :4318 **Play UI:** http://100.105.170.6:8123/play --- ## Architecture ``` Dolphin services → ch_put() → ch_writer.py (async batch) → dolphin-clickhouse:8123 NG7 laptop → ng_otel_writer.py (OTel SDK) → dolphin-otelcol:4317 → dolphin-clickhouse /proc poller → system_stats_service.py → dolphin.system_stats supervisord → supervisord_ch_listener.py (eventlistener) → dolphin.supervisord_state ``` All writes are **fire-and-forget** — ch_writer batches in a background thread, drops silently on queue full. OBF hot loop (100ms) is never blocked. --- ## Tables | Table | Source | Rate | Retention | |---|---|---|---| | `eigen_scans` | nautilus_event_trader | ~8/min | 10yr | | `posture_events` | meta_health_service_v3 | few/day | forever | | `acb_state` | acb_processor_service | ~5/day | forever | | `daily_pnl` | paper_trade_flow | 1/day | forever | | `trade_events` | DolphinActor (pending) | ~40/day | 10yr | | `obf_universe` | obf_universe_service | 540/min | forever | | `obf_fast_intrade` | DolphinActor (pending) | 100ms×assets | 5yr | | `exf_data` | exf_fetcher_flow | ~1/min | forever | | `meta_health` | meta_health_service_v3 | ~1/10s | forever | | `account_events` | DolphinActor (pending) | rare | forever | | `supervisord_state` | supervisord_ch_listener | push+60s poll | forever | | `system_stats` | system_stats_service | 1/30s | forever | OTel tables (`otel_logs`, `otel_traces`, `otel_metrics_*`) auto-created by collector for NG7 instrumentation. --- ## Distributed Trace ID `scan_uuid` (UUIDv7) is the causal trace root across all tables: ``` eigen_scans.scan_uuid ← NG7 generates one per scan │ ├── obf_fast_intrade.scan_uuid (100ms OBF while in-trade) ├── trade_events.scan_uuid (entry + exit rows) └── posture_events.scan_uuid (if scan triggered posture re-eval) ``` **NG7 migration:** replace `uuid.uuid4()` with `uuid7()` from `ch_writer.py` — same String format, drop-in. --- ## Key Queries (CH Play) ```sql -- Current system state SELECT * FROM dolphin.v_current_posture; -- Scan latency last hour SELECT * FROM dolphin.v_scan_latency_1h; -- Trade summary last 30 days SELECT * FROM dolphin.v_trade_summary_30d; -- Process health SELECT * FROM dolphin.v_process_health; -- System resources (5min buckets, last hour) SELECT * FROM dolphin.v_system_stats_1h ORDER BY bucket; -- Full causal chain for a scan SELECT event_type, ts, detail, value1, value2 FROM dolphin.v_scan_causal_chain WHERE trace_id = '' ORDER BY ts; -- Scans that preceded losing trades SELECT e.scan_number, e.vel_div, t.asset, t.pnl, t.exit_reason FROM dolphin.trade_events t JOIN dolphin.eigen_scans e ON e.scan_uuid = t.scan_uuid WHERE t.pnl < 0 AND t.exit_price > 0 ORDER BY t.pnl ASC LIMIT 20; ``` --- ## Files | File | Purpose | |---|---| | `prod/ch_writer.py` | Shared singleton — `from ch_writer import ch_put, ts_us, uuid7` | | `prod/system_stats_service.py` | /proc poller, runs under supervisord:system_stats | | `prod/supervisord_ch_listener.py` | supervisord eventlistener | | `prod/ng_otel_writer.py` (on NG7) | OTel drop-in for remote machines | | `prod/clickhouse/config.xml` | CH server config (40% RAM cap, async_insert) | | `prod/clickhouse/users.xml` | dolphin user, wait_for_async_insert=0 | | `prod/otelcol/config.yaml` | OTel Collector → dolphin.otel_* | | `/root/ch-setup/schema.sql` | Full DDL — idempotent, re-runnable | --- ## Credentials - User: `dolphin` / `dolphin_ch_2026` - OTel DSN: `http://dolphin_uptrace_token@100.105.170.6:14318/1` (if Uptrace ever deployed) --- ## Pending (when DolphinActor is wired) - `trade_events` — add `ch_put("trade_events", {...})` at entry and exit - `obf_fast_intrade` — add in OBF 100ms tick (only when n_open_positions > 0) - `account_events` — STARTUP/SHUTDOWN/END_DAY hooks - `daily_pnl` — end-of-day in paper_trade_flow / nautilus_prefect_flow - See `prod/service_integration.py` for exact copy-paste snippets