124 lines
4.1 KiB
Markdown
124 lines
4.1 KiB
Markdown
|
|
# ClickHouse Observability Layer
|
|||
|
|
|
|||
|
|
**Deployed:** 2026-04-06
|
|||
|
|
**CH Version:** 24.3-alpine
|
|||
|
|
**Ports:** HTTP :8123, Native :9000
|
|||
|
|
**OTel Collector:** OTLP gRPC :4317 / HTTP :4318
|
|||
|
|
**Play UI:** http://100.105.170.6:8123/play
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Architecture
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Dolphin services → ch_put() → ch_writer.py (async batch) → dolphin-clickhouse:8123
|
|||
|
|
NG7 laptop → ng_otel_writer.py (OTel SDK) → dolphin-otelcol:4317 → dolphin-clickhouse
|
|||
|
|
/proc poller → system_stats_service.py → dolphin.system_stats
|
|||
|
|
supervisord → supervisord_ch_listener.py (eventlistener) → dolphin.supervisord_state
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
All writes are **fire-and-forget** — ch_writer batches in a background thread, drops silently on queue full. OBF hot loop (100ms) is never blocked.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Tables
|
|||
|
|
|
|||
|
|
| Table | Source | Rate | Retention |
|
|||
|
|
|---|---|---|---|
|
|||
|
|
| `eigen_scans` | nautilus_event_trader | ~8/min | 10yr |
|
|||
|
|
| `posture_events` | meta_health_service_v3 | few/day | forever |
|
|||
|
|
| `acb_state` | acb_processor_service | ~5/day | forever |
|
|||
|
|
| `daily_pnl` | paper_trade_flow | 1/day | forever |
|
|||
|
|
| `trade_events` | DolphinActor (pending) | ~40/day | 10yr |
|
|||
|
|
| `obf_universe` | obf_universe_service | 540/min | forever |
|
|||
|
|
| `obf_fast_intrade` | DolphinActor (pending) | 100ms×assets | 5yr |
|
|||
|
|
| `exf_data` | exf_fetcher_flow | ~1/min | forever |
|
|||
|
|
| `meta_health` | meta_health_service_v3 | ~1/10s | forever |
|
|||
|
|
| `account_events` | DolphinActor (pending) | rare | forever |
|
|||
|
|
| `supervisord_state` | supervisord_ch_listener | push+60s poll | forever |
|
|||
|
|
| `system_stats` | system_stats_service | 1/30s | forever |
|
|||
|
|
|
|||
|
|
OTel tables (`otel_logs`, `otel_traces`, `otel_metrics_*`) auto-created by collector for NG7 instrumentation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Distributed Trace ID
|
|||
|
|
|
|||
|
|
`scan_uuid` (UUIDv7) is the causal trace root across all tables:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
eigen_scans.scan_uuid ← NG7 generates one per scan
|
|||
|
|
│
|
|||
|
|
├── obf_fast_intrade.scan_uuid (100ms OBF while in-trade)
|
|||
|
|
├── trade_events.scan_uuid (entry + exit rows)
|
|||
|
|
└── posture_events.scan_uuid (if scan triggered posture re-eval)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**NG7 migration:** replace `uuid.uuid4()` with `uuid7()` from `ch_writer.py` — same String format, drop-in.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Key Queries (CH Play)
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- Current system state
|
|||
|
|
SELECT * FROM dolphin.v_current_posture;
|
|||
|
|
|
|||
|
|
-- Scan latency last hour
|
|||
|
|
SELECT * FROM dolphin.v_scan_latency_1h;
|
|||
|
|
|
|||
|
|
-- Trade summary last 30 days
|
|||
|
|
SELECT * FROM dolphin.v_trade_summary_30d;
|
|||
|
|
|
|||
|
|
-- Process health
|
|||
|
|
SELECT * FROM dolphin.v_process_health;
|
|||
|
|
|
|||
|
|
-- System resources (5min buckets, last hour)
|
|||
|
|
SELECT * FROM dolphin.v_system_stats_1h ORDER BY bucket;
|
|||
|
|
|
|||
|
|
-- Full causal chain for a scan
|
|||
|
|
SELECT event_type, ts, detail, value1, value2
|
|||
|
|
FROM dolphin.v_scan_causal_chain
|
|||
|
|
WHERE trace_id = '<scan_uuid>'
|
|||
|
|
ORDER BY ts;
|
|||
|
|
|
|||
|
|
-- Scans that preceded losing trades
|
|||
|
|
SELECT e.scan_number, e.vel_div, t.asset, t.pnl, t.exit_reason
|
|||
|
|
FROM dolphin.trade_events t
|
|||
|
|
JOIN dolphin.eigen_scans e ON e.scan_uuid = t.scan_uuid
|
|||
|
|
WHERE t.pnl < 0 AND t.exit_price > 0
|
|||
|
|
ORDER BY t.pnl ASC LIMIT 20;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Files
|
|||
|
|
|
|||
|
|
| File | Purpose |
|
|||
|
|
|---|---|
|
|||
|
|
| `prod/ch_writer.py` | Shared singleton — `from ch_writer import ch_put, ts_us, uuid7` |
|
|||
|
|
| `prod/system_stats_service.py` | /proc poller, runs under supervisord:system_stats |
|
|||
|
|
| `prod/supervisord_ch_listener.py` | supervisord eventlistener |
|
|||
|
|
| `prod/ng_otel_writer.py` (on NG7) | OTel drop-in for remote machines |
|
|||
|
|
| `prod/clickhouse/config.xml` | CH server config (40% RAM cap, async_insert) |
|
|||
|
|
| `prod/clickhouse/users.xml` | dolphin user, wait_for_async_insert=0 |
|
|||
|
|
| `prod/otelcol/config.yaml` | OTel Collector → dolphin.otel_* |
|
|||
|
|
| `/root/ch-setup/schema.sql` | Full DDL — idempotent, re-runnable |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Credentials
|
|||
|
|
|
|||
|
|
- User: `dolphin` / `dolphin_ch_2026`
|
|||
|
|
- OTel DSN: `http://dolphin_uptrace_token@100.105.170.6:14318/1` (if Uptrace ever deployed)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Pending (when DolphinActor is wired)
|
|||
|
|
|
|||
|
|
- `trade_events` — add `ch_put("trade_events", {...})` at entry and exit
|
|||
|
|
- `obf_fast_intrade` — add in OBF 100ms tick (only when n_open_positions > 0)
|
|||
|
|
- `account_events` — STARTUP/SHUTDOWN/END_DAY hooks
|
|||
|
|
- `daily_pnl` — end-of-day in paper_trade_flow / nautilus_prefect_flow
|
|||
|
|
- See `prod/service_integration.py` for exact copy-paste snippets
|