diff --git a/prod/docs/SYSTEM_BIBLE.md b/prod/docs/SYSTEM_BIBLE.md index 6111820..cf32919 100644 --- a/prod/docs/SYSTEM_BIBLE.md +++ b/prod/docs/SYSTEM_BIBLE.md @@ -1362,19 +1362,83 @@ supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.con ### 16.10 Daemon Start Sequence +**IMPORTANT**: supervisord has NO systemd unit — it is NOT auto-started on reboot. +After any reboot or OOM kill, supervisord must be started manually (step 2 below). + +```bash +# 1. Verify Hazelcast/Prefect are running (systemd-managed, survive reboots) +systemctl status dolphin-prefect-worker + +# 2. Start supervisord (MUST export DOLPHIN_LOG_ROOT — used by logfile= directives) +mkdir -p /tmp/dolphin_logs/supervisor /tmp/dolphin_logs/trader +DOLPHIN_LOG_ROOT=/tmp/dolphin_logs supervisord \ + -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf +# dolphin_data group (OBF, ACB, MHS, exf, maras, esof) starts automatically + +# 3. Verify data pipeline is up +supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status + +# 4. Start BLUE (manual — autostart=false; only start after verifying BingX position state) +supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf \ + start dolphin:nautilus_trader + +# 5. Prefect deployments run on schedule (daily): +# paper_trade_flow.py ← 00:05 UTC +# nautilus_prefect_flow ← 00:10 UTC ``` -1. docker-compose up -d ← Hazelcast 5701, ManCenter 8080, Prefect 4200 -2. supervisord (auto) ← starts dolphin_data group automatically on boot - └── exf_fetcher, acb_processor, obf_universe, meta_health start in parallel -3. (Manual when needed): - supervisorctl start dolphin:nautilus_trader ← HZ entry listener - supervisorctl start dolphin:scan_bridge ← when DolphinNG6 active +### 16.12 CRITICAL — supervisord.conf Safety Rules -4. Prefect deployments (daily, scheduled): - paper_trade_flow.py ← 00:05 UTC - nautilus_prefect_flow.py ← 00:10 UTC - mc_forewarner_flow.py ← daily +**RULE 1 — Never use /tmp paths for trader binaries.** +`/tmp` is writable but survives reboots on this host (not a tmpfs). However, directories +created by agents (e.g. `/tmp/blue_runtime_mirror/`) may be manually cleaned or never +recreated after an OOM kill, leaving supervisord unable to start the process. +**Canonical paths for all trader programs MUST reference `/mnt/dolphinng5_predict/`.** + +**RULE 2 — nautilus_trader correct config (BLUE live mainnet):** +```ini +command=/home/dolphin/siloqy_env/bin/python3 /mnt/dolphinng5_predict/prod/nautilus_event_trader.py +directory=/mnt/dolphinng5_predict/prod +environment=PYTHONPATH="/mnt/dolphinng5_predict:/mnt/dolphinng5_predict/nautilus_dolphin:/mnt/dolphinng5_predict/prod",DOLPHIN_LOCAL_RUNTIME_ROOT="/mnt/dolphinng5_predict",... +``` + +**RULE 3 — OBF starvation → BLUE freeze.** +If `obf_universe` dies and is not restarted, BLUE logs +`"OBF step_live: no snapshots for N consecutive bars — OBF gate degraded to random"`. +After ~60 bars (~10 min) without OBF the survival stack degrades to TURTLE/HIBERNATE, +blocking new ENTERs. The existing open position stays open in RETRACT. Fix: restart +supervisord (which brings OBF up), then restart `dolphin:nautilus_trader`. + +**RULE 4 — Before restarting BLUE after a gap, check for open positions.** +If BLUE was in RETRACT when it died, there MAY be an open position on live BingX mainnet. +Check `/tmp/dolphin_capital_checkpoint.json` (last capital) and `/tmp/nautilus_trader.log` +(last V7 decision) for context, but verify directly on BingX before restarting. + +### 16.13 OOM Recovery Runbook (post-reboot) + +```bash +# Confirm nothing is running +supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status 2>&1 || echo "supervisord down — need to start" + +# Check BLUE's last known state +tail -5 /tmp/nautilus_trader.log +cat /tmp/dolphin_capital_checkpoint.json + +# Restart supervisord (data pipeline only — do NOT auto-start BLUE) +mkdir -p /tmp/dolphin_logs/supervisor /tmp/dolphin_logs/trader +DOLPHIN_LOG_ROOT=/tmp/dolphin_logs supervisord \ + -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf + +# Give services 15s to reach RUNNING state +sleep 15 +supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status + +# Verify OBF is connected (look for "subscribed" in first 30 lines of log) +head -30 /tmp/dolphin_logs/supervisor/obf_universe-error.log + +# Only then, start BLUE after manually confirming BingX position state +supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf \ + start dolphin:nautilus_trader ``` ### 16.11 Monitoring Endpoints