Files
DOLPHIN/prod/docs/EXTF_PROD_BRINGUP.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

24 KiB
Executable File

DOLPHIN Paper Trading — Production Bringup Guide

Purpose: Step-by-step ops guide for standing up the Prefect + Hazelcast paper trading stack. Audience: Operations agent or junior dev. No research decisions required. State as of: 2026-03-06 Assumes: Windows 11, Docker Desktop installed, Siloqy venv exists at C:\Users\Lenovo\Documents\- Siloqy\


Architecture Overview

[ARB512 Scanner] ─► eigenvalues/YYYY-MM-DD/ ─► [paper_trade_flow.py]
                                                        |
                                           [NDAlphaEngine (Python)]
                                                        |
                                         ┌──────────────┴──────────────┐
                                    [Hazelcast IMap]            [paper_logs/*.jsonl]
                                         |
                                   [Prefect UI :4200]
                                   [HZ-MC UI   :8080]

Components:

  • docker-compose.yml: Hazelcast 5.3 (port 5701) + HZ Management Center (port 8080) + Prefect Server (port 4200)
  • paper_trade_flow.py: Prefect flow, runs daily at 00:05 UTC
  • configs/blue.yml: Champion SHORT config (frozen, production)
  • configs/green.yml: Bidirectional config (STATUS: PENDING — LONG validation still in progress)
  • Python venv: C:\Users\Lenovo\Documents\- Siloqy\

Data flow: Prefect triggers daily → reads yesterday's Arrow/NPZ scans from eigenvalues dir → NDAlphaEngine processes → writes P&L to Hazelcast IMap + local JSONL log.


Step 1: Prerequisites Check

Open a terminal (Git Bash or PowerShell).

# 1a. Verify Docker Desktop is installed
docker --version
# Expected: Docker version 29.x.x

# 1b. Verify Python venv
"/c/Users/Lenovo/Documents/- Siloqy/Scripts/python.exe" --version
# Expected: Python 3.11.x or 3.12.x

# 1c. Verify working directories exist
ls "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod/"
# Expected: configs/  docker-compose.yml  paper_trade_flow.py  BRINGUP_GUIDE.md

ls "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod/configs/"
# Expected: blue.yml  green.yml

Step 2: Install Python Dependencies

Run once. Takes ~2-5 minutes.

"/c/Users/Lenovo/Documents/- Siloqy/Scripts/pip.exe" install \
    hazelcast-python-client \
    prefect \
    pyyaml \
    pyarrow \
    numpy \
    pandas

Verify:

"/c/Users/Lenovo/Documents/- Siloqy/Scripts/python.exe" -c "import hazelcast; import prefect; import yaml; print('OK')"

Step 3: Start Docker Desktop

Docker Desktop must be running before starting containers.

Option A (GUI): Double-click Docker Desktop from Start menu. Wait for the whale icon in the system tray to stop animating (~30-60 seconds).

Option B (command):

Start-Process "C:\Program Files\Docker\Docker\Docker Desktop.exe"
# Wait ~60 seconds, then verify:
docker ps

Verify Docker is ready:

docker info | grep "Server Version"
# Expected: Server Version: 27.x.x

Step 4: Start the Infrastructure Stack

cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod"
docker compose up -d

Expected output:

[+] Running 3/3
 - Container dolphin-hazelcast     Started
 - Container dolphin-hazelcast-mc  Started
 - Container dolphin-prefect       Started

Verify all containers healthy:

docker compose ps
# All 3 should show "healthy" or "running"

Wait ~30 seconds for Hazelcast to initialize, then verify:

curl http://localhost:5701/hazelcast/health/ready
# Expected: {"message":"Hazelcast is ready!"}

curl http://localhost:4200/api/health
# Expected: {"status":"healthy"}

UIs:


Step 5: Register Prefect Deployments

Run once to register the blue and green scheduled deployments.

cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod"
"/c/Users/Lenovo/Documents/- Siloqy/Scripts/python.exe" paper_trade_flow.py --register

Expected output:

Registered: dolphin-paper-blue
Registered: dolphin-paper-green

Verify in Prefect UI: http://localhost:4200 → Deployments → should show 2 deployments with CronSchedule "5 0 * * *".


Step 6: Start the Prefect Worker

The Prefect worker polls for scheduled runs. Run in a separate terminal (keep it open, or run as a service).

"/c/Users/Lenovo/Documents/- Siloqy/Scripts/prefect.exe" worker start --pool "dolphin"

OR (if prefect CLI not in PATH):

"/c/Users/Lenovo/Documents/- Siloqy/Scripts/python.exe" -m prefect worker start --pool "dolphin"

Leave this terminal running. It will pick up the 00:05 UTC scheduled runs.


Step 7: Manual Test Run

Before relying on the schedule, test with a known good date (a date that has scan data).

cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod"
"/c/Users/Lenovo/Documents/- Siloqy/Scripts/python.exe" paper_trade_flow.py \
    --date 2026-03-05 \
    --config configs/blue.yml

Expected output (abbreviated):

=== BLUE paper trade: 2026-03-05 ===
  Loaded N scans for 2026-03-05 | cols=XX
  2026-03-05: PnL=+XX.XX  T=X  boost=1.XXx  MC=OK
  HZ write OK → DOLPHIN_PNL_BLUE[2026-03-05]
=== DONE: blue 2026-03-05 | PnL=+XX.XX | Capital=25,XXX.XX ===

Verify data written to Hazelcast:

Verify log file written:

ls "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod/paper_logs/blue/"
cat "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod/paper_logs/blue/paper_pnl_2026-03.jsonl"

Step 8: Scan Data Source Verification

The flow reads scan files from:

C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues\YYYY-MM-DD\

Each date directory should contain scan_*__Indicators.npz or scan_*.arrow files.

ls "/c/Users/Lenovo/Documents/- Dolphin NG HD (NG3)/correlation_arb512/eigenvalues/" | tail -5
# Expected: recent date directories like 2026-03-05, 2026-03-04, etc.

ls "/c/Users/Lenovo/Documents/- Dolphin NG HD (NG3)/correlation_arb512/eigenvalues/2026-03-05/"
# Expected: scan_NNNN__Indicators.npz files

If a date directory is missing, the flow logs a warning and writes pnl=0 for that day (non-critical).


Step 9: Daily Operations

Normal daily flow (automated):

  1. ARB512 scanner (extended_main.py) writes scans to eigenvalues/YYYY-MM-DD/ throughout the day
  2. At 00:05 UTC, Prefect triggers dolphin-paper-blue and dolphin-paper-green
  3. Each flow reads yesterday's scans, runs the engine, writes to HZ + JSONL log
  4. Monitor via Prefect UI and HZ-MC

Check today's run result:

# Latest P&L log entry:
tail -1 "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod/paper_logs/blue/paper_pnl_$(date +%Y-%m).jsonl"

Check HZ state:

  • http://localhost:8080 → Maps → DOLPHIN_STATE_BLUE → key "latest"
  • Should show: {"capital": XXXXX, "strategy": "blue", "last_date": "YYYY-MM-DD", ...}

Step 10: Restart After Reboot

After Windows restarts:

# 1. Start Docker Desktop (GUI or command — see Step 3)

# 2. Restart containers
cd "/c/Users/Lenovo/Documents/- DOLPHIN NG HD HCM TSF Predict/prod"
docker compose up -d

# 3. Restart Prefect worker (in a dedicated terminal)
"/c/Users/Lenovo/Documents/- Siloqy/Scripts/python.exe" -m prefect worker start --pool "dolphin"

Deployments and HZ data persist (docker volumes: hz_data, prefect_data).


Troubleshooting

"No scan dir for YYYY-MM-DD"

  • The ARB512 scanner may not have run for that date
  • Check: ls "C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues\YYYY-MM-DD\"
  • Non-critical: flow logs pnl=0 and continues

"HZ write failed (not critical)"

  • Hazelcast container not running or not yet healthy
  • Run: docker compose ps → check dolphin-hazelcast shows "healthy"
  • Run: docker compose restart hazelcast

"ModuleNotFoundError: No module named 'hazelcast'"

  • Dependencies not installed in Siloqy venv
  • Rerun Step 2

"error during connect: open //./pipe/dockerDesktopLinuxEngine"

  • Docker Desktop not running
  • Start Docker Desktop (see Step 3), wait 60 seconds, retry

Prefect worker not picking up runs

  • Verify worker is running with --pool "dolphin" (matches work_queue_name in deployments)
  • Check Prefect UI → Work Pools → should show "dolphin" pool as online

Green deployment errors on bidirectional config

  • Green is PENDING LONG validation. If direction: bidirectional causes engine errors, temporarily set green.yml direction: short_only until LONG system is validated.

Key File Locations

File Path
Prefect flow prod/paper_trade_flow.py
Blue config prod/configs/blue.yml
Green config prod/configs/green.yml
Docker stack prod/docker-compose.yml
Blue P&L logs prod/paper_logs/blue/paper_pnl_YYYY-MM.jsonl
Green P&L logs prod/paper_logs/green/paper_pnl_YYYY-MM.jsonl
Scan data source C:\Users\Lenovo\Documents\- Dolphin NG HD (NG3)\correlation_arb512\eigenvalues\
NDAlphaEngine HCM\nautilus_dolphin\nautilus_dolphin\nautilus\esf_alpha_orchestrator.py
MC-Forewarner models HCM\nautilus_dolphin\mc_results\models\

Current Status (2026-03-06)

Item Status
Docker stack Built — needs Docker Desktop running
Python deps (HZ + Prefect) Installing (pip background job)
Blue config Frozen champion SHORT — ready
Green config PENDING — LONG validation running (b79rt78uv)
Prefect deployments Not yet registered (run Step 5 after deps install)
Manual test run Not yet done (run Step 7)
vol_p60 calibration Hardcoded 0.000099 (pre-calibrated from 55-day window) — acceptable
Engine state persistence Implemented — engine capital and open positions serialize to Hazelcast STATE IMap

Engine State Persistence

The NDAlphaEngine is instantiated fresh during each daily Prefect run, but its internal state is loaded from the Hazelcast DOLPHIN_STATE_BLUE/GREEN maps. Both capital and any active position spanning midnight are accurately tracked and restored.

Impact for paper trading: P&L and cumulative capital growth track correctly across days.


Guide written 2026-03-08. Status updated.


Appendix D: Live Operations Monitoring — DEV "Realized Slippage"

Purpose: Track whether ExF latency (~10ms) is causing unacceptable fill slippage vs backtest assumptions.

Background

  • Backtest friction assumptions: 8-10 bps round-trip (2bps entry + 2bps exit + fees)
  • ExF latency-induced drift: ~0.055 bps (normal vol), ~0.17 bps (high vol events)
  • Current Python implementation is sufficient (latency << friction assumptions)

Metric Definition

realized_slippage_bps = abs(fill_price - signal_price) / signal_price * 10000

Monitoring Thresholds

Threshold Action
< 2 bps Nominal — within backtest assumptions
2-5 bps ⚠️ Watch — approaching friction limits
> 5 bps 🚨 ALERT — investigate latency/market impact issues

Implementation Notes

  • Log signal_price (price at signal generation) vs fill_price (actual execution)
  • Track per-trade slippage in paper_logs
  • Alert if 24h moving average exceeds 5 bps
  • If consistently > 5 bps → escalate to Java/Chronicle Queue port for <100μs latency

TODO

  • Add slippage tracking to paper_trade_flow.py trade logging
  • Create Grafana/Prefect alert for slippage > 5 bps
  • Document slippage post-trade analysis pipeline

Last updated: 2026-03-17


Appendix E: External Factors (ExF) System v2.0

Date: 2026-03-17
Purpose: Complete production guide for the External Factors real-time data pipeline
Components: exf_fetcher_flow.py, exf_persistence.py, exf_integrity_monitor.py

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         EXTERNAL FACTORS SYSTEM v2.0                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────────┐     ┌──────────────────┐     ┌──────────────────┐   │
│  │  Data Providers  │     │  Data Providers  │     │  Data Providers  │   │
│  │  (Binance)       │     │  (Deribit)       │     │  (FRED/Macro)    │   │
│  │  - funding_btc   │     │  - dvol_btc      │     │  - vix           │   │
│  │  - basis         │     │  - dvol_eth      │     │  - dxy           │   │
│  │  - spread        │     │  - fund_dbt_btc  │     │  - us10y         │   │
│  │  - imbal_*       │     │                  │     │                  │   │
│  └────────┬─────────┘     └────────┬─────────┘     └────────┬─────────┘   │
│           │                        │                        │             │
│           └────────────────────────┼────────────────────────┘             │
│                                    ▼                                      │
│  ┌──────────────────────────────────────────────────────────────────┐    │
│  │           RealTimeExFService (28 indicators)                     │    │
│  │  - Per-indicator async polling at native rate                    │    │
│  │  - Rate limiting per provider (Binance 20/s, FRED 2/s, etc)      │    │
│  │  - In-memory cache with <1ms read latency                        │    │
│  │  - Daily history rotation for lag support                        │    │
│  └────────────────────────────────┬─────────────────────────────────┘    │
│                                   │                                       │
│           ┌───────────────────────┼───────────────────────┐              │
│           │                       │                       │              │
│           ▼                       ▼                       ▼              │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐        │
│  │  HOT PATH       │   │  OFF HOT PATH   │   │  MONITORING     │        │
│  │  (0.5s interval)│   │  (5 min interval│   │  (60s interval) │        │
│  │                 │   │                 │   │                 │        │
│  │ Hazelcast       │   │ Disk Persistence│   │ Integrity Check │        │
│  │ DOLPHIN_FEATURES│   │ NPZ Format      │   │ HZ vs Disk      │        │
│  │ ['exf_latest']  │   │ /mnt/ng6_data/  │   │ Staleness Check │        │
│  │                 │   │ eigenvalues/    │   │ ACB Validation  │        │
│  │ Instant access  │   │ Durability      │   │ Alert on drift  │        │
│  │ for Alpha Engine│   │ for Backtests   │   │                 │        │
│  └─────────────────┘   └─────────────────┘   └─────────────────┘        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Component Reference

Component File Purpose Update Rate
RealTimeExFService realtime_exf_service.py Fetches 28 indicators from 8 providers Per-indicator native rate
ExF Fetcher Flow exf_fetcher_flow.py Prefect flow orchestrating HZ push 0.5s (500ms)
ExF Persistence exf_persistence.py Disk writer (NPZ format) 5 minutes
ExF Integrity Monitor exf_integrity_monitor.py Data validation & alerts 60 seconds

Indicators (28 Total)

Category Indicators Count
Binance Derivatives funding_btc, funding_eth, oi_btc, oi_eth, ls_btc, ls_eth, ls_top, taker, basis 9
Microstructure imbal_btc, imbal_eth, spread 3
Deribit dvol_btc, dvol_eth, fund_dbt_btc, fund_dbt_eth 4
Macro (FRED) vix, dxy, us10y, sp500, fedfunds 5
Sentiment fng 1
On-chain hashrate 1
DeFi tvl 1
Liquidations liq_vol_24h, liq_long_ratio, liq_z_score, liq_percentile 4

ACB-Critical Indicators (9 Required for _acb_ready=True)

These indicators MUST be present and fresh for the Adaptive Circuit Breaker to function:

ACB_KEYS = [
    "funding_btc", "funding_eth",   # Binance funding rates
    "dvol_btc", "dvol_eth",        # Deribit volatility indices
    "fng",                          # Fear & Greed
    "vix",                          # VIX (market fear)
    "ls_btc",                       # Long/Short ratio
    "taker",                        # Taker buy/sell ratio
    "oi_btc",                       # Open interest
]

Data Flow

  1. Fetch: RealTimeExFService polls each provider at native rate
  2. Cache: Values stored in memory with staleness tracking
  3. HZ Push (every 0.5s): Hot path to Hazelcast for Alpha Engine
  4. Persistence (every 5min): Background flush to NPZ on disk
  5. Integrity Check (every 60s): Validate HZ vs disk consistency

File Locations (Linux)

Data Type Path
Persistence root /mnt/ng6_data/eigenvalues/
Daily directory /mnt/ng6_data/eigenvalues/{YYYY-MM-DD}/
ExF snapshots /mnt/ng6_data/eigenvalues/{YYYY-MM-DD}/extf_snapshot_{timestamp}__Indicators.npz
Checksum files /mnt/ng6_data/eigenvalues/{YYYY-MM-DD}/extf_snapshot_{timestamp}__Indicators.npz.sha256

NPZ File Format

{
    # Metadata (JSON string in _metadata array)
    "_metadata": json.dumps({
        "_timestamp_utc": "2026-03-17T12:00:00+00:00",
        "_version": "1.0",
        "_service": "ExFPersistence",
        "_staleness_s": json.dumps({"basis": 0.2, "funding_btc": 3260.0, ...}),
    }),
    
    # Numeric indicators (each as float64 array)
    "basis": np.array([0.01178]),
    "spread": np.array([0.00143]),
    "funding_btc": np.array([7.53e-06]),
    "vix": np.array([24.06]),
    ...
}

Running the ExF System

Option 1: Standalone (Development/Testing)

cd /root/extf_docs

# Test mode (no persistence, no monitoring)
python exf_fetcher_flow.py --no-persist --no-monitor --warmup 15

# With persistence (production)
python exf_fetcher_flow.py --warmup 30

# Run integration tests
python test_exf_integration.py --duration 30 --test all

Option 2: Prefect Deployment (Production)

# Deploy to Prefect
cd /mnt/dolphinng5_predict/prod
prefect deployment build exf_fetcher_flow.py:exf_fetcher_flow \
  --name "exf-live" \
  --pool dolphin \
  --cron "*/5 * * * *"  # Or run continuously

# Start worker
prefect worker start --pool dolphin

Monitoring & Alerting

Health Status

The integrity monitor exposes health status via get_health_status():

{
    "timestamp": "2026-03-17T12:00:00+00:00",
    "overall": "healthy",  # healthy | degraded | critical
    "hz_connected": True,
    "persist_connected": True,
    "indicators_present": 28,
    "indicators_expected": 28,
    "acb_ready": True,
    "stale_count": 2,
    "alerts_active": 0,
}

Alert Thresholds

Condition Severity Action
ACB-critical indicator missing CRITICAL Alpha engine may fail
Hazelcast disconnected CRITICAL Real-time data unavailable
Indicator stale > 120s WARNING Check provider API
HZ/disk divergence > 3 indicators WARNING Investigate sync issue
Overall health = degraded WARNING Monitor closely
Overall health = critical CRITICAL Page on-call engineer

Troubleshooting

Issue: _acb_ready=False

Symptoms: Health check shows acb_ready: False
Diagnosis: One or more ACRITICAL indicators missing

# Check which indicators are missing
python3 << 'EOF'
import hazelcast, json
client = hazelcast.HazelcastClient(cluster_name='dolphin', cluster_members=['localhost:5701'])
data = json.loads(client.get_map("DOLPHIN_FEATURES").get("exf_latest").result())
acb_keys = ["funding_btc", "funding_eth", "dvol_btc", "dvol_eth", "fng", "vix", "ls_btc", "taker", "oi_btc"]
missing = [k for k in acb_keys if k not in data or data[k] != data[k]]  # NaN check
print(f"Missing ACB indicators: {missing}")
print(f"Present: {[k for k in acb_keys if k not in missing]}")
client.shutdown()
EOF

Common Causes:

  • Deribit API down (dvol_btc, dvol_eth)
  • Alternative.me API down (fng)
  • FRED API key expired (vix)

Fix: Check provider status, verify API keys in realtime_exf_service.py


Issue: No disk persistence

Symptoms: files_written: 0 in persistence stats
Diagnosis:

# Check mount
ls -la /mnt/ng6_data/eigenvalues/

# Check permissions
touch /mnt/ng6_data/eigenvalues/write_test && rm /mnt/ng6_data/eigenvalues/write_test

# Check disk space
df -h /mnt/ng6_data/

Fix:

# Remount if needed
sudo mount -t cifs //100.119.158.61/DolphinNG6_Data /mnt/ng6_data -o credentials=/root/.dolphin_creds

Issue: High staleness

Symptoms: Staleness > 120s for critical indicators
Diagnosis:

# Check fetcher process
ps aux | grep exf_fetcher

# Check logs
journalctl -u exf-fetcher -n 100

# Manual fetch test
curl -s "https://fapi.binance.com/fapi/v1/premiumIndex?symbol=BTCUSDT" | head -c 200
curl -s "https://www.deribit.com/api/v2/public/get_volatility_index_data?currency=BTC&resolution=3600&count=1" | head -c 200

Fix: Restart fetcher, check network connectivity, verify API rate limits not exceeded

TODO (Future Enhancements)

  • Expand indicators: Add 50+ additional indicators from CoinMetrics, Glassnode, etc.
  • Fix dead indicators: Repair broken parsers (see DEAD_INDICATORS in service)
  • Adaptive lag: Switch from uniform lag=1 to per-indicator optimal lags (needs 80+ days data)
  • Intra-day ACB: Move from daily to continuous ACB calculation
  • Arrow format: Dual output NPZ + Arrow for better performance
  • Redundancy: Multiple provider failover for critical indicators

Data Retention

Data Type Retention Cleanup
Hazelcast cache Real-time only (no history) N/A
Disk snapshots (NPZ) 7 days Automatic
Logs 30 days Manual/Logrotate
Backfill data Permanent Never

Last updated: 2026-03-17