Files
DOLPHIN/prod/docs/OPERATIONAL_STATUS.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

4.8 KiB
Executable File

Operational Status - NG7 Live

Last Updated: 2026-03-25 05:35 UTC
Status: FULLY OPERATIONAL


Current State

Component Status Details
NG7 (Windows) LIVE Writing directly to Hz over Tailscale
Hz Server HEALTHY Receiving scans ~5s interval
Nautilus Trader RUNNING Processing scans, 0 lag
Scan Bridge RUNNING Legacy backup (unused)

Recent Changes

1. NG7 Direct Hz Write (Primary)

  • Before: Arrow → SMB → Scan Bridge → Hz (~5-60s lag)
  • After: NG7 → Hz direct (~67ms network + ~55ms processing)
  • Result: 400-500x faster, real-time sync

2. Supervisord Migration

  • Migrated nautilus_trader and scan_bridge from systemd to supervisord
  • Config: /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf
  • Status: supervisorctl -c ... status

3. Bug Fix: file_mtime

  • Issue: Nautilus dedup failed (missing file_mtime field)
  • Fix: Added NG7 compatibility fallback using timestamp
  • Location: nautilus_event_trader.py line ~320

Test Results

Latency Benchmark

Network (Tailscale):  ~67ms  (52% of total)
Engine processing:    ~55ms  (42% of total)
Total end-to-end:     ~130ms
Sync quality:         0 lag (100% in-sync)

Scan Statistics (Current)

Hz latest scan:     #1803
Engine last scan:   #1803
Scans processed:    1674
Bar index:          1613
Capital:            $25,000
Posture:            APEX

Integrity Checks

  • NG7 metadata present
  • Eigenvalue tracking active
  • Pricing data (50 symbols)
  • Multi-window results
  • Byte-for-byte Hz/disk congruence

Architecture

NG7 (Windows) ──Tailscale──→ Hz (Linux) ──→ Nautilus
         │                        │
         └────Disk (backup)───────┘

Bottleneck: Network RTT (~67ms) - physics limited, optimal.


Commands

# Status
supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status

# Hz check
python3 -c "import hazelcast; c=HazelcastClient(cluster_name='dolphin',cluster_members=['localhost:5701']); print(json.loads(c.get_map('DOLPHIN_FEATURES').get('latest_eigen_scan').result()))"

# Logs
tail -50 /mnt/dolphinng5_predict/prod/supervisor/logs/nautilus_trader.log

Notes

  • Network latency (~67ms) is the dominant factor - expected for EU→Sweden
  • Engine processing (~55ms) is secondary
  • 0 scan lag = optimal sync achieved
  • MHS disabled to prevent restart loops

System Recovery - 2026-03-26 08:00 UTC

Issue: System extremely sluggish, terminal locked, load average 16.6+

Root Causes

Issue Details
Zombie Process Storm 12,385 zombie timeout processes from Hazelcast healthcheck
Hung CIFS Mounts DolphinNG6 shares (3 mounts) unresponsive from 100.119.158.61
Stuck Process grep -ri scanning /mnt in D-state for 24+ hours
I/O Wait 38% wait time from blocked SMB operations

Actions Taken

  1. Killed stuck processes:

    • grep -ri (PID 101907) - unlocked terminal
    • meta_health_daemon_v2.py (PID 224047) - D-state cleared
    • Stuck ls processes on CIFS mounts
  2. Cleared zombie processes:

    • Killed Hazelcast parent (PID 2049)
    • Lazy unmounted 3 hung CIFS shares
    • Zombie count: 12,385 → 3
  3. Fixed Hazelcast zombie leak:

    • Added init: true to docker-compose.yml
    • Recreated container with tini init system
    • Healthcheck timeout processes now properly reaped

Results

Metric Before After
Load Average 16.6+ 2.72
Zombie Processes 12,385 3 (stable)
I/O Wait 38% 0%
Total Tasks 12,682 352
System Response Timeout <100ms

Docker Compose Fix

# /mnt/dolphinng5_predict/prod/docker-compose.yml
services:
  hazelcast:
    image: hazelcast/hazelcast:5.3
    init: true  # Added: enables proper zombie reaping
    # ... rest of config

Current Status

Component Status Notes
Hazelcast Healthy Init: true, zombie reaping working
Hz Management Center Up 36h Stable
Prefect Server Up 36h Stable
CIFS Mounts ⚠️ Partial Only DolphinNG5_Predict mounted
System Performance Normal Responsive, low latency

CIFS Mount Status

# Currently mounted:
//100.119.158.61/DolphinNG5_Predict on /mnt/dolphinng5_predict

# Unmounted (server unresponsive):
//100.119.158.61/DolphinNG6
//100.119.158.61/DolphinNG6_Data
//100.119.158.61/DolphinNG6_Data_New
//100.119.158.61/Vids

Note: DolphinNG6 server at 100.119.158.61 is unresponsive for new mount attempts. DolphinNG5_Predict remains operational.


Last Updated: 2026-03-26 08:15 UTC
Status: OPERATIONAL (post-recovery)