Files
DOLPHIN/prod/OPERATIONAL_STATUS.md
hjnormey 01c19662cb initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems:
- prod/ (BLUE harness, configs, scripts, docs)
- nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved)
- adaptive_exit/ (AEM engine + models/bucket_assignments.pkl)
- Observability/ (EsoF advisor, TUI, dashboards)
- external_factors/ (EsoF producer)
- mc_forewarning_qlabs_fork/ (MC regime/envelope)

Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00

255 lines
7.3 KiB
Markdown
Executable File
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Operational Status - NG7 Live
**Last Updated:** 2026-03-25 05:35 UTC
**Status:** ✅ FULLY OPERATIONAL
---
## Current State
| Component | Status | Details |
|-----------|--------|---------|
| NG7 (Windows) | ✅ LIVE | Writing directly to Hz over Tailscale |
| Hz Server | ✅ HEALTHY | Receiving scans ~5s interval |
| Nautilus Trader | ✅ RUNNING | Processing scans, 0 lag |
| Scan Bridge | ✅ RUNNING | Legacy backup (unused) |
---
## Recent Changes
### 1. NG7 Direct Hz Write (Primary)
- **Before:** Arrow → SMB → Scan Bridge → Hz (~5-60s lag)
- **After:** NG7 → Hz direct (~67ms network + ~55ms processing)
- **Result:** 400-500x faster, real-time sync
### 2. Supervisord Migration
- Migrated `nautilus_trader` and `scan_bridge` from systemd to supervisord
- Config: `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf`
- Status: `supervisorctl -c ... status`
### 3. Bug Fix: file_mtime
- **Issue:** Nautilus dedup failed (missing `file_mtime` field)
- **Fix:** Added NG7 compatibility fallback using `timestamp`
- **Location:** `nautilus_event_trader.py` line ~320
---
## Test Results
### Latency Benchmark
```
Network (Tailscale): ~67ms (52% of total)
Engine processing: ~55ms (42% of total)
Total end-to-end: ~130ms
Sync quality: 0 lag (100% in-sync)
```
### Scan Statistics (Current)
```
Hz latest scan: #1803 (At writeup. 56K now).-
Engine last scan: #1803 (Closer to 56K).-
Scans processed: 1674 (Closer if not equal to Engine/Hz last)
Bar index: 1613
Capital: $25,000 (26K after last tests).-
Posture: APEX
```
### Integrity Checks
- ✅ NG7 metadata present
- ✅ Eigenvalue tracking active
- ✅ Pricing data (50 symbols)
- ✅ Multi-window results
- ✅ Byte-for-byte Hz/disk congruence
---
## Architecture
```
NG7 (Windows) ──Tailscale──→ Hz (Linux) ──→ Nautilus
│ │
└────Disk (backup)───────┘
```
**Bottleneck:** Network RTT (~67ms) - physics limited, optimal.
---
## Commands
```bash
# Status
supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status
# Hz check
python3 -c "import hazelcast; c=HazelcastClient(cluster_name='dolphin',cluster_members=['localhost:5701']); print(json.loads(c.get_map('DOLPHIN_FEATURES').get('latest_eigen_scan').result()))"
# Logs
tail -50 /mnt/dolphinng5_predict/prod/supervisor/logs/nautilus_trader.log
```
---
## Notes
- Network latency (~67ms) is the dominant factor - expected for EU→Sweden
- Engine processing (~55ms) is secondary
- 0 scan lag = optimal sync achieved
- MHS disabled to prevent restart loops
---
## System Recovery - 2026-03-26 08:00 UTC
**Issue:** System extremely sluggish, terminal locked, load average 16.6+
### Root Causes
| Issue | Details |
|-------|---------|
| Zombie Process Storm | 12,385 zombie `timeout` processes from Hazelcast healthcheck |
| Hung CIFS Mounts | DolphinNG6 shares (3 mounts) unresponsive from `100.119.158.61` |
| Stuck Process | `grep -ri` scanning `/mnt` in D-state for 24+ hours |
| I/O Wait | 38% wait time from blocked SMB operations |
### Actions Taken
1. **Killed stuck processes:**
- `grep -ri` (PID 101907) - unlocked terminal
- `meta_health_daemon_v2.py` (PID 224047) - D-state cleared
- Stuck `ls` processes on CIFS mounts
2. **Cleared zombie processes:**
- Killed Hazelcast parent (PID 2049)
- Lazy unmounted 3 hung CIFS shares
- Zombie count: 12,385 → 3
3. **Fixed Hazelcast zombie leak:**
- Added `init: true` to `docker-compose.yml`
- Recreated container with tini init system
- Healthcheck `timeout` processes now properly reaped
### Results
| Metric | Before | After |
|--------|--------|-------|
| Load Average | 16.6+ | 2.72 |
| Zombie Processes | 12,385 | 3 (stable) |
| I/O Wait | 38% | 0% |
| Total Tasks | 12,682 | 352 |
| System Response | Timeout | <100ms |
### Docker Compose Fix
```yaml
# /mnt/dolphinng5_predict/prod/docker-compose.yml
services:
hazelcast:
image: hazelcast/hazelcast:5.3
init: true # Added: enables proper zombie reaping
# ... rest of config
```
### Current Status
| Component | Status | Notes |
|-----------|--------|-------|
| Hazelcast | Healthy | Init: true, zombie reaping working |
| Hz Management Center | Up 36h | Stable |
| Prefect Server | Up 36h | Stable |
| CIFS Mounts | Partial | Only DolphinNG5_Predict mounted |
| System Performance | Normal | Responsive, low latency |
### CIFS Mount Status
```bash
# Currently mounted:
//100.119.158.61/DolphinNG5_Predict on /mnt/dolphinng5_predict
# Unmounted (server unresponsive):
//100.119.158.61/DolphinNG6
//100.119.158.61/DolphinNG6_Data
//100.119.158.61/DolphinNG6_Data_New
//100.119.158.61/Vids
```
**Note:** DolphinNG6 server at `100.119.158.61` is unresponsive for new mount attempts. DolphinNG5_Predict remains operational.
---
**Last Updated:** 2026-03-26 08:15 UTC
**Status:** OPERATIONAL (post-recovery)
---
## NG8 Development Status - 2026-03-26 09:00 UTC
### Objective
Create performance-optimized NG8 engine with **exact 512-bit numerical equivalence** to NG7.
### Discovery
NG7 code found on `DolphinNG6` share (Windows SMB mounted):
- `enhanced_main.py` - Entry point
- `dolphin_correlation_arb512_with_eigen_tracking.py` - Core eigenvalue engine (1,586 lines)
- Uses **`python-flint`** (Arb library) for 512-bit precision
- Power iteration + Rayleigh quotient algorithm
- Multi-window support: [50, 150, 300, 750]
### NG8 Architecture
**CRITICAL: ZERO algorithmic changes to 512-bit paths**
| Component | NG7 | NG8 | Change |
|-----------|-----|-----|--------|
| 512-bit library | python-flint | python-flint | None |
| Eigenvalue algorithm | Power iteration | Power iteration | None |
| Correlation calc | Arb O(n³) | Arb O(n³) | None |
| Price validation | Python | Numba float64 | Optimized |
| Output format | JSON/Arrow | JSON/Arrow | None |
### Files Created
| Path | Purpose |
|------|---------|
| `/mnt/dolphinng5_predict/- Dolphin NG8/ng8_core_optimized.py` | Main engine (EXACT algorithm) |
| `/mnt/dolphinng5_predict/- Dolphin NG8/ng8_equivalence_test.py` | Test harness |
| `/mnt/dolphinng5_predict/- Dolphin NG8/README_NG8.md` | Documentation |
### Hz Temp Namespace
NG8 writes to `NG8_TEMP_FEATURES` for safe testing without affecting NG7 production.
### Safety
- Original NG7 code: **Untouched** (in `/mnt/dolphinng6/`)
- Production system: **Unaffected**
- Rollback: **Immediate** (NG7 still running)
### Status
🔄 **In Development** - Equivalence testing pending
---
**Last Updated:** 2026-03-26 09:00 UTC
---
## TODO — Backlog (2026-03-30)
### MHS "Threesome Test" — scan payload quality checks
Add to `meta_health_service_v3.py` (or dedicated health check):
1. `latest_eigen_scan` has `assets` list len > 0 AND `vel_div` field present and finite
2. `engine_snapshot` age < 120s when `nautilus_trader` is RUNNING
3. `scans_processed` counter increments between consecutive polls (monotonicity check)
Alert level: WARN on any failure; CRITICAL if all three fail simultaneously.
### OBF "vs. wall clock drift detect"
In `obf_universe_service.py` or MHS: alert when any `asset_{symbol}_ob` key has
`timestamp` > N seconds behind wall clock (N = 30s suggested).
Detects: WS stream stall, OBF service crash, HZ put failure.
Can spot-check a fixed set of liquid assets (BTC, ETH, SOL, BNB) rather than all 500+.