Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
255 lines
7.3 KiB
Markdown
Executable File
255 lines
7.3 KiB
Markdown
Executable File
# Operational Status - NG7 Live
|
||
|
||
**Last Updated:** 2026-03-25 05:35 UTC
|
||
**Status:** ✅ FULLY OPERATIONAL
|
||
|
||
---
|
||
|
||
## Current State
|
||
|
||
| Component | Status | Details |
|
||
|-----------|--------|---------|
|
||
| NG7 (Windows) | ✅ LIVE | Writing directly to Hz over Tailscale |
|
||
| Hz Server | ✅ HEALTHY | Receiving scans ~5s interval |
|
||
| Nautilus Trader | ✅ RUNNING | Processing scans, 0 lag |
|
||
| Scan Bridge | ✅ RUNNING | Legacy backup (unused) |
|
||
|
||
---
|
||
|
||
## Recent Changes
|
||
|
||
### 1. NG7 Direct Hz Write (Primary)
|
||
- **Before:** Arrow → SMB → Scan Bridge → Hz (~5-60s lag)
|
||
- **After:** NG7 → Hz direct (~67ms network + ~55ms processing)
|
||
- **Result:** 400-500x faster, real-time sync
|
||
|
||
### 2. Supervisord Migration
|
||
- Migrated `nautilus_trader` and `scan_bridge` from systemd to supervisord
|
||
- Config: `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf`
|
||
- Status: `supervisorctl -c ... status`
|
||
|
||
### 3. Bug Fix: file_mtime
|
||
- **Issue:** Nautilus dedup failed (missing `file_mtime` field)
|
||
- **Fix:** Added NG7 compatibility fallback using `timestamp`
|
||
- **Location:** `nautilus_event_trader.py` line ~320
|
||
|
||
---
|
||
|
||
## Test Results
|
||
|
||
### Latency Benchmark
|
||
```
|
||
Network (Tailscale): ~67ms (52% of total)
|
||
Engine processing: ~55ms (42% of total)
|
||
Total end-to-end: ~130ms
|
||
Sync quality: 0 lag (100% in-sync)
|
||
```
|
||
|
||
### Scan Statistics (Current)
|
||
```
|
||
Hz latest scan: #1803 (At writeup. 56K now).-
|
||
Engine last scan: #1803 (Closer to 56K).-
|
||
Scans processed: 1674 (Closer if not equal to Engine/Hz last)
|
||
Bar index: 1613
|
||
Capital: $25,000 (26K after last tests).-
|
||
Posture: APEX
|
||
```
|
||
|
||
### Integrity Checks
|
||
- ✅ NG7 metadata present
|
||
- ✅ Eigenvalue tracking active
|
||
- ✅ Pricing data (50 symbols)
|
||
- ✅ Multi-window results
|
||
- ✅ Byte-for-byte Hz/disk congruence
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```
|
||
NG7 (Windows) ──Tailscale──→ Hz (Linux) ──→ Nautilus
|
||
│ │
|
||
└────Disk (backup)───────┘
|
||
```
|
||
|
||
**Bottleneck:** Network RTT (~67ms) - physics limited, optimal.
|
||
|
||
---
|
||
|
||
## Commands
|
||
|
||
```bash
|
||
# Status
|
||
supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf status
|
||
|
||
# Hz check
|
||
python3 -c "import hazelcast; c=HazelcastClient(cluster_name='dolphin',cluster_members=['localhost:5701']); print(json.loads(c.get_map('DOLPHIN_FEATURES').get('latest_eigen_scan').result()))"
|
||
|
||
# Logs
|
||
tail -50 /mnt/dolphinng5_predict/prod/supervisor/logs/nautilus_trader.log
|
||
```
|
||
|
||
---
|
||
|
||
## Notes
|
||
- Network latency (~67ms) is the dominant factor - expected for EU→Sweden
|
||
- Engine processing (~55ms) is secondary
|
||
- 0 scan lag = optimal sync achieved
|
||
- MHS disabled to prevent restart loops
|
||
|
||
|
||
---
|
||
|
||
## System Recovery - 2026-03-26 08:00 UTC
|
||
|
||
**Issue:** System extremely sluggish, terminal locked, load average 16.6+
|
||
|
||
### Root Causes
|
||
|
||
| Issue | Details |
|
||
|-------|---------|
|
||
| Zombie Process Storm | 12,385 zombie `timeout` processes from Hazelcast healthcheck |
|
||
| Hung CIFS Mounts | DolphinNG6 shares (3 mounts) unresponsive from `100.119.158.61` |
|
||
| Stuck Process | `grep -ri` scanning `/mnt` in D-state for 24+ hours |
|
||
| I/O Wait | 38% wait time from blocked SMB operations |
|
||
|
||
### Actions Taken
|
||
|
||
1. **Killed stuck processes:**
|
||
- `grep -ri` (PID 101907) - unlocked terminal
|
||
- `meta_health_daemon_v2.py` (PID 224047) - D-state cleared
|
||
- Stuck `ls` processes on CIFS mounts
|
||
|
||
2. **Cleared zombie processes:**
|
||
- Killed Hazelcast parent (PID 2049)
|
||
- Lazy unmounted 3 hung CIFS shares
|
||
- Zombie count: 12,385 → 3
|
||
|
||
3. **Fixed Hazelcast zombie leak:**
|
||
- Added `init: true` to `docker-compose.yml`
|
||
- Recreated container with tini init system
|
||
- Healthcheck `timeout` processes now properly reaped
|
||
|
||
### Results
|
||
|
||
| Metric | Before | After |
|
||
|--------|--------|-------|
|
||
| Load Average | 16.6+ | 2.72 |
|
||
| Zombie Processes | 12,385 | 3 (stable) |
|
||
| I/O Wait | 38% | 0% |
|
||
| Total Tasks | 12,682 | 352 |
|
||
| System Response | Timeout | <100ms |
|
||
|
||
### Docker Compose Fix
|
||
|
||
```yaml
|
||
# /mnt/dolphinng5_predict/prod/docker-compose.yml
|
||
services:
|
||
hazelcast:
|
||
image: hazelcast/hazelcast:5.3
|
||
init: true # Added: enables proper zombie reaping
|
||
# ... rest of config
|
||
```
|
||
|
||
### Current Status
|
||
|
||
| Component | Status | Notes |
|
||
|-----------|--------|-------|
|
||
| Hazelcast | ✅ Healthy | Init: true, zombie reaping working |
|
||
| Hz Management Center | ✅ Up 36h | Stable |
|
||
| Prefect Server | ✅ Up 36h | Stable |
|
||
| CIFS Mounts | ⚠️ Partial | Only DolphinNG5_Predict mounted |
|
||
| System Performance | ✅ Normal | Responsive, low latency |
|
||
|
||
### CIFS Mount Status
|
||
|
||
```bash
|
||
# Currently mounted:
|
||
//100.119.158.61/DolphinNG5_Predict on /mnt/dolphinng5_predict
|
||
|
||
# Unmounted (server unresponsive):
|
||
//100.119.158.61/DolphinNG6
|
||
//100.119.158.61/DolphinNG6_Data
|
||
//100.119.158.61/DolphinNG6_Data_New
|
||
//100.119.158.61/Vids
|
||
```
|
||
|
||
**Note:** DolphinNG6 server at `100.119.158.61` is unresponsive for new mount attempts. DolphinNG5_Predict remains operational.
|
||
|
||
---
|
||
|
||
**Last Updated:** 2026-03-26 08:15 UTC
|
||
**Status:** ✅ OPERATIONAL (post-recovery)
|
||
|
||
|
||
---
|
||
|
||
## NG8 Development Status - 2026-03-26 09:00 UTC
|
||
|
||
### Objective
|
||
Create performance-optimized NG8 engine with **exact 512-bit numerical equivalence** to NG7.
|
||
|
||
### Discovery
|
||
|
||
NG7 code found on `DolphinNG6` share (Windows SMB mounted):
|
||
- `enhanced_main.py` - Entry point
|
||
- `dolphin_correlation_arb512_with_eigen_tracking.py` - Core eigenvalue engine (1,586 lines)
|
||
- Uses **`python-flint`** (Arb library) for 512-bit precision
|
||
- Power iteration + Rayleigh quotient algorithm
|
||
- Multi-window support: [50, 150, 300, 750]
|
||
|
||
### NG8 Architecture
|
||
|
||
**CRITICAL: ZERO algorithmic changes to 512-bit paths**
|
||
|
||
| Component | NG7 | NG8 | Change |
|
||
|-----------|-----|-----|--------|
|
||
| 512-bit library | python-flint | python-flint | None |
|
||
| Eigenvalue algorithm | Power iteration | Power iteration | None |
|
||
| Correlation calc | Arb O(n³) | Arb O(n³) | None |
|
||
| Price validation | Python | Numba float64 | Optimized |
|
||
| Output format | JSON/Arrow | JSON/Arrow | None |
|
||
|
||
### Files Created
|
||
|
||
| Path | Purpose |
|
||
|------|---------|
|
||
| `/mnt/dolphinng5_predict/- Dolphin NG8/ng8_core_optimized.py` | Main engine (EXACT algorithm) |
|
||
| `/mnt/dolphinng5_predict/- Dolphin NG8/ng8_equivalence_test.py` | Test harness |
|
||
| `/mnt/dolphinng5_predict/- Dolphin NG8/README_NG8.md` | Documentation |
|
||
|
||
### Hz Temp Namespace
|
||
|
||
NG8 writes to `NG8_TEMP_FEATURES` for safe testing without affecting NG7 production.
|
||
|
||
### Safety
|
||
|
||
- Original NG7 code: **Untouched** (in `/mnt/dolphinng6/`)
|
||
- Production system: **Unaffected**
|
||
- Rollback: **Immediate** (NG7 still running)
|
||
|
||
### Status
|
||
|
||
🔄 **In Development** - Equivalence testing pending
|
||
|
||
---
|
||
|
||
**Last Updated:** 2026-03-26 09:00 UTC
|
||
|
||
---
|
||
|
||
## TODO — Backlog (2026-03-30)
|
||
|
||
### MHS "Threesome Test" — scan payload quality checks
|
||
Add to `meta_health_service_v3.py` (or dedicated health check):
|
||
1. `latest_eigen_scan` has `assets` list len > 0 AND `vel_div` field present and finite
|
||
2. `engine_snapshot` age < 120s when `nautilus_trader` is RUNNING
|
||
3. `scans_processed` counter increments between consecutive polls (monotonicity check)
|
||
Alert level: WARN on any failure; CRITICAL if all three fail simultaneously.
|
||
|
||
### OBF "vs. wall clock drift detect"
|
||
In `obf_universe_service.py` or MHS: alert when any `asset_{symbol}_ob` key has
|
||
`timestamp` > N seconds behind wall clock (N = 30s suggested).
|
||
Detects: WS stream stall, OBF service crash, HZ put failure.
|
||
Can spot-check a fixed set of liquid assets (BTC, ETH, SOL, BNB) rather than all 500+.
|