initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
This commit is contained in:
119
prod/services/ARCHITECTURE_CHOICE.md
Executable file
119
prod/services/ARCHITECTURE_CHOICE.md
Executable file
@@ -0,0 +1,119 @@
|
||||
# Service Architecture Options
|
||||
|
||||
## Option 1: Single Supervisor (Recommended for You)
|
||||
**One systemd service → Manages multiple internal components**
|
||||
|
||||
```
|
||||
dolphin-supervisor.service
|
||||
├── ExF Component (thread)
|
||||
├── OB Component (thread)
|
||||
├── Watchdog Component (thread)
|
||||
└── MC Component (thread)
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- One systemd unit to manage
|
||||
- Components share memory efficiently
|
||||
- Centralized health monitoring
|
||||
- Built-in restart per component
|
||||
- Lower system overhead
|
||||
|
||||
**Cons:**
|
||||
- Single process (if it crashes, all components stop)
|
||||
- Less isolation between components
|
||||
|
||||
**Use when:** Components are tightly coupled, share data
|
||||
|
||||
**Commands:**
|
||||
```bash
|
||||
systemctl --user start dolphin-supervisor
|
||||
journalctl --user -u dolphin-supervisor -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Option 2: Multiple Separate Services
|
||||
**Each component = separate systemd service**
|
||||
|
||||
```
|
||||
dolphin-exf.service
|
||||
├── ExF Component
|
||||
|
||||
dolphin-ob.service
|
||||
├── OB Component
|
||||
|
||||
dolphin-watchdog.service
|
||||
├── Watchdog Component
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Full isolation between components
|
||||
- Independent restart/failure domains
|
||||
- Can set different resource limits per service
|
||||
- Systemd handles everything
|
||||
|
||||
**Cons:**
|
||||
- More systemd units to manage
|
||||
- Higher memory overhead (separate processes)
|
||||
- IPC needed for shared data
|
||||
|
||||
**Use when:** Components are independent, need strong isolation
|
||||
|
||||
**Commands:**
|
||||
```bash
|
||||
./service_manager.py start
|
||||
./service_manager.py status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Option 3: Hybrid (Single Supervisor + Critical Services Separate)
|
||||
|
||||
```
|
||||
dolphin-supervisor.service
|
||||
├── ExF Component
|
||||
├── OB Component
|
||||
└── MC Component (scheduled)
|
||||
|
||||
dolphin-watchdog.service (separate - critical!)
|
||||
└── Watchdog Component
|
||||
```
|
||||
|
||||
**Use when:** One component is critical/safety-related
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
For your Dolphin system, **Option 1 (Single Supervisor)** is likely best because:
|
||||
|
||||
1. **Tight coupling**: ExF, OB, Watchdog all need Hazelcast
|
||||
2. **Data sharing**: Components share state via memory
|
||||
3. **Simplicity**: One command to start/stop everything
|
||||
4. **Resource efficiency**: Lower overhead than separate processes
|
||||
|
||||
The supervisor handles:
|
||||
- Auto-restart of failed components
|
||||
- Health monitoring
|
||||
- Structured logging
|
||||
- Graceful shutdown
|
||||
|
||||
---
|
||||
|
||||
## Quick Start: Single Supervisor
|
||||
|
||||
```bash
|
||||
# 1. Enable and start
|
||||
cd /mnt/dolphinng5_predict/prod/services
|
||||
systemctl --user enable dolphin-supervisor
|
||||
systemctl --user start dolphin-supervisor
|
||||
|
||||
# 2. Check status
|
||||
systemctl --user status dolphin-supervisor
|
||||
|
||||
# 3. View logs
|
||||
journalctl --user -u dolphin-supervisor -f
|
||||
|
||||
# 4. Stop
|
||||
systemctl --user stop dolphin-supervisor
|
||||
```
|
||||
Reference in New Issue
Block a user