initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
This commit is contained in:
195
prod/services/README.md
Executable file
195
prod/services/README.md
Executable file
@@ -0,0 +1,195 @@
|
||||
# Dolphin Userland Services
|
||||
|
||||
**Server-grade service management without root!** Uses `systemd --user` for reliability.
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```bash
|
||||
# Check status
|
||||
./service_manager.py status
|
||||
|
||||
# Start all services
|
||||
./service_manager.py start
|
||||
|
||||
# View logs
|
||||
./service_manager.py logs exf -f
|
||||
```
|
||||
|
||||
## 📋 Service Overview
|
||||
|
||||
| Service | File | Description | Interval |
|
||||
|---------|------|-------------|----------|
|
||||
| **exf** | `dolphin-exf.service` | External Factors (aggressive) | 0.5s |
|
||||
| **ob** | `dolphin-ob.service` | Order Book Streamer | 500ms |
|
||||
| **watchdog** | `dolphin-watchdog.service` | Survival Stack | 10s |
|
||||
| **mc** | `dolphin-mc.timer` | MC-Forewarner | 4h |
|
||||
|
||||
## 🔧 Service Manager Commands
|
||||
|
||||
```bash
|
||||
# Status
|
||||
./service_manager.py status # All services
|
||||
./service_manager.py status exf # Specific service
|
||||
|
||||
# Control
|
||||
./service_manager.py start # Start all
|
||||
./service_manager.py stop # Stop all
|
||||
./service_manager.py restart exf # Restart specific
|
||||
|
||||
# Logs
|
||||
./service_manager.py logs exf # Last 50 lines
|
||||
./service_manager.py logs exf -f # Follow
|
||||
./service_manager.py logs exf -n 100 # Last 100 lines
|
||||
|
||||
# Auto-start on boot
|
||||
./service_manager.py enable # Enable all
|
||||
./service_manager.py disable # Disable all
|
||||
|
||||
# After editing .service files
|
||||
./service_manager.py reload # Reload systemd
|
||||
```
|
||||
|
||||
## 🏗️ Creating a New Service
|
||||
|
||||
### Option 1: Full Service Base (Recommended)
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
from services.service_base import ServiceBase
|
||||
|
||||
class MyService(ServiceBase):
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
name='my-service',
|
||||
check_interval=30,
|
||||
max_retries=3,
|
||||
notify_systemd=True
|
||||
)
|
||||
|
||||
async def run_cycle(self):
|
||||
# Your logic here
|
||||
await do_work()
|
||||
await asyncio.sleep(1) # Cycle interval
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
# Optional: custom health check
|
||||
return True
|
||||
|
||||
if __name__ == '__main__':
|
||||
MyService().run()
|
||||
```
|
||||
|
||||
Create systemd service file:
|
||||
```bash
|
||||
cat > ~/.config/systemd/user/dolphin-my.service << 'SERVICEFILE'
|
||||
[Unit]
|
||||
Description=My Service
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
ExecStart=/usr/bin/python3 /path/to/my_service.py
|
||||
Restart=always
|
||||
RestartSec=5
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
SERVICEFILE
|
||||
|
||||
# Enable and start
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable dolphin-my.service
|
||||
systemctl --user start dolphin-my.service
|
||||
```
|
||||
|
||||
### Option 2: Simple Scheduled Task
|
||||
|
||||
```python
|
||||
from services.service_base import run_scheduled
|
||||
|
||||
def my_task():
|
||||
print("Running...")
|
||||
|
||||
run_scheduled(my_task, interval_seconds=60, name='my-task')
|
||||
```
|
||||
|
||||
## 📊 Features
|
||||
|
||||
### Automatic
|
||||
- **Restart on crash**: Services auto-restart with backoff
|
||||
- **Health checks**: Built-in monitoring
|
||||
- **Structured logging**: JSON to systemd journal
|
||||
- **Resource limits**: Memory/CPU quotas
|
||||
- **Graceful shutdown**: SIGTERM handling
|
||||
|
||||
### Retry Logic (Tenacity)
|
||||
```python
|
||||
@ServiceBase.retry_with_backoff
|
||||
async def fetch_data(self):
|
||||
# Automatically retries with exponential backoff
|
||||
pass
|
||||
```
|
||||
|
||||
### Health Check Endpoint
|
||||
Services expose health via Hazelcast or file:
|
||||
```python
|
||||
async def health_check(self) -> bool:
|
||||
return self.last_update > time.time() - 2.0
|
||||
```
|
||||
|
||||
## 📝 Logging
|
||||
|
||||
All services log structured JSON:
|
||||
```json
|
||||
{
|
||||
"timestamp": "2024-03-25T15:30:00",
|
||||
"level": "INFO",
|
||||
"service": "exf",
|
||||
"message": "Indicators updated"
|
||||
}
|
||||
```
|
||||
|
||||
View logs:
|
||||
```bash
|
||||
# All services
|
||||
journalctl --user -f
|
||||
|
||||
# Specific service
|
||||
journalctl --user -u dolphin-exf -f
|
||||
```
|
||||
|
||||
## 🔍 Monitoring
|
||||
|
||||
```bash
|
||||
# Service status
|
||||
systemctl --user status
|
||||
|
||||
# Resource usage
|
||||
systemctl --user show dolphin-exf --property=MemoryCurrent,CPUUsageNSec
|
||||
|
||||
# Recent failures
|
||||
systemctl --user --failed
|
||||
```
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Service won't start | Check `journalctl --user -u dolphin-exf` |
|
||||
| High memory usage | Adjust `MemoryMax=` in .service file |
|
||||
| Restart loop | Check exit code: `systemctl --user status exf` |
|
||||
| Logs not showing | Ensure `StandardOutput=journal` |
|
||||
| Permission denied | Service files must be in `~/.config/systemd/user/` |
|
||||
|
||||
## 🔄 Service Dependencies
|
||||
|
||||
```
|
||||
exf -> hazelcast
|
||||
ob -> hazelcast, exf
|
||||
watchdog -> hazelcast, exf, ob
|
||||
mc -> hazelcast (timer-triggered)
|
||||
```
|
||||
|
||||
Configured via `After=` and `Wants=` in service files.
|
||||
Reference in New Issue
Block a user