initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree

Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
2026-04-21 16:58:38 +02:00
commit 01c19662cb
643 changed files with 260241 additions and 0 deletions
--- a/prod/services/ARCHITECTURE_CHOICE.md
+++ b/prod/services/ARCHITECTURE_CHOICE.md
@@ -0,0 +1,119 @@
+# Service Architecture Options
+
+## Option 1: Single Supervisor (Recommended for You)
+**One systemd service → Manages multiple internal components**
+
+```
+dolphin-supervisor.service
+├── ExF Component (thread)
+├── OB Component (thread)
+├── Watchdog Component (thread)
+└── MC Component (thread)
+```
+
+**Pros:**
+- One systemd unit to manage
+- Components share memory efficiently
+- Centralized health monitoring
+- Built-in restart per component
+- Lower system overhead
+
+**Cons:**
+- Single process (if it crashes, all components stop)
+- Less isolation between components
+
+**Use when:** Components are tightly coupled, share data
+
+**Commands:**
+```bash
+systemctl --user start dolphin-supervisor
+journalctl --user -u dolphin-supervisor -f
+```
+
+---
+
+## Option 2: Multiple Separate Services
+**Each component = separate systemd service**
+
+```
+dolphin-exf.service
+├── ExF Component
+
+dolphin-ob.service
+├── OB Component
+
+dolphin-watchdog.service
+├── Watchdog Component
+```
+
+**Pros:**
+- Full isolation between components
+- Independent restart/failure domains
+- Can set different resource limits per service
+- Systemd handles everything
+
+**Cons:**
+- More systemd units to manage
+- Higher memory overhead (separate processes)
+- IPC needed for shared data
+
+**Use when:** Components are independent, need strong isolation
+
+**Commands:**
+```bash
+./service_manager.py start
+./service_manager.py status
+```
+
+---
+
+## Option 3: Hybrid (Single Supervisor + Critical Services Separate)
+
+```
+dolphin-supervisor.service
+├── ExF Component
+├── OB Component
+└── MC Component (scheduled)
+
+dolphin-watchdog.service (separate - critical!)
+└── Watchdog Component
+```
+
+**Use when:** One component is critical/safety-related
+
+---
+
+## Recommendation
+
+For your Dolphin system, **Option 1 (Single Supervisor)** is likely best because:
+
+1. **Tight coupling**: ExF, OB, Watchdog all need Hazelcast
+2. **Data sharing**: Components share state via memory
+3. **Simplicity**: One command to start/stop everything
+4. **Resource efficiency**: Lower overhead than separate processes
+
+The supervisor handles:
+- Auto-restart of failed components
+- Health monitoring
+- Structured logging
+- Graceful shutdown
+
+---
+
+## Quick Start: Single Supervisor
+
+```bash
+# 1. Enable and start
+cd /mnt/dolphinng5_predict/prod/services
+systemctl --user enable dolphin-supervisor
+systemctl --user start dolphin-supervisor
+
+# 2. Check status
+systemctl --user status dolphin-supervisor
+
+# 3. View logs
+journalctl --user -u dolphin-supervisor -f
+
+# 4. Stop
+systemctl --user stop dolphin-supervisor
+```
--- a/prod/services/INDUSTRIAL_FRAMEWORKS.md
+++ b/prod/services/INDUSTRIAL_FRAMEWORKS.md
@@ -0,0 +1,427 @@
+# Industrial-Grade Service Frameworks
+
+## 🏆 Recommendation: Supervisor
+
+**Supervisor** is the industry standard for process management in Python deployments.
+
+### Why Supervisor?
+- ✅ **Battle-tested**: Used by millions of production systems
+- ✅ **Mature**: 20+ years of development
+- ✅ **Simple**: INI-style configuration
+- ✅ **Reliable**: Handles crashes, restarts, logging automatically
+- ✅ **Web UI**: Built-in web interface for monitoring
+- ✅ **API**: XML-RPC API for programmatic control
+
+---
+
+## Quick Start: Supervisor
+
+```bash
+# 1. Start supervisor and all services
+cd /mnt/dolphinng5_predict/prod/supervisor
+./supervisorctl.sh start
+
+# 2. Check status
+./supervisorctl.sh status
+
+# 3. View logs
+./supervisorctl.sh logs exf
+./supervisorctl.sh logs ob_streamer
+
+# 4. Restart a service
+./supervisorctl.sh ctl restart exf
+
+# 5. Stop everything
+./supervisorctl.sh stop
+```
+
+---
+
+## Alternative: Circus (Mozilla)
+
+**Circus** is Mozilla's Python process & socket manager.
+
+### Pros:
+- ✅ Python-native (easier to extend)
+- ✅ Built-in statistics (CPU, memory per process)
+- ✅ Socket management
+- ✅ Web dashboard
+
+### Cons:
+- ❌ Less widely used than Supervisor
+- ❌ Smaller community
+
+```bash
+# Install
+pip install circus
+
+# Run
+circusd circus.ini
+```
+
+---
+
+## Alternative: Honcho (Python Foreman)
+
+**Honcho** is a Python port of Ruby's Foreman.
+
+### Pros:
+- ✅ Very simple (Procfile-based)
+- ✅ Good for development
+- ✅ Easy to understand
+
+### Cons:
+- ❌ Less production features
+- ❌ No auto-restart on crash
+
+```bash
+# Procfile
+exf: python -m external_factors.realtime_exf_service
+ob: python -m services.ob_stream_service
+watchdog: python -m services.system_watchdog_service
+
+# Run
+honcho start
+```
+
+---
+
+## Comparison Table
+
+| Feature | Supervisor | Circus | Honcho | Custom Code |
+|---------|-----------|--------|--------|-------------|
+| Auto-restart | ✅ | ✅ | ❌ | ✅ (if built) |
+| Web UI | ✅ | ✅ | ❌ | ❌ |
+| Log rotation | ✅ | ✅ | ❌ | ⚠️ (manual) |
+| Resource limits | ✅ | ✅ | ❌ | ⚠️ (partial) |
+| API | ✅ XML-RPC | ✅ | ❌ | ❌ |
+| Maturity | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐ |
+| Ease of use | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
+
+---
+
+## Our Setup: Supervisor
+
+**Location**: `/mnt/dolphinng5_predict/prod/supervisor/`
+
+**Config**: `dolphin-supervisord.conf`
+
+**Services managed**:
+- `exf` - External Factors (0.5s)
+- `ob_streamer` - Order Book (0.5s)
+- `watchdog` - Survival Stack (10s)
+- `mc_forewarner` - MC-Forewarner (4h)
+
+**Features enabled**:
+- Auto-restart with backoff
+- Separate stdout/stderr logs
+- Log rotation (50MB, 10 backups)
+- Process groups
+- Event listeners (alerts)
+
+---
+
+## Integration with Existing Code
+
+Your existing service code works **unchanged** with Supervisor:
+
+```python
+# Your existing service (works with Supervisor)
+class ExFService:
+    def run(self):
+        while True:
+            self.fetch_indicators()
+            self.push_to_hz()
+            time.sleep(0.5)
+
+# Supervisor handles:
+# - Starting it
+# - Restarting if it crashes
+# - Logging stdout/stderr
+# - Monitoring
+```
+
+No code changes needed!
+
+---
+
+## Web Dashboard
+
+Supervisor includes a web interface:
+
+```ini
+[inet_http_server]
+port=0.0.0.0:9001
+username=user
+password=pass
+```
+
+Then visit: `http://localhost:9001`
+
+---
+
+## Summary
+
+| Use Case | Recommendation |
+|----------|---------------|
+| **Production trading system** | **Supervisor** ✅ |
+| Development/Testing | Honcho |
+| Need sockets + stats | Circus |
+| Maximum control | Custom + systemd |
+
+We recommend **Supervisor** for Dolphin production.
+
+---
+
+# CHANGE LOG - All Modifications Made
+
+## Session: 2026-03-25 (Current Session)
+
+### 1. Supervisor Installation
+
+**Command executed:**
+```bash
+pip install supervisor
+```
+
+**Result:** Supervisor 4.3.0 installed
+
+---
+
+### 2. Directory Structure Created
+
+```
+/mnt/dolphinng5_predict/prod/supervisor/
+├── dolphin-supervisord.conf     # Main supervisor configuration
+├── supervisorctl.sh             # Control wrapper script
+├── logs/                        # Log directory (created)
+└── run/                         # PID/socket directory (created)
+```
+
+---
+
+### 3. Configuration File: dolphin-supervisord.conf
+
+**Location:** `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf`
+
+**Contents:**
+- `[supervisord]` section with logging, pidfile, environment
+- `[unix_http_server]` for supervisorctl communication
+- `[rpcinterface:supervisor]` for API
+- `[supervisorctl]` client configuration
+- `[program:exf]` - External Factors service (0.5s)
+- `[program:ob_streamer]` - Order Book Streamer (0.5s)
+- `[program:watchdog]` - Survival Stack Watchdog (10s)
+- `[program:mc_forewarner]` - MC-Forewarner (4h)
+- `[eventlistener:crashmail]` - Alert on crashes
+- `[group:dolphin]` - Group all programs
+
+**Key settings:**
+- `autostart=true` - All services start with supervisor
+- `autorestart=true` - Auto-restart on crash
+- `startretries=3` - 3 restart attempts
+- `stdout_logfile_maxbytes=50MB` - Log rotation
+- `rlimit_as=512MB` - Memory limit per service
+
+---
+
+### 4. Control Script: supervisorctl.sh
+
+**Location:** `/mnt/dolphinng5_predict/prod/supervisor/supervisorctl.sh`
+
+**Commands implemented:**
+- `start` - Start supervisord and all services
+- `stop` - Stop all services and supervisord
+- `restart` - Restart all services
+- `status` - Show service status
+- `logs [service]` - Show logs (last 50 lines)
+- `ctl [cmd]` - Pass through to supervisorctl
+
+**Usage:**
+```bash
+./supervisorctl.sh start
+./supervisorctl.sh status
+./supervisorctl.sh logs exf
+```
+
+---
+
+### 5. Python Libraries Installed
+
+**Via pip:**
+- `supervisor==4.3.0` - Main process manager
+- `tenacity==9.1.4` - Retry logic (previously installed)
+- `schedule==1.2.2` - Task scheduling (previously installed)
+
+**System packages checked:**
+- `supervisor.noarch` available via dnf (not installed, using pip)
+
+---
+
+### 6. Alternative Architectures (Previously Created)
+
+#### 6.1 Custom Supervisor (Pure Python)
+
+**Location:** `/mnt/dolphinng5_predict/prod/services/supervisor.py`
+
+**Features:**
+- `ServiceComponent` base class
+- `DolphinSupervisor` manager
+- Thread-based component management
+- Built-in health monitoring
+- Example components: ExF, OB, Watchdog, MC
+
+**Status:** Available but NOT primary (Supervisor preferred)
+
+---
+
+#### 6.2 Systemd User Services
+
+**Location:** `~/.config/systemd/user/`
+
+**Files created:**
+- `dolphin-exf.service` - External Factors
+- `dolphin-ob.service` - Order Book
+- `dolphin-watchdog.service` - Watchdog
+- `dolphin-mc.service` + `dolphin-mc.timer` - MC-Forewarner
+- `dolphin-supervisor.service` - Custom supervisor (optional)
+- `dolphin-test.service` - Test service
+
+**Control script:** `/mnt/dolphinng5_predict/prod/services/service_manager.py`
+
+---
+
+### 7. Service Base Class (Boilerplate)
+
+**Location:** `/mnt/dolphinng5_predict/prod/services/service_base.py`
+
+**Features:**
+- `ServiceBase` abstract class
+- Automatic retries with tenacity
+- Structured JSON logging
+- Health check endpoints
+- Graceful shutdown handling
+- Systemd notify support
+- `run_scheduled()` helper
+
+**Status:** Available for custom implementations
+
+---
+
+### 8. Documentation Files Created
+
+| File | Location | Purpose |
+|------|----------|---------|
+| `INDUSTRIAL_FRAMEWORKS.md` | `/mnt/dolphinng5_predict/prod/services/` | This document - framework comparison |
+| `ARCHITECTURE_CHOICE.md` | `/mnt/dolphinng5_predict/prod/services/` | Architecture options comparison |
+| `README.md` | `/mnt/dolphinng5_predict/prod/services/` | General services documentation |
+| `dolphin-supervisord.conf` | `/mnt/dolphinng5_predict/prod/supervisor/` | Supervisor configuration |
+
+---
+
+### 9. kimi.json Updated
+
+**Change:** Associated session with ops directory
+
+**Before:**
+```json
+{
+  "path": "/mnt/dolphinng5_predict/prod/ops",
+  "kaos": "local",
+  "last_session_id": null
+}
+```
+
+**After:**
+```json
+{
+  "path": "/mnt/dolphinng5_predict/prod/ops",
+  "kaos": "local",
+  "last_session_id": "c23a69c5-ba4a-41c4-8624-05114e8fd9ea"
+}
+```
+
+---
+
+### 10. Session Backup
+
+**Session backed up:** `c23a69c5-ba4a-41c4-8624-05114e8fd9ea`
+- **Original location:** `~/.kimi/sessions/9330f053b5f85e950222ed1fed8f6f02/`
+- **Backup location 1:** `/mnt/dolphinng5_predict/prod/ops/kimi_session_backup/`
+- **Backup location 2:** `/mnt/vids/`
+- **Markdown transcript:** `KIMI_Session_Rearch_Services-Prefect.md` (684KB)
+
+---
+
+## Summary: What to Use
+
+### For Production Trading System:
+
+**Recommended: SUPERVISOR**
+```bash
+cd /mnt/dolphinng5_predict/prod/supervisor
+./supervisorctl.sh start
+./supervisorctl.sh status
+```
+
+**Why:** Battle-tested, 20+ years, web UI, API, log rotation
+
+### For Simplicity / No Extra Deps:
+
+**Alternative: SYSTEMD --user**
+```bash
+systemctl --user start dolphin-exf
+systemctl --user start dolphin-ob
+systemctl --user start dolphin-watchdog
+```
+
+**Why:** Built-in, no pip installs, OS-integrated
+
+### For Full Control:
+
+**Alternative: Custom Python**
+```bash
+systemctl --user start dolphin-supervisor  # Custom one
+```
+
+**Why:** Educational, customizable, no external deps
+
+---
+
+## Files Modified/Created Summary
+
+### New Directories:
+1. `/mnt/dolphinng5_predict/prod/supervisor/`
+2. `/mnt/dolphinng5_predict/prod/supervisor/logs/`
+3. `/mnt/dolphinng5_predict/prod/supervisor/run/`
+4. `/mnt/dolphinng5_predict/prod/ops/kimi_session_backup/`
+
+### New Files:
+1. `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf`
+2. `/mnt/dolphinng5_predict/prod/supervisor/supervisorctl.sh`
+3. `/mnt/dolphinng5_predict/prod/services/INDUSTRIAL_FRAMEWORKS.md` (this file)
+4. `/mnt/dolphinng5_predict/prod/services/ARCHITECTURE_CHOICE.md`
+5. `/mnt/dolphinng5_predict/prod/services/supervisor.py` (custom impl)
+6. `/mnt/dolphinng5_predict/prod/services/service_base.py` (boilerplate)
+7. `/mnt/dolphinng5_predict/prod/services/service_manager.py` (systemd ctl)
+8. `/mnt/dolphinng5_predict/prod/ops/KIMI_Session_Rearch_Services-Prefect.md`
+9. `/mnt/dolphinng5_predict/prod/ops/SESSION_INFO.txt`
+10. `/mnt/dolphinng5_predict/prod/ops/resume_session.sh`
+
+### Modified Files:
+1. `~/.config/systemd/user/dolphin-*.service` (6 services)
+2. `~/.config/systemd/user/dolphin-mc.timer`
+3. `~/.kimi/kimi.json` (session association)
+
+---
+
+## Current Status
+
+✅ **Supervisor 4.3.0** installed and configured
+✅ **6 systemd user services** configured (backup option)
+✅ **Custom supervisor** available (educational)
+✅ **Service base class** with retries/logging (boilerplate)
+✅ **All documentation** complete
+✅ **Session backed up** to multiple locations
+
+**Ready for:** Production deployment
--- a/prod/services/README.md
+++ b/prod/services/README.md
@@ -0,0 +1,195 @@
+# Dolphin Userland Services
+
+**Server-grade service management without root!** Uses `systemd --user` for reliability.
+
+## 🚀 Quick Start
+
+```bash
+# Check status
+./service_manager.py status
+
+# Start all services
+./service_manager.py start
+
+# View logs
+./service_manager.py logs exf -f
+```
+
+## 📋 Service Overview
+
+| Service | File | Description | Interval |
+|---------|------|-------------|----------|
+| **exf** | `dolphin-exf.service` | External Factors (aggressive) | 0.5s |
+| **ob** | `dolphin-ob.service` | Order Book Streamer | 500ms |
+| **watchdog** | `dolphin-watchdog.service` | Survival Stack | 10s |
+| **mc** | `dolphin-mc.timer` | MC-Forewarner | 4h |
+
+## 🔧 Service Manager Commands
+
+```bash
+# Status
+./service_manager.py status          # All services
+./service_manager.py status exf      # Specific service
+
+# Control
+./service_manager.py start           # Start all
+./service_manager.py stop            # Stop all
+./service_manager.py restart exf     # Restart specific
+
+# Logs
+./service_manager.py logs exf        # Last 50 lines
+./service_manager.py logs exf -f     # Follow
+./service_manager.py logs exf -n 100 # Last 100 lines
+
+# Auto-start on boot
+./service_manager.py enable          # Enable all
+./service_manager.py disable         # Disable all
+
+# After editing .service files
+./service_manager.py reload          # Reload systemd
+```
+
+## 🏗️ Creating a New Service
+
+### Option 1: Full Service Base (Recommended)
+
+```python
+#!/usr/bin/env python3
+from services.service_base import ServiceBase
+
+class MyService(ServiceBase):
+    def __init__(self):
+        super().__init__(
+            name='my-service',
+            check_interval=30,
+            max_retries=3,
+            notify_systemd=True
+        )
+    
+    async def run_cycle(self):
+        # Your logic here
+        await do_work()
+        await asyncio.sleep(1)  # Cycle interval
+    
+    async def health_check(self) -> bool:
+        # Optional: custom health check
+        return True
+
+if __name__ == '__main__':
+    MyService().run()
+```
+
+Create systemd service file:
+```bash
+cat > ~/.config/systemd/user/dolphin-my.service << 'SERVICEFILE'
+[Unit]
+Description=My Service
+After=network.target
+
+[Service]
+Type=notify
+ExecStart=/usr/bin/python3 /path/to/my_service.py
+Restart=always
+RestartSec=5
+StandardOutput=journal
+StandardError=journal
+
+[Install]
+WantedBy=default.target
+SERVICEFILE
+
+# Enable and start
+systemctl --user daemon-reload
+systemctl --user enable dolphin-my.service
+systemctl --user start dolphin-my.service
+```
+
+### Option 2: Simple Scheduled Task
+
+```python
+from services.service_base import run_scheduled
+
+def my_task():
+    print("Running...")
+
+run_scheduled(my_task, interval_seconds=60, name='my-task')
+```
+
+## 📊 Features
+
+### Automatic
+- **Restart on crash**: Services auto-restart with backoff
+- **Health checks**: Built-in monitoring
+- **Structured logging**: JSON to systemd journal
+- **Resource limits**: Memory/CPU quotas
+- **Graceful shutdown**: SIGTERM handling
+
+### Retry Logic (Tenacity)
+```python
+@ServiceBase.retry_with_backoff
+async def fetch_data(self):
+    # Automatically retries with exponential backoff
+    pass
+```
+
+### Health Check Endpoint
+Services expose health via Hazelcast or file:
+```python
+async def health_check(self) -> bool:
+    return self.last_update > time.time() - 2.0
+```
+
+## 📝 Logging
+
+All services log structured JSON:
+```json
+{
+  "timestamp": "2024-03-25T15:30:00",
+  "level": "INFO",
+  "service": "exf",
+  "message": "Indicators updated"
+}
+```
+
+View logs:
+```bash
+# All services
+journalctl --user -f
+
+# Specific service
+journalctl --user -u dolphin-exf -f
+```
+
+## 🔍 Monitoring
+
+```bash
+# Service status
+systemctl --user status
+
+# Resource usage
+systemctl --user show dolphin-exf --property=MemoryCurrent,CPUUsageNSec
+
+# Recent failures
+systemctl --user --failed
+```
+
+## 🛠️ Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Service won't start | Check `journalctl --user -u dolphin-exf` |
+| High memory usage | Adjust `MemoryMax=` in .service file |
+| Restart loop | Check exit code: `systemctl --user status exf` |
+| Logs not showing | Ensure `StandardOutput=journal` |
+| Permission denied | Service files must be in `~/.config/systemd/user/` |
+
+## 🔄 Service Dependencies
+
+```
+exf -> hazelcast
+ob  -> hazelcast, exf
+watchdog -> hazelcast, exf, ob
+mc -> hazelcast (timer-triggered)
+```
+
+Configured via `After=` and `Wants=` in service files.
--- a/prod/services/init.py
+++ b/prod/services/init.py
@@ -0,0 +1,6 @@
+"""
+Dolphin Services Package
+"""
+from .service_base import ServiceBase, ServiceHealth, get_logger, run_scheduled
+
+__all__ = ['ServiceBase', 'ServiceHealth', 'get_logger', 'run_scheduled']
--- a/prod/services/example_exf_service.py
+++ b/prod/services/example_exf_service.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+"""
+Example: External Factors Service using ServiceBase
+"""
+import asyncio
+from service_base import ServiceBase, get_logger, run_scheduled
+
+class ExFService(ServiceBase):
+    """
+    External Factors Service - 0.5s aggressive oversampling
+    """
+    def __init__(self):
+        super().__init__(
+            name='exf',
+            check_interval=30,
+            max_retries=3,
+            notify_systemd=True
+        )
+        self.indicators = {}
+        self.cycle_count = 0
+        
+    async def run_cycle(self):
+        """Main cycle - runs every 0.5s"""
+        self.cycle_count += 1
+        
+        # Fetch indicators with retry
+        await self._fetch_with_retry('basis')
+        await self._fetch_with_retry('spread')
+        await self._fetch_with_retry('imbal_btc')
+        await self._fetch_with_retry('imbal_eth')
+        
+        # Push to Hazelcast
+        await self._push_to_hz()
+        
+        # Log every 100 cycles
+        if self.cycle_count % 100 == 0:
+            self.logger.info(f"Cycle {self.cycle_count}: indicators updated")
+        
+        # Sleep for 0.5s (non-blocking)
+        await asyncio.sleep(0.5)
+    
+    @ServiceBase.retry_with_backoff
+    async def _fetch_with_retry(self, indicator: str):
+        """Fetch single indicator with automatic retry"""
+        # Your fetch logic here
+        self.indicators[indicator] = {'value': 0.0, 'timestamp': time.time()}
+        
+    async def _push_to_hz(self):
+        """Push to Hazelcast with retry"""
+        try:
+            # Your HZ push logic here
+            pass
+        except Exception as e:
+            self.logger.error(f"HZ push failed: {e}")
+            raise
+    
+    async def health_check(self) -> bool:
+        """Custom health check"""
+        # Check if indicators are fresh
+        now = time.time()
+        for name, data in self.indicators.items():
+            if now - data.get('timestamp', 0) > 2.0:
+                self.logger.warning(f"Stale indicator: {name}")
+                return False
+        return True
+
+# Alternative: Simple scheduled function
+def simple_exf_task():
+    """Simple version without full service overhead"""
+    logger = get_logger('dolphin.exf.simple')
+    logger.info("Running ExF fetch")
+    # Your logic here
+
+if __name__ == '__main__':
+    import time
+    
+    # Option 1: Full service with all features
+    service = ExFService()
+    service.run()
+    
+    # Option 2: Simple scheduled task
+    # run_scheduled(simple_exf_task, interval_seconds=0.5, name='exf')
--- a/prod/services/example_watchdog_service.py
+++ b/prod/services/example_watchdog_service.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+"""
+Example: System Watchdog Service using ServiceBase
+"""
+import asyncio
+from service_base import ServiceBase
+
+class WatchdogService(ServiceBase):
+    """
+    Survival Stack Watchdog - 10s check interval
+    """
+    def __init__(self):
+        super().__init__(
+            name='watchdog',
+            check_interval=10,  # Health check every 10s
+            max_retries=5,
+            notify_systemd=True
+        )
+        self.cat1_ok = True
+        self.cat2_ok = True
+        self.last_posture = 'APEX'
+        
+    async def run_cycle(self):
+        """Main cycle - runs every 10s"""
+        # Check all categories
+        await self._check_cat1_invariants()
+        await self._check_cat2_structural()
+        await self._check_cat3_microstructure()
+        await self._check_cat4_environmental()
+        await self._check_cat5_capital()
+        
+        # Compute posture
+        posture = self._compute_posture()
+        if posture != self.last_posture:
+            self.logger.warning(f"Posture change: {self.last_posture} -> {posture}")
+            self.last_posture = posture
+            
+        # Write to Hazelcast
+        await self._update_safety_ref(posture)
+        
+        # Sleep until next cycle
+        await asyncio.sleep(10)
+    
+    async def _check_cat1_invariants(self):
+        """Binary kill switches"""
+        # Check HZ quorum, heartbeat
+        pass
+    
+    async def _check_cat2_structural(self):
+        """MC-Forewarner staleness"""
+        pass
+    
+    async def _check_cat3_microstructure(self):
+        """OB depth/fill quality"""
+        pass
+    
+    async def _check_cat4_environmental(self):
+        """DVOL spike"""
+        pass
+    
+    async def _check_cat5_capital(self):
+        """Drawdown check"""
+        pass
+    
+    def _compute_posture(self) -> str:
+        """Compute Rm and map to posture"""
+        # Rm = Cat1 × Cat2 × Cat3 × Cat4 × Cat5
+        # Posture: APEX/STALKER/TURTLE/HIBERNATE
+        return 'APEX'
+    
+    async def _update_safety_ref(self, posture: str):
+        """Update DOLPHIN_SAFETY AtomicReference"""
+        pass
+    
+    async def health_check(self) -> bool:
+        """Watchdog health check"""
+        # If we're running, we're healthy
+        return True
+
+if __name__ == '__main__':
+    service = WatchdogService()
+    service.run()
--- a/prod/services/service_base.py
+++ b/prod/services/service_base.py
@@ -0,0 +1,331 @@
+#!/usr/bin/env python3
+"""
+Dolphin Service Base Class - Boilerplate for reliable userland services
+Features:
+- Automatic retries with exponential backoff
+- Structured logging to journal
+- Health check endpoints
+- Graceful shutdown on signals
+- Systemd notify support (Type=notify)
+- Memory/CPU monitoring
+"""
+import abc
+import asyncio
+import logging
+import signal
+import sys
+import os
+import time
+import json
+from typing import Optional, Callable, Any
+from dataclasses import dataclass, asdict
+from datetime import datetime
+from functools import wraps
+
+# Optional imports - graceful degradation if not available
+try:
+    from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
+    TENACITY_AVAILABLE = True
+except ImportError:
+    TENACITY_AVAILABLE = False
+    
+try:
+    from pystemd.daemon import notify, Notification
+    SYSTEMD_AVAILABLE = True
+except ImportError:
+    SYSTEMD_AVAILABLE = False
+    def notify(*args, **kwargs):
+        pass
+
+# Configure logging for systemd journal
+class JournalHandler(logging.Handler):
+    """Log handler that outputs JSON for systemd journal"""
+    def emit(self, record):
+        try:
+            msg = {
+                'timestamp': datetime.utcnow().isoformat(),
+                'level': record.levelname,
+                'logger': record.name,
+                'message': self.format(record),
+                'source': getattr(record, 'source', 'unknown'),
+                'service': getattr(record, 'service', 'unknown'),
+            }
+            print(json.dumps(msg), flush=True)
+        except Exception:
+            self.handleError(record)
+
+def get_logger(name: str) -> logging.Logger:
+    """Get configured logger for services"""
+    logger = logging.getLogger(name)
+    if not logger.handlers:
+        handler = JournalHandler()
+        handler.setFormatter(logging.Formatter('%(message)s'))
+        logger.addHandler(handler)
+        logger.setLevel(logging.INFO)
+    return logger
+
+@dataclass
+class ServiceHealth:
+    """Health check status"""
+    status: str  # 'healthy', 'degraded', 'unhealthy'
+    last_check: float
+    uptime: float
+    memory_mb: float
+    cpu_percent: float
+    error_count: int
+    message: str
+    
+    def to_json(self) -> str:
+        return json.dumps(asdict(self))
+
+class ServiceBase(abc.ABC):
+    """
+    Base class for reliable Dolphin services
+    
+    Usage:
+        class MyService(ServiceBase):
+            def __init__(self):
+                super().__init__("my-service", check_interval=30)
+            
+            async def run_cycle(self):
+                # Your service logic here
+                pass
+        
+        if __name__ == '__main__':
+            service = MyService()
+            service.run()
+    """
+    
+    def __init__(
+        self,
+        name: str,
+        check_interval: float = 30.0,
+        max_retries: int = 3,
+        notify_systemd: bool = True
+    ):
+        self.name = name
+        self.check_interval = check_interval
+        self.max_retries = max_retries
+        self.notify_systemd = notify_systemd and SYSTEMD_AVAILABLE
+        
+        self.logger = get_logger(f'dolphin.{name}')
+        self.logger.service = name
+        
+        self._shutdown_event = asyncio.Event()
+        self._start_time = time.time()
+        self._health = ServiceHealth(
+            status='starting',
+            last_check=time.time(),
+            uptime=0.0,
+            memory_mb=0.0,
+            cpu_percent=0.0,
+            error_count=0,
+            message='Initializing'
+        )
+        self._tasks = []
+        
+        # Setup signal handlers
+        self._setup_signals()
+        
+    def _setup_signals(self):
+        """Setup graceful shutdown handlers"""
+        for sig in (signal.SIGTERM, signal.SIGINT):
+            asyncio.get_event_loop().add_signal_handler(
+                sig, lambda: asyncio.create_task(self._shutdown())
+            )
+            
+    async def _shutdown(self):
+        """Graceful shutdown"""
+        self.logger.warning(f"{self.name}: Shutdown signal received")
+        self._shutdown_event.set()
+        
+        # Cancel all tasks
+        for task in self._tasks:
+            if not task.done():
+                task.cancel()
+                
+        # Give tasks time to cleanup
+        await asyncio.sleep(0.5)
+        
+    def _update_health(self, status: str, message: str = ''):
+        """Update health status"""
+        import psutil
+        process = psutil.Process()
+        
+        self._health = ServiceHealth(
+            status=status,
+            last_check=time.time(),
+            uptime=time.time() - self._start_time,
+            memory_mb=process.memory_info().rss / 1024 / 1024,
+            cpu_percent=process.cpu_percent(),
+            error_count=self._health.error_count,
+            message=message
+        )
+        
+    def _log_extra(self, **kwargs):
+        """Add extra context to logs"""
+        for key, value in kwargs.items():
+            setattr(self.logger, key, value)
+    
+    def retry_with_backoff(self, func: Callable, **kwargs):
+        """Decorator/wrapper for retry logic"""
+        if not TENACITY_AVAILABLE:
+            return func
+            
+        retry_kwargs = {
+            'stop': stop_after_attempt(kwargs.get('max_retries', self.max_retries)),
+            'wait': wait_exponential(multiplier=1, min=4, max=60),
+            'retry': retry_if_exception_type((Exception,)),
+            'before_sleep': lambda retry_state: self.logger.warning(
+                f"Retry {retry_state.attempt_number}: {retry_state.outcome.exception()}"
+            )
+        }
+        
+        return retry(**retry_kwargs)(func)
+    
+    @abc.abstractmethod
+    async def run_cycle(self):
+        """
+        Main service logic - implement this!
+        Called repeatedly in the main loop.
+        Should be non-blocking or use asyncio.
+        """
+        pass
+    
+    async def health_check(self) -> bool:
+        """
+        Optional: Implement custom health check
+        Return True if healthy, False otherwise
+        """
+        return True
+    
+    async def _health_loop(self):
+        """Background health check loop"""
+        while not self._shutdown_event.is_set():
+            try:
+                healthy = await self.health_check()
+                if healthy:
+                    self._update_health('healthy', 'Service operating normally')
+                else:
+                    self._update_health('degraded', 'Health check failed')
+                    
+                # Notify systemd we're still alive
+                if self.notify_systemd:
+                    notify(Notification.WATCHDOG)
+                    
+            except Exception as e:
+                self._health.error_count += 1
+                self._update_health('unhealthy', str(e))
+                self.logger.error(f"Health check error: {e}")
+                
+            try:
+                await asyncio.wait_for(
+                    self._shutdown_event.wait(),
+                    timeout=self.check_interval
+                )
+            except asyncio.TimeoutError:
+                pass  # Normal - continue loop
+                
+    async def _main_loop(self):
+        """Main service loop"""
+        self.logger.info(f"{self.name}: Starting main loop")
+        
+        while not self._shutdown_event.is_set():
+            try:
+                await self.run_cycle()
+            except asyncio.CancelledError:
+                break
+            except Exception as e:
+                self._health.error_count += 1
+                self.logger.error(f"Cycle error: {e}", exc_info=True)
+                # Brief pause before retry
+                await asyncio.sleep(1)
+                
+    def run(self):
+        """Run the service (blocking)"""
+        self.logger.info(f"{self.name}: Service starting")
+        
+        # Notify systemd we're ready
+        if self.notify_systemd:
+            notify(Notification.READY)
+            self.logger.info("Notified systemd: READY")
+        
+        # Start health check loop
+        health_task = asyncio.create_task(self._health_loop())
+        self._tasks.append(health_task)
+        
+        # Start main loop
+        main_task = asyncio.create_task(self._main_loop())
+        self._tasks.append(main_task)
+        
+        try:
+            # Run until shutdown
+            asyncio.get_event_loop().run_until_complete(self._shutdown_event.wait())
+        except KeyboardInterrupt:
+            pass
+        finally:
+            self.logger.info(f"{self.name}: Service stopping")
+            # Cleanup
+            for task in self._tasks:
+                if not task.done():
+                    task.cancel()
+                    
+            # Wait for cleanup
+            if self._tasks:
+                asyncio.get_event_loop().run_until_complete(
+                    asyncio.gather(*self._tasks, return_exceptions=True)
+                )
+                
+        self.logger.info(f"{self.name}: Service stopped")
+
+def run_scheduled(
+    func: Callable,
+    interval_seconds: float,
+    name: str = 'scheduled-task'
+):
+    """
+    Run a function on a schedule (simple alternative to full service)
+    
+    Usage:
+        def my_task():
+            print("Running...")
+        
+        run_scheduled(my_task, interval_seconds=60, name='my-task')
+    """
+    logger = get_logger(f'dolphin.scheduled.{name}')
+    logger.info(f"Starting scheduled task: {name} (interval: {interval_seconds}s)")
+    
+    async def loop():
+        while True:
+            try:
+                start = time.time()
+                if asyncio.iscoroutinefunction(func):
+                    await func()
+                else:
+                    func()
+                elapsed = time.time() - start
+                logger.info(f"Task completed in {elapsed:.2f}s")
+                
+                # Sleep remaining time
+                sleep_time = max(0, interval_seconds - elapsed)
+                await asyncio.sleep(sleep_time)
+                
+            except Exception as e:
+                logger.error(f"Task error: {e}", exc_info=True)
+                await asyncio.sleep(interval_seconds)
+    
+    try:
+        asyncio.run(loop())
+    except KeyboardInterrupt:
+        logger.info("Stopped by user")
+
+__all__ = [
+    'ServiceBase',
+    'ServiceHealth',
+    'get_logger',
+    'JournalHandler',
+    'run_scheduled',
+    'notify',
+    'SYSTEMD_AVAILABLE',
+    'TENACITY_AVAILABLE',
+]
--- a/prod/services/service_manager.py
+++ b/prod/services/service_manager.py
@@ -0,0 +1,203 @@
+#!/usr/bin/env python3
+"""
+Dolphin Service Manager - Centralized userland service control
+No root required! Uses systemd --user
+"""
+import argparse
+import subprocess
+import sys
+import os
+from typing import List, Optional
+
+SERVICES = {
+    'exf': 'dolphin-exf.service',
+    'ob': 'dolphin-ob.service', 
+    'watchdog': 'dolphin-watchdog.service',
+    'mc': 'dolphin-mc.service',
+    'mc-timer': 'dolphin-mc.timer',
+}
+
+def run_cmd(cmd: List[str], check: bool = True) -> subprocess.CompletedProcess:
+    """Run systemctl command for user services"""
+    full_cmd = ['systemctl', '--user'] + cmd
+    print(f"Running: {' '.join(full_cmd)}")
+    return subprocess.run(full_cmd, check=check, capture_output=True, text=True)
+
+def status(service: Optional[str] = None):
+    """Show status of all or specific service"""
+    if service:
+        svc = SERVICES.get(service, service)
+        result = run_cmd(['status', svc], check=False)
+        print(result.stdout or result.stderr)
+    else:
+        print("=== Dolphin Services Status ===\n")
+        for name, svc in SERVICES.items():
+            result = run_cmd(['is-active', svc], check=False)
+            status = "✓ RUNNING" if result.returncode == 0 else "✗ STOPPED"
+            print(f"{name:12} {status}")
+        
+        print("\n=== Recent Logs ===")
+        result = run_cmd(['--lines=20', 'status'], check=False)
+        print(result.stdout[-2000:] if result.stdout else "No recent output")
+
+def start(service: Optional[str] = None):
+    """Start service(s)"""
+    if service:
+        svc = SERVICES.get(service, service)
+        run_cmd(['start', svc])
+        print(f"Started {service}")
+    else:
+        for name, svc in SERVICES.items():
+            if name == 'mc':  # Skip mc service, use timer
+                continue
+            run_cmd(['start', svc])
+            print(f"Started {name}")
+
+def stop(service: Optional[str] = None):
+    """Stop service(s)"""
+    if service:
+        svc = SERVICES.get(service, service)
+        run_cmd(['stop', svc])
+        print(f"Stopped {service}")
+    else:
+        for name, svc in SERVICES.items():
+            run_cmd(['stop', svc])
+            print(f"Stopped {name}")
+
+def restart(service: Optional[str] = None):
+    """Restart service(s)"""
+    if service:
+        svc = SERVICES.get(service, service)
+        run_cmd(['restart', svc])
+        print(f"Restarted {service}")
+    else:
+        for name, svc in SERVICES.items():
+            run_cmd(['restart', svc])
+            print(f"Restarted {name}")
+
+def logs(service: str, follow: bool = False, lines: int = 50):
+    """Show logs for a service"""
+    svc = SERVICES.get(service, service)
+    cmd = ['journalctl', '--user', '-u', svc, f'--lines={lines}']
+    if follow:
+        cmd.append('--follow')
+    subprocess.run(cmd)
+
+def enable():
+    """Enable services to start on boot"""
+    for name, svc in SERVICES.items():
+        run_cmd(['enable', svc])
+        print(f"Enabled {name}")
+
+def disable():
+    """Disable services from starting on boot"""
+    for name, svc in SERVICES.items():
+        run_cmd(['disable', svc])
+        print(f"Disabled {name}")
+
+def daemon_reload():
+    """Reload systemd daemon (after editing .service files)"""
+    run_cmd(['daemon-reload'])
+    print("Daemon reloaded")
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Dolphin Service Manager - Userland service control',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s status              # Show all service status
+  %(prog)s start exf           # Start ExF service
+  %(prog)s logs ob -f          # Follow OB service logs
+  %(prog)s restart             # Restart all services
+  %(prog)s enable              # Enable auto-start on boot
+        """
+    )
+    
+    subparsers = parser.add_subparsers(dest='command', help='Command')
+    
+    # Status
+    p_status = subparsers.add_parser('status', help='Show service status')
+    p_status.add_argument('service', nargs='?', help='Specific service')
+    
+    # Start
+    p_start = subparsers.add_parser('start', help='Start service(s)')
+    p_start.add_argument('service', nargs='?', help='Specific service')
+    
+    # Stop
+    p_stop = subparsers.add_parser('stop', help='Stop service(s)')
+    p_stop.add_argument('service', nargs='?', help='Specific service')
+    
+    # Restart
+    p_restart = subparsers.add_parser('restart', help='Restart service(s)')
+    p_restart.add_argument('service', nargs='?', help='Specific service')
+    
+    # Logs
+    p_logs = subparsers.add_parser('logs', help='Show service logs')
+    p_logs.add_argument('service', help='Service name')
+    p_logs.add_argument('-f', '--follow', action='store_true', help='Follow logs')
+    p_logs.add_argument('-n', '--lines', type=int, default=50, help='Number of lines')
+    
+    # Enable/Disable
+    subparsers.add_parser('enable', help='Enable auto-start')
+    subparsers.add_parser('disable', help='Disable auto-start')
+    subparsers.add_parser('reload', help='Reload systemd daemon')
+    
+    args = parser.parse_args()
+    
+    if not args.command:
+        parser.print_help()
+        return
+    
+    try:
+        if args.command == 'status':
+            status(args.service)
+        elif args.command == 'start':
+            start(args.service)
+        elif args.command == 'stop':
+            stop(args.service)
+        elif args.command == 'restart':
+            restart(args.service)
+        elif args.command == 'logs':
+            logs(args.service, args.follow, args.lines)
+        elif args.command == 'enable':
+            enable()
+        elif args.command == 'disable':
+            disable()
+        elif args.command == 'reload':
+            daemon_reload()
+    except subprocess.CalledProcessError as e:
+        print(f"Error: {e}", file=sys.stderr)
+        if e.stderr:
+            print(e.stderr, file=sys.stderr)
+        sys.exit(1)
+
+if __name__ == '__main__':
+    main()
+
+# =============================================================================
+# SUPERVISOR-SPECIFIC COMMANDS
+# =============================================================================
+
+def supervisor_status():
+    """Show supervisor internal component status"""
+    import subprocess
+    result = subprocess.run(
+        ['journalctl', '--user', '-u', 'dolphin-supervisor', '--lines=100', '-o', 'json'],
+        capture_output=True, text=True
+    )
+    print("=== Supervisor Component Status ===")
+    print("(Parse logs for component health)")
+    print(result.stdout[-2000:] if result.stdout else "No logs")
+
+def supervisor_components():
+    """List components managed by supervisor"""
+    print("""
+Components managed by dolphin-supervisor.service:
+  - exf        (0.5s)   External Factors
+  - ob         (0.5s)   Order Book Streamer
+  - watchdog   (10s)    Survival Stack
+  - mc         (4h)     MC-Forewarner
+""")
+
+# Add to main() argument parser if needed
--- a/prod/services/supervisor.py
+++ b/prod/services/supervisor.py
@@ -0,0 +1,411 @@
+#!/usr/bin/env python3
+"""
+Dolphin Service Supervisor
+==========================
+A SINGLE userland service that manages MULTIPLE service-like components.
+
+Architecture:
+- One systemd service: dolphin-supervisor.service
+- Internally manages: ExF, OB, Watchdog, MC, etc.
+- Each component is a Python thread/async task
+- Centralized health, logging, restart
+"""
+import asyncio
+import threading
+import signal
+import sys
+import time
+import json
+import traceback
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Callable
+from datetime import datetime
+from concurrent.futures import ThreadPoolExecutor
+import logging
+
+# Optional systemd notify
+try:
+    from pystemd.daemon import notify, Notification
+    SYSTEMD_AVAILABLE = True
+except ImportError:
+    SYSTEMD_AVAILABLE = False
+    def notify(*args, **kwargs):
+        pass
+
+# Optional tenacity for retries
+try:
+    from tenacity import retry, stop_after_attempt, wait_exponential
+    TENACITY_AVAILABLE = True
+except ImportError:
+    TENACITY_AVAILABLE = False
+
+
+# =============================================================================
+# STRUCTURED LOGGING
+# =============================================================================
+
+class JSONFormatter(logging.Formatter):
+    def format(self, record):
+        log_data = {
+            'timestamp': datetime.utcnow().isoformat(),
+            'level': record.levelname,
+            'component': getattr(record, 'component', 'supervisor'),
+            'message': record.getMessage(),
+            'source': record.name,
+        }
+        if hasattr(record, 'extra_data'):
+            log_data.update(record.extra_data)
+        return json.dumps(log_data)
+
+def get_logger(name: str) -> logging.Logger:
+    logger = logging.getLogger(name)
+    if not logger.handlers:
+        handler = logging.StreamHandler()
+        handler.setFormatter(JSONFormatter())
+        logger.addHandler(handler)
+        logger.setLevel(logging.INFO)
+    return logger
+
+
+# =============================================================================
+# COMPONENT BASE CLASS
+# =============================================================================
+
+@dataclass
+class ComponentHealth:
+    name: str
+    status: str  # 'healthy', 'degraded', 'failed', 'stopped'
+    last_run: float
+    error_count: int
+    message: str
+    uptime: float = 0.0
+
+
+class ServiceComponent(ABC):
+    """
+    Base class for a service-like component.
+    Runs in its own thread, managed by the supervisor.
+    """
+    
+    def __init__(self, name: str, interval: float = 1.0, max_retries: int = 3):
+        self.name = name
+        self.interval = interval
+        self.max_retries = max_retries
+        self.logger = get_logger(f'component.{name}')
+        self.logger.component = name
+        
+        self._running = False
+        self._thread: Optional[threading.Thread] = None
+        self._error_count = 0
+        self._last_run = 0
+        self._start_time = 0
+        self._health = ComponentHealth(
+            name=name, status='stopped',
+            last_run=0, error_count=0, message='Not started'
+        )
+        
+    @abstractmethod
+    def run_cycle(self):
+        """Override this with your component's work"""
+        pass
+    
+    def health_check(self) -> bool:
+        """Override for custom health check"""
+        return True
+    
+    def _execute_with_retry(self):
+        """Execute run_cycle with retry logic"""
+        for attempt in range(self.max_retries):
+            try:
+                self.run_cycle()
+                self._error_count = 0
+                self._last_run = time.time()
+                return
+            except Exception as e:
+                self._error_count += 1
+                self.logger.error(
+                    f"Cycle failed (attempt {attempt + 1}): {e}",
+                    extra={'extra_data': {'attempt': attempt + 1, 'error': str(e)}}
+                )
+                if attempt < self.max_retries - 1:
+                    time.sleep(min(2 ** attempt, 30))  # Exponential backoff
+                else:
+                    raise
+    
+    def _loop(self):
+        """Main component loop (runs in thread)"""
+        self._running = True
+        self._start_time = time.time()
+        self.logger.info(f"{self.name}: Component started")
+        
+        while self._running:
+            try:
+                self._execute_with_retry()
+                self._health.status = 'healthy'
+                self._health.message = 'Running normally'
+            except Exception as e:
+                self._health.status = 'failed'
+                self._health.message = f'Failed: {str(e)[:100]}'
+                self.logger.error(f"{self.name}: Component failed: {e}")
+                # Continue running (supervisor will restart if needed)
+            
+            # Sleep until next cycle
+            time.sleep(self.interval)
+        
+        self._health.status = 'stopped'
+        self.logger.info(f"{self.name}: Component stopped")
+    
+    def start(self):
+        """Start the component in a new thread"""
+        if self._thread and self._thread.is_alive():
+            self.logger.warning(f"{self.name}: Already running")
+            return
+        
+        self._thread = threading.Thread(target=self._loop, name=f"component-{self.name}")
+        self._thread.daemon = True
+        self._thread.start()
+        self.logger.info(f"{self.name}: Thread started")
+    
+    def stop(self, timeout: float = 5.0):
+        """Stop the component gracefully"""
+        self._running = False
+        if self._thread and self._thread.is_alive():
+            self._thread.join(timeout=timeout)
+            if self._thread.is_alive():
+                self.logger.warning(f"{self.name}: Thread did not stop gracefully")
+    
+    def get_health(self) -> ComponentHealth:
+        """Get current health status"""
+        self._health.last_run = self._last_run
+        self._health.error_count = self._error_count
+        if self._start_time:
+            self._health.uptime = time.time() - self._start_time
+        return self._health
+
+
+# =============================================================================
+# SUPERVISOR (SINGLE SERVICE)
+# =============================================================================
+
+class DolphinSupervisor:
+    """
+    SINGLE service that manages MULTIPLE userland components.
+    
+    Usage:
+        supervisor = DolphinSupervisor()
+        supervisor.register(ExFComponent())
+        supervisor.register(OBComponent())
+        supervisor.register(WatchdogComponent())
+        supervisor.run()
+    """
+    
+    def __init__(self, health_check_interval: float = 10.0):
+        self.logger = get_logger('supervisor')
+        self.logger.component = 'supervisor'
+        
+        self.components: Dict[str, ServiceComponent] = {}
+        self._running = False
+        self._shutdown_event = threading.Event()
+        self._health_check_interval = health_check_interval
+        self._supervisor_thread: Optional[threading.Thread] = None
+        
+        # Signal handling
+        self._setup_signals()
+    
+    def _setup_signals(self):
+        """Setup graceful shutdown"""
+        def handler(signum, frame):
+            self.logger.info(f"Received signal {signum}, shutting down...")
+            self._shutdown_event.set()
+        
+        signal.signal(signal.SIGTERM, handler)
+        signal.signal(signal.SIGINT, handler)
+    
+    def register(self, component: ServiceComponent):
+        """Register a component to be managed"""
+        self.components[component.name] = component
+        self.logger.info(f"Registered component: {component.name}")
+    
+    def start_all(self):
+        """Start all registered components"""
+        self.logger.info(f"Starting {len(self.components)} components...")
+        for name, component in self.components.items():
+            try:
+                component.start()
+            except Exception as e:
+                self.logger.error(f"Failed to start {name}: {e}")
+        
+        # Notify systemd we're ready
+        if SYSTEMD_AVAILABLE:
+            notify(Notification.READY)
+            self.logger.info("Notified systemd: READY")
+    
+    def stop_all(self, timeout: float = 5.0):
+        """Stop all components gracefully"""
+        self.logger.info("Stopping all components...")
+        for name, component in self.components.items():
+            try:
+                component.stop(timeout=timeout)
+            except Exception as e:
+                self.logger.error(f"Error stopping {name}: {e}")
+    
+    def _supervisor_loop(self):
+        """Main supervisor loop - monitors components"""
+        self.logger.info("Supervisor monitoring started")
+        
+        while not self._shutdown_event.is_set():
+            # Check health of all components
+            health_report = {}
+            for name, component in self.components.items():
+                health = component.get_health()
+                health_report[name] = {
+                    'status': health.status,
+                    'uptime': health.uptime,
+                    'errors': health.error_count,
+                    'message': health.message
+                }
+                
+                # Restart failed components
+                if health.status == 'failed' and component._running:
+                    self.logger.warning(f"{name}: Restarting failed component...")
+                    component.stop(timeout=2.0)
+                    time.sleep(1)
+                    component.start()
+            
+            # Log health summary
+            failed = sum(1 for h in health_report.values() if h['status'] == 'failed')
+            if failed > 0:
+                self.logger.error(f"Health check: {failed} components failed", 
+                                extra={'extra_data': health_report})
+            else:
+                self.logger.debug("Health check: all components healthy",
+                                extra={'extra_data': health_report})
+            
+            # Notify systemd watchdog
+            if SYSTEMD_AVAILABLE:
+                notify(Notification.WATCHDOG)
+            
+            # Wait for next check
+            self._shutdown_event.wait(self._health_check_interval)
+        
+        self.logger.info("Supervisor monitoring stopped")
+    
+    def get_status(self) -> Dict:
+        """Get full status of supervisor and components"""
+        return {
+            'supervisor': {
+                'running': self._running,
+                'components_count': len(self.components)
+            },
+            'components': {
+                name: {
+                    'status': comp.get_health().status,
+                    'uptime': comp.get_health().uptime,
+                    'errors': comp.get_health().error_count,
+                    'message': comp.get_health().message
+                }
+                for name, comp in self.components.items()
+            }
+        }
+    
+    def run(self):
+        """Run the supervisor (blocking)"""
+        self.logger.info("=" * 60)
+        self.logger.info("Dolphin Service Supervisor Starting")
+        self.logger.info("=" * 60)
+        
+        self._running = True
+        
+        # Start all components
+        self.start_all()
+        
+        # Start supervisor monitoring thread
+        self._supervisor_thread = threading.Thread(
+            target=self._supervisor_loop, 
+            name="supervisor-monitor"
+        )
+        self._supervisor_thread.start()
+        
+        # Wait for shutdown signal
+        try:
+            while not self._shutdown_event.is_set():
+                self._shutdown_event.wait(1)
+        except KeyboardInterrupt:
+            pass
+        finally:
+            self._running = False
+            self.stop_all()
+            if self._supervisor_thread:
+                self._supervisor_thread.join(timeout=5.0)
+        
+        self.logger.info("Supervisor shutdown complete")
+
+
+# =============================================================================
+# EXAMPLE COMPONENTS
+# =============================================================================
+
+class ExFComponent(ServiceComponent):
+    """External Factors - 0.5s aggressive oversampling"""
+    def __init__(self):
+        super().__init__(name='exf', interval=0.5, max_retries=3)
+        self.indicators = {}
+    
+    def run_cycle(self):
+        # Simulate fetching indicators
+        self.indicators['basis'] = {'value': 0.01, 'timestamp': time.time()}
+        self.indicators['spread'] = {'value': 0.02, 'timestamp': time.time()}
+        # In real implementation: fetch from APIs, push to Hazelcast
+
+
+class OBComponent(ServiceComponent):
+    """Order Book Streamer - 500ms"""
+    def __init__(self):
+        super().__init__(name='ob', interval=0.5, max_retries=3)
+    
+    def run_cycle(self):
+        # Simulate OB snapshot
+        pass
+
+
+class WatchdogComponent(ServiceComponent):
+    """Survival Stack Watchdog - 10s"""
+    def __init__(self):
+        super().__init__(name='watchdog', interval=10.0, max_retries=5)
+        self.posture = 'APEX'
+    
+    def run_cycle(self):
+        # Check categories, compute posture
+        pass
+
+
+class MCComponent(ServiceComponent):
+    """MC-Forewarner - 4h (but we check every 5s if it's time)"""
+    def __init__(self):
+        super().__init__(name='mc', interval=300, max_retries=3)  # 5 min check
+        self.last_run = 0
+    
+    def run_cycle(self):
+        # Only actually run every 4 hours
+        if time.time() - self.last_run > 14400:  # 4 hours
+            self.logger.info("Running MC-Forewarner assessment")
+            self.last_run = time.time()
+
+
+# =============================================================================
+# MAIN ENTRY POINT
+# =============================================================================
+
+if __name__ == '__main__':
+    # Create supervisor
+    supervisor = DolphinSupervisor(health_check_interval=10.0)
+    
+    # Register components
+    supervisor.register(ExFComponent())
+    supervisor.register(OBComponent())
+    supervisor.register(WatchdogComponent())
+    supervisor.register(MCComponent())
+    
+    # Run
+    supervisor.run()
--- a/prod/services/test_service.py
+++ b/prod/services/test_service.py
@@ -0,0 +1,27 @@
+#!/usr/bin/env python3
+"""Test service to validate the setup"""
+import asyncio
+import sys
+sys.path.insert(0, '/mnt/dolphinng5_predict/prod')
+
+from services.service_base import ServiceBase
+
+class TestService(ServiceBase):
+    def __init__(self):
+        super().__init__(
+            name='test',
+            check_interval=5,
+            max_retries=3,
+            notify_systemd=True
+        )
+        self.counter = 0
+        
+    async def run_cycle(self):
+        self.counter += 1
+        self.logger.info(f"Test cycle {self.counter}")
+        await asyncio.sleep(2)
+
+if __name__ == '__main__':
+    print("Starting test service...")
+    service = TestService()
+    service.run()