initial: import DOLPHIN baseline 2026-04-21 from dolphinng5_predict working tree
Includes core prod + GREEN/BLUE subsystems: - prod/ (BLUE harness, configs, scripts, docs) - nautilus_dolphin/ (GREEN Nautilus-native impl + dvae/ preserved) - adaptive_exit/ (AEM engine + models/bucket_assignments.pkl) - Observability/ (EsoF advisor, TUI, dashboards) - external_factors/ (EsoF producer) - mc_forewarning_qlabs_fork/ (MC regime/envelope) Excludes runtime caches, logs, backups, and reproducible artifacts per .gitignore.
This commit is contained in:
191
prod/docs/SCAN_BRIDGE_PHASE2_COMPLETE.md
Executable file
191
prod/docs/SCAN_BRIDGE_PHASE2_COMPLETE.md
Executable file
@@ -0,0 +1,191 @@
|
||||
# Scan Bridge Phase 2 Implementation - COMPLETE
|
||||
|
||||
**Date:** 2026-03-24
|
||||
**Phase:** 2 - Prefect Integration
|
||||
**Status:** ✅ IMPLEMENTATION COMPLETE
|
||||
|
||||
---
|
||||
|
||||
## Deliverables Created
|
||||
|
||||
| File | Purpose | Lines |
|
||||
|------|---------|-------|
|
||||
| `scan_bridge_prefect_daemon.py` | Prefect-managed daemon with health monitoring | 397 |
|
||||
| `scan_bridge_deploy.py` | Deployment and management script | 152 |
|
||||
| `prefect.yaml` | Prefect deployment configuration | 65 |
|
||||
| `SCAN_BRIDGE_PHASE2_COMPLETE.md` | This completion document | - |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ PREFECT ORCHESTRATION │
|
||||
│ (localhost:4200) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
|
||||
│ │ Health Check Task │────▶│ scan-bridge-daemon Flow │ │
|
||||
│ │ (every 30s) │ │ (long-running) │ │
|
||||
│ └─────────────────────┘ └─────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ │ manages │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ Scan Bridge Subprocess │ │
|
||||
│ │ (scan_bridge_service.py) │ │
|
||||
│ │ │ │
|
||||
│ │ • Watches Arrow files │ │
|
||||
│ │ • Pushes to Hazelcast │ │
|
||||
│ │ • Logs forwarded to Prefect │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
└─────────────────────────────────────────┼───────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Hazelcast │
|
||||
│ (DOLPHIN_FEATURES) │
|
||||
│ latest_eigen_scan │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. Automatic Restart
|
||||
- Restarts bridge on crash
|
||||
- Max 3 restart attempts
|
||||
- 5-second delay between attempts
|
||||
|
||||
### 2. Health Monitoring
|
||||
```python
|
||||
HEALTH_CHECK_INTERVAL = 30 # seconds
|
||||
DATA_STALE_THRESHOLD = 60 # Critical - triggers restart
|
||||
DATA_WARNING_THRESHOLD = 30 # Warning only
|
||||
```
|
||||
|
||||
### 3. Centralized Logging
|
||||
All bridge output appears in Prefect UI:
|
||||
```
|
||||
[Bridge] [OK] Pushed 200 scans. Latest: #4228
|
||||
[Bridge] Connected to Hazelcast
|
||||
```
|
||||
|
||||
### 4. Hazelcast Integration
|
||||
Checks data freshness:
|
||||
- Verifies `latest_eigen_scan` exists
|
||||
- Monitors data age
|
||||
- Alerts on staleness
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Deploy to Prefect
|
||||
```bash
|
||||
cd /mnt/dolphinng5_predict/prod
|
||||
source /home/dolphin/siloqy_env/bin/activate
|
||||
|
||||
# Create deployment
|
||||
python scan_bridge_deploy.py create
|
||||
|
||||
# Or manually:
|
||||
prefect deployment build scan_bridge_prefect_daemon.py:scan_bridge_daemon_flow \
|
||||
--name scan-bridge-daemon --pool dolphin-daemon-pool
|
||||
prefect deployment apply scan-bridge-daemon-deployment.yaml
|
||||
```
|
||||
|
||||
### Start Worker
|
||||
```bash
|
||||
python scan_bridge_deploy.py start
|
||||
# Or:
|
||||
prefect worker start --pool dolphin-daemon-pool
|
||||
```
|
||||
|
||||
### Check Status
|
||||
```bash
|
||||
python scan_bridge_deploy.py status
|
||||
python scan_bridge_deploy.py health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Health Check States
|
||||
|
||||
| Status | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| ✅ Healthy | Data age < 30s | Continue monitoring |
|
||||
| ⚠️ Warning | Data age 30-60s | Log warning |
|
||||
| ❌ Stale | Data age > 60s | Restart bridge |
|
||||
| ❌ Down | Process not running | Restart bridge |
|
||||
| ❌ Error | Hazelcast unavailable | Alert, retry |
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Metrics
|
||||
|
||||
The daemon tracks:
|
||||
- Process uptime
|
||||
- Data freshness (seconds)
|
||||
- Scan number progression
|
||||
- Asset count
|
||||
- Restart count
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `SYSTEM_BIBLE.md` - Updated v4 with Prefect daemon info
|
||||
|
||||
---
|
||||
|
||||
## Next Steps (Phase 3)
|
||||
|
||||
1. **Deploy to production**
|
||||
```bash
|
||||
python scan_bridge_deploy.py create
|
||||
prefect worker start --pool dolphin-daemon-pool
|
||||
```
|
||||
|
||||
2. **Configure alerting**
|
||||
- Add Slack/Discord webhooks
|
||||
- Set up PagerDuty for critical alerts
|
||||
|
||||
3. **Dashboard**
|
||||
- Create Prefect dashboard
|
||||
- Monitor health over time
|
||||
|
||||
4. **Integration with main flows**
|
||||
- Ensure `paper_trade_flow` waits for bridge
|
||||
- Add dependency checks
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# Test health check
|
||||
python -c "
|
||||
from scan_bridge_prefect_daemon import check_hazelcast_data_freshness
|
||||
result = check_hazelcast_data_freshness()
|
||||
print(f\"Status: {result}\")
|
||||
"
|
||||
|
||||
# Run standalone health check
|
||||
python scan_bridge_prefect_daemon.py
|
||||
# Then: Ctrl+C to stop
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Phase 2 Status:** ✅ COMPLETE
|
||||
**Ready for:** Production deployment
|
||||
**Next Review:** After 7 days of production running
|
||||
|
||||
---
|
||||
|
||||
*Document: SCAN_BRIDGE_PHASE2_COMPLETE.md*
|
||||
*Version: 1.0*
|
||||
*Date: 2026-03-24*
|
||||
Reference in New Issue
Block a user