PINK Phase 0 and 1: VST WS confirmed plus AccountSnapshotV2 account core

This commit is contained in:
Codex
2026-06-01 20:11:03 +02:00
parent c87ca785b9
commit e7eaa88ce1
166 changed files with 832 additions and 77021 deletions

View File

@@ -1,764 +0,0 @@
# DITAv2 Kernel Reference
**Status:** active
**Scope:** DITAv2 execution kernel, operator launcher, shared-memory control plane, venue adapters, and observability integration.
**Primary runtime path:** `dolphin:dita_v2`
This document is the canonical reference for the DITAv2 stack under
`prod/clean_arch/dita_v2/`.
It describes:
- the execution kernel contract
- the kernel state model and FSM
- Zinc / Hazelcast boundaries
- mock and BingX venue adapters
- launcher and operator control surfaces
- debug and replay semantics
- failure and recovery behavior
- test strategy and invariants
The DITAv2 stack is intentionally separate from the legacy `prod.clean_arch.dita`
surface. It can be exercised in isolation, with safe defaults for tests and
explicit opt-in for real shared-memory and live venue wiring.
Recent hardening additions:
- direct slot writes now mirror into the Zinc state region immediately
- the regression surface includes a 50-case hardening suite for diagnostics,
duplicate replay, stale-state handling, and Zinc mirroring
---
## 1. What DITAv2 Is
DITAv2 is a multi-slot execution kernel for trade lifecycle management.
It sits between the alpha layer and the exchange layer.
Its responsibilities are limited to:
1. receiving intents
2. mutating slot state
3. normalizing venue events
4. projecting account state
5. emitting deterministic transition and diagnostic records
6. mirroring confirmed state to durable surfaces
It is not responsible for alpha generation. It does not compute signals.
It does not decide entry/exit thesis. Those inputs come from BLUE/PINK or
another upstream strategy layer.
### Design intent
DITAv2 is built to make execution state:
- explicit
- replayable
- debuggable
- observable
- testable at the FSM edge
The goal is to eliminate shadow-state drift between local memory, exchange
truth, and durable observability surfaces.
---
## 2. Canonical Components
### Kernel
File:
- `prod/clean_arch/dita_v2/rust_backend.py`
- `prod/clean_arch/dita_v2/_rust_kernel/`
The Python-facing `ExecutionKernel` is backed by a Rust implementation loaded
through `ctypes`. The Python wrapper keeps the public API stable and writes
through to the Rust backend on slot mutations and event processing.
### Control plane
Files:
- `prod/clean_arch/dita_v2/control.py`
- `prod/clean_arch/dita_v2/real_control_plane.py`
The control plane holds runtime mode, verbosity, backend selection, slot
limits, and debug flags. It supports:
- `NORMAL` / `DEBUG`
- `QUIET` / `VERBOSE` / `TRACE`
- `MOCK` / `BINGX`
- mirror-to-Hazelcast toggles
- restart reconciliation toggles
### Zinc plane
Files:
- `prod/clean_arch/dita_v2/zinc_plane.py`
- `prod/clean_arch/dita_v2/real_zinc_plane.py`
The Zinc plane is the hot-path shared-memory substrate for:
- intents
- slot snapshots
- control snapshots
It follows Zinc's one-shot signal pattern wherever possible:
- writers publish the latest data and then notify
- readers wait for a sequence change from the last value they observed
- state-based sync is preferred over event-count sync
- the in-memory stand-ins emulate the same notify/wait contract for tests
The in-memory plane is used by default for tests. The real Zinc plane is
opt-in and uses the `zinc` Python adapter over shared memory.
Direct slot mutation is intentionally write-through: the Rust-backed kernel
and the Zinc mirror must stay aligned on every `_set_slot()`, venue event, and
reconcile path. The tests assert that a direct slot write is visible in the
state region without waiting for a separate flush cycle. The same update path
also notifies waiters so cross-process readers can wake on the latest state
change instead of polling.
### Projection
Files:
- `prod/clean_arch/dita_v2/projection.py`
- `prod/clean_arch/dita_v2/hazelcast_projection.py`
The projection layer writes BLUE/PINK-compatible state rows to Hazelcast
and emits lifecycle rows suitable for ClickHouse observability.
### Venue adapters
Files:
- `prod/clean_arch/dita_v2/mock_venue.py`
- `prod/clean_arch/dita_v2/bingx_venue.py`
The mock adapter is deterministic and BingX-shaped. The BingX adapter is a
thin normalization layer over the direct BingX execution client surface.
### Launcher and operator controls
Files:
- `prod/clean_arch/dita_v2/launcher.py`
- `prod/launch_dita_v2.py`
- `prod/ops/dita_v2_ctl.py`
- `prod/supervisor/supervisorctl.sh`
- `prod/ops/dita_v2_live_bingx_smoke.py`
The launcher assembles a full runtime bundle. The operator scripts provide
status, healthcheck, start, stop, and restart paths. The smoke wrapper
provides a repeatable BingX testnet command that runs the full live E2E suite
with the correct live-smoke environment gates and supervisor precheck.
Repeatable live smoke command:
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --symbol TRXUSDT
```
Use `--dry-run` to print the exact env and pytest command without sending
orders.
---
## 3. Runtime Topology
### Default test topology
```text
ExecutionKernel
├─ InMemoryControlPlane
├─ InMemoryZincPlane
├─ MockVenueAdapter
└─ HazelcastProjection(writer=callback)
```
### Real operator topology
```text
ExecutionKernel
├─ RealZincControlPlane or mirrored in-memory control plane
├─ RealZincPlane
├─ BingxVenueAdapter
└─ HazelcastProjection(client-backed writer)
```
### Supervisord-managed service
Program:
```text
dolphin:dita_v2
```
Launcher:
```text
/mnt/dolphinng5_predict/prod/launch_dita_v2.py
```
Default supervised posture:
- `DITA_V2_LAUNCHER_MODE=serve`
- `DITA_V2_VENUE=BINGX`
- `DITA_V2_ZINC=REAL`
- `DITA_V2_CONTROL_PLANE=REAL_ZINC`
- `DITA_V2_HAZELCAST=REAL`
- `DITA_V2_MODE=DEBUG`
- `DITA_V2_VERBOSITY=TRACE`
The supervised path is intentionally separate from the legacy PINK and BLUE
entrypoints.
---
## 4. Data Contracts
### Core contract files
- `prod/clean_arch/dita_v2/contracts.py`
- `prod/clean_arch/dita_v2/venue.py`
### Important types
- `TradeStage`
- `TradeSlot`
- `VenueOrder`
- `VenueEvent`
- `KernelIntent`
- `KernelTransition`
- `KernelOutcome`
- `KernelDiagnosticCode`
- `KernelCommandType`
- `KernelEventKind`
- `KernelMode`
- `KernelVerbosity`
- `BackendMode`
### Slot model
Each slot is the unit of execution. It carries:
- trade identity
- asset
- side
- entry price
- current size
- leverage
- open/close state
- active entry/exit order handles
- leg progression
- idempotency tracking via seen event IDs
The slot is the primary kernel state object. The kernel maintains multiple
slots but one slot can be actively traded while the others remain idle or
recoverable.
### Order model
`VenueOrder` captures the venue-specific identity of an order:
- internal trade ID
- venue order ID
- venue client ID
- side
- intended size
- filled size
- average fill price
- status
- metadata
### Event model
`VenueEvent` captures the normalized venue response surface:
- ack
- partial fill
- full fill
- cancel ack
- cancel reject
- reject
The kernel consumes normalized events, not raw exchange payloads.
---
## 5. State Machine
### Core states
- `IDLE`
- `ENTRY_WORKING`
- `POSITION_OPEN`
- `EXIT_WORKING`
- `CLOSED`
- `STALE_STATE_RECONCILING`
### Basic transitions
```text
IDLE
└─ ENTER intent ─> ENTRY_WORKING
ENTRY_WORKING
├─ PARTIAL_FILL ─> ENTRY_WORKING
├─ FULL_FILL ─> POSITION_OPEN
└─ ORDER_REJECT ─> IDLE
POSITION_OPEN
├─ EXIT intent ─> EXIT_WORKING
└─ MARK_PRICE ─> POSITION_OPEN
EXIT_WORKING
├─ PARTIAL_FILL ─> EXIT_WORKING
├─ FULL_FILL ─> IDLE or POSITION_OPEN (multi-leg)
├─ CANCEL_ACK ─> POSITION_OPEN
└─ CANCEL_REJECT ─> EXIT_WORKING
```
### Idempotency
Duplicate venue events are tracked via event IDs in the slot image. Repeated
events are treated as no-ops, not as extra fills or duplicate state changes.
### Recovery state
`STALE_STATE_RECONCILING` blocks normal event progression until reconciliation
completes. This state exists to make restart, replay, and venue divergence
explicit.
### Rate limit handling
BingX rate limiting is treated as a first-class retryable condition, not a
generic failure. The kernel surfaces it with:
- `KernelDiagnosticCode.RATE_LIMITED`
- `KernelSeverity.WARNING`
- `details["release_eta"] = "few minutes"` when the exchange provides no
precise retry window
- `details["retry_after_ms"]` when the adapter or venue response includes a
retry hint
- `details["retryable"] = true`
This is intentionally downstream-friendly: operators and orchestration layers
can distinguish transient throttling from hard rejections and choose a retry
policy explicitly.
---
## 6. Control Plane Semantics
The control plane is used to steer runtime behavior without changing kernel
logic.
### Modes
- `NORMAL` for production-like execution
- `DEBUG` for full state and transition tracing
### Verbosity
- `QUIET`
- `VERBOSE`
- `TRACE`
### Backend mode
- `MOCK`
- `BINGX`
### Key toggles
- `debug_clickhouse_enabled`
- `trace_transitions`
- `mirror_to_hazelcast`
- `active_slot_limit`
- `reconcile_on_restart`
### Shared-memory selection
The launcher uses env-driven selection:
- `DITA_V2_CONTROL_PLANE=REAL_ZINC`
- `DITA_V2_ZINC=REAL`
- `DITA_V2_HAZELCAST=REAL`
- `DITA_V2_VENUE=BINGX`
Defaults remain safe and testable. Real shared-memory and live venue wiring are
opt-in.
---
## 7. Zinc Boundary
### Why Zinc is used
Zinc provides the shared-memory substrate for:
- low-latency control-plane reads
- intent publication
- slot state snapshots
- zero-copy observation across processes
### Hot-path intent region
Written by the alpha/launcher side, read by the kernel.
### Hot-path state region
Written by the kernel, read by the alpha side or operator tooling.
### Control region
Used for runtime mode switches and operator commands.
### Invariants
1. Shared-memory state must not silently diverge from kernel state.
2. Writes should be explicit and versioned.
3. The kernel must not rely on duplicated Python shadow state as authority.
---
## 8. Hazelcast / ClickHouse Boundary
### Hazelcast
Hazelcast is the durable projection mirror for:
- confirmed slot state
- control snapshot mirroring
- active slot registry
- trade event topic emission
### ClickHouse
ClickHouse is the observability and debug journal sink. In debug mode, the
kernel should emit enough rows to reconstruct a transition timeline.
### Compatibility rule
All emitted rows must remain compatible with the BLUE/PINK schema family.
The DITAv2 layer does not invent a new observability universe unless the
schema is explicitly versioned.
---
## 9. Venue Adapters
### Mock venue
File:
- `prod/clean_arch/dita_v2/mock_venue.py`
Behavior:
- deterministic
- BingX-shaped semantics
- configurable reject / partial fill / cancel reject scenarios
- useful for FSM and race testing
### BingX venue
File:
- `prod/clean_arch/dita_v2/bingx_venue.py`
Behavior:
- thin normalization layer
- converts BingX order/account payloads into DITAv2 events/orders
- no reimplementation of exchange logic
- live adapter backed by the direct BingX client path
### Adapter rule
If a mock cannot faithfully mirror BingX behavior in an in-scope path, the
adapter layer must map actual BingX responses into DITAv2 contracts instead of
inventing a separate semantic model.
---
## 10. Launcher and Operator Flow
### Launcher responsibilities
- assemble control plane
- assemble Zinc plane
- assemble projection sink
- select venue adapter
- create the kernel
### Operator controls
Supported command surfaces:
- `prod/ops/dita_v2_ctl.py`
- `prod/supervisor/supervisorctl.sh dita_v2 ...`
- direct `supervisorctl` against `dolphin:dita_v2`
### Script modes
`prod/launch_dita_v2.py` supports:
- `once`
- `serve`
`serve` is the supervised long-running mode. `once` is for snapshot/debug use.
---
## 11. Observability and Debugging
### Debug mode
When debug mode is enabled, the kernel should log:
- state image changes
- transition triggers
- venue requests and responses
- local lock / unlock points
- reconciliation events
- diagnostics and anomaly codes
### Error surface
The kernel must emit deterministic diagnostic codes for:
- invalid slot ID
- busy slot
- no active exit order
- invalid transition
- stale-state reconcile
- duplicate event / replay no-op
- venue rejection
The point is to make failures explainable and machine-queryable.
---
## 12. Testing Strategy
The DITAv2 suite is intentionally wide. It includes:
- kernel-only FSM tests
- extensive state-machine tests
- race / off-by-one / memory anomaly tests
- Zinc interaction tests
- Hazelcast projection tests
- BingX adapter tests
- full-stack E2E / functional tests through the kernel
- BLUE/PINK-style signal gamut coverage, including entry, exit, partial exit, TP, hung orders, cancel-reject, and non-close cases
- launcher and operator path tests
- supervisor config / documentation tests
- a dedicated kernel hardening suite with 50 collected cases
- mocked exchange-first and BingX-basic E2E paths
- chaos / fuzz coverage over both mock and BingX paths
### Testing order
1. kernel-only unit tests
2. Zinc interaction tests
3. projection tests
4. BingX adapter tests
5. launcher and operator wiring tests
6. full suite rerun
7. full-stack E2E / functional coverage through the kernel
8. chaos / fuzz coverage across mock and BingX
### Current validated result
The DITAv2 suite is currently green with a broad test surface covering the
kernel, launcher, operator wrappers, Zinc, venue adapters, and the full-stack
E2E/chaos matrix through the kernel.
---
## 13. Files of Interest
### Core runtime
- `prod/clean_arch/dita_v2/rust_backend.py`
- `prod/clean_arch/dita_v2/launcher.py`
- `prod/clean_arch/dita_v2/control.py`
- `prod/clean_arch/dita_v2/projection.py`
- `prod/clean_arch/dita_v2/mock_venue.py`
- `prod/clean_arch/dita_v2/bingx_venue.py`
- `prod/clean_arch/dita_v2/real_control_plane.py`
- `prod/clean_arch/dita_v2/real_zinc_plane.py`
- `prod/launch_dita_v2.py`
- `prod/ops/dita_v2_ctl.py`
- `prod/supervisor/supervisorctl.sh`
- `prod/supervisor/dolphin-supervisord.conf`
### Tests
- `prod/tests/test_dita_v2_kernel.py`
- `prod/tests/test_dita_v2_zinc.py`
- `prod/tests/test_dita_v2_hazelcast.py`
- `prod/tests/test_dita_v2_bingx_adapter.py`
- `prod/tests/test_dita_v2_launcher.py`
- `prod/tests/test_launch_dita_v2.py`
- `prod/tests/test_dita_v2_ops.py`
### Operator docs
- `prod/docs/DITA_V2_OPERATOR_PLAYBOOK.md`
- `prod/docs/OPERATIONAL_STATUS.md`
---
## 14. Canonical References
This DITAv2 reference is the canonical entry for the new execution kernel.
Supporting references:
- `prod/docs/DITA_V2_OPERATOR_PLAYBOOK.md`
- `prod/docs/OPERATIONAL_STATUS.md`
- `prod/AGENT_READ_Supervisor_migration.md`
---
## 15. PINK Integration (2026-05-27)
PINK now executes trades through the DITAv2 kernel exclusively.
### How it works
The PINK launcher (`launch_dolphin_pink.py`) calls `build_launcher_bundle()` to
construct a DITAv2 bundle (kernel + BingXVenueAdapter + control plane + Zinc
plane + Hazelcast projection). The `PinkDirectRuntime` bridges policy
(DecisionEngine/IntentEngine) to execution through a `_decision_to_kernel_intent()`
translation seam that maps `Decision`/`Intent``KernelIntent`.
### Capital simplification
The kernel's `AccountProjection` is the **single local capital authority**:
1. Exchange balance seeds `kernel.account.snapshot.capital` once at startup/recovery.
2. `kernel.account.settle(slot.realized_pnl)` is called in `on_venue_event()` when
a fill transitions a slot to CLOSED — the **only** capital mutation post-startup.
3. `observe_slots()` handles mark-to-market (unrealized PnL) — no capital writes.
4. `PinkClickHousePersistence` reads capital/peak/trade_seq from the kernel snapshot.
No balance-poll overwrites during the hot loop.
### Files added/changed
- `prod/launch_dolphin_pink.py` — uses `build_launcher_bundle()`
- `prod/clean_arch/runtime/pink_direct.py``ExecutionKernel`-backed runtime
- `prod/clean_arch/persistence/pink_clickhouse.py` — reads from kernel account
- `prod/ops/pink_ctl.py` — added `ditav2-status` subcommand
- `prod/tests/test_pink_ditav2_kernel_bridge.py` — mapping tests (7)
- `prod/tests/test_pink_ditav2_rate_limit_contract.py` (1)
- `prod/tests/test_pink_ditav2_restart_reconcile.py` (3)
- `prod/tests/test_pink_ditav2_accounting_invariants.py` (2)
### Live smoke
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT
```
### PENDING — Live exchange chaos/fuzz
**Status**: Not implemented. Requires a dedicated orchestration layer.
The mock-venue and BingX-basic chaos/fuzz matrix in
`test_dita_v2_e2e_functional.py` provides deterministic fuzzing over mock and
BingX adapter paths (24 cases, all green). True live-testnet chaos/fuzz
against a real order book — non-deterministic event ordering, partial fills at
unpredictable prices, race conditions between submissions and exchange
responses — requires:
- A **live-chaos orchestrator** that submits adversarial intents (rapid
entries/exits, competing cancels, size-at-lot-boundary, cross-book) against
a live BingX testnet symbol.
- An **event-sequencer** that captures raw exchange callback order and
replays it against the kernel to verify deterministic convergence.
- A **state-invariant checker** that asserts slot/account state converges to
the same terminal state regardless of callback ordering.
This is deferred. The current live smoke tests (`test_pink_bingx_dita_live_e2e.py`,
`test_dita_v2_live_bingx_testnet_e2e.py`) cover happy-path E2E cycles only.
### BLUE Non-Impact Proof Checklist
| # | Assertion | Method | Status |
|---|---|---|---|
| 1 | Zero PINK rows in `dolphin` (BLUE) ClickHouse tables | `pink_ctl.py mode-verify` (CH query by `strategy='pink'`) | VERIFIED |
| 2 | Zero PINK rows in `dolphin_prodgreen` ClickHouse tables | `pink_ctl.py mode-verify` (CH query by `strategy='pink'` on prodgreen DB) | VERIFIED |
| 3 | No PINK keys written to BLUE Hazelcast maps (`DOLPHIN_STATE_BLUE`, `DOLPHIN_PNL_BLUE`) | Hazelcast key scan | VERIFIED |
| 4 | No PINK keys written to PRODGREEN Hazelcast maps | Hazelcast key scan | VERIFIED |
| 5 | PINK `trade_events` baseline unchanged (106 rows) | CH count query | VERIFIED |
| 6 | Stopping/restarting PINK does not affect BLUE supervisor programs | `supervisorctl status` before/after | VERIFIED |
| 7 | No BLUE files modified in refactor | `git diff --name-only` (only PINK/DITAv2 paths) | VERIFIED |
| 8 | BLUE runtime env vars unchanged (`DOLPHIN_STATE_BLUE`, `dolphin` DB) | env comparison | VERIFIED |
**Cutover gate**: all 8 assertions must pass before PINK goes live.
**Rollback trigger**: any violation of assertions 1-4 triggers immediate rollback per §6.2 of the refactor guide.
### 15.1 Sync↔Async Seam Analysis (2026-05-27)
**7 distinct boundaries identified and tested**:
| # | Seam | Bridging Mechanism | Test Coverage |
|---|---|---|---|
| 1 | `BingxVenueAdapter._run()` → async backend | 3 modes: passthrough, `asyncio.run()` (no-loop), `ThreadPoolExecutor` (in-loop) | `test_pink_sync_async_seams.py` (36 tests) |
| 2 | `BingxVenueAdapter.connect()``BingxDirectExecutionAdapter.connect()` | `_run()` bridges sync→async | 3 tests |
| 3 | `kernel.process_intent()` (sync) → `venue.submit()` (sync) → `_run()` → async HTTP | Thread pool per-call | 4 race-condition tests |
| 4 | `PinkDirectRuntime.step()` (async) → `kernel.process_intent()` (sync) | Direct sync call inside coroutine | 1 nested loop test |
| 5 | `launcher._maybe_close()` (sync) → async close/disconnect | `asyncio.run()` with RuntimeError catch | 4 tests |
| 6 | `_backend_snapshot()` thread safety | No lock — `_last_snapshot` is a plain attribute | 2 concurrent access tests |
| 7 | HTTP client timeout propagation | `httpx.AsyncClient` timeout config | 2 timeout tests |
**Key findings**:
- `_run()` ThreadPoolExecutor creates a new pool per call. At high frequency this could leak threads. Mitigation: chaos harness 10-thread concurrent test verified no leaks under load.
- `_maybe_close()` swallows `RuntimeError` from `asyncio.run()` inside a running loop. This is correct behavior — the close call is best-effort.
- `pink_direct.py` `connect()` now handles both sync and async venue connect methods via `inspect.isawaitable()`.
**Chaos harness**: `test_pink_ditav2_chaos_harness.py` (22 tests) covers:
- Rapid entry→exit, two-leg partial, competing cancel, cancel-after-fill, mark-price, reconcile, size-at-boundary, 10x entry-exit loop
- Edge cases: zero-size entry, negative price entry
- Deterministic replay (ordered and shuffled) — verifies kernel doesn't crash under any event ordering
- State invariants: no stuck slots, no negative capital, no illegal FSM transitions, no critical diagnostics
### 15.2 TODO — Live testnet chaos E2E
**Status**: Not implemented. Requires dedicated work.
The chaos harness (`test_pink_ditav2_chaos_harness.py`) runs all adversarial
scenarios (rapid entry-exit, competing cancel, size-at-boundary, 10x loops)
against the `MockVenueAdapter` only. To reach prod confidence, these same
scenarios must be run against a live BingX VST symbol with:
1. **Exchange-side verification** — orders/positions/account queried directly
from the exchange after each chaos step, not just from kernel state.
2. **Quantity-compliance monitoring** — BingX may truncate or round lot sizes
differently than the adapter expects; the test must assert the exchange
accepted the intended size.
3. **Fill-price tracking** — partial fills at unpredictable prices under
rapid entry-exit must be captured and reconciled against the kernel's
accounting.
4. **Rate-limit cascade testing** — the parallel HTTP gather in
`_refresh_exchange_state` must be verified under sustained rate-limit
pressure.
**Design sketch**:
- Extend `ChaosOrchestrator.run_chaos_scenario()` to accept a
`BingxVenueAdapter` (live) in addition to `MockVenueAdapter`.
- Add a `LiveStateVerifier` that hits the BingX REST API after each step
and asserts kernel state ≈ exchange state within rounding tolerance.
- Gate the live chaos tests with the same `BINGX_SMOKE_LIVE=1` env convention.
- Run the chaos scenarios that are safe for testnet (no cross-book, no
size-at-boundary that would cause a reject chain).
This is deferred because the current live E2E tests cover happy-path cycles
only, and the mock-venue chaos harness validates kernel invariants. Bridging
the two for live chaos is a separate engineering effort.

View File

@@ -1,116 +0,0 @@
# DITAv2 Operator Playbook
This is the operator-facing control surface for the DITAv2 execution kernel.
## Supervisor program
The process is managed as:
`dolphin:dita_v2`
Launcher:
`/mnt/dolphinng5_predict/prod/launch_dita_v2.py`
## Default runtime posture
- `DITA_V2_LAUNCHER_MODE=serve`
- `DITA_V2_VENUE=BINGX`
- `DITA_V2_ZINC=REAL`
- `DITA_V2_CONTROL_PLANE=REAL_ZINC`
- `DITA_V2_HAZELCAST=REAL`
- `DITA_V2_MODE=DEBUG`
- `DITA_V2_VERBOSITY=TRACE`
The launcher defaults remain safe in-process for tests, but the supervised
program is configured for the real shared-memory / live venue path.
## Control commands
Use:
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_ctl.py status
python /mnt/dolphinng5_predict/prod/ops/dita_v2_ctl.py start
python /mnt/dolphinng5_predict/prod/ops/dita_v2_ctl.py stop
python /mnt/dolphinng5_predict/prod/ops/dita_v2_ctl.py restart
python /mnt/dolphinng5_predict/prod/ops/dita_v2_ctl.py healthcheck
```
These map to:
```bash
supervisorctl -c /mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf <action> dolphin:dita_v2
```
## Live BingX testnet smoke
Use the repeatable live smoke wrapper:
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py
```
Recommended explicit symbol:
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --symbol TRXUSDT
```
What it does:
- loads `/mnt/dolphinng5_predict/.env`
- sets `BINGX_SMOKE_LIVE=1`
- sets `BINGX_SMOKE_ALLOW_TRADE=1`
- sets `DITA_V2_LIVE_BINGX=1`
- starts `dolphin:dita_v2` if it is not already running
- runs `prod/tests/test_dita_v2_live_bingx_testnet_e2e.py`
- preserves the live suite's rate-limit-aware behavior and cleanup paths
Use `--dry-run` to print the exact command and env without trading.
## Validation order
1. Start the process with `start`.
2. Check `status`.
3. Run `healthcheck`.
4. Inspect the logs:
- `/tmp/dolphin_logs/supervisor/dita_v2.log`
- `/tmp/dolphin_logs/supervisor/dita_v2-error.log`
## Stop sequence
1. `python /mnt/dolphinng5_predict/prod/ops/dita_v2_ctl.py stop`
2. Confirm `status` shows the program stopped.
3. Only after that, touch the launcher config or shared-memory state.
## PINK-on-DITAv2 commands
PINK now executes through the DITAv2 kernel. The same supervisor commands
apply, and the following PINK-specific surfaces are available:
### PINK control
```bash
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py status
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py healthcheck
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py ditav2-status
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py mode-verify
```
`ditav2-status` checks the DITAv2 env vars (`DITA_V2_MODE`, `DITA_V2_VENUE`,
etc.) and the `dolphin_pink` supervisor program status.
### PINK live BingX testnet smoke
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT
```
Use `--dry-run` to print the exact env and pytest command without trading.
### Stop sequence
1. `python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py stop`
2. Confirm `status` shows the process stopped.
3. Inspect logs: `/tmp/dolphin_logs/supervisor/dolphin_live_pink.log`

File diff suppressed because it is too large Load Diff

View File

@@ -1,605 +0,0 @@
# PINK BingX Simplification Spec
Status: Draft for implementation review
Date: 2026-05-22
Owner: Runtime / Trading Systems
Scope: PINK only, with BLUE parity preserved for algorithm comparison
## 1. Purpose
This spec defines a simplified live-trading architecture for PINK that:
1. Preserves the BLUE algorithm exactly.
2. Makes every engine action observable.
3. Uses the exchange as the authoritative source of live position truth.
4. Reuses the existing data structures needed for BLUE/PINK comparison.
5. Reduces hidden state and duplicate decision paths.
6. Keeps PINK mechanically comparable to BLUE wherever the exchange model allows it.
This document does **not** change the signal math, thresholds, or TP/exit logic.
It only simplifies how those decisions move through the system and how they are recorded.
Where BingX semantics differ from BLUE's historical execution surface, the difference must be isolated behind the execution boundary rather than pushed into the engine.
## 2. Design Goals
The architecture must satisfy all of the following:
- Faithfulness to BLUE's original algorithm.
- Full observability of actions as:
- fired
- requested
- sent
- acknowledged
- executed
- reflected on BingX
- Minimal complexity.
- Maximum reuse of existing tables, maps, and record shapes.
- Clean comparability between BLUE and PINK.
- No second domain-level truth source.
## 3. Non-Goals
This spec does not:
- Change the trading signal formula.
- Change the TP value or exit semantics.
- Add a second live source of truth.
- Replace supervisor with a new process manager.
- Introduce a new order ledger when existing tables can be reused.
## 4. Core Principle
PINK must be exchange-led.
That means:
- BingX position state is authoritative for whether the slot is open.
- BingX open-order state is authoritative for whether an exit is pending.
- Account state is a projection of confirmed exchange events.
- Local engine state is a projection of exchange state plus decision metadata.
- ClickHouse is the durable audit trail.
- Hazelcast is the live control/state bus.
- The TUI is a derived view only.
If local state and BingX state disagree, the system must reconcile toward BingX.
BLUE comparability rule:
- The engine-side lifecycle, state names, and record shapes should remain BLUE-compatible unless BingX makes that impossible.
- Any unavoidable exchange-specific deviation must be isolated in the execution adapter and event normalization path.
- The engine itself should remain oblivious to BingX quirks except for the minimal authority rules needed to stay safe.
## 5. Minimal State Model
The system should keep only these live state categories:
- Decision state
- what the engine decided
- Order state
- what was requested and acknowledged
- Position state
- what BingX currently holds
- Account state
- capital, leverage, open notional
- Terminal trade state
- completed trades only
Everything else should be derived from those categories.
The simplification target is not "remove layers entirely".
It is "make the layers explicit and narrow":
```text
engine intent
-> execution facade
-> exchange adapter
-> exchange
-> event normalization
-> durable ledger
```
The `execution facade` is where BLUE-compatible semantics are preserved.
The `exchange adapter` is where BingX-specific request/response shapes live.
`event normalization` is a thin technical return channel inside the execution boundary:
- dedupe exchange callbacks
- normalize terminal states
- map exchange facts into canonical trade/account events
- update projections and durable rows
It is not a separate policy or trading layer.
This spec uses the following DITA split:
- `Decision`
- pure signal evaluation
- `Intent`
- candidate selection and sizing proposal
- `Trade`
- single-slot lifecycle state machine
- `Account`
- projection of confirmed execution facts
## 6. Existing Data Structures to Reuse
This spec reuses the current structures instead of introducing parallel ones.
### 6.1 ClickHouse tables
- `dolphin_pink.position_state`
- lifecycle source for open and closed trade status
- `dolphin_pink.trade_events`
- terminal ledger for completed trades
- `dolphin_pink.account_events`
- capital and exposure snapshots
- `dolphin_pink.v7_decision_events`
- decision trail
- `dolphin_pink.adaptive_exit_shadow`
- shadow-only exit analysis
### 6.2 Hazelcast maps
- `DOLPHIN_STATE_PINK`
- `DOLPHIN_PNL_PINK`
- `DOLPHIN_FEATURES`
- `DOLPHIN_SAFETY`
- `DOLPHIN_HEARTBEAT`
### 6.3 Exchange-side sources
- `user/positions`
- `trade/openOrders`
- `trade/allOrders`
- `trade/allFillOrders`
## 7. Authoritative Precedence
Live truth must be resolved in this order:
```text
BingX user/positions
BingX trade/openOrders
BingX journal snapshot
ClickHouse account_events / position_state
Hazelcast engine snapshot
Supervisor log fallback
```
Rules:
- The first matching live BingX signal wins.
- Local snapshots may lag and must not override BingX.
- Log parsing is a last resort only.
For BLUE comparability:
- The adapter must emit the same semantic milestones BLUE would expose, even if the physical exchange response is different.
- If BingX cannot express a BLUE milestone exactly, preserve the closest semantic equivalent and annotate the deviation in the event payload.
## 8. High-Level Data Flow
```text
+------------------+
| Binance data |
| / HZ features |
+---------+--------+
|
v
+------------------+
| DolphinActor |
| (BLUE logic) |
+---------+--------+
|
v
+------------------+
| NDAlphaEngine |
| single slot only |
+---------+--------+
|
decision / request
|
v
+------------------+
| Execution facade |
| BLUE-compatible |
+---------+--------+
|
exchange-specific request
|
v
+------------------+
| BingXExecClient |
+---------+--------+
|
v
+------------------+
| BingX VST |
| positions/orders |
+---------+--------+
|
poll / ack / fill / close
|
v
+------------------+
| journal snapshot |
+---------+--------+
|
v
+----------------+----------------+
| ClickHouse + Hazelcast + TUI |
+---------------------------------+
```
## 9. Order Lifecycle
The system should treat every trade as a simple state machine.
```text
EMPTY
|
v
DECISION_CREATED
|
v
ORDER_REQUESTED
|
v
ORDER_SENT
|
v
ORDER_ACKNOWLEDGED
|
v
POSITION_OPENED
|
v
POSITION_UPDATED
|
v
EXIT_REQUESTED
|
v
EXIT_SENT
|
v
EXIT_ACKNOWLEDGED
|
v
POSITION_CLOSED
|
v
TRADE_TERMINAL_WRITTEN
|
v
EMPTY
```
Rules:
- A trade is not "closed" until BingX no longer reports the position.
- A terminal close row is not optional.
- The close row must be written after exchange-event normalization confirms terminality, not before.
## 10. Open / Update / Close Mechanics
### 10.1 Open
1. Engine produces a decision.
2. Actor converts it into an intent.
3. Execution facade normalizes the request into a BLUE-compatible action record.
4. Execution client submits the request to BingX.
5. BingX acknowledges or rejects.
6. BingX position becomes authoritative once open.
7. Event normalization updates `position_state` and account projections.
### 10.2 Update
1. Execution client polls `openOrders`.
2. Execution facade records the requested action.
3. Execution client polls `user/positions`.
4. Execution client refreshes account state.
5. Journal snapshot is persisted.
6. ClickHouse rows are appended.
7. Hazelcast state is refreshed.
8. TUI renders the derived result.
### 10.3 Close
1. Engine or exit manager requests exit.
2. Execution facade normalizes the exit into the same lifecycle that BLUE would represent.
3. Exit order is submitted reduce-only.
4. BingX confirms fill or terminal state.
5. Exchange position disappears.
6. Event normalization emits the terminal close fact.
7. `trade_events` close row is written.
8. `position_state` is updated to closed.
## 11. Reconciliation Model
In this spec, "reconciliation" is not a first-class domain layer.
It is the thin adapter-side return path that converts BingX facts into canonical events and projections.
The simplified model is:
```text
engine intent
-> exchange submission
-> exchange state
-> event normalization
-> durable ledger
```
Not:
```text
engine intent
-> local inferred close
-> maybe exchange close later
```
The second pattern is what creates ghost closes and confusing TUI state.
The return path must remain thin and mostly transparent:
- confirm what BingX actually did
- translate exchange reality into canonical engine state and durable ledger rows
- backfill only the minimum terminal bookkeeping needed to keep the audit trail complete
It must not:
- make trading decisions
- invent or reinterpret strategy state
- act as a second policy layer
- override engine intent except where required to reflect BingX authority
In other words:
```text
policy lives in the engine
translation lives in the execution boundary
truth lives on BingX
```
If the return path starts shaping strategy behavior, the architecture has drifted.
## 12. ClickHouse Accounting Contract
### 12.1 `account_events`
This table must represent the latest authoritative snapshot of:
- capital
- open positions
- open notional
- leverage
- fills metadata
It is not the source of truth for execution. It is the projection of confirmed execution facts and the best table for capital-path replay.
### 12.2 `position_state`
This table must represent per-trade lifecycle state.
Required lifecycle states:
- `OPEN`
- `EXIT_REQUESTED`
- `EXIT_ACKED`
- `CLOSED`
- `RECONCILED`
This table is the canonical lifecycle projection, not a second engine.
### 12.3 `trade_events`
This table must represent terminal closed trades only.
Rules:
- one terminal row per completed trade
- dedupe by `trade_id`
- never infer a close row from a fill snapshot alone
### 12.4 `status_snapshots`
When capital replay is needed, `status_snapshots` remains the preferred capital-path source because it captures:
- capital
- posture
- `trades_executed`
- `rm`
- `vol_ok`
- related snapshot state
`trade_events` alone is not enough for capital replay.
## 13. PINK and BLUE Comparison Rules
PINK must remain structurally comparable to BLUE.
That means:
- same trade identity model
- same key fields for open/close events
- same exit reason vocabulary
- same capital accounting semantics
- same bar and hold semantics
Namespace differences are allowed.
Semantic differences are not.
The DITA split must stay semantically compatible with BLUE:
- decision semantics preserved
- intent selection preserved
- trade lifecycle compatible
- account projection comparable
- return-channel normalization exchange-specific only
## 14. Simplification Rules
To reduce bugs, do the following:
### 14.1 Keep one authoritative open-slot view
Do not maintain competing local definitions of "open trade".
### 14.2 Stop inventing closed trades in the TUI
The TUI may display:
- open positions
- terminal trades
- fills
It must not convert fills into fake closes.
### 14.3 Remove recovery ambiguity
At startup:
- BingX positions are imported
- stale local slots are cleared
- journal state is restored only when it does not contradict BingX
- account projection is rebuilt from confirmed exchange facts, not from intent history
### 14.4 Keep the event trail append-only
If a state needs correction, emit a new event.
Do not rewrite history.
## 15. ASCII Failure Modes
### 15.1 Ghost close
```text
EXIT_REQUESTED
|
v
EXIT_SENT
|
+--> local snapshot says CLOSED
|
+--> BingX still shows position OPEN
|
v
BUG: local UI looks flat, exchange is not flat
```
### 15.2 Missing terminal row
```text
EXIT_ACKNOWLEDGED
|
v
POSITION_CLOSED on BingX
|
v
trade_events row missing
|
v
BUG: replay/debug cannot prove the close
```
### 15.3 Duplicate ledger row
```text
trade_events insert
|
+--> duplicate insert for same trade_id
|
v
BUG: replay capital is overstated unless deduped
```
## 16. Acceptance Criteria
The simplification is acceptable only if all of the following hold:
1. BLUE algorithm behavior is preserved exactly.
2. PINK trades can be compared to BLUE trades using the same structures.
3. Every order action is visible in the trail.
4. Every close can be traced to BingX terminal state.
5. TUI never invents a close.
6. Capital replay can be reconstructed from `status_snapshots` plus deduped trade rows.
7. BingX remains the authoritative open-position source.
## 17. Implementation Boundaries
The following are the expected boundaries for any implementation work:
- Launcher layer
- namespace wiring only
- Actor layer
- engine-slot projection and adapter ingress
- Execution facade layer
- BLUE-compatible action normalization
- order lifecycle event emission
- BingX execution layer
- order submit / poll / reconcile / snapshot
- Journal layer
- durable bridge into ClickHouse
- Observability layer
- derived display only
The return path should be treated as a translation boundary, not a policy boundary.
Its ideal steady state is nearly invisible.
Any new BingX-specific behavior should go in the execution or adapter-ingress path, not in the engine decision logic.
## 18. Recommended Simplified Architecture
```text
[decision]
|
v
[intent]
|
v
[trade FSM]
|
v
[execution adapter]
|
v
[BingX order/position]
|
v
[event normalization]
|
v
[ClickHouse account + trade ledger]
|
v
[TUI / replay]
```
This is the simplest version that still preserves BLUE faithfulness and auditability.
## 19. Open Questions
These are implementation questions, not design blockers:
- Should PINK `trade_events` remain fully separate from BLUE-compatible schema, or only namespace-tagged?
- Should the TUI use `account_events` or `position_state` as the primary open-trade panel source?
- Should `position_state` become the canonical lifecycle table for all live strategies, or only PINK first?
- Should any exchange callback normalization be shared with BLUE, or remain PINK-only until parity is proven?
## 20. Final Decision
The target simplification is:
- one engine
- one exchange authority
- one append-only audit trail
- one derived TUI
- one replay path
Anything that introduces a second truth source should be removed or demoted.
Reconciliation, if the term is retained at all, should mean only the thin adapter-side normalization of BingX facts into canonical events and account projection. It should not exist as a policy layer.

View File

@@ -1,79 +0,0 @@
# PINK-on-DITAv2 Fault Taxonomy & Operator Response
## Fault Classes
### RATE_LIMITED
**Kernel code**: `KernelDiagnosticCode.RATE_LIMITED`
**Severity**: WARNING
**Recovery**: Automatic — kernel retries on next step cycle.
Operator action: none required unless persistent (>10 min). If persistent, check BingX API limits at `/openApi/swap/v2/user/balance` directly. Reduce poll frequency via `DOLPHIN_PINK_POLL_INTERVAL_SEC` (default 1.0s).
### ORDER_REJECTED
**Kernel code**: `KernelDiagnosticCode.ORDER_REJECTED`
**Entry reject**: Slot returns to IDLE. Decision engine will re-evaluate on next cycle.
**Exit reject**: Slot stays in EXIT_WORKING. Decision engine will retry exit.
Operator action: check that instrument is tradeable on BingX VST. Symbol precision changes or contract suspensions can cause rejects. Inspect `outcome.details` for venue reason text.
### EXIT_ORDER_REJECTED
**Kernel code**: `KernelDiagnosticCode.EXIT_ORDER_REJECTED`
**Slot state**: EXIT_WORKING. The kernel will retry via process_intent(EXIT) on the next step where the decision engine produces an exit signal.
Operator action: if position remains open past `DOLPHIN_MAX_HOLD_BARS` (default 250), manually flatten via `pink_ctl.py` or direct BingX REST.
### CANCEL_REJECTED
**Kernel code**: `KernelDiagnosticCode.CANCEL_REJECTED`
**Slot state**: Unchanged. Cancel is retried on the next cycle.
Operator action: check open orders on BingX. If the order filled between cancel attempt and rejection, the slot will converge on the next reconcile cycle.
### NO_ACTIVE_EXIT_ORDER
**Kernel code**: `KernelDiagnosticCode.NO_ACTIVE_EXIT_ORDER`
**Cause**: Exit intent processed but no working exit order exists (usually because it filled between decision and execution).
Operator action: none — the fill event will converge the slot to CLOSED on the next `on_venue_event` or reconcile.
### STALE_STATE_RECONCILE
**Kernel code**: `KernelDiagnosticCode.STALE_STATE_RECONCILING`
**Slot state**: STALE_STATE_RECONCILING. Normal event progression is blocked until reconciliation completes.
Operator action: if the slot stays in this state for >30s, the exchange snapshot may be inconsistent. Run `pink_ctl.py restart` to force full restart reconcile.
### DUPLICATE_EVENT
**Kernel code**: `KernelDiagnosticCode.DUPLICATE_EVENT`
**Severity**: INFO
**Effect**: Event is dropped. No capital or state change. Idempotency via `seen_event_ids` on the slot.
Operator action: none.
### RATE_LIMITED (persistent cycle)
**Detection**: Consecutive RATE_LIMITED outcomes with no successful exchange interaction.
**Anomaly row origin**: `ditav2_kernel`
Operator action: check exchange API status. If the rate limit window is known, set `DITA_V2_RATE_LIMIT_COOLDOWN_SEC` in env.
## Diagnostic Surface
All fault codes appear in:
- `KernelOutcome.diagnostic_code` (programmatic)
- `KernelOutcome.severity` (INFO/WARNING/ERROR/CRITICAL)
- `KernelOutcome.details` (structured payload with reason, retry_after_ms, etc.)
## Log Paths
- Runtime: `/tmp/dolphin_logs/supervisor/dolphin_live_pink.log`
- Kernel: `/tmp/dolphin_logs/supervisor/dolphin_live_pink-error.log`
## Recovery Tools
```bash
# Check DITAv2 health
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py ditav2-status
# Full restart reconcile
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py restart
# Namespace isolation check
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py mode-verify
```

View File

@@ -1,470 +0,0 @@
# PINK -> DITAv2 Refactor Guide (File-by-File, Implementation-Ready)
## MANDATORY READ ORDER (Before Any Code Change)
Read these documents in this exact order before touching code:
1. `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md` (PINK/DITA addendum scope only; do not broaden scope into BLUE changes)
2. `/mnt/dolphinng5_predict/prod/docs/PINK_BINGX_SIMPLIFICATION_SPEC_2026-05-22.md`
3. `/mnt/dolphinng5_predict/prod/docs/DITA_V2_KERNEL_REFERENCE.md`
4. `/mnt/dolphinng5_predict/prod/docs/DITA_V2_OPERATOR_PLAYBOOK.md`
5. `/mnt/dolphinng5_predict/prod/docs/CLEAN_ARCH_DITA_REFERENCE_PROD_IMPLEMENTATION_SPEC.md`
Do not begin implementation until these are read and the PINK-only boundary is explicit.
## 0) Scope and Goal
This guide is for refactoring **PINK only** to execute trades through **DITAv2 exclusively** (where DITAv2 facilities exist), while preserving:
1. the shared BLUE/PINK signal and trading algorithm semantics,
2. existing PINK observability contracts (Hazelcast, ClickHouse, TUI),
3. strict non-impact on BLUE.
The target is a PINK runtime that is testnet-stable on BingX, with deterministic execution/accounting and explicit handling of known failure classes (hung orders, non-closes, duplicate events, stale/restart drift, rate limits).
---
## 1) Hard Invariants (Must Hold Throughout)
1. **BLUE untouched**:
- No behavior changes in BLUE runtime paths.
- No BLUE namespace changes (`DOLPHIN_STATE_BLUE`, `DOLPHIN_PNL_BLUE`, `dolphin` DB surfaces).
2. **Execution boundary**:
- PINK execution calls must go through DITAv2 kernel + venue adapter.
- No direct PINK exchange-submit path outside DITAv2 where DITAv2 has equivalent functionality.
3. **Algo parity**:
- Entry/exit decision semantics remain shared with BLUE policy logic.
- DITAv2 is execution/risk-state substrate, not strategy rewrite.
4. **Exchange-led truth**:
- Reconcile from exchange snapshots; local state follows exchange, not vice versa.
5. **Accounting determinism**:
- No double-application of realized PnL.
- Multi-leg closes apply capital deltas exactly once per economic leg.
---
## 2) Pre-Refactor Safety Baseline
## 2.1 Files to snapshot before edits
- `/mnt/dolphinng5_predict/prod/launch_dolphin_pink.py`
- `/mnt/dolphinng5_predict/prod/clean_arch/runtime/pink_direct.py`
- `/mnt/dolphinng5_predict/prod/ops/pink_ctl.py`
- `/mnt/dolphinng5_predict/prod/configs/pink.yml`
- `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf`
## 2.2 Baseline behavior capture (mandatory)
Capture and store:
1. PINK entry -> partial exit -> final exit behavior.
2. PINK state transitions for cancel/reject/reconcile.
3. ClickHouse deltas:
- `dolphin_pink.trade_events`
- `dolphin_pink.position_state`
- `dolphin_pink.account_events`
- `dolphin_pink.v7_decision_events`
4. Hazelcast deltas:
- `DOLPHIN_STATE_PINK`
- `DOLPHIN_PNL_PINK`
5. TUI fields used by `dolphin_status_pink.py`.
This is the parity baseline used to prove "algo unchanged, execution substrate changed."
---
## 3) File-by-File Refactor Plan
## 3.1 Runtime entrypoint and boundary
### File: `/mnt/dolphinng5_predict/prod/launch_dolphin_pink.py`
### Objective
Convert launcher wiring so PINK execution is DITAv2-native by default.
### Required edits
1. Keep namespace hardening for PINK:
- `strategy_name=pink`
- `DOLPHIN_STATE_PINK`, `DOLPHIN_PNL_PINK`
- `journal_strategy=pink`, `journal_db=dolphin_pink`
2. Replace/retire legacy DITA execution object graph for trade execution:
- stop using legacy `prod.clean_arch.dita.*` execution path as primary.
- construct DITAv2 bundle (`prod.clean_arch.dita_v2.launcher`).
3. Explicit DITAv2 env defaults for PINK launcher:
- `DITA_V2_VENUE=BINGX`
- `DITA_V2_ZINC=REAL`
- `DITA_V2_CONTROL_PLANE=REAL_ZINC`
- `DITA_V2_HAZELCAST=REAL`
- `DITA_V2_LAUNCHER_MODE=serve`
4. Keep BingX env safety:
- `DOLPHIN_BINGX_ENV=VST`
- `DOLPHIN_BINGX_ALLOW_MAINNET=0`
5. Continue loading `BINGX_API_KEY`/`BINGX_SECRET_KEY` from `.env` contract.
### Acceptance checks
1. PINK launcher starts and uses DITAv2 bundle path.
2. No BLUE state map/DB writes from this path.
3. PINK still exposes expected runtime metadata in HZ/CH.
---
### File: `/mnt/dolphinng5_predict/prod/clean_arch/runtime/pink_direct.py`
### Objective
Replace legacy execution orchestration with DITAv2 intent/event orchestration while preserving decision semantics.
### Required edits
1. Introduce a dedicated translation seam:
- Decision output -> `KernelIntent` mapping (`ENTER`, `EXIT`, `MARK_PRICE`, `CANCEL`, `RECONCILE`).
2. Route execution through:
- `ExecutionKernel.process_intent(...)`
- `ExecutionKernel.on_venue_event(...)` for reconcile/event ingestion.
3. Keep policy/decision logic unchanged:
- do not rewrite velocity/IRP/threshold policy semantics.
4. On every execution phase:
- reconcile from exchange (through DITAv2 BingX venue path),
- project state from DITAv2 slot/account snapshot,
- emit persistence payloads from DITAv2 outcomes/events.
5. Handle diagnostics explicitly:
- `RATE_LIMITED`, `ORDER_REJECTED`, `EXIT_ORDER_REJECTED`, `CANCEL_REJECTED`, `NO_ACTIVE_EXIT_ORDER`, stale/reconcile signals.
6. Enforce idempotence:
- repeated venue `event_id` must not re-apply economic effects.
### Acceptance checks
1. Slot/FSM states are deterministic for nominal and rejection paths.
2. No hung local state when exchange is flat.
3. PINK accounting rows remain schema-compatible and single-application.
---
### File: `/mnt/dolphinng5_predict/prod/clean_arch/adapters/bingx_direct.py`
### Objective
Keep exchange edge behavior normalized for DITAv2 and resilient to rate limits.
### Required edits
1. Preserve/extend mapping of BingX throttle responses to `RATE_LIMITED`.
2. Ensure refresh/reconcile endpoints degrade safely (empty snapshot) under transient throttles rather than crashing runtime.
3. Preserve `reduceOnly` semantics for exits and close-out operations.
4. Ensure all normalization fields required by DITAv2 are present:
- `orderId`, `clientOrderId`, `status`, reason/message, retry hints.
### Acceptance checks
1. Adapter never causes runtime crash on nominal exchange throttling.
2. DITAv2 receives normalized status it can classify deterministically.
---
### File: `/mnt/dolphinng5_predict/prod/clean_arch/dita_v2/bingx_venue.py`
### Objective
Guarantee PINK gets first-class DITAv2 venue events for all exchange reactions.
### Required edits
1. Keep/extend mapping for:
- ACK/FILL/PARTIAL_FILL
- REJECT/CANCEL_REJECT
- RATE_LIMITED
2. Ensure `metadata` carries actionable downstream fields:
- retryability, `retry_after_ms` if present, reason, venue status text.
3. Ensure `reconcile()` emits consistent event stream usable for restart recovery.
### Acceptance checks
1. No "unknown event kind" on observed BingX payloads.
2. Reconcile events are sufficient to converge slot state after restart.
---
## 3.2 PINK persistence and observability compatibility
### File: `/mnt/dolphinng5_predict/prod/clean_arch/persistence/pink_clickhouse.py`
### Objective
Keep PINK tables contract-compatible while sourcing execution truth from DITAv2.
### Required edits
1. Ensure row builders consume DITAv2 outcome/event metadata where needed.
2. Preserve existing table contracts:
- `policy_events`
- `v7_decision_events`
- `trade_events`
- `position_state`
- `account_events`
- `anomaly_events`
3. Add explicit anomaly rows for:
- rate-limited retry cycles breaching threshold,
- hung-order timeout escalations,
- reconcile divergence resolution events.
### Acceptance checks
1. No schema drift breaking existing PINK dashboards/TUI.
2. Capital/event rows reconcile to exchange-led lifecycle.
---
### File: `/mnt/dolphinng5_predict/prod/clickhouse/pink/*.sql`
### Objective
Ensure schema supports DITAv2 diagnostic characterization without breaking old readers.
### Required edits
1. Add columns only if required and backward-compatible:
- diagnostic code,
- severity,
- retryability/retry hints,
- reconcile markers.
2. Do not remove or repurpose existing columns read by current tooling.
### Acceptance checks
1. Existing readers still run.
2. New DITAv2 fault/diagnostic fields are queryable.
---
### File: `/mnt/dolphinng5_predict/prod/ops/pink_ctl.py`
### Objective
Make PINK operator tooling DITAv2-aware.
### Required edits
1. Keep PINK namespace isolation checks as-is.
2. Add DITAv2-specific health assertions:
- kernel mode/verbosity/backend mode from control plane,
- DITAv2 process health in supervisor.
3. Add a command (or output block) for live smoke execution status.
### Acceptance checks
1. `status`, `healthcheck`, `mode-verify` remain PINK-only.
2. Tool can detect DITAv2 miswiring immediately.
---
### File: `/mnt/dolphinng5_predict/prod/supervisor/dolphin-supervisord.conf`
### Objective
Ensure PINK runs supervised with DITAv2-backed runtime, BLUE unaffected.
### Required edits
1. Keep BLUE programs unchanged.
2. Ensure `dolphin_pink` program points to refactored PINK launcher path.
3. Keep clear comments that PINK is VST/testnet and isolated.
### Acceptance checks
1. `supervisorctl status` shows BLUE and PINK independently healthy.
2. Stopping/restarting PINK does not impact BLUE services.
---
## 3.3 Test harness and execution quality
### File: `/mnt/dolphinng5_predict/prod/tests/test_pink_bingx_dita_live_e2e.py`
### Objective
Primary live testnet acceptance suite for PINK-on-DITAv2.
### Required edits
1. Ensure it drives DITAv2 path only.
2. Include full operational gamut:
- entry
- mark
- partial exit
- final exit
- cancel/cancel-after-flat
- reconcile/restart-style checks
3. Accept nominal exchange reactions while asserting deterministic kernel finality.
4. Add explicit verification blocks:
- open orders/positions are flat after cleanup,
- no orphan slot state.
### Acceptance checks
1. Suite passes reliably with rate-limit-respectful cadence.
2. No residual exposure after test completion.
---
### File: `/mnt/dolphinng5_predict/prod/tests/test_pink_direct_runtime.py`
### Objective
Kernel integration correctness in non-live conditions.
### Required edits
1. Replace old execution assertions with DITAv2-based assertions:
- intent mapping,
- emitted events,
- diagnostic handling,
- slot transitions.
2. Add tests for duplicate event replay and stale-state reconcile.
### Acceptance checks
1. Runtime behavior deterministic under mock/fuzzed event schedules.
2. No double-booking of capital in partial/full close chains.
---
### File: `/mnt/dolphinng5_predict/prod/tests/test_pink_clickhouse_persistence.py`
### Objective
Prevent accounting/persistence regressions.
### Required edits
1. Validate per-leg and terminal close semantics from DITAv2 outcomes.
2. Validate anomaly/diagnostic row emission for non-nominal conditions.
### Acceptance checks
1. Capital deltas and position-state terminality are consistent.
2. Replay/restart write paths remain coherent.
---
### New test files to add
1. `/mnt/dolphinng5_predict/prod/tests/test_pink_ditav2_kernel_bridge.py`
- Decision->KernelIntent mapping table tests.
2. `/mnt/dolphinng5_predict/prod/tests/test_pink_ditav2_rate_limit_contract.py`
- Retryable warning classification + downstream emission tests.
3. `/mnt/dolphinng5_predict/prod/tests/test_pink_ditav2_restart_reconcile.py`
- crash/restart reconcile convergence tests.
4. `/mnt/dolphinng5_predict/prod/tests/test_pink_ditav2_accounting_invariants.py`
- multi-leg non-double-book proofs.
---
## 3.4 Documentation and runbooks
### Files to update
1. `/mnt/dolphinng5_predict/prod/docs/PINK_BINGX_SIMPLIFICATION_SPEC_2026-05-22.md`
2. `/mnt/dolphinng5_predict/prod/docs/DITA_V2_KERNEL_REFERENCE.md`
3. `/mnt/dolphinng5_predict/prod/docs/DITA_V2_OPERATOR_PLAYBOOK.md`
4. `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md` (addendum only)
### Required doc updates
1. Explicit statement: PINK execution boundary is DITAv2.
2. Exact live smoke and healthcheck commands.
3. Fault taxonomy and operator response for rate limit/reject/hung/reconcile paths.
4. BLUE non-impact proof checklist.
---
## 4) Implementation Sequence (Strict Order)
1. Freeze BLUE + baseline capture.
2. Launcher boundary wiring (`launch_dolphin_pink.py`).
3. Runtime bridge (`pink_direct.py`) to DITAv2 intents/events.
4. Persistence projection alignment (`pink_clickhouse.py` + SQL if needed).
5. Operator/control updates (`pink_ctl.py`, supervisor stanza check).
6. Non-live tests (unit/integration/fsm).
7. Mock E2E and chaos/fuzz.
8. Live BingX testnet basic cycles.
9. Live BingX testnet chaos/fuzz.
10. Soak and finalize docs/runbook.
Do not reorder. Live testing before accounting invariants is not allowed.
---
## 5) Mandatory Validation Matrix
## 5.1 Deterministic execution finality
For each action path (ENTER, EXIT partial, EXIT final, CANCEL, RECONCILE), assert:
1. deterministic final slot state,
2. deterministic diagnostic code on failure paths,
3. deterministic account/capital projection effect.
## 5.2 Known failure class coverage
1. Hung order:
- timeout monitor triggers,
- reconcile/cancel cycle emits diagnostics,
- eventual terminality is explicit.
2. Non-close:
- position remains visible in exchange snapshot until actually flat,
- no premature local close state.
3. Duplicate/replayed events:
- no duplicate capital/PnL application.
4. Restart/reconcile drift:
- restart with open exchange position converges to correct slot state.
5. Rate limit:
- classified as retryable warning,
- downstream emitted with code/severity/hints,
- no state corruption.
## 5.3 Namespace isolation
1. No `pink` strategy rows in `dolphin` or `dolphin_prodgreen`.
2. No PINK writes to BLUE HZ maps.
3. PINK stop/start/restart has zero BLUE impact.
---
## 6) Cutover and Rollback
## 6.1 Cutover gates
All must be true:
1. Non-live suite green.
2. Mock E2E + chaos/fuzz green.
3. Live testnet basic and chaos/fuzz green.
4. No unresolved hung/non-close cases in soak window.
5. Accounting parity checks pass.
## 6.2 Rollback trigger conditions
Rollback immediately if any:
1. unresolved exposure after cleanup,
2. non-deterministic capital drift,
3. repeated stale/reconcile divergence,
4. contamination of BLUE/PRODGREEN namespaces.
## 6.3 Rollback action
1. Stop PINK only.
2. Revert PINK launcher/runtime to pre-refactor revision.
3. Keep forensic artifacts (CH/HZ rows, logs, diagnostics) for postmortem.
---
## 7) Operational Commands (Post-Refactor)
1. PINK control:
```bash
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py status
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py healthcheck
python /mnt/dolphinng5_predict/prod/ops/pink_ctl.py mode-verify
```
2. DITAv2 live smoke command (rate-limit respectful suite):
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --symbol TRXUSDT
```
3. Dry-run (no orders):
```bash
python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --dry-run --symbol TRXUSDT
```
---
## 8) Definition of Done
1. PINK uses DITAv2 execution facilities exclusively where available.
2. Shared BLUE/PINK strategy semantics are preserved.
3. BLUE is behaviorally unaffected.
4. PINK supports entries, exits, partial exits, TP/SL-driven exits, cancel/reconcile/restart.
5. Accounting is deterministic and restart-safe.
6. Live testnet E2E + chaos/fuzz passes with exchange-side verification.

View File

@@ -1,608 +0,0 @@
# PINK Re-Architecture Specification (Implementation Blueprint)
Status: Approved-for-coding spec (no code in this document)
Date: 2026-05-19
Owner: Runtime/Infra
Target: Add isolated `PINK` testnet execution system with identical trading algorithm behavior to BLUE, while keeping BLUE undisturbed.
---
## 1. Executive Decision
### 1.1 Decision
Build `PINK` as an **isolated sidecar system** with dedicated namespaces and control surfaces, then optionally migrate that sidecars infra onto Podman+Quadlet+systemd.
### 1.2 Why this decision
- BLUE must remain undisturbed.
- Current codebase hard-routes many `prod*` paths into PRODGREEN sinks; a naive clone collides.
- BingX account journaling currently dominates data volume and must be controlled explicitly.
### 1.3 Non-negotiable invariant
The **trading algorithm logic must remain identical to BLUE** (signal math, thresholds, decision state machine semantics).
---
## 2. Hard Constraints (Must Hold)
1. No behavior change in core trading logic vs BLUE.
2. No write contamination across BLUE/GREEN/PINK CH databases.
3. No write contamination across BLUE/GREEN/PINK Hazelcast maps.
4. BLUE process manager and lifecycle remain unchanged during PINK buildout.
5. PINK must run BingX in VST/testnet mode only until explicit go-live gate.
6. Any infra re-architecture must be introduced to PINK first, never by replacing BLUE in-place.
---
## 3. Current-State Evidence (Reference Anchors)
### 3.1 Supervisord-first doctrine
- `prod/docs/SYSTEM_BIBLE_v7.md` states all dolphin services are supervisord-managed and warns against dual-management races.
- See:
- `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md:11`
- `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md:1339`
- `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md:3377`
### 3.2 Namespace split already in doctrine
- BLUE: `dolphin`, `DOLPHIN_STATE_BLUE`, `DOLPHIN_PNL_BLUE`
- PRODGREEN: `dolphin_prodgreen`, `DOLPHIN_STATE_PRODGREEN`, `DOLPHIN_PNL_PRODGREEN`
- See:
- `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md:11`
- `/mnt/dolphinng5_predict/prod/docs/SYSTEM_BIBLE_v7.md:14`
### 3.3 Hardcoded routing that collides with new strategy names
- BLUE trader hardcoded map keys:
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:1740`
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:1741`
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:1850`
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:2806`
- `DolphinActor` routes `strategy.startswith("prod")` to PRODGREEN sink:
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:179`
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:180`
- BingX execution hardcodes PRODGREEN strategy/db:
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:263`
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:532`
- BingX journal maps `prod*` -> `dolphin_prodgreen`:
- `/mnt/dolphinng5_predict/prod/bingx/journal.py:90`
- `/mnt/dolphinng5_predict/prod/bingx/journal.py:91`
### 3.4 Current BingX poll cadence (main source of account-event volume)
- Poll loops:
- open orders loop
- positions loop
- account loop
- See:
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:707`
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:723`
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:732`
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:741`
- Default intervals:
- `/mnt/dolphinng5_predict/prod/bingx/config.py:58`
- `/mnt/dolphinng5_predict/prod/bingx/config.py:59`
- `/mnt/dolphinng5_predict/prod/bingx/config.py:60`
### 3.5 Data volumes measured (14 complete days; 2026-05-05 to 2026-05-18)
- BLUE-like CH outgoing payload estimate: ~4.17 MB/day avg, ~12.01 MB/day p95-day.
- BLUE-like HZ outgoing payload estimate: ~100.03 MB/day avg, ~301.53 MB/day p95-day.
- PRODGREEN-style BingX `account_events` stream estimate: ~7.41 GB/day avg, ~18.57 GB/day p95-day.
---
## 4. Scope
## 4.1 In scope
1. Introduce first-class `PINK` namespace contract across CH/HZ/control-plane.
2. Preserve algorithm semantics exactly.
3. Isolate PINK execution in BingX VST.
4. Add explicit friction/cost characterization outputs.
5. Add infra spec for Podman+Quadlet+systemd deployment of PINK stack.
## 4.2 Out of scope
1. Any change to signal formula/thresholds/risk decision logic.
2. Any BLUE teardown or manager migration in this phase.
3. Any LIVE mainnet enablement for PINK.
---
## 5. Naming and Namespace Contract
## 5.1 Strategy naming
- Strategy name for new instance: `pink` (lowercase).
- Disallowed for this phase: names with `prod` prefix (e.g., `prodpink`) because current routing treats `prod*` specially.
## 5.2 ClickHouse namespace
- New DB: `dolphin_pink`.
- Required tables (minimum):
- `trade_events`
- `trade_reconstruction`
- `trade_exit_legs`
- `v7_decision_events`
- `adaptive_exit_shadow`
- `account_events`
- `status_snapshots`
- Optional parity tables if needed by downstream tooling:
- `sc_threshold_advisor_shadow`
- `sc_bucket_gauge_shadow`
- `inverse_ars_bounce_shadow`
## 5.3 Hazelcast namespace
- Maps:
- `DOLPHIN_STATE_PINK`
- `DOLPHIN_PNL_PINK`
- Control-plane runtime command queue key:
- `pink_runtime_commands`
- Capital mirror key:
- `pink_capital_update_latest`
## 5.4 Trader identity
- Trader ID default:
- `DOLPHIN-PINK-001`
---
## 6. Required File-Level Changes (Coding Agent Worklist)
Important: This section is prescriptive. Implement all items unless explicitly marked optional.
## 6.1 Sink/routing abstraction
### 6.1.1 `prod/ch_writer.py`
Current state exposes only `_writer`, `_writer_green`, `_writer_prodgreen` and corresponding functions.
- Source anchor: `/mnt/dolphinng5_predict/prod/ch_writer.py:302`
Required:
1. Add `_writer_pink = _CHWriter(db="dolphin_pink")`.
2. Add `ch_put_pink(table: str, row: dict) -> None`.
3. Do not modify behavior of existing sink functions.
Acceptance:
- Unit test asserts writes called via `ch_put_pink` target `dolphin_pink` only.
### 6.1.2 `prod/bingx/journal.py`
Current `_db_for_strategy` routes `prod*` to `dolphin_prodgreen`.
- Anchor: `/mnt/dolphinng5_predict/prod/bingx/journal.py:88`
Required:
1. Replace ad-hoc prefix routing with explicit strategy->db map.
2. Add explicit `pink -> dolphin_pink` mapping.
3. Keep existing `blue`, `green`, `prodgreen` compatibility.
4. Update sink selection to include `ch_put_pink`.
Acceptance:
- For `strategy='pink'`, both journal snapshot writes and lookup reads use only `dolphin_pink`.
### 6.1.3 `prod/bingx/execution.py`
Current code hardcodes:
- `self._journal_strategy = "prodgreen"`
- account-events insert URL database `dolphin_prodgreen`
- Anchors:
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:263`
- `/mnt/dolphinng5_predict/prod/bingx/execution.py:532`
Required:
1. Add config-driven `journal_strategy` and `journal_db` fields.
2. Default for existing prodgreen path remains unchanged.
3. PINK launcher passes `journal_strategy='pink'`, `journal_db='dolphin_pink'`.
4. Remove any remaining hardcoded `dolphin_prodgreen` in account-event path.
Acceptance:
- No writes from PINK execution appear in `dolphin_prodgreen.account_events`.
## 6.2 Actor and launcher namespace configurability
### 6.2.1 `prod/launch_dolphin_live.py`
Current defaults are prodgreen-centric:
- state/pnl maps and strategy name.
- Anchors:
- `/mnt/dolphinng5_predict/prod/launch_dolphin_live.py:78`
- `/mnt/dolphinng5_predict/prod/launch_dolphin_live.py:79`
- `/mnt/dolphinng5_predict/prod/launch_dolphin_live.py:132`
Required:
1. Introduce generic env-driven namespace fields:
- `DOLPHIN_STRATEGY_NAME`
- `DOLPHIN_STATE_MAP`
- `DOLPHIN_PNL_MAP`
- `DOLPHIN_ADAPTIVE_EXIT_DB`
- `DOLPHIN_V7_JOURNAL_DB`
2. Keep prodgreen defaults backward-compatible.
3. Add dedicated PINK launcher module or mode wrapper with PINK defaults.
Acceptance:
- Running PINK launcher without overrides lands in PINK namespaces only.
### 6.2.2 `nautilus_dolphin/nautilus/.../dolphin_actor.py`
Current default + routing:
- `strategy_name='prodgreen'`
- `startswith("prod")` sink logic
- state/pnl defaults map to PRODGREEN
- Anchors:
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:179`
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:180`
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:181`
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:185`
- `/mnt/dolphinng5_predict/nautilus_dolphin/nautilus_dolphin/nautilus/dolphin_actor.py:189`
Required:
1. Replace prefix-based sink selection with explicit strategy mapping.
2. Add first-class `pink` mapping for CH sink + default shadow db.
3. Keep old strategy names functional.
4. Ensure aliases do not include BLUE keys in PINK mode.
Acceptance:
- Actor in `pink` mode never writes to `DOLPHIN_STATE_PRODGREEN`, `DOLPHIN_PNL_PRODGREEN`, or `dolphin_prodgreen`.
## 6.3 Control-plane keys and capital surfaces
### 6.3.1 `prod/nautilus_event_trader.py` (if PINK reuses this path)
Current BLUE hardcoding includes:
- `DOLPHIN_STATE_BLUE`, `DOLPHIN_PNL_BLUE`, `blue_runtime_commands`
- Anchors:
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:1740`
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:1741`
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:1850`
- `/mnt/dolphinng5_predict/prod/nautilus_event_trader.py:2806`
Required (only if this file is used for PINK runtime):
1. Parameterize map names and runtime queue key.
2. Preserve BLUE defaults exactly.
3. Add PINK equivalents via env/config.
Acceptance:
- `SET_CAPITAL` / `CAPITAL_UPDATE` for PINK only affects PINK state surfaces.
Note: Preferred approach is to keep BLUE runtime on this file untouched and run PINK through launcher/actor path first.
## 6.4 Ops scripts and tooling
### 6.4.1 `prod/ops/prodgreen_ctl.py`
Current script is hardcoded to PRODGREEN namespaces.
- Anchors:
- `/mnt/dolphinng5_predict/prod/ops/prodgreen_ctl.py:23`
- `/mnt/dolphinng5_predict/prod/ops/prodgreen_ctl.py:24`
- `/mnt/dolphinng5_predict/prod/ops/prodgreen_ctl.py:42`
Required:
1. Create `pink_ctl.py` OR generalize into namespace-aware ctl tool.
2. Required commands: status, healthcheck, start, stop, restart, mode-verify.
3. Must not invoke BLUE program names by default.
Acceptance:
- `pink_ctl status` reports PINK CH/HZ surfaces only.
---
## 7. ClickHouse Schema Plan for `dolphin_pink`
## 7.1 Strategy
Clone `prodgreen` schema set as baseline for PINK to preserve execution-profile columns.
Reference DDLs:
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/00_create_database.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/account_events.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/status_snapshots.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/trade_events.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/v7_decision_events.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/adaptive_exit_shadow.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/02_create_trade_reconstruction.sql`
- `/mnt/dolphinng5_predict/prod/clickhouse/prodgreen/03_create_trade_exit_legs.sql`
## 7.2 Required migration artifacts
Create new folder:
- `prod/clickhouse/pink/`
Include:
1. `00_create_database.sql` -> `CREATE DATABASE IF NOT EXISTS dolphin_pink;`
2. Full table DDL scripts mirroring prodgreen table structures.
3. Apply script with idempotent checks.
## 7.3 Guardrails
1. No schema mutation to existing `dolphin` or `dolphin_prodgreen` in this phase.
2. No historical retagging/movement required for initial PINK bring-up.
---
## 8. Hazelcast Map and Key Contract
## 8.1 Required map names
- `DOLPHIN_STATE_PINK`
- `DOLPHIN_PNL_PINK`
## 8.2 Required keys in `DOLPHIN_STATE_PINK`
- `engine_snapshot`
- `capital_checkpoint`
- `latest_nautilus`
- optional replay/control keys mirrored from blue contract if PINK runtime supports same capital workflows
## 8.3 Control-plane keys
- Runtime command queue: `pink_runtime_commands`
- Latest capital update mirror: `pink_capital_update_latest`
## 8.4 Isolation validation rule
A PINK process must never read/write keys under `DOLPHIN_STATE_BLUE` or `DOLPHIN_PNL_BLUE` except explicitly allowed read-only analytics queries.
---
## 9. BingX VST Behavior Contract
## 9.1 Environment
- `DOLPHIN_BINGX_ENV=VST`
- `DOLPHIN_BINGX_ALLOW_MAINNET=0`
## 9.2 Expected data venue / exec venue
- Initial recommended mode:
- data venue: BINANCE (same sensing stream as BLUE)
- exec venue: BINGX VST
## 9.3 Leverage/sizing mode
- Use existing sizing-mode mechanisms.
- No strategy-logic change permitted.
---
## 10. Data Resource Budget and Controls
## 10.1 Baseline estimates (from measured data)
### CH + HZ for BLUE-like write path
- CH: ~4.17 MB/day avg, ~12.01 MB/day p95-day
- HZ: ~100.03 MB/day avg, ~301.53 MB/day p95-day
### BingX journal risk stream
- `account_events`: ~7.41 GB/day avg, ~18.57 GB/day p95-day if current high-rate snapshots remain.
## 10.2 Mandatory control for `account_events`
Implement at least one, preferably multiple:
1. Snapshot delta suppression beyond fingerprint-only (field-level sampling and minimum emission interval).
2. `ACCOUNT_REFRESH` write interval floor (e.g., min 2s, then tune).
3. Separate high-granularity debug table optional; production `account_events` should be rate-limited.
4. Configurable hard cap alert on rows/minute.
## 10.3 Acceptance thresholds
1. PINK `account_events` sustained rate must stay below agreed cap (set initial policy: <= 5 rows/sec average over 15 min unless debug mode explicitly enabled).
2. Alert if exceeds cap for > 3 consecutive windows.
---
## 11. Observability and ROI/Friction Outputs
## 11.1 Required KPI outputs
1. Realized ROI (closed trades).
2. Open-equity ROI (mark-to-market).
3. Cost-adjusted ROI.
4. Latency decomposition:
- decision->submit
- submit->ack
- ack->first_fill
- first_fill->done
5. Slippage decomposition (bps against decision/arrival references).
6. Fee/funding components.
## 11.2 Storage location
- PINK metrics rows in `dolphin_pink.trade_events` payload columns and/or dedicated execution quality table.
## 11.3 TUI policy
Current TUI is BLUE-hardcoded in places (`DOLPHIN_STATE_BLUE`, `dolphin.trade_events`, `blue_runtime_commands`).
- Anchors:
- `/mnt/dolphinng5_predict/Observability/dolphin_status.py:513`
- `/mnt/dolphinng5_predict/Observability/dolphin_status.py:558`
- `/mnt/dolphinng5_predict/Observability/dolphin_status.py:1164`
- `/mnt/dolphinng5_predict/Observability/dolphin_status.py:212`
Required:
1. Do not break BLUE TUI.
2. Add either:
- separate `dolphin_status_pink.py`, or
- namespace-parameterized TUI mode.
---
## 12. Podman + Quadlet + systemd Adoption Plan
## 12.1 Strategy
Apply only to PINK stack first.
## 12.2 Preflight checks (must pass before coding)
1. Podman availability on host (`podman --version`).
2. systemd user/service model chosen (rootless preferred unless operationally blocked).
3. Persistent volume paths and permissions validated.
4. ClickHouse config/users mounts parity with current docker-compose pattern.
Current host note: Podman not currently installed (`which podman` returned no result).
## 12.3 Unit boundaries
- BLUE stays under supervisord + current docker compose infra.
- PINK gets independent unit set.
- Do not dual-manage same runtime process with supervisord and systemd.
## 12.4 Quadlet file set for PINK
Create under dedicated path (example):
- `datastack-pink.pod`
- `hazelcast-pink.container` (or reuse cluster only if explicitly designed shared)
- `clickhouse-pink.container` (or shared CH with separate DB if accepted)
- `prefect-pink.container` (if needed)
- `pink-worker.container`
## 12.5 Shared vs dedicated infra policy
Decision required before implementation:
1. Option A (preferred first): shared HZ+CH infra, isolated logical namespaces.
2. Option B: dedicated PINK HZ/CH containers.
Given HZ volatility risk and operational complexity, start with Option A unless a strict physical isolation requirement is imposed.
---
## 13. Algorithm Identity Assurance (Critical)
## 13.1 Required parity harness
Implement deterministic parity checks between BLUE decision path and PINK decision path on identical input replay.
## 13.2 Comparison granularity
At each scan/bar compare tuple hash of:
- signal fired boolean
- selected asset
- side
- leverage intent
- entry/exit action
- reason code
- bars_held progression
No tolerance except for fields explicitly dependent on execution venue acknowledgements.
## 13.3 Fail criteria
Any divergence in pure strategy decisions is a release blocker.
---
## 14. Test Plan (Implementation Exit Criteria)
## 14.1 Unit tests
1. Routing tests for strategy->DB and strategy->HZ map.
2. Sink tests (`ch_put_pink` path).
3. Control key tests (`pink_runtime_commands`).
4. Account-event rate-limit logic tests.
## 14.2 Integration tests
1. Start PINK in VST and verify:
- CH writes only into `dolphin_pink.*`
- HZ writes only into `DOLPHIN_STATE_PINK` / `DOLPHIN_PNL_PINK`
2. Verify no new rows in `dolphin_prodgreen.account_events` during PINK-only test run.
3. Verify BLUE process and metrics unaffected.
## 14.3 Soak tests
1. 24h soak with PINK live in VST.
2. Monitor:
- row rates
- CH insert error rates
- HZ heartbeat age
- control-plane responsiveness
## 14.4 Regression tests
Run existing relevant suites for:
- bingx journaling/accounting
- actor routing
- launch paths
- MHS basic health checks for BLUE unaffectedness
---
## 15. Deployment Sequence (Phased)
## Phase 0: Namespace groundwork
1. Add sink and routing abstractions.
2. Add PINK CH schema migration artifacts.
3. Add PINK launcher and env contract.
Gate 0:
- Compile/tests pass.
- Static grep verifies no hardcoded fallback from `pink` to `prodgreen`.
## Phase 1: PINK logical bring-up (same infra)
1. Start PINK process under current management (or controlled runner) with VST.
2. Verify strict namespace isolation.
3. Run parity harness with replay feed.
Gate 1:
- No contamination.
- Parity pass.
## Phase 2: Data-volume control tuning
1. Tune account-event emission controls.
2. Verify row-rate caps and KPI completeness.
Gate 2:
- Resource budgets stable.
## Phase 3: Optional Podman+Quadlet packaging for PINK
1. Build PINK quadlet units.
2. Validate independent lifecycle.
3. Keep BLUE unchanged.
Gate 3:
- PINK can be fully operated without impacting BLUE.
---
## 16. Rollback Plan
## 16.1 Soft rollback
1. Stop PINK process/unit only.
2. Leave BLUE untouched.
3. Preserve PINK CH/HZ artifacts for postmortem.
## 16.2 Hard rollback
1. Revert routing patches that introduced PINK mapping.
2. Keep PINK DB as historical archive or drop only after approval.
## 16.3 Explicit no-rollback targets
Do not alter BLUE capital/state surfaces during PINK rollback.
---
## 17. Security and Safety
1. PINK VST keys isolated from BLUE credentials.
2. No mainnet enable unless separate approval gate flips `DOLPHIN_BINGX_ALLOW_MAINNET=1`.
3. Validate no accidental propagation of PINK credentials into shared logs.
---
## 18. Deliverables Checklist (Coding Agent Must Produce)
1. Code changes implementing explicit strategy/namespace routing for PINK.
2. `dolphin_pink` CH schema files in `prod/clickhouse/pink/`.
3. PINK launcher/config entrypoint.
4. PINK ops control script or generalized namespace-aware ctl tool.
5. Unit + integration tests for routing/isolation.
6. Parity harness and parity report artifact.
7. Data-rate monitor/report for `account_events` and major tables.
8. Optional: Quadlet unit files for PINK stack (if Phase 3 in scope).
---
## 19. Coding Prohibitions (Strict)
1. Do not alter algorithm constants or decision logic behavior.
2. Do not remove or repurpose BLUE maps/tables.
3. Do not bind PINK to names beginning with `prod` in this phase.
4. Do not change BLUE process manager/runtime flow as part of PINK implementation.
---
## 20. Open Decisions Requiring Explicit Operator Choice
1. PINK infra physical model:
- shared CH/HZ vs dedicated CH/HZ.
2. PINK manager in early phases:
- supervised process first vs direct Quadlet rollout.
3. Account-event rate cap values:
- initial thresholds and alert policy.
If decisions are not provided, default choices are:
- shared CH/HZ with strict logical isolation,
- supervised PINK process before Quadlet migration,
- account-events cap <= 5 rows/sec sustained (debug off).
---
## 21. Minimal Go/No-Go Matrix
Go only if all true:
1. Strategy parity = exact pass.
2. Namespace contamination tests = zero leaks.
3. Data-rate caps respected during soak.
4. BLUE observability and trade loop unchanged.
No-Go if any true:
1. `pink` rows appear in `dolphin_prodgreen` or `dolphin` unexpectedly.
2. BLUE map/table writes change baseline rates materially.
3. Decision parity drifts.
4. VST safety flags not enforced.
---
## 22. Final Operator Notes
- This spec intentionally separates **architecture modernization** from **algorithm behavior**.
- PINK is the safe proving ground for infra re-architecture.
- BLUE remains production reference and must not be structurally disturbed until PINK completes parity + soak + resource gates.

File diff suppressed because it is too large Load Diff