Files
siloqy/prod/docs/DITA_V2_KERNEL_REFERENCE.md
Codex d4b73b236a PINK DITAv2 Sprint 2-3: accounting parity + multi-leg groundwork
Sprint 2 (accounting + observability parity, PINK scope):
- Verified pink_clickhouse.py writes the 8 BLUE-legacy row families at
  matching schema and that capital authority in pink_direct.step() is
  solely kernel.account (no balance-poll overwrite in the hot loop).
- Report: prod/clean_arch/dita_v2/SPRINT2_ACCOUNTING_PARITY.md.

Sprint 3 offline groundwork (no exchange contact):
- Add _write_trade_exit_leg to pink_clickhouse.py: one BLUE-schema-faithful
  trade_exit_legs row per exit leg, with isolated (non-cumulative) per-leg
  deltas tracked via _leg_state (reset on ENTER). Closes the docstring gap.
- New offline suite test_pink_multi_exit_groundwork.py (3 passed):
  * Flaw 4 — two-leg exit closes once, realized accrues per leg, closed
    slot rejects further EXIT (no double-close).
  * Overshoot invariant — a final EXIT requesting more than the remaining
    size CLAMPS (size to 0, no oversell), retiring the Sprint 0 cumulative-
    ratio risk empirically.
  * trade_exit_legs delta + full BLUE column-set assertions.
- Persistence regression after edits: 10 passed.

BLUE untouched: no changes to dolphin.* / DOLPHIN_*_BLUE / nautilus_event_trader.py.
Live VST multi-leg run remains deferred pending explicit authorization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 19:21:45 +02:00

22 KiB

DITAv2 Kernel Reference

Status: active Scope: DITAv2 execution kernel, operator launcher, shared-memory control plane, venue adapters, and observability integration. Primary runtime path: dolphin:dita_v2

This document is the canonical reference for the DITAv2 stack under prod/clean_arch/dita_v2/.

It describes:

  • the execution kernel contract
  • the kernel state model and FSM
  • Zinc / Hazelcast boundaries
  • mock and BingX venue adapters
  • launcher and operator control surfaces
  • debug and replay semantics
  • failure and recovery behavior
  • test strategy and invariants

The DITAv2 stack is intentionally separate from the legacy prod.clean_arch.dita surface. It can be exercised in isolation, with safe defaults for tests and explicit opt-in for real shared-memory and live venue wiring.

Recent hardening additions:

  • direct slot writes now mirror into the Zinc state region immediately
  • the regression surface includes a 50-case hardening suite for diagnostics, duplicate replay, stale-state handling, and Zinc mirroring

1. What DITAv2 Is

DITAv2 is a multi-slot execution kernel for trade lifecycle management. It sits between the alpha layer and the exchange layer.

Its responsibilities are limited to:

  1. receiving intents
  2. mutating slot state
  3. normalizing venue events
  4. projecting account state
  5. emitting deterministic transition and diagnostic records
  6. mirroring confirmed state to durable surfaces

It is not responsible for alpha generation. It does not compute signals. It does not decide entry/exit thesis. Those inputs come from BLUE/PINK or another upstream strategy layer.

Design intent

DITAv2 is built to make execution state:

  • explicit
  • replayable
  • debuggable
  • observable
  • testable at the FSM edge

The goal is to eliminate shadow-state drift between local memory, exchange truth, and durable observability surfaces.


2. Canonical Components

Kernel

File:

  • prod/clean_arch/dita_v2/rust_backend.py
  • prod/clean_arch/dita_v2/_rust_kernel/

The Python-facing ExecutionKernel is backed by a Rust implementation loaded through ctypes. The Python wrapper keeps the public API stable and writes through to the Rust backend on slot mutations and event processing.

Control plane

Files:

  • prod/clean_arch/dita_v2/control.py
  • prod/clean_arch/dita_v2/real_control_plane.py

The control plane holds runtime mode, verbosity, backend selection, slot limits, and debug flags. It supports:

  • NORMAL / DEBUG
  • QUIET / VERBOSE / TRACE
  • MOCK / BINGX
  • mirror-to-Hazelcast toggles
  • restart reconciliation toggles

Zinc plane

Files:

  • prod/clean_arch/dita_v2/zinc_plane.py
  • prod/clean_arch/dita_v2/real_zinc_plane.py

The Zinc plane is the hot-path shared-memory substrate for:

  • intents
  • slot snapshots
  • control snapshots

It follows Zinc's one-shot signal pattern wherever possible:

  • writers publish the latest data and then notify
  • readers wait for a sequence change from the last value they observed
  • state-based sync is preferred over event-count sync
  • the in-memory stand-ins emulate the same notify/wait contract for tests

The in-memory plane is used by default for tests. The real Zinc plane is opt-in and uses the zinc Python adapter over shared memory.

Direct slot mutation is intentionally write-through: the Rust-backed kernel and the Zinc mirror must stay aligned on every _set_slot(), venue event, and reconcile path. The tests assert that a direct slot write is visible in the state region without waiting for a separate flush cycle. The same update path also notifies waiters so cross-process readers can wake on the latest state change instead of polling.

Projection

Files:

  • prod/clean_arch/dita_v2/projection.py
  • prod/clean_arch/dita_v2/hazelcast_projection.py

The projection layer writes BLUE/PINK-compatible state rows to Hazelcast and emits lifecycle rows suitable for ClickHouse observability.

Venue adapters

Files:

  • prod/clean_arch/dita_v2/mock_venue.py
  • prod/clean_arch/dita_v2/bingx_venue.py

The mock adapter is deterministic and BingX-shaped. The BingX adapter is a thin normalization layer over the direct BingX execution client surface.

Launcher and operator controls

Files:

  • prod/clean_arch/dita_v2/launcher.py
  • prod/launch_dita_v2.py
  • prod/ops/dita_v2_ctl.py
  • prod/supervisor/supervisorctl.sh
  • prod/ops/dita_v2_live_bingx_smoke.py

The launcher assembles a full runtime bundle. The operator scripts provide status, healthcheck, start, stop, and restart paths. The smoke wrapper provides a repeatable BingX testnet command that runs the full live E2E suite with the correct live-smoke environment gates and supervisor precheck.

Repeatable live smoke command:

python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --symbol TRXUSDT

Use --dry-run to print the exact env and pytest command without sending orders.


3. Runtime Topology

Default test topology

ExecutionKernel
  ├─ InMemoryControlPlane
  ├─ InMemoryZincPlane
  ├─ MockVenueAdapter
  └─ HazelcastProjection(writer=callback)

Real operator topology

ExecutionKernel
  ├─ RealZincControlPlane or mirrored in-memory control plane
  ├─ RealZincPlane
  ├─ BingxVenueAdapter
  └─ HazelcastProjection(client-backed writer)

Supervisord-managed service

Program:

dolphin:dita_v2

Launcher:

/mnt/dolphinng5_predict/prod/launch_dita_v2.py

Default supervised posture:

  • DITA_V2_LAUNCHER_MODE=serve
  • DITA_V2_VENUE=BINGX
  • DITA_V2_ZINC=REAL
  • DITA_V2_CONTROL_PLANE=REAL_ZINC
  • DITA_V2_HAZELCAST=REAL
  • DITA_V2_MODE=DEBUG
  • DITA_V2_VERBOSITY=TRACE

The supervised path is intentionally separate from the legacy PINK and BLUE entrypoints.


4. Data Contracts

Core contract files

  • prod/clean_arch/dita_v2/contracts.py
  • prod/clean_arch/dita_v2/venue.py

Important types

  • TradeStage
  • TradeSlot
  • VenueOrder
  • VenueEvent
  • KernelIntent
  • KernelTransition
  • KernelOutcome
  • KernelDiagnosticCode
  • KernelCommandType
  • KernelEventKind
  • KernelMode
  • KernelVerbosity
  • BackendMode

Slot model

Each slot is the unit of execution. It carries:

  • trade identity
  • asset
  • side
  • entry price
  • current size
  • leverage
  • open/close state
  • active entry/exit order handles
  • leg progression
  • idempotency tracking via seen event IDs

The slot is the primary kernel state object. The kernel maintains multiple slots but one slot can be actively traded while the others remain idle or recoverable.

Order model

VenueOrder captures the venue-specific identity of an order:

  • internal trade ID
  • venue order ID
  • venue client ID
  • side
  • intended size
  • filled size
  • average fill price
  • status
  • metadata

Event model

VenueEvent captures the normalized venue response surface:

  • ack
  • partial fill
  • full fill
  • cancel ack
  • cancel reject
  • reject

The kernel consumes normalized events, not raw exchange payloads.


5. State Machine

Core states

  • IDLE
  • ENTRY_WORKING
  • POSITION_OPEN
  • EXIT_WORKING
  • CLOSED
  • STALE_STATE_RECONCILING

Basic transitions

IDLE
  └─ ENTER intent ─> ENTRY_WORKING
ENTRY_WORKING
  ├─ PARTIAL_FILL ─> ENTRY_WORKING
  ├─ FULL_FILL ─> POSITION_OPEN
  └─ ORDER_REJECT ─> IDLE
POSITION_OPEN
  ├─ EXIT intent ─> EXIT_WORKING
  └─ MARK_PRICE ─> POSITION_OPEN
EXIT_WORKING
  ├─ PARTIAL_FILL ─> EXIT_WORKING
  ├─ FULL_FILL ─> IDLE or POSITION_OPEN (multi-leg)
  ├─ CANCEL_ACK ─> POSITION_OPEN
  └─ CANCEL_REJECT ─> EXIT_WORKING

Idempotency

Duplicate venue events are tracked via event IDs in the slot image. Repeated events are treated as no-ops, not as extra fills or duplicate state changes.

Recovery state

STALE_STATE_RECONCILING blocks normal event progression until reconciliation completes. This state exists to make restart, replay, and venue divergence explicit.

Rate limit handling

BingX rate limiting is treated as a first-class retryable condition, not a generic failure. The kernel surfaces it with:

  • KernelDiagnosticCode.RATE_LIMITED
  • KernelSeverity.WARNING
  • details["release_eta"] = "few minutes" when the exchange provides no precise retry window
  • details["retry_after_ms"] when the adapter or venue response includes a retry hint
  • details["retryable"] = true

This is intentionally downstream-friendly: operators and orchestration layers can distinguish transient throttling from hard rejections and choose a retry policy explicitly.


6. Control Plane Semantics

The control plane is used to steer runtime behavior without changing kernel logic.

Modes

  • NORMAL for production-like execution
  • DEBUG for full state and transition tracing

Verbosity

  • QUIET
  • VERBOSE
  • TRACE

Backend mode

  • MOCK
  • BINGX

Key toggles

  • debug_clickhouse_enabled
  • trace_transitions
  • mirror_to_hazelcast
  • active_slot_limit
  • reconcile_on_restart

Shared-memory selection

The launcher uses env-driven selection:

  • DITA_V2_CONTROL_PLANE=REAL_ZINC
  • DITA_V2_ZINC=REAL
  • DITA_V2_HAZELCAST=REAL
  • DITA_V2_VENUE=BINGX

Defaults remain safe and testable. Real shared-memory and live venue wiring are opt-in.


7. Zinc Boundary

Why Zinc is used

Zinc provides the shared-memory substrate for:

  • low-latency control-plane reads
  • intent publication
  • slot state snapshots
  • zero-copy observation across processes

Hot-path intent region

Written by the alpha/launcher side, read by the kernel.

Hot-path state region

Written by the kernel, read by the alpha side or operator tooling.

Control region

Used for runtime mode switches and operator commands.

Invariants

  1. Shared-memory state must not silently diverge from kernel state.
  2. Writes should be explicit and versioned.
  3. The kernel must not rely on duplicated Python shadow state as authority.

8. Hazelcast / ClickHouse Boundary

Hazelcast

Hazelcast is the durable projection mirror for:

  • confirmed slot state
  • control snapshot mirroring
  • active slot registry
  • trade event topic emission

ClickHouse

ClickHouse is the observability and debug journal sink. In debug mode, the kernel should emit enough rows to reconstruct a transition timeline.

Compatibility rule

All emitted rows must remain compatible with the BLUE/PINK schema family. The DITAv2 layer does not invent a new observability universe unless the schema is explicitly versioned.


9. Venue Adapters

Mock venue

File:

  • prod/clean_arch/dita_v2/mock_venue.py

Behavior:

  • deterministic
  • BingX-shaped semantics
  • configurable reject / partial fill / cancel reject scenarios
  • useful for FSM and race testing

BingX venue

File:

  • prod/clean_arch/dita_v2/bingx_venue.py

Behavior:

  • thin normalization layer
  • converts BingX order/account payloads into DITAv2 events/orders
  • no reimplementation of exchange logic
  • live adapter backed by the direct BingX client path

Adapter rule

If a mock cannot faithfully mirror BingX behavior in an in-scope path, the adapter layer must map actual BingX responses into DITAv2 contracts instead of inventing a separate semantic model.


10. Launcher and Operator Flow

Launcher responsibilities

  • assemble control plane
  • assemble Zinc plane
  • assemble projection sink
  • select venue adapter
  • create the kernel

Operator controls

Supported command surfaces:

  • prod/ops/dita_v2_ctl.py
  • prod/supervisor/supervisorctl.sh dita_v2 ...
  • direct supervisorctl against dolphin:dita_v2

Script modes

prod/launch_dita_v2.py supports:

  • once
  • serve

serve is the supervised long-running mode. once is for snapshot/debug use.


11. Observability and Debugging

Debug mode

When debug mode is enabled, the kernel should log:

  • state image changes
  • transition triggers
  • venue requests and responses
  • local lock / unlock points
  • reconciliation events
  • diagnostics and anomaly codes

Error surface

The kernel must emit deterministic diagnostic codes for:

  • invalid slot ID
  • busy slot
  • no active exit order
  • invalid transition
  • stale-state reconcile
  • duplicate event / replay no-op
  • venue rejection

The point is to make failures explainable and machine-queryable.


12. Testing Strategy

The DITAv2 suite is intentionally wide. It includes:

  • kernel-only FSM tests
  • extensive state-machine tests
  • race / off-by-one / memory anomaly tests
  • Zinc interaction tests
  • Hazelcast projection tests
  • BingX adapter tests
  • full-stack E2E / functional tests through the kernel
  • BLUE/PINK-style signal gamut coverage, including entry, exit, partial exit, TP, hung orders, cancel-reject, and non-close cases
  • launcher and operator path tests
  • supervisor config / documentation tests
  • a dedicated kernel hardening suite with 50 collected cases
  • mocked exchange-first and BingX-basic E2E paths
  • chaos / fuzz coverage over both mock and BingX paths

Testing order

  1. kernel-only unit tests
  2. Zinc interaction tests
  3. projection tests
  4. BingX adapter tests
  5. launcher and operator wiring tests
  6. full suite rerun
  7. full-stack E2E / functional coverage through the kernel
  8. chaos / fuzz coverage across mock and BingX

Current validated result

The DITAv2 suite is currently green with a broad test surface covering the kernel, launcher, operator wrappers, Zinc, venue adapters, and the full-stack E2E/chaos matrix through the kernel.


13. Files of Interest

Core runtime

  • prod/clean_arch/dita_v2/rust_backend.py
  • prod/clean_arch/dita_v2/launcher.py
  • prod/clean_arch/dita_v2/control.py
  • prod/clean_arch/dita_v2/projection.py
  • prod/clean_arch/dita_v2/mock_venue.py
  • prod/clean_arch/dita_v2/bingx_venue.py
  • prod/clean_arch/dita_v2/real_control_plane.py
  • prod/clean_arch/dita_v2/real_zinc_plane.py
  • prod/launch_dita_v2.py
  • prod/ops/dita_v2_ctl.py
  • prod/supervisor/supervisorctl.sh
  • prod/supervisor/dolphin-supervisord.conf

Tests

  • prod/tests/test_dita_v2_kernel.py
  • prod/tests/test_dita_v2_zinc.py
  • prod/tests/test_dita_v2_hazelcast.py
  • prod/tests/test_dita_v2_bingx_adapter.py
  • prod/tests/test_dita_v2_launcher.py
  • prod/tests/test_launch_dita_v2.py
  • prod/tests/test_dita_v2_ops.py

Operator docs

  • prod/docs/DITA_V2_OPERATOR_PLAYBOOK.md
  • prod/docs/OPERATIONAL_STATUS.md

14. Canonical References

This DITAv2 reference is the canonical entry for the new execution kernel.

Supporting references:

  • prod/docs/DITA_V2_OPERATOR_PLAYBOOK.md
  • prod/docs/OPERATIONAL_STATUS.md
  • prod/AGENT_READ_Supervisor_migration.md

15. PINK Integration (2026-05-27)

PINK now executes trades through the DITAv2 kernel exclusively.

How it works

The PINK launcher (launch_dolphin_pink.py) calls build_launcher_bundle() to construct a DITAv2 bundle (kernel + BingXVenueAdapter + control plane + Zinc plane + Hazelcast projection). The PinkDirectRuntime bridges policy (DecisionEngine/IntentEngine) to execution through a _decision_to_kernel_intent() translation seam that maps Decision/IntentKernelIntent.

Capital simplification

The kernel's AccountProjection is the single local capital authority:

  1. Exchange balance seeds kernel.account.snapshot.capital once at startup/recovery.
  2. kernel.account.settle(slot.realized_pnl) is called in on_venue_event() when a fill transitions a slot to CLOSED — the only capital mutation post-startup.
  3. observe_slots() handles mark-to-market (unrealized PnL) — no capital writes.
  4. PinkClickHousePersistence reads capital/peak/trade_seq from the kernel snapshot.

No balance-poll overwrites during the hot loop.

Files added/changed

  • prod/launch_dolphin_pink.py — uses build_launcher_bundle()
  • prod/clean_arch/runtime/pink_direct.pyExecutionKernel-backed runtime
  • prod/clean_arch/persistence/pink_clickhouse.py — reads from kernel account
  • prod/ops/pink_ctl.py — added ditav2-status subcommand
  • prod/tests/test_pink_ditav2_kernel_bridge.py — mapping tests (7)
  • prod/tests/test_pink_ditav2_rate_limit_contract.py (1)
  • prod/tests/test_pink_ditav2_restart_reconcile.py (3)
  • prod/tests/test_pink_ditav2_accounting_invariants.py (2)

Live smoke

python /mnt/dolphinng5_predict/prod/ops/dita_v2_live_bingx_smoke.py --pink --symbol TRXUSDT

PENDING — Live exchange chaos/fuzz

Status: Not implemented. Requires a dedicated orchestration layer.

The mock-venue and BingX-basic chaos/fuzz matrix in test_dita_v2_e2e_functional.py provides deterministic fuzzing over mock and BingX adapter paths (24 cases, all green). True live-testnet chaos/fuzz against a real order book — non-deterministic event ordering, partial fills at unpredictable prices, race conditions between submissions and exchange responses — requires:

  • A live-chaos orchestrator that submits adversarial intents (rapid entries/exits, competing cancels, size-at-lot-boundary, cross-book) against a live BingX testnet symbol.
  • An event-sequencer that captures raw exchange callback order and replays it against the kernel to verify deterministic convergence.
  • A state-invariant checker that asserts slot/account state converges to the same terminal state regardless of callback ordering.

This is deferred. The current live smoke tests (test_pink_bingx_dita_live_e2e.py, test_dita_v2_live_bingx_testnet_e2e.py) cover happy-path E2E cycles only.

BLUE Non-Impact Proof Checklist

# Assertion Method Status
1 Zero PINK rows in dolphin (BLUE) ClickHouse tables pink_ctl.py mode-verify (CH query by strategy='pink') VERIFIED
2 Zero PINK rows in dolphin_prodgreen ClickHouse tables pink_ctl.py mode-verify (CH query by strategy='pink' on prodgreen DB) VERIFIED
3 No PINK keys written to BLUE Hazelcast maps (DOLPHIN_STATE_BLUE, DOLPHIN_PNL_BLUE) Hazelcast key scan VERIFIED
4 No PINK keys written to PRODGREEN Hazelcast maps Hazelcast key scan VERIFIED
5 PINK trade_events baseline unchanged (106 rows) CH count query VERIFIED
6 Stopping/restarting PINK does not affect BLUE supervisor programs supervisorctl status before/after VERIFIED
7 No BLUE files modified in refactor git diff --name-only (only PINK/DITAv2 paths) VERIFIED
8 BLUE runtime env vars unchanged (DOLPHIN_STATE_BLUE, dolphin DB) env comparison VERIFIED

Cutover gate: all 8 assertions must pass before PINK goes live. Rollback trigger: any violation of assertions 1-4 triggers immediate rollback per §6.2 of the refactor guide.

15.1 Sync↔Async Seam Analysis (2026-05-27)

7 distinct boundaries identified and tested:

# Seam Bridging Mechanism Test Coverage
1 BingxVenueAdapter._run() → async backend 3 modes: passthrough, asyncio.run() (no-loop), ThreadPoolExecutor (in-loop) test_pink_sync_async_seams.py (36 tests)
2 BingxVenueAdapter.connect()BingxDirectExecutionAdapter.connect() _run() bridges sync→async 3 tests
3 kernel.process_intent() (sync) → venue.submit() (sync) → _run() → async HTTP Thread pool per-call 4 race-condition tests
4 PinkDirectRuntime.step() (async) → kernel.process_intent() (sync) Direct sync call inside coroutine 1 nested loop test
5 launcher._maybe_close() (sync) → async close/disconnect asyncio.run() with RuntimeError catch 4 tests
6 _backend_snapshot() thread safety No lock — _last_snapshot is a plain attribute 2 concurrent access tests
7 HTTP client timeout propagation httpx.AsyncClient timeout config 2 timeout tests

Key findings:

  • _run() ThreadPoolExecutor creates a new pool per call. At high frequency this could leak threads. Mitigation: chaos harness 10-thread concurrent test verified no leaks under load.
  • _maybe_close() swallows RuntimeError from asyncio.run() inside a running loop. This is correct behavior — the close call is best-effort.
  • pink_direct.py connect() now handles both sync and async venue connect methods via inspect.isawaitable().

Chaos harness: test_pink_ditav2_chaos_harness.py (22 tests) covers:

  • Rapid entry→exit, two-leg partial, competing cancel, cancel-after-fill, mark-price, reconcile, size-at-boundary, 10x entry-exit loop
  • Edge cases: zero-size entry, negative price entry
  • Deterministic replay (ordered and shuffled) — verifies kernel doesn't crash under any event ordering
  • State invariants: no stuck slots, no negative capital, no illegal FSM transitions, no critical diagnostics

15.2 TODO — Live testnet chaos E2E

Status: Not implemented. Requires dedicated work.

The chaos harness (test_pink_ditav2_chaos_harness.py) runs all adversarial scenarios (rapid entry-exit, competing cancel, size-at-boundary, 10x loops) against the MockVenueAdapter only. To reach prod confidence, these same scenarios must be run against a live BingX VST symbol with:

  1. Exchange-side verification — orders/positions/account queried directly from the exchange after each chaos step, not just from kernel state.
  2. Quantity-compliance monitoring — BingX may truncate or round lot sizes differently than the adapter expects; the test must assert the exchange accepted the intended size.
  3. Fill-price tracking — partial fills at unpredictable prices under rapid entry-exit must be captured and reconciled against the kernel's accounting.
  4. Rate-limit cascade testing — the parallel HTTP gather in _refresh_exchange_state must be verified under sustained rate-limit pressure.

Design sketch:

  • Extend ChaosOrchestrator.run_chaos_scenario() to accept a BingxVenueAdapter (live) in addition to MockVenueAdapter.
  • Add a LiveStateVerifier that hits the BingX REST API after each step and asserts kernel state ≈ exchange state within rounding tolerance.
  • Gate the live chaos tests with the same BINGX_SMOKE_LIVE=1 env convention.
  • Run the chaos scenarios that are safe for testnet (no cross-book, no size-at-boundary that would cause a reject chain).

This is deferred because the current live E2E tests cover happy-path cycles only, and the mock-venue chaos harness validates kernel invariants. Bridging the two for live chaos is a separate engineering effort.