From 9128ab799ed9e7eb8e52c3c9b787acf2b20e2514 Mon Sep 17 00:00:00 2001 From: Codex Date: Mon, 1 Jun 2026 16:31:23 +0200 Subject: [PATCH] PINK: centralize all flaw findings in FLAW_ANALYSIS doc Rewrite PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md as the central registry with combined catalog (A+T+E+F+G = 116 flaws), severity distribution, and cross-references to the TRACE doc for deep E, F, G detail. Add reciprocal cross-reference in TRACE doc header. Co-authored-by: CommandCodeBot --- PINK_DITAv2_E2E_TRACE_ANALYSIS.md | 5 + PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md | 714 ++++++++++++++++++++++++ 2 files changed, 719 insertions(+) create mode 100644 PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md diff --git a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md index cbf6bc7..207cd9b 100644 --- a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md +++ b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md @@ -6,6 +6,11 @@ boundary crossing in the PINK execution pipeline. No test execution. **System scope:** 34 active source files, ~12,000 lines across Rust kernel, Python bridge, venue adapter, runtime, and persistence. +> **Central flaw registry:** [PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md](./PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md) +> contains the combined catalog of all 116 flaws (A, T, E, F, G series) with +> severity distribution and cross-references. This file provides the deep E2E +> trace context — read the central registry for the master list. + --- ## E2E Data Flow (One Call) diff --git a/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md b/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md new file mode 100644 index 0000000..1801aae --- /dev/null +++ b/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md @@ -0,0 +1,714 @@ +# PINK DITAv2 — Structural Flaw Analysis (CENTRAL) + +**Analysis date:** 2026-05-31 +**Scope:** Full PINK pipeline — all flaws across all modules. +**Sources:** +- This file (A-series): Detailed writeups for architectural flaws. +- [PINK_DITAv2_E2E_TRACE_ANALYSIS.md](./PINK_DITAv2_E2E_TRACE_ANALYSIS.md) (E, F, G-series): + Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans. + Every E, F, G entry below is a summary only — full detail is in the TRACE doc. + +--- + +## Combined Catalog (All Flaws, All Passes) + +| Pass | Focus | Count | Critical | High | Medium | Low | Info | +|------|-------|-------|----------|------|--------|-----|------| +| A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 | +| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 | +| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 | +| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 | +| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 | +| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** | + +--- + +## T-Series: Threading & Atomicity Flaws + +*Full detail in TRACE doc under "Threading & Atomicity" section.* + +| # | Flaw | Layer | Severity | +|---|------|-------|----------| +| T1 | `InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy | Zinc | **Critical** | +| T2 | Thread-unsafe kernel snapshot capture for account | Bridge | **High** | +| T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | **High** | +| T4 | Consequence: `on_venue_event` PnL settle races | Bridge | **High** | +| T5 | Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls | Zinc | Medium | +| T6 | `_write_region` buffer zero + notify race with concurrent reader | Zinc | Medium | +| T7 | Publication of events in `process_intent` loop not synchronized with persist | Bridge | Medium | +| T8 | `asyncio.run` executor skip in `_run` leads to event-loop stall | Venue | Low | +| T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low | + +--- + +## E-Series: E2E Data-Flow Flaws (Pass 1) + +*Full detail in TRACE doc under "Layer 1" through "Layer 9."* + +| # | Flaw | Layer | Severity | +|---|------|-------|----------| +| E1 | `step()` calls `pump_venue_events()` every cycle unconditionally | Runtime | **High** | +| E2 | `kernel.snapshot()["account"]` returns a fresh dict, not a live view | Bridge | Low | +| E3 | `_decision_to_kernel_intent` drops `order_type` and `limit_price` | Runtime | **High** | +| E4 | `_exit_intent_from_slot` trusts slot.size but slot may be stale | Runtime | **High** | +| E5 | JSON serialization round-trip loses numeric precision | Bridge | Low | +| E6 | `_RustKernelLib` is a global singleton — shared across all kernels | Bridge | Low | +| E7 | ENTER handler silently allows re-entry with same trade_id | Rust | **High** | +| E8 | EXIT handler uses `initial_size` not current size | Rust | **High** | +| E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low | +| E10 | `apply_fill` entry branch double-sets `active_entry_order` | Rust | Low | +| E11 | `_legacy_intent()` is a lossy conversion | Venue | Low | +| E12 | `_events_from_submit()` price fallback chain can lose venue price | Venue | Low | +| E13 | `_backend_snapshot()` timeout returns stale data | Venue | Medium | +| E14 | `_events_from_cancel` uses stale `slot_id` from order metadata | Venue | Low | +| E15 | Submit sets leverage via separate HTTP call | Adapter | Medium | +| E16 | `_format_quantity`/`_format_price` may use zero tick/step | Adapter | Medium | +| E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium | +| E18 | `on_venue_event` settles PnL incrementally — fees never included | Bridge | Medium | +| E19 | `observe_slots` called with ALL slots, not just changed ones | Bridge | Low | +| E20 | `_capital()` reads live from `AccountProjection` — stale row risk | Persistence | Low | +| E21 | `persist_fill_events()` synthesizes fake Decision/Intent | Persistence | Medium | +| E22 | `_write_trade_exit_leg` capital_before uses arithmetic reconstruction | Persistence | Medium | +| E23 | `_write_trade_event` uses entry_price as exit_price | Persistence | Medium | +| E24 | Mock venue always emits fill on `partial_fill_ratio > 0` | Test | Low | +| E25 | Test scenarios use MARKET-only `_si()` helper — no LIMIT tests | Test | Low | +| E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low | + +--- + +## F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3) + +*Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."* + +| # | Flaw | Layer | Severity | +|---|------|-------|----------| +| F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium | +| F2 | `_last_settled_pnl` reset before `venue.submit()` — transient window | Bridge | Medium | +| F3 | `_first_invalid_intent_field` allows `leverage=0` and `target_size=0` | Bridge | Low | +| F4 | `outcome.emitted_events` only from venue — Rust kernel events dropped | Bridge | Low | +| F5 | `on_venue_event` redundant FFI read of slot already returned by Rust | Bridge | Low | +| F6 | `process_intent` records pre-venue transitions with `event=None` | Bridge | Info | +| F7 | `reconcile_from_slots` writes ALL slots to projection/zinc | Bridge | Low | +| F8 | `HazelcastRowWriter.put()` synchronous, no error handling — crashes intent | Projection | **Medium** | +| F9 | `RealZincPlane.write_slot()` serializes ALL slots, not just changed one | Zinc | Low | +| F10 | `RealZincPlane` zeros buffer before write — concurrent read sees empty | Zinc | Low | +| F11 | `RealZincPlane._write_region` no partial-write recovery | Zinc | Low | +| F12 | `InMemoryZincPlane` intent_region grows without bound | Zinc | Low | +| F13 | `InMemoryZincPlane` uses non-re-entrant `threading.Condition` | Zinc | Low | +| F14 | `KernelSlotView.__setattr__` round-trips unknown fields — silently dropped | Bridge | Low | +| F15 | `on_venue_event` loop stops on first exception — slot left in partial state | Bridge | **High** | +| F16 | `venue.submit()` returning empty events leaves slot in ORDER_REQUESTED | Bridge | Medium | +| F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium | +| F18 | Leverage-set and order-submit failures share error handler | Adapter | Low | +| F19 | `_events_from_submit` stale snapshot fallback → wrong fill detection | Venue | Medium | +| F20 | `__del__` frees Rust handle at unpredictable GC time — no explicit close() | Bridge | **Medium** | +| F21 | `DITAv2LauncherBundle.close()` closes venue before kernel is done | Launcher | Low | +| F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | **Medium** | +| F23 | `VenueEvent.size` = `intent.target_size` not actual fill | Venue | Info | +| F24 | `asyncio.run()` inside async function in test generator | Test | Low | +| F25 | `_build_fresh_kernel_from_slot` leaks old kernel objects per call | Test | Low | +| F26 | `seen_event_ids` not cleared on re-entry — accumulates across trades | Rust | Low | +| F27 | `RealZincControlPlane.read()` parses Zinc region every call — no caching | Control | Low | +| F28 | `_legacy_intent` hardcodes confidence=1.0, bars_held=0 | Venue | Info | +| F29 | `_slot_to_payload` in real_zinc_plane.py is dead code | Zinc | Info | +| F30 | Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py | Zinc | Low | + +--- + +## G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4) + +*Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."* + +| # | Flaw | Layer | Severity | +|---|------|-------|----------| +| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | **Critical** | +| G2 | `into_c_string` unwrap() panics on NUL byte in FFI string | Rust | **Critical** | +| G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition | Rust | **Critical** | +| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** | +| G5 | `realized_pnl` unbounded f64 — overflows to inf at extreme values | Rust | **High** | +| G6 | `mark_price` produces unbounded unrealized_pnl — no result guard | Rust | **High** | +| G7 | ENTER no is_finite() guard on target_size | Rust | **High** | +| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** | +| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** | +| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** | +| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium | +| G12 | No max leverage cap enforced by kernel | Rust | Medium | +| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium | +| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium | +| G15 | Zero `__post_init__` validators on all 16 config dataclasses (127 fields) | Config | **High** | +| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info | +| G17 | String config fields — Zinc region injection risk | Config | Medium | +| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low | +| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low | +| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low | +| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** | +| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium | +| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium | +| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium | +| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low | +| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low | +| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** | +| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** | +| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** | +| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** | +| G31 | Hazelcast projection never closed | Lifecycle | Medium | +| G32 | _maybe_close() break skips second method | Lifecycle | Low | +| G33 | close() not idempotent for RealZinc components | Lifecycle | Low | +| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low | +| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info | +| G36 | Only one try/finally in entire codebase | Lifecycle | **High** | + +--- + +## A-Series: Architectural Flaws (detailed writeups) + +*These are the original architectural flaws with full analysis.* + +--- + +### Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size + +**Location:** `_rust_kernel/src/lib.rs` lines ~770-780 (EXIT handler in `process_intent`) + +**Severity:** **High** + +**Nature:** Logic error — wrong base for exit-size computation. + +### Downstream effect + +The EXIT handler computes the exit size as `base_size * exit_ratio` where: +```rust +let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size }; +``` + +After partial fills (e.g., two separate MARKET exit legs), `initial_size` is still the +**original** entry size while `slot.size` has been reduced by previous legs. If the +cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the +computed exit size can exceed the remaining position. + +The venue adapter clamps to actual position via `reduceOnly`, but the kernel's _own_ +accounting reduces `slot.size` by the fill size, not by the intended exit size. The +slot can therefore go negative (`slot.size < 0`) if the fill is larger than remaining. + +### Exact trigger + +1. Enter SHORT, size=1.0, `initial_size=1.0`, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0 +2. EXIT leg 0: `exit_size = 1.0 * 0.6 = 0.6`. Fill consumes 0.6. Slot size goes to 0.4. +3. EXIT leg 1: `exit_size = 1.0 * 0.6 = 0.6`. But remaining is 0.4. Requests 0.6. +4. BingX `reduceOnly` clamps fill to 0.4. Slot size goes to 0.0. +5. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is already at 0.0. + Kernel returns `NO_OPEN_POSITION` — the final EXIT is rejected because `slot.closed` + was not set by the previous fill (it was a partial close, not terminal). +6. Slot is at size=0.0, `!slot.closed`, no active orders, but `!slot.is_free()` because + `size <= 0.0` is true but `fsm_state != IDLE/CLOSED` — slot is **stuck** in + `POSITION_OPEN` with zero size. + +This is **not** purely a mis-sized ratio problem. With MARKET orders that fill fully, +even correct ratios can leave the slot stuck if the fill price differs from the +intended-size price and the venue adjusts fill quantity. + +### Fix strategy + +Use `slot.size` directly as the base (not `initial_size`): +```rust +let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size); +``` + +This guarantees the exit never requests more than the remaining position, regardless +of cumulative ratio math. The venue still clamps, but the kernel's intent is correct. + +--- + +### Flaw A2: Misleading CANCEL diagnostic code on entry-only slots + +**Location:** `_rust_kernel/src/lib.rs` lines ~798-810 (CANCEL rejection path) + +**Severity:** **Low** + +**Nature:** Diagnostic pollution — wrong error code. + +### Downstream effect + +When a CANCEL intent arrives and **neither** `active_exit_order` nor +`active_entry_order` is cancellable, the kernel returns: +```rust +diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER +``` + +But the reason may be that there's no active entry order either, or the FSM state +doesn't permit cancellation. The diagnostic name suggests an exit-order-specific +problem when the failure is generic "nothing to cancel." + +### Fix + +Change to a generic `NO_ACTIVE_ORDER` diagnostic or `SLOT_IDLE` when the slot is +already in IDLE. `NO_ACTIVE_EXIT_ORDER` is misleading for a slot that has never had +any order. + +--- + +### Flaw A3: Float-accumulated slot.size after partial fills can go negative + +**Location:** `_rust_kernel/src/lib.rs` lines ~1365-1370 (apply_fill exit path) + +**Severity:** **Low** + +**Nature:** Numerical precision edge case. + +### Code path + +```rust +slot.size = (slot.size - fill_size).max(0.0); +``` + +This clamps to zero, which is correct. But if the venue fills *more* than requested +(on BingX, this can happen with market orders where the fill walks the book), the +slot sees `fill_size > intended_size`. The `max(0.0)` prevents negative, but the +slot then reports `size=0.0` with `!closed` and an FSM state that's not IDLE. + +The `is_free()` check requires `size <= 0.0` AND `fsm_state in {IDLE, CLOSED}`. A +slot with `size=0.0` and `fsm_state=POSITION_OPEN` is stuck — no EXIT will be +accepted and no ENTER can start. + +### Trigger + +Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the +book, overshoots). `fill_size=0.8`, `slot.size = (0.6 - 0.8).max(0.0) = 0.0`. +Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), `closed=false`. + +### Fix + +When `slot.size <= 1e-12` after a fill and the slot is in an exit-related state, +force transition to CLOSED/IDLE regardless of leg index: +```rust +if slot.size <= 1e-12 { + slot.closed = true; + slot.fsm_state = TradeStage::CLOSED; + slot.active_exit_order = None; + slot.active_entry_order = None; + return; +} +``` + +--- + +### Flaw A4: Entry price is clobbered by mark_price if called before fill arrives + +**Location:** `_rust_kernel/src/lib.rs` lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch) + +**Severity:** **Medium** + +**Nature:** Accounting accuracy — incorrect PnL base. + +### Code path + +```rust +// In mark_price: +if self.entry_price <= 0.0 { + self.entry_price = price; // Seeds entry_price from mark before fill +} + +// In apply_fill (entry): +if event.price > 0.0 { + slot.entry_price = event.price; // Overwrites with actual fill price +} +``` + +The `mark_price` path seeds `entry_price` from a market price when the slot has no +fill yet. The `apply_fill` entry path correctly overwrites with the actual fill price. +So in the normal flow this is harmless — the fill overwrites the mark. + +**However**, consider this sequence: +1. ENTER intent accepted → slot goes `ORDER_REQUESTED`, `entry_price = 0.0` +2. `runtime.step()` calls `kernel.mark_price(snapshot.symbol, snapshot.price)` → sets `entry_price = 100.0` +3. `on_venue_event(ORDER_ACK)` → `ENTRY_WORKING`, `entry_price` still `100.0` +4. `on_venue_event(PARTIAL_FILL)` → `apply_fill` sets `entry_price = 99.5` (fill price) +5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5 + +This is a transient mis-valuation window. It corrects itself on the next `observe_slots` +call, but intra-step readers see wrong unrealized PnL. Not critical because: +- `account.snapshot.unrealized_pnl` uses the slot's `unrealized_pnl`, not the mark +- Realized PnL is computed from actual fill prices +- The window lasts at most one scan cycle (~5s) + +### Fix + +Don't set `entry_price` from `mark_price` when there's no fill: +```rust +fn mark_price(&mut self, price: f64) { + if !price.is_finite() || price <= 0.0 { return; } + // Don't seed entry_price — leave it at 0.0 until a fill arrives + if self.entry_price <= 0.0 || self.size <= 0.0 { + self.unrealized_pnl = 0.0; + return; + } + // ... normal PnL computation +} +``` + +--- + +### Flaw A5: Capital-before computation is arithmetic not snapshot-based + +**Location:** `pink_clickhouse.py` lines ~761-762 (`_write_trade_exit_leg`) and ~822-823 (`_write_trade_event`) + +**Severity:** **High** + +**Nature:** Accounting accuracy — wrong capital_before under multi-slot or intervening events. + +### Code pattern (appears in two places) + +```python +capital_after = self._capital() +capital_before = capital_after - pnl_leg # In _write_trade_exit_leg +capital_before = capital_after - pnl # In _write_trade_event +``` + +This reconstructs `capital_before` by subtracting the current leg's PnL from the +current capital. This is **only correct** if: +1. No other slots settled PnL between this leg and the previous one +2. No capital corrections (reconcile, manual override) happened between legs +3. No fees were deducted between legs + +With multi-slot (PINK configurable `max_slots > 1`), a concurrent trade on slot 1 +that closes between slot 0's exit legs will have its PnL baked into `capital_after`, +making `capital_before = capital_after - pnl_leg` wrong. + +### Fix + +Maintain a per-trade `capital_before_leg` snapshot taken at the moment of the +first fill event for each trade, advancing it by the realized PnL of each leg: +```python +self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg) +self._leg_state[trade_id]["capital_after"] = capital_after +``` + +And use `prev["capital_before"]` for the row, not `capital_after - pnl_leg`. + +--- + +### Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange + +**Location:** `pink_direct.py` lines ~597-630 (`recover_account`) and docstring of `reconcile_account` + +**Severity:** **Medium** + +**Nature:** Operational drift — capital is never verified against exchange truth in hot loop. + +### The gap + +`reconcile_account()` (line 632) has this docstring: +``` +Periodic exchange-led account sync. +Capital is re-seeded from the exchange balance as a guard against long-running drift +``` + +But the actual implementation: +```python +async def reconcile_account(self, ...) -> dict[str, Any]: + return await self.recover_account(...) + +async def recover_account(self, ...) -> dict[str, Any]: + capital = float(self.kernel.account.snapshot.capital or 25000.0) + _reconcile_position_slot(self.kernel, capital, slot_id=0) +``` + +It passes the **kernel's own capital** to `_reconcile_position_slot`, which then +overwrites `kernel.account.snapshot.capital` with... the same value. No exchange +balance poll ever overwrites capital. + +`connect()` at line 224 does the same — it passes `initial_capital` (an env default), +not the exchange balance. The exchange balance is never read for capital seeding +in the current code path. `_reconcile_position_slot` does call +`venue.open_positions()`, but it only reads positions, not capital. + +### Effect + +Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation, +or any other systematic error) accumulates monotonically. There is no mechanism to +detect or correct drift. Over weeks of live trading, the kernel's capital snapshot +can diverge arbitrarily from the exchange's actual balance. + +### Fix + +Either: +1. Make `_reconcile_position_slot` read the exchange balance and use it for + capital reseeding (the docstring claims it does this already), or +2. Add a separate capital-verification path that surfaces the delta between + kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct. + +--- + +### Flaw A7: No fee tracking in kernel accounting + +**Location:** `rust_backend.py` lines ~540-545 (on_venue_event settle), `bingx_direct.py` submit_intent return + +**Severity:** **Medium** + +**Nature:** Accounting accuracy — fees are invisible to capital tracking. + +### Downstream effect + +When a trade closes, the kernel computes: +```rust +realized_pnl = delta * notional +``` + +This is **gross** PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%). +These fees are never subtracted from the kernel's realized PnL. Over 100 trades with +$100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000 +trades at 10x leverage and $50k average notional, the error is $200k. + +The `BingxDirectExecutionAdapter` does return `ExecutionReceipt` with fill data, +but `bingx_venue._events_from_submit()` only reads `price` and `filled_size` — +commission/fee fields are ignored. + +### Fix + +1. Read fee/commission from the BingX ack payload in `_events_from_submit()` +2. Pass fees through `VenueEvent.metadata["fee"]` +3. In the Rust kernel's `apply_fill`, subtract the fee from realized PnL: + ```rust + let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0); + slot.realized_pnl += realized - fee; + ``` + +--- + +### Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input + +**Location:** `_rust_kernel/src/lib.rs` lines ~745-748 + +**Severity:** **Low** + +**Nature:** Silent fallback — corrupt input produces a trade, not a rejection. + +```rust +slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 { + intent.leverage +} else { + 1.0 +}; +``` + +A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of +rejecting the intent. The Python bridge does validate `_first_invalid_intent_field()` +which catches NaN/inf, but it doesn't catch `leverage <= 0.0` (it only checks +`not math.isfinite(value)`). + +### Fix + +Add `leverage <= 0.0` to the Python bridge's invalid-intent check. The Rust kernel +should still have the `1.0` fallback as a defensive measure, but the bridge should +prevent bad leverages from reaching Rust in the first place. + +--- + +### Flaw A9: Mock venue submit condition convoluted — dead code paths + +**Location:** `mock_venue.py` lines ~60-90 + +**Severity:** **Informational** + +**Nature:** Code clarity — confusing condition logic. + +```python +if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit: + events.append(ack_event) +if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0: + # ... fill events +``` + +The condition logic is confusing: +- When `emit_ack_before_fill=True` and `emit_fill_on_submit=True`: both branches run → ACK + fill +- When `emit_ack_before_fill=False` and `emit_fill_on_submit=True`: first branch runs because + `not True = False`, so `False or False = False` → no ACK. Second branch runs → fill only. + This produces a fill without an ACK, which is **not** a realistic venue scenario. +- When `partial_fill_ratio=1.0` (default): second branch runs and emits a `FULL_FILL` event + even when `emit_fill_on_submit=False`, because `0.0 or 1.0 > 0 = True`. + +The partial fill ratio check should be gated on `emit_fill_on_submit`: +```python +should_emit_fill = self.scenario.emit_fill_on_submit or ( + is_entry and self.scenario.entry_partial_fill_ratio > 0 +) or ( + not is_entry and self.scenario.exit_partial_fill_ratio > 0 +) +``` + +--- + +### Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows + +**Location:** `pink_direct.py` lines ~318-374 (`pump_venue_events`), called at line ~436 + +**Severity:** **Medium** + +**Nature:** Operational overhead — unnecessary exchange HTTP calls. + +### The problem + +`step()` calls `pump_venue_events()` **every cycle**, which calls `venue.reconcile()`. +For `BingxVenueAdapter`, `reconcile()` calls `_backend_snapshot()` which does up to 5 +HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow +where orders fill synchronously within `process_intent()`, there are **no** late fills +to drain. + +On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each +`pump_venue_events()` call consumes 5+ of that budget. At a 5-second policy cycle, +this is 60 requests/minute — 60% of the rate budget — just to poll for fills that +don't exist. + +### Fix + +Gate the pump on whether the previous cycle submitted a LIMIT order: +```python +self._has_resting_order = any( + o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED) + for o in kernel.open_orders() +) +if self._has_resting_order: + await self.pump_venue_events(snapshot, market_state=market_state) +``` + +Or add a config flag `async_fill_mode: bool = False`. + +--- + +### Flaw A11: VenueAdapter.submit() blocks the event loop + +**Location:** `bingx_venue.py` lines ~225-233 (`_run`) + +**Severity:** **Medium** + +**Nature:** Runtime safety — synchronous call in async context. + +```python +def _run(self, result: Any) -> Any: + if inspect.isawaitable(result): + try: + asyncio.get_running_loop() + except RuntimeError: + return asyncio.run(result) + pool = self._get_executor() + return pool.submit(asyncio.run, result).result() +``` + +When called from `step()` (which is an async function), `_run` submits the async +`submit_intent()` to a thread pool, runs it with `asyncio.run()`, then calls +`.result()` which blocks the current thread until complete. The BingX HTTP call +can take 1-5 seconds depending on network latency and exchange load. + +During this block, the event loop **cannot** process other async tasks (data feed +updates, health checks, signal processing). In a single-runtime deployment, this +stalls the entire policy cycle. + +### Fix + +Make `process_intent` in `ExecutionKernel` accept an async venue callback, or +make `BingxVenueAdapter` truly async (not sync-with-thread-bridge). For now, +at minimum the PINK runtime should run `step()` in an executor to avoid blocking +the main event loop. + +--- + +### Flaw A12: Stale KernelStateView slot references after reconcile + +**Location:** `rust_backend.py` lines ~350-365 (`KernelStateView.refresh`) + +**Severity:** **Low** + +**Nature:** Stale data — view not rebuilt on reconcile. + +```python +class KernelStateView: + def __init__(self, kernel): + self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)] + # ... + + def refresh(self) -> None: + snapshot = self._kernel._snapshot_backend() + self.active_trade_index = dict(snapshot.get("active_trade_index", {})) + self.venue_order_index = dict(snapshot.get("venue_order_index", {})) + self.client_order_index = dict(snapshot.get("client_order_index", {})) +``` + +`refresh()` updates the index maps but does **not** recreate `self.slots`. The slot +views in `self.slots` are live proxies (they read through `_get_slot` each time), +so slot data is current. But if `max_slots` changes (it shouldn't, but it's mutable) +or if slots are re-indexed by a reconcile, the view list is wrong. + +Not critical because `max_slots` is set at init and never changes, but worth +fixing for robustness. + +--- + +### Flaw A13: `persist_fill_events` uses current price as exit price + +**Location:** `pink_clickhouse.py` lines ~408 + +**Severity:** **Low** + +**Nature:** Historical accuracy — logged price may not match fill price. + +```python +price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list + if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot) +``` + +This correctly reads from the event's price. But `decision.reference_price` at line +417 falls back to this price, which is the fill price. The trade_event row at line +835 uses `exit_price = slot_dict.get("entry_price", ...)` — which is the **entry** +price, not the exit price. The trade_event always shows exit_price == entry_price. + +This means `trade_events` in ClickHouse will never show a realistic exit price +for the persisted trade, breaking any PnL reconstruction that relies on +`(exit_price - entry_price) * size * leverage`. + +--- + +### Flaw A14: `_write_position_state` maps active_leg_index to entry_bar + +**Location:** `pink_clickhouse.py` line ~673 + +**Severity:** **Low** + +**Nature:** Semantic mismatch — wrong field mapping. + +```python +"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0), +``` + +`active_leg_index` is the index into the exit-leg-ratios array (which leg is being +exited next). It has nothing to do with how many bars the position has been held. +When a position opens, `active_leg_index` is 0. After the first exit leg, it +advances to 1. Neither value is a bar count. + +`entry_bar` should be `bars_held` from the intent/decision, or a computed value +from `entry_time` to now. + +--- + +### Flaw A15: `persist_recovery_state` passes account dict as slot dict + +**Location:** `pink_clickhouse.py` lines ~447-460 + +**Severity:** **Low** + +**Nature:** Wrong data — account snapshot used where slot data is expected. + +```python +def persist_recovery_state(self, *, snapshot, acc_dict, ...): + slot_dict = acc_dict or {} # ← acc_dict is an account snapshot, not a slot + self._write_position_state(..., slot_dict={}, ...) # ← correctly uses empty dict + self._write_trade_reconstruction( + snapshot, + trade_id=acc_dict.get("trade_id", "") if acc_dict else "", + # acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key + ) +``` + +The `trade_id` in the trade_reconstruction row will always be `""` because +`acc_dict` comes from `kernel.snapshot()["account"]` which has keys `capital`, +`equity`, `realized_pnl`, etc. — not `trade_id`. This means the recovery +`trade_reconstruction` row has no trade_id linkage.