PINK: centralize all flaw findings in FLAW_ANALYSIS doc

Rewrite PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md as the central registry with combined catalog (A+T+E+F+G = 116 flaws), severity distribution, and cross-references to the TRACE doc for deep E, F, G detail. Add reciprocal cross-reference in TRACE doc header. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 16:31:23 +02:00
parent d9dd54c24e
commit 9128ab799e
2 changed files with 719 additions and 0 deletions
--- a/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
+++ b/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
@@ -0,0 +1,714 @@
+# PINK DITAv2 — Structural Flaw Analysis (CENTRAL)
+
+**Analysis date:** 2026-05-31
+**Scope:** Full PINK pipeline — all flaws across all modules.
+**Sources:**
+- This file (A-series): Detailed writeups for architectural flaws.
+- [PINK_DITAv2_E2E_TRACE_ANALYSIS.md](./PINK_DITAv2_E2E_TRACE_ANALYSIS.md) (E, F, G-series):
+  Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans.
+  Every E, F, G entry below is a summary only — full detail is in the TRACE doc.
+
+---
+
+## Combined Catalog (All Flaws, All Passes)
+
+| Pass | Focus | Count | Critical | High | Medium | Low | Info |
+|------|-------|-------|----------|------|--------|-----|------|
+| A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 |
+| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
+| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
+| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
+| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
+| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |
+
+---
+
+## T-Series: Threading & Atomicity Flaws
+
+*Full detail in TRACE doc under "Threading & Atomicity" section.*
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| T1 | `InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy | Zinc | **Critical** |
+| T2 | Thread-unsafe kernel snapshot capture for account | Bridge | **High** |
+| T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | **High** |
+| T4 | Consequence: `on_venue_event` PnL settle races | Bridge | **High** |
+| T5 | Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls | Zinc | Medium |
+| T6 | `_write_region` buffer zero + notify race with concurrent reader | Zinc | Medium |
+| T7 | Publication of events in `process_intent` loop not synchronized with persist | Bridge | Medium |
+| T8 | `asyncio.run` executor skip in `_run` leads to event-loop stall | Venue | Low |
+| T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low |
+
+---
+
+## E-Series: E2E Data-Flow Flaws (Pass 1)
+
+*Full detail in TRACE doc under "Layer 1" through "Layer 9."*
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| E1 | `step()` calls `pump_venue_events()` every cycle unconditionally | Runtime | **High** |
+| E2 | `kernel.snapshot()["account"]` returns a fresh dict, not a live view | Bridge | Low |
+| E3 | `_decision_to_kernel_intent` drops `order_type` and `limit_price` | Runtime | **High** |
+| E4 | `_exit_intent_from_slot` trusts slot.size but slot may be stale | Runtime | **High** |
+| E5 | JSON serialization round-trip loses numeric precision | Bridge | Low |
+| E6 | `_RustKernelLib` is a global singleton — shared across all kernels | Bridge | Low |
+| E7 | ENTER handler silently allows re-entry with same trade_id | Rust | **High** |
+| E8 | EXIT handler uses `initial_size` not current size | Rust | **High** |
+| E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low |
+| E10 | `apply_fill` entry branch double-sets `active_entry_order` | Rust | Low |
+| E11 | `_legacy_intent()` is a lossy conversion | Venue | Low |
+| E12 | `_events_from_submit()` price fallback chain can lose venue price | Venue | Low |
+| E13 | `_backend_snapshot()` timeout returns stale data | Venue | Medium |
+| E14 | `_events_from_cancel` uses stale `slot_id` from order metadata | Venue | Low |
+| E15 | Submit sets leverage via separate HTTP call | Adapter | Medium |
+| E16 | `_format_quantity`/`_format_price` may use zero tick/step | Adapter | Medium |
+| E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium |
+| E18 | `on_venue_event` settles PnL incrementally — fees never included | Bridge | Medium |
+| E19 | `observe_slots` called with ALL slots, not just changed ones | Bridge | Low |
+| E20 | `_capital()` reads live from `AccountProjection` — stale row risk | Persistence | Low |
+| E21 | `persist_fill_events()` synthesizes fake Decision/Intent | Persistence | Medium |
+| E22 | `_write_trade_exit_leg` capital_before uses arithmetic reconstruction | Persistence | Medium |
+| E23 | `_write_trade_event` uses entry_price as exit_price | Persistence | Medium |
+| E24 | Mock venue always emits fill on `partial_fill_ratio > 0` | Test | Low |
+| E25 | Test scenarios use MARKET-only `_si()` helper — no LIMIT tests | Test | Low |
+| E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low |
+
+---
+
+## F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)
+
+*Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."*
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium |
+| F2 | `_last_settled_pnl` reset before `venue.submit()` — transient window | Bridge | Medium |
+| F3 | `_first_invalid_intent_field` allows `leverage=0` and `target_size=0` | Bridge | Low |
+| F4 | `outcome.emitted_events` only from venue — Rust kernel events dropped | Bridge | Low |
+| F5 | `on_venue_event` redundant FFI read of slot already returned by Rust | Bridge | Low |
+| F6 | `process_intent` records pre-venue transitions with `event=None` | Bridge | Info |
+| F7 | `reconcile_from_slots` writes ALL slots to projection/zinc | Bridge | Low |
+| F8 | `HazelcastRowWriter.put()` synchronous, no error handling — crashes intent | Projection | **Medium** |
+| F9 | `RealZincPlane.write_slot()` serializes ALL slots, not just changed one | Zinc | Low |
+| F10 | `RealZincPlane` zeros buffer before write — concurrent read sees empty | Zinc | Low |
+| F11 | `RealZincPlane._write_region` no partial-write recovery | Zinc | Low |
+| F12 | `InMemoryZincPlane` intent_region grows without bound | Zinc | Low |
+| F13 | `InMemoryZincPlane` uses non-re-entrant `threading.Condition` | Zinc | Low |
+| F14 | `KernelSlotView.__setattr__` round-trips unknown fields — silently dropped | Bridge | Low |
+| F15 | `on_venue_event` loop stops on first exception — slot left in partial state | Bridge | **High** |
+| F16 | `venue.submit()` returning empty events leaves slot in ORDER_REQUESTED | Bridge | Medium |
+| F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium |
+| F18 | Leverage-set and order-submit failures share error handler | Adapter | Low |
+| F19 | `_events_from_submit` stale snapshot fallback → wrong fill detection | Venue | Medium |
+| F20 | `__del__` frees Rust handle at unpredictable GC time — no explicit close() | Bridge | **Medium** |
+| F21 | `DITAv2LauncherBundle.close()` closes venue before kernel is done | Launcher | Low |
+| F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | **Medium** |
+| F23 | `VenueEvent.size` = `intent.target_size` not actual fill | Venue | Info |
+| F24 | `asyncio.run()` inside async function in test generator | Test | Low |
+| F25 | `_build_fresh_kernel_from_slot` leaks old kernel objects per call | Test | Low |
+| F26 | `seen_event_ids` not cleared on re-entry — accumulates across trades | Rust | Low |
+| F27 | `RealZincControlPlane.read()` parses Zinc region every call — no caching | Control | Low |
+| F28 | `_legacy_intent` hardcodes confidence=1.0, bars_held=0 | Venue | Info |
+| F29 | `_slot_to_payload` in real_zinc_plane.py is dead code | Zinc | Info |
+| F30 | Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py | Zinc | Low |
+
+---
+
+## G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)
+
+*Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."*
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | **Critical** |
+| G2 | `into_c_string` unwrap() panics on NUL byte in FFI string | Rust | **Critical** |
+| G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition | Rust | **Critical** |
+| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
+| G5 | `realized_pnl` unbounded f64 — overflows to inf at extreme values | Rust | **High** |
+| G6 | `mark_price` produces unbounded unrealized_pnl — no result guard | Rust | **High** |
+| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
+| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
+| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
+| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
+| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
+| G12 | No max leverage cap enforced by kernel | Rust | Medium |
+| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
+| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
+| G15 | Zero `__post_init__` validators on all 16 config dataclasses (127 fields) | Config | **High** |
+| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
+| G17 | String config fields — Zinc region injection risk | Config | Medium |
+| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
+| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
+| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
+| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
+| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
+| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
+| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
+| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
+| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
+| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
+| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
+| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
+| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
+| G31 | Hazelcast projection never closed | Lifecycle | Medium |
+| G32 | _maybe_close() break skips second method | Lifecycle | Low |
+| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
+| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
+| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
+| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
+
+---
+
+## A-Series: Architectural Flaws (detailed writeups)
+
+*These are the original architectural flaws with full analysis.*
+
+---
+
+### Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size
+
+**Location:** `_rust_kernel/src/lib.rs` lines ~770-780 (EXIT handler in `process_intent`)
+
+**Severity:** **High**
+
+**Nature:** Logic error — wrong base for exit-size computation.
+
+### Downstream effect
+
+The EXIT handler computes the exit size as `base_size * exit_ratio` where:
+```rust
+let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
+```
+
+After partial fills (e.g., two separate MARKET exit legs), `initial_size` is still the
+**original** entry size while `slot.size` has been reduced by previous legs. If the
+cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the
+computed exit size can exceed the remaining position.
+
+The venue adapter clamps to actual position via `reduceOnly`, but the kernel's _own_
+accounting reduces `slot.size` by the fill size, not by the intended exit size. The
+slot can therefore go negative (`slot.size < 0`) if the fill is larger than remaining.
+
+### Exact trigger
+
+1. Enter SHORT, size=1.0, `initial_size=1.0`, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
+2. EXIT leg 0: `exit_size = 1.0 * 0.6 = 0.6`. Fill consumes 0.6. Slot size goes to 0.4.
+3. EXIT leg 1: `exit_size = 1.0 * 0.6 = 0.6`. But remaining is 0.4. Requests 0.6.
+4. BingX `reduceOnly` clamps fill to 0.4. Slot size goes to 0.0.
+5. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is already at 0.0.
+   Kernel returns `NO_OPEN_POSITION` — the final EXIT is rejected because `slot.closed`
+   was not set by the previous fill (it was a partial close, not terminal).
+6. Slot is at size=0.0, `!slot.closed`, no active orders, but `!slot.is_free()` because
+   `size <= 0.0` is true but `fsm_state != IDLE/CLOSED` — slot is **stuck** in
+   `POSITION_OPEN` with zero size.
+
+This is **not** purely a mis-sized ratio problem. With MARKET orders that fill fully,
+even correct ratios can leave the slot stuck if the fill price differs from the
+intended-size price and the venue adjusts fill quantity.
+
+### Fix strategy
+
+Use `slot.size` directly as the base (not `initial_size`):
+```rust
+let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);
+```
+
+This guarantees the exit never requests more than the remaining position, regardless
+of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.
+
+---
+
+### Flaw A2: Misleading CANCEL diagnostic code on entry-only slots
+
+**Location:** `_rust_kernel/src/lib.rs` lines ~798-810 (CANCEL rejection path)
+
+**Severity:** **Low**
+
+**Nature:** Diagnostic pollution — wrong error code.
+
+### Downstream effect
+
+When a CANCEL intent arrives and **neither** `active_exit_order` nor
+`active_entry_order` is cancellable, the kernel returns:
+```rust
+diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER
+```
+
+But the reason may be that there's no active entry order either, or the FSM state
+doesn't permit cancellation. The diagnostic name suggests an exit-order-specific
+problem when the failure is generic "nothing to cancel."
+
+### Fix
+
+Change to a generic `NO_ACTIVE_ORDER` diagnostic or `SLOT_IDLE` when the slot is
+already in IDLE. `NO_ACTIVE_EXIT_ORDER` is misleading for a slot that has never had
+any order.
+
+---
+
+### Flaw A3: Float-accumulated slot.size after partial fills can go negative
+
+**Location:** `_rust_kernel/src/lib.rs` lines ~1365-1370 (apply_fill exit path)
+
+**Severity:** **Low**
+
+**Nature:** Numerical precision edge case.
+
+### Code path
+
+```rust
+slot.size = (slot.size - fill_size).max(0.0);
+```
+
+This clamps to zero, which is correct. But if the venue fills *more* than requested
+(on BingX, this can happen with market orders where the fill walks the book), the
+slot sees `fill_size > intended_size`. The `max(0.0)` prevents negative, but the
+slot then reports `size=0.0` with `!closed` and an FSM state that's not IDLE.
+
+The `is_free()` check requires `size <= 0.0` AND `fsm_state in {IDLE, CLOSED}`. A
+slot with `size=0.0` and `fsm_state=POSITION_OPEN` is stuck — no EXIT will be
+accepted and no ENTER can start.
+
+### Trigger
+
+Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the
+book, overshoots). `fill_size=0.8`, `slot.size = (0.6 - 0.8).max(0.0) = 0.0`.
+Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), `closed=false`.
+
+### Fix
+
+When `slot.size <= 1e-12` after a fill and the slot is in an exit-related state,
+force transition to CLOSED/IDLE regardless of leg index:
+```rust
+if slot.size <= 1e-12 {
+    slot.closed = true;
+    slot.fsm_state = TradeStage::CLOSED;
+    slot.active_exit_order = None;
+    slot.active_entry_order = None;
+    return;
+}
+```
+
+---
+
+### Flaw A4: Entry price is clobbered by mark_price if called before fill arrives
+
+**Location:** `_rust_kernel/src/lib.rs` lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)
+
+**Severity:** **Medium**
+
+**Nature:** Accounting accuracy — incorrect PnL base.
+
+### Code path
+
+```rust
+// In mark_price:
+if self.entry_price <= 0.0 {
+    self.entry_price = price;  // Seeds entry_price from mark before fill
+}
+
+// In apply_fill (entry):
+if event.price > 0.0 {
+    slot.entry_price = event.price;  // Overwrites with actual fill price
+}
+```
+
+The `mark_price` path seeds `entry_price` from a market price when the slot has no
+fill yet. The `apply_fill` entry path correctly overwrites with the actual fill price.
+So in the normal flow this is harmless — the fill overwrites the mark.
+
+**However**, consider this sequence:
+1. ENTER intent accepted → slot goes `ORDER_REQUESTED`, `entry_price = 0.0`
+2. `runtime.step()` calls `kernel.mark_price(snapshot.symbol, snapshot.price)` → sets `entry_price = 100.0`
+3. `on_venue_event(ORDER_ACK)` → `ENTRY_WORKING`, `entry_price` still `100.0`
+4. `on_venue_event(PARTIAL_FILL)` → `apply_fill` sets `entry_price = 99.5` (fill price)
+5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5
+
+This is a transient mis-valuation window. It corrects itself on the next `observe_slots`
+call, but intra-step readers see wrong unrealized PnL. Not critical because:
+- `account.snapshot.unrealized_pnl` uses the slot's `unrealized_pnl`, not the mark
+- Realized PnL is computed from actual fill prices
+- The window lasts at most one scan cycle (~5s)
+
+### Fix
+
+Don't set `entry_price` from `mark_price` when there's no fill:
+```rust
+fn mark_price(&mut self, price: f64) {
+    if !price.is_finite() || price <= 0.0 { return; }
+    // Don't seed entry_price — leave it at 0.0 until a fill arrives
+    if self.entry_price <= 0.0 || self.size <= 0.0 {
+        self.unrealized_pnl = 0.0;
+        return;
+    }
+    // ... normal PnL computation
+}
+```
+
+---
+
+### Flaw A5: Capital-before computation is arithmetic not snapshot-based
+
+**Location:** `pink_clickhouse.py` lines ~761-762 (`_write_trade_exit_leg`) and ~822-823 (`_write_trade_event`)
+
+**Severity:** **High**
+
+**Nature:** Accounting accuracy — wrong capital_before under multi-slot or intervening events.
+
+### Code pattern (appears in two places)
+
+```python
+capital_after = self._capital()
+capital_before = capital_after - pnl_leg   # In _write_trade_exit_leg
+capital_before = capital_after - pnl        # In _write_trade_event
+```
+
+This reconstructs `capital_before` by subtracting the current leg's PnL from the
+current capital. This is **only correct** if:
+1. No other slots settled PnL between this leg and the previous one
+2. No capital corrections (reconcile, manual override) happened between legs
+3. No fees were deducted between legs
+
+With multi-slot (PINK configurable `max_slots > 1`), a concurrent trade on slot 1
+that closes between slot 0's exit legs will have its PnL baked into `capital_after`,
+making `capital_before = capital_after - pnl_leg` wrong.
+
+### Fix
+
+Maintain a per-trade `capital_before_leg` snapshot taken at the moment of the
+first fill event for each trade, advancing it by the realized PnL of each leg:
+```python
+self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
+self._leg_state[trade_id]["capital_after"] = capital_after
+```
+
+And use `prev["capital_before"]` for the row, not `capital_after - pnl_leg`.
+
+---
+
+### Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange
+
+**Location:** `pink_direct.py` lines ~597-630 (`recover_account`) and docstring of `reconcile_account`
+
+**Severity:** **Medium**
+
+**Nature:** Operational drift — capital is never verified against exchange truth in hot loop.
+
+### The gap
+
+`reconcile_account()` (line 632) has this docstring:
+```
+Periodic exchange-led account sync.
+Capital is re-seeded from the exchange balance as a guard against long-running drift
+```
+
+But the actual implementation:
+```python
+async def reconcile_account(self, ...) -> dict[str, Any]:
+    return await self.recover_account(...)
+
+async def recover_account(self, ...) -> dict[str, Any]:
+    capital = float(self.kernel.account.snapshot.capital or 25000.0)
+    _reconcile_position_slot(self.kernel, capital, slot_id=0)
+```
+
+It passes the **kernel's own capital** to `_reconcile_position_slot`, which then
+overwrites `kernel.account.snapshot.capital` with... the same value. No exchange
+balance poll ever overwrites capital.
+
+`connect()` at line 224 does the same — it passes `initial_capital` (an env default),
+not the exchange balance. The exchange balance is never read for capital seeding
+in the current code path. `_reconcile_position_slot` does call
+`venue.open_positions()`, but it only reads positions, not capital.
+
+### Effect
+
+Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation,
+or any other systematic error) accumulates monotonically. There is no mechanism to
+detect or correct drift. Over weeks of live trading, the kernel's capital snapshot
+can diverge arbitrarily from the exchange's actual balance.
+
+### Fix
+
+Either:
+1. Make `_reconcile_position_slot` read the exchange balance and use it for
+   capital reseeding (the docstring claims it does this already), or
+2. Add a separate capital-verification path that surfaces the delta between
+   kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.
+
+---
+
+### Flaw A7: No fee tracking in kernel accounting
+
+**Location:** `rust_backend.py` lines ~540-545 (on_venue_event settle), `bingx_direct.py` submit_intent return
+
+**Severity:** **Medium**
+
+**Nature:** Accounting accuracy — fees are invisible to capital tracking.
+
+### Downstream effect
+
+When a trade closes, the kernel computes:
+```rust
+realized_pnl = delta * notional
+```
+
+This is **gross** PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%).
+These fees are never subtracted from the kernel's realized PnL. Over 100 trades with
+$100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000
+trades at 10x leverage and $50k average notional, the error is $200k.
+
+The `BingxDirectExecutionAdapter` does return `ExecutionReceipt` with fill data,
+but `bingx_venue._events_from_submit()` only reads `price` and `filled_size` —
+commission/fee fields are ignored.
+
+### Fix
+
+1. Read fee/commission from the BingX ack payload in `_events_from_submit()`
+2. Pass fees through `VenueEvent.metadata["fee"]`
+3. In the Rust kernel's `apply_fill`, subtract the fee from realized PnL:
+   ```rust
+   let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
+   slot.realized_pnl += realized - fee;
+   ```
+
+---
+
+### Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input
+
+**Location:** `_rust_kernel/src/lib.rs` lines ~745-748
+
+**Severity:** **Low**
+
+**Nature:** Silent fallback — corrupt input produces a trade, not a rejection.
+
+```rust
+slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
+    intent.leverage
+} else {
+    1.0
+};
+```
+
+A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of
+rejecting the intent. The Python bridge does validate `_first_invalid_intent_field()`
+which catches NaN/inf, but it doesn't catch `leverage <= 0.0` (it only checks
+`not math.isfinite(value)`).
+
+### Fix
+
+Add `leverage <= 0.0` to the Python bridge's invalid-intent check. The Rust kernel
+should still have the `1.0` fallback as a defensive measure, but the bridge should
+prevent bad leverages from reaching Rust in the first place.
+
+---
+
+### Flaw A9: Mock venue submit condition convoluted — dead code paths
+
+**Location:** `mock_venue.py` lines ~60-90
+
+**Severity:** **Informational**
+
+**Nature:** Code clarity — confusing condition logic.
+
+```python
+if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
+    events.append(ack_event)
+if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
+    # ... fill events
+```
+
+The condition logic is confusing:
+- When `emit_ack_before_fill=True` and `emit_fill_on_submit=True`: both branches run → ACK + fill
+- When `emit_ack_before_fill=False` and `emit_fill_on_submit=True`: first branch runs because
+  `not True = False`, so `False or False = False` → no ACK. Second branch runs → fill only.
+  This produces a fill without an ACK, which is **not** a realistic venue scenario.
+- When `partial_fill_ratio=1.0` (default): second branch runs and emits a `FULL_FILL` event
+  even when `emit_fill_on_submit=False`, because `0.0 or 1.0 > 0 = True`.
+
+The partial fill ratio check should be gated on `emit_fill_on_submit`:
+```python
+should_emit_fill = self.scenario.emit_fill_on_submit or (
+    is_entry and self.scenario.entry_partial_fill_ratio > 0
+) or (
+    not is_entry and self.scenario.exit_partial_fill_ratio > 0
+)
+```
+
+---
+
+### Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows
+
+**Location:** `pink_direct.py` lines ~318-374 (`pump_venue_events`), called at line ~436
+
+**Severity:** **Medium**
+
+**Nature:** Operational overhead — unnecessary exchange HTTP calls.
+
+### The problem
+
+`step()` calls `pump_venue_events()` **every cycle**, which calls `venue.reconcile()`.
+For `BingxVenueAdapter`, `reconcile()` calls `_backend_snapshot()` which does up to 5
+HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow
+where orders fill synchronously within `process_intent()`, there are **no** late fills
+to drain.
+
+On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each
+`pump_venue_events()` call consumes 5+ of that budget. At a 5-second policy cycle,
+this is 60 requests/minute — 60% of the rate budget — just to poll for fills that
+don't exist.
+
+### Fix
+
+Gate the pump on whether the previous cycle submitted a LIMIT order:
+```python
+self._has_resting_order = any(
+    o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
+    for o in kernel.open_orders()
+)
+if self._has_resting_order:
+    await self.pump_venue_events(snapshot, market_state=market_state)
+```
+
+Or add a config flag `async_fill_mode: bool = False`.
+
+---
+
+### Flaw A11: VenueAdapter.submit() blocks the event loop
+
+**Location:** `bingx_venue.py` lines ~225-233 (`_run`)
+
+**Severity:** **Medium**
+
+**Nature:** Runtime safety — synchronous call in async context.
+
+```python
+def _run(self, result: Any) -> Any:
+    if inspect.isawaitable(result):
+        try:
+            asyncio.get_running_loop()
+        except RuntimeError:
+            return asyncio.run(result)
+        pool = self._get_executor()
+        return pool.submit(asyncio.run, result).result()
+```
+
+When called from `step()` (which is an async function), `_run` submits the async
+`submit_intent()` to a thread pool, runs it with `asyncio.run()`, then calls
+`.result()` which blocks the current thread until complete. The BingX HTTP call
+can take 1-5 seconds depending on network latency and exchange load.
+
+During this block, the event loop **cannot** process other async tasks (data feed
+updates, health checks, signal processing). In a single-runtime deployment, this
+stalls the entire policy cycle.
+
+### Fix
+
+Make `process_intent` in `ExecutionKernel` accept an async venue callback, or
+make `BingxVenueAdapter` truly async (not sync-with-thread-bridge). For now,
+at minimum the PINK runtime should run `step()` in an executor to avoid blocking
+the main event loop.
+
+---
+
+### Flaw A12: Stale KernelStateView slot references after reconcile
+
+**Location:** `rust_backend.py` lines ~350-365 (`KernelStateView.refresh`)
+
+**Severity:** **Low**
+
+**Nature:** Stale data — view not rebuilt on reconcile.
+
+```python
+class KernelStateView:
+    def __init__(self, kernel):
+        self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
+        # ...
+
+    def refresh(self) -> None:
+        snapshot = self._kernel._snapshot_backend()
+        self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
+        self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
+        self.client_order_index = dict(snapshot.get("client_order_index", {}))
+```
+
+`refresh()` updates the index maps but does **not** recreate `self.slots`. The slot
+views in `self.slots` are live proxies (they read through `_get_slot` each time),
+so slot data is current. But if `max_slots` changes (it shouldn't, but it's mutable)
+or if slots are re-indexed by a reconcile, the view list is wrong.
+
+Not critical because `max_slots` is set at init and never changes, but worth
+fixing for robustness.
+
+---
+
+### Flaw A13: `persist_fill_events` uses current price as exit price
+
+**Location:** `pink_clickhouse.py` lines ~408
+
+**Severity:** **Low**
+
+**Nature:** Historical accuracy — logged price may not match fill price.
+
+```python
+price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
+              if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)
+```
+
+This correctly reads from the event's price. But `decision.reference_price` at line
+417 falls back to this price, which is the fill price. The trade_event row at line
+835 uses `exit_price = slot_dict.get("entry_price", ...)` — which is the **entry**
+price, not the exit price. The trade_event always shows exit_price == entry_price.
+
+This means `trade_events` in ClickHouse will never show a realistic exit price
+for the persisted trade, breaking any PnL reconstruction that relies on
+`(exit_price - entry_price) * size * leverage`.
+
+---
+
+### Flaw A14: `_write_position_state` maps active_leg_index to entry_bar
+
+**Location:** `pink_clickhouse.py` line ~673
+
+**Severity:** **Low**
+
+**Nature:** Semantic mismatch — wrong field mapping.
+
+```python
+"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
+```
+
+`active_leg_index` is the index into the exit-leg-ratios array (which leg is being
+exited next). It has nothing to do with how many bars the position has been held.
+When a position opens, `active_leg_index` is 0. After the first exit leg, it
+advances to 1. Neither value is a bar count.
+
+`entry_bar` should be `bars_held` from the intent/decision, or a computed value
+from `entry_time` to now.
+
+---
+
+### Flaw A15: `persist_recovery_state` passes account dict as slot dict
+
+**Location:** `pink_clickhouse.py` lines ~447-460
+
+**Severity:** **Low**
+
+**Nature:** Wrong data — account snapshot used where slot data is expected.
+
+```python
+def persist_recovery_state(self, *, snapshot, acc_dict, ...):
+    slot_dict = acc_dict or {}   # ← acc_dict is an account snapshot, not a slot
+    self._write_position_state(..., slot_dict={}, ...)  # ← correctly uses empty dict
+    self._write_trade_reconstruction(
+        snapshot,
+        trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
+        # acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
+    )
+```
+
+The `trade_id` in the trade_reconstruction row will always be `""` because
+`acc_dict` comes from `kernel.snapshot()["account"]` which has keys `capital`,
+`equity`, `realized_pnl`, etc. — not `trade_id`. This means the recovery
+`trade_reconstruction` row has no trade_id linkage.