PINK: centralize all flaw findings in FLAW_ANALYSIS doc

Rewrite PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md as the central registry with combined catalog (A+T+E+F+G = 116 flaws), severity distribution, and cross-references to the TRACE doc for deep E, F, G detail. Add reciprocal cross-reference in TRACE doc header. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 16:31:23 +02:00
parent d9dd54c24e
commit 9128ab799e
2 changed files with 719 additions and 0 deletions
--- a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
+++ b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
@@ -6,6 +6,11 @@ boundary crossing in the PINK execution pipeline. No test execution.
 **System scope:** 34 active source files, ~12,000 lines across Rust kernel,
 Python bridge, venue adapter, runtime, and persistence.
 > **Central flaw registry:** [PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md](./PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md)
 > contains the combined catalog of all 116 flaws (A, T, E, F, G series) with
 > severity distribution and cross-references. This file provides the deep E2E
 > trace context — read the central registry for the master list.
 ---
 ## E2E Data Flow (One Call)
--- a/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
+++ b/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
@@ -0,0 +1,714 @@
 # PINK DITAv2 — Structural Flaw Analysis (CENTRAL)
 **Analysis date:** 2026-05-31
 **Scope:** Full PINK pipeline — all flaws across all modules.
 **Sources:**
 - This file (A-series): Detailed writeups for architectural flaws.
 - [PINK_DITAv2_E2E_TRACE_ANALYSIS.md](./PINK_DITAv2_E2E_TRACE_ANALYSIS.md) (E, F, G-series):
  Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans.
  Every E, F, G entry below is a summary only — full detail is in the TRACE doc.
 ---
 ## Combined Catalog (All Flaws, All Passes)
 | Pass | Focus | Count | Critical | High | Medium | Low | Info |
 |------|-------|-------|----------|------|--------|-----|------|
 | A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 |
 | T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
 | E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
 | F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
 | G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
 | **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |
 ---
 ## T-Series: Threading & Atomicity Flaws
 *Full detail in TRACE doc under "Threading & Atomicity" section.*
 | # | Flaw | Layer | Severity |
 |---|------|-------|----------|
 | T1 | `InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy | Zinc | **Critical** |
 | T2 | Thread-unsafe kernel snapshot capture for account | Bridge | **High** |
 | T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | **High** |
 | T4 | Consequence: `on_venue_event` PnL settle races | Bridge | **High** |
 | T5 | Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls | Zinc | Medium |
 | T6 | `_write_region` buffer zero + notify race with concurrent reader | Zinc | Medium |
 | T7 | Publication of events in `process_intent` loop not synchronized with persist | Bridge | Medium |
 | T8 | `asyncio.run` executor skip in `_run` leads to event-loop stall | Venue | Low |
 | T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low |
 ---
 ## E-Series: E2E Data-Flow Flaws (Pass 1)
 *Full detail in TRACE doc under "Layer 1" through "Layer 9."*
 | # | Flaw | Layer | Severity |
 |---|------|-------|----------|
 | E1 | `step()` calls `pump_venue_events()` every cycle unconditionally | Runtime | **High** |
 | E2 | `kernel.snapshot()["account"]` returns a fresh dict, not a live view | Bridge | Low |
 | E3 | `_decision_to_kernel_intent` drops `order_type` and `limit_price` | Runtime | **High** |
 | E4 | `_exit_intent_from_slot` trusts slot.size but slot may be stale | Runtime | **High** |
 | E5 | JSON serialization round-trip loses numeric precision | Bridge | Low |
 | E6 | `_RustKernelLib` is a global singleton — shared across all kernels | Bridge | Low |
 | E7 | ENTER handler silently allows re-entry with same trade_id | Rust | **High** |
 | E8 | EXIT handler uses `initial_size` not current size | Rust | **High** |
 | E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low |
 | E10 | `apply_fill` entry branch double-sets `active_entry_order` | Rust | Low |
 | E11 | `_legacy_intent()` is a lossy conversion | Venue | Low |
 | E12 | `_events_from_submit()` price fallback chain can lose venue price | Venue | Low |
 | E13 | `_backend_snapshot()` timeout returns stale data | Venue | Medium |
 | E14 | `_events_from_cancel` uses stale `slot_id` from order metadata | Venue | Low |
 | E15 | Submit sets leverage via separate HTTP call | Adapter | Medium |
 | E16 | `_format_quantity`/`_format_price` may use zero tick/step | Adapter | Medium |
 | E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium |
 | E18 | `on_venue_event` settles PnL incrementally — fees never included | Bridge | Medium |
 | E19 | `observe_slots` called with ALL slots, not just changed ones | Bridge | Low |
 | E20 | `_capital()` reads live from `AccountProjection` — stale row risk | Persistence | Low |
 | E21 | `persist_fill_events()` synthesizes fake Decision/Intent | Persistence | Medium |
 | E22 | `_write_trade_exit_leg` capital_before uses arithmetic reconstruction | Persistence | Medium |
 | E23 | `_write_trade_event` uses entry_price as exit_price | Persistence | Medium |
 | E24 | Mock venue always emits fill on `partial_fill_ratio > 0` | Test | Low |
 | E25 | Test scenarios use MARKET-only `_si()` helper — no LIMIT tests | Test | Low |
 | E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low |
 ---
 ## F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)
 *Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."*
 | # | Flaw | Layer | Severity |
 |---|------|-------|----------|
 | F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium |
 | F2 | `_last_settled_pnl` reset before `venue.submit()` — transient window | Bridge | Medium |
 | F3 | `_first_invalid_intent_field` allows `leverage=0` and `target_size=0` | Bridge | Low |
 | F4 | `outcome.emitted_events` only from venue — Rust kernel events dropped | Bridge | Low |
 | F5 | `on_venue_event` redundant FFI read of slot already returned by Rust | Bridge | Low |
 | F6 | `process_intent` records pre-venue transitions with `event=None` | Bridge | Info |
 | F7 | `reconcile_from_slots` writes ALL slots to projection/zinc | Bridge | Low |
 | F8 | `HazelcastRowWriter.put()` synchronous, no error handling — crashes intent | Projection | **Medium** |
 | F9 | `RealZincPlane.write_slot()` serializes ALL slots, not just changed one | Zinc | Low |
 | F10 | `RealZincPlane` zeros buffer before write — concurrent read sees empty | Zinc | Low |
 | F11 | `RealZincPlane._write_region` no partial-write recovery | Zinc | Low |
 | F12 | `InMemoryZincPlane` intent_region grows without bound | Zinc | Low |
 | F13 | `InMemoryZincPlane` uses non-re-entrant `threading.Condition` | Zinc | Low |
 | F14 | `KernelSlotView.__setattr__` round-trips unknown fields — silently dropped | Bridge | Low |
 | F15 | `on_venue_event` loop stops on first exception — slot left in partial state | Bridge | **High** |
 | F16 | `venue.submit()` returning empty events leaves slot in ORDER_REQUESTED | Bridge | Medium |
 | F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium |
 | F18 | Leverage-set and order-submit failures share error handler | Adapter | Low |
 | F19 | `_events_from_submit` stale snapshot fallback → wrong fill detection | Venue | Medium |
 | F20 | `__del__` frees Rust handle at unpredictable GC time — no explicit close() | Bridge | **Medium** |
 | F21 | `DITAv2LauncherBundle.close()` closes venue before kernel is done | Launcher | Low |
 | F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | **Medium** |
 | F23 | `VenueEvent.size` = `intent.target_size` not actual fill | Venue | Info |
 | F24 | `asyncio.run()` inside async function in test generator | Test | Low |
 | F25 | `_build_fresh_kernel_from_slot` leaks old kernel objects per call | Test | Low |
 | F26 | `seen_event_ids` not cleared on re-entry — accumulates across trades | Rust | Low |
 | F27 | `RealZincControlPlane.read()` parses Zinc region every call — no caching | Control | Low |
 | F28 | `_legacy_intent` hardcodes confidence=1.0, bars_held=0 | Venue | Info |
 | F29 | `_slot_to_payload` in real_zinc_plane.py is dead code | Zinc | Info |
 | F30 | Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py | Zinc | Low |
 ---
 ## G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)
 *Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."*
 | # | Flaw | Layer | Severity |
 |---|------|-------|----------|
 | G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | **Critical** |
 | G2 | `into_c_string` unwrap() panics on NUL byte in FFI string | Rust | **Critical** |
 | G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition | Rust | **Critical** |
 | G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
 | G5 | `realized_pnl` unbounded f64 — overflows to inf at extreme values | Rust | **High** |
 | G6 | `mark_price` produces unbounded unrealized_pnl — no result guard | Rust | **High** |
 | G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
 | G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
 | G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
 | G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
 | G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
 | G12 | No max leverage cap enforced by kernel | Rust | Medium |
 | G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
 | G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
 | G15 | Zero `__post_init__` validators on all 16 config dataclasses (127 fields) | Config | **High** |
 | G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
 | G17 | String config fields — Zinc region injection risk | Config | Medium |
 | G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
 | G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
 | G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
 | G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
 | G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
 | G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
 | G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
 | G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
 | G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
 | G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
 | G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
 | G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
 | G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
 | G31 | Hazelcast projection never closed | Lifecycle | Medium |
 | G32 | _maybe_close() break skips second method | Lifecycle | Low |
 | G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
 | G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
 | G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
 | G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
 ---
 ## A-Series: Architectural Flaws (detailed writeups)
 *These are the original architectural flaws with full analysis.*
 ---
 ### Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size
 **Location:** `_rust_kernel/src/lib.rs` lines ~770-780 (EXIT handler in `process_intent`)
 **Severity:** **High**
 **Nature:** Logic error — wrong base for exit-size computation.
 ### Downstream effect
 The EXIT handler computes the exit size as `base_size * exit_ratio` where:
 ```rust
 let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
 ```
 After partial fills (e.g., two separate MARKET exit legs), `initial_size` is still the
 **original** entry size while `slot.size` has been reduced by previous legs. If the
 cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the
 computed exit size can exceed the remaining position.
 The venue adapter clamps to actual position via `reduceOnly`, but the kernel's _own_
 accounting reduces `slot.size` by the fill size, not by the intended exit size. The
 slot can therefore go negative (`slot.size < 0`) if the fill is larger than remaining.
 ### Exact trigger
 1. Enter SHORT, size=1.0, `initial_size=1.0`, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
 2. EXIT leg 0: `exit_size = 1.0 * 0.6 = 0.6`. Fill consumes 0.6. Slot size goes to 0.4.
 3. EXIT leg 1: `exit_size = 1.0 * 0.6 = 0.6`. But remaining is 0.4. Requests 0.6.
 4. BingX `reduceOnly` clamps fill to 0.4. Slot size goes to 0.0.
 5. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is already at 0.0.
   Kernel returns `NO_OPEN_POSITION` — the final EXIT is rejected because `slot.closed`
   was not set by the previous fill (it was a partial close, not terminal).
 6. Slot is at size=0.0, `!slot.closed`, no active orders, but `!slot.is_free()` because
   `size <= 0.0` is true but `fsm_state != IDLE/CLOSED` — slot is **stuck** in
   `POSITION_OPEN` with zero size.
 This is **not** purely a mis-sized ratio problem. With MARKET orders that fill fully,
 even correct ratios can leave the slot stuck if the fill price differs from the
 intended-size price and the venue adjusts fill quantity.
 ### Fix strategy
 Use `slot.size` directly as the base (not `initial_size`):
 ```rust
 let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);
 ```
 This guarantees the exit never requests more than the remaining position, regardless
 of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.
 ---
 ### Flaw A2: Misleading CANCEL diagnostic code on entry-only slots
 **Location:** `_rust_kernel/src/lib.rs` lines ~798-810 (CANCEL rejection path)
 **Severity:** **Low**
 **Nature:** Diagnostic pollution — wrong error code.
 ### Downstream effect
 When a CANCEL intent arrives and **neither** `active_exit_order` nor
 `active_entry_order` is cancellable, the kernel returns:
 ```rust
 diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER
 ```
 But the reason may be that there's no active entry order either, or the FSM state
 doesn't permit cancellation. The diagnostic name suggests an exit-order-specific
 problem when the failure is generic "nothing to cancel."
 ### Fix
 Change to a generic `NO_ACTIVE_ORDER` diagnostic or `SLOT_IDLE` when the slot is
 already in IDLE. `NO_ACTIVE_EXIT_ORDER` is misleading for a slot that has never had
 any order.
 ---
 ### Flaw A3: Float-accumulated slot.size after partial fills can go negative
 **Location:** `_rust_kernel/src/lib.rs` lines ~1365-1370 (apply_fill exit path)
 **Severity:** **Low**
 **Nature:** Numerical precision edge case.
 ### Code path
 ```rust
 slot.size = (slot.size - fill_size).max(0.0);
 ```
 This clamps to zero, which is correct. But if the venue fills *more* than requested
 (on BingX, this can happen with market orders where the fill walks the book), the
 slot sees `fill_size > intended_size`. The `max(0.0)` prevents negative, but the
 slot then reports `size=0.0` with `!closed` and an FSM state that's not IDLE.
 The `is_free()` check requires `size <= 0.0` AND `fsm_state in {IDLE, CLOSED}`. A
 slot with `size=0.0` and `fsm_state=POSITION_OPEN` is stuck — no EXIT will be
 accepted and no ENTER can start.
 ### Trigger
 Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the
 book, overshoots). `fill_size=0.8`, `slot.size = (0.6 - 0.8).max(0.0) = 0.0`.
 Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), `closed=false`.
 ### Fix
 When `slot.size <= 1e-12` after a fill and the slot is in an exit-related state,
 force transition to CLOSED/IDLE regardless of leg index:
 ```rust
 if slot.size <= 1e-12 {
    slot.closed = true;
    slot.fsm_state = TradeStage::CLOSED;
    slot.active_exit_order = None;
    slot.active_entry_order = None;
    return;
 }
 ```
 ---
 ### Flaw A4: Entry price is clobbered by mark_price if called before fill arrives
 **Location:** `_rust_kernel/src/lib.rs` lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)
 **Severity:** **Medium**
 **Nature:** Accounting accuracy — incorrect PnL base.
 ### Code path
 ```rust
 // In mark_price:
 if self.entry_price <= 0.0 {
    self.entry_price = price;  // Seeds entry_price from mark before fill
 }
 // In apply_fill (entry):
 if event.price > 0.0 {
    slot.entry_price = event.price;  // Overwrites with actual fill price
 }
 ```
 The `mark_price` path seeds `entry_price` from a market price when the slot has no
 fill yet. The `apply_fill` entry path correctly overwrites with the actual fill price.
 So in the normal flow this is harmless — the fill overwrites the mark.
 **However**, consider this sequence:
 1. ENTER intent accepted → slot goes `ORDER_REQUESTED`, `entry_price = 0.0`
 2. `runtime.step()` calls `kernel.mark_price(snapshot.symbol, snapshot.price)` → sets `entry_price = 100.0`
 3. `on_venue_event(ORDER_ACK)` → `ENTRY_WORKING`, `entry_price` still `100.0`
 4. `on_venue_event(PARTIAL_FILL)` → `apply_fill` sets `entry_price = 99.5` (fill price)
 5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5
 This is a transient mis-valuation window. It corrects itself on the next `observe_slots`
 call, but intra-step readers see wrong unrealized PnL. Not critical because:
 - `account.snapshot.unrealized_pnl` uses the slot's `unrealized_pnl`, not the mark
 - Realized PnL is computed from actual fill prices
 - The window lasts at most one scan cycle (~5s)
 ### Fix
 Don't set `entry_price` from `mark_price` when there's no fill:
 ```rust
 fn mark_price(&mut self, price: f64) {
    if !price.is_finite() || price <= 0.0 { return; }
    // Don't seed entry_price — leave it at 0.0 until a fill arrives
    if self.entry_price <= 0.0 || self.size <= 0.0 {
        self.unrealized_pnl = 0.0;
        return;
    }
    // ... normal PnL computation
 }
 ```
 ---
 ### Flaw A5: Capital-before computation is arithmetic not snapshot-based
 **Location:** `pink_clickhouse.py` lines ~761-762 (`_write_trade_exit_leg`) and ~822-823 (`_write_trade_event`)
 **Severity:** **High**
 **Nature:** Accounting accuracy — wrong capital_before under multi-slot or intervening events.
 ### Code pattern (appears in two places)
 ```python
 capital_after = self._capital()
 capital_before = capital_after - pnl_leg   # In _write_trade_exit_leg
 capital_before = capital_after - pnl        # In _write_trade_event
 ```
 This reconstructs `capital_before` by subtracting the current leg's PnL from the
 current capital. This is **only correct** if:
 1. No other slots settled PnL between this leg and the previous one
 2. No capital corrections (reconcile, manual override) happened between legs
 3. No fees were deducted between legs
 With multi-slot (PINK configurable `max_slots > 1`), a concurrent trade on slot 1
 that closes between slot 0's exit legs will have its PnL baked into `capital_after`,
 making `capital_before = capital_after - pnl_leg` wrong.
 ### Fix
 Maintain a per-trade `capital_before_leg` snapshot taken at the moment of the
 first fill event for each trade, advancing it by the realized PnL of each leg:
 ```python
 self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
 self._leg_state[trade_id]["capital_after"] = capital_after
 ```
 And use `prev["capital_before"]` for the row, not `capital_after - pnl_leg`.
 ---
 ### Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange
 **Location:** `pink_direct.py` lines ~597-630 (`recover_account`) and docstring of `reconcile_account`
 **Severity:** **Medium**
 **Nature:** Operational drift — capital is never verified against exchange truth in hot loop.
 ### The gap
 `reconcile_account()` (line 632) has this docstring:
 ```
 Periodic exchange-led account sync.
 Capital is re-seeded from the exchange balance as a guard against long-running drift
 ```
 But the actual implementation:
 ```python
 async def reconcile_account(self, ...) -> dict[str, Any]:
    return await self.recover_account(...)
 async def recover_account(self, ...) -> dict[str, Any]:
    capital = float(self.kernel.account.snapshot.capital or 25000.0)
    _reconcile_position_slot(self.kernel, capital, slot_id=0)
 ```
 It passes the **kernel's own capital** to `_reconcile_position_slot`, which then
 overwrites `kernel.account.snapshot.capital` with... the same value. No exchange
 balance poll ever overwrites capital.
 `connect()` at line 224 does the same — it passes `initial_capital` (an env default),
 not the exchange balance. The exchange balance is never read for capital seeding
 in the current code path. `_reconcile_position_slot` does call
 `venue.open_positions()`, but it only reads positions, not capital.
 ### Effect
 Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation,
 or any other systematic error) accumulates monotonically. There is no mechanism to
 detect or correct drift. Over weeks of live trading, the kernel's capital snapshot
 can diverge arbitrarily from the exchange's actual balance.
 ### Fix
 Either:
 1. Make `_reconcile_position_slot` read the exchange balance and use it for
   capital reseeding (the docstring claims it does this already), or
 2. Add a separate capital-verification path that surfaces the delta between
   kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.
 ---
 ### Flaw A7: No fee tracking in kernel accounting
 **Location:** `rust_backend.py` lines ~540-545 (on_venue_event settle), `bingx_direct.py` submit_intent return
 **Severity:** **Medium**
 **Nature:** Accounting accuracy — fees are invisible to capital tracking.
 ### Downstream effect
 When a trade closes, the kernel computes:
 ```rust
 realized_pnl = delta * notional
 ```
 This is **gross** PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%).
 These fees are never subtracted from the kernel's realized PnL. Over 100 trades with
 $100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000
 trades at 10x leverage and $50k average notional, the error is $200k.
 The `BingxDirectExecutionAdapter` does return `ExecutionReceipt` with fill data,
 but `bingx_venue._events_from_submit()` only reads `price` and `filled_size` —
 commission/fee fields are ignored.
 ### Fix
 1. Read fee/commission from the BingX ack payload in `_events_from_submit()`
 2. Pass fees through `VenueEvent.metadata["fee"]`
 3. In the Rust kernel's `apply_fill`, subtract the fee from realized PnL:
   ```rust
   let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
   slot.realized_pnl += realized - fee;
   ```
 ---
 ### Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input
 **Location:** `_rust_kernel/src/lib.rs` lines ~745-748
 **Severity:** **Low**
 **Nature:** Silent fallback — corrupt input produces a trade, not a rejection.
 ```rust
 slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
    intent.leverage
 } else {
    1.0
 };
 ```
 A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of
 rejecting the intent. The Python bridge does validate `_first_invalid_intent_field()`
 which catches NaN/inf, but it doesn't catch `leverage <= 0.0` (it only checks
 `not math.isfinite(value)`).
 ### Fix
 Add `leverage <= 0.0` to the Python bridge's invalid-intent check. The Rust kernel
 should still have the `1.0` fallback as a defensive measure, but the bridge should
 prevent bad leverages from reaching Rust in the first place.
 ---
 ### Flaw A9: Mock venue submit condition convoluted — dead code paths
 **Location:** `mock_venue.py` lines ~60-90
 **Severity:** **Informational**
 **Nature:** Code clarity — confusing condition logic.
 ```python
 if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
    events.append(ack_event)
 if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
    # ... fill events
 ```
 The condition logic is confusing:
 - When `emit_ack_before_fill=True` and `emit_fill_on_submit=True`: both branches run → ACK + fill
 - When `emit_ack_before_fill=False` and `emit_fill_on_submit=True`: first branch runs because
  `not True = False`, so `False or False = False` → no ACK. Second branch runs → fill only.
  This produces a fill without an ACK, which is **not** a realistic venue scenario.
 - When `partial_fill_ratio=1.0` (default): second branch runs and emits a `FULL_FILL` event
  even when `emit_fill_on_submit=False`, because `0.0 or 1.0 > 0 = True`.
 The partial fill ratio check should be gated on `emit_fill_on_submit`:
 ```python
 should_emit_fill = self.scenario.emit_fill_on_submit or (
    is_entry and self.scenario.entry_partial_fill_ratio > 0
 ) or (
    not is_entry and self.scenario.exit_partial_fill_ratio > 0
 )
 ```
 ---
 ### Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows
 **Location:** `pink_direct.py` lines ~318-374 (`pump_venue_events`), called at line ~436
 **Severity:** **Medium**
 **Nature:** Operational overhead — unnecessary exchange HTTP calls.
 ### The problem
 `step()` calls `pump_venue_events()` **every cycle**, which calls `venue.reconcile()`.
 For `BingxVenueAdapter`, `reconcile()` calls `_backend_snapshot()` which does up to 5
 HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow
 where orders fill synchronously within `process_intent()`, there are **no** late fills
 to drain.
 On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each
 `pump_venue_events()` call consumes 5+ of that budget. At a 5-second policy cycle,
 this is 60 requests/minute — 60% of the rate budget — just to poll for fills that
 don't exist.
 ### Fix
 Gate the pump on whether the previous cycle submitted a LIMIT order:
 ```python
 self._has_resting_order = any(
    o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
    for o in kernel.open_orders()
 )
 if self._has_resting_order:
    await self.pump_venue_events(snapshot, market_state=market_state)
 ```
 Or add a config flag `async_fill_mode: bool = False`.
 ---
 ### Flaw A11: VenueAdapter.submit() blocks the event loop
 **Location:** `bingx_venue.py` lines ~225-233 (`_run`)
 **Severity:** **Medium**
 **Nature:** Runtime safety — synchronous call in async context.
 ```python
 def _run(self, result: Any) -> Any:
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()
 ```
 When called from `step()` (which is an async function), `_run` submits the async
 `submit_intent()` to a thread pool, runs it with `asyncio.run()`, then calls
 `.result()` which blocks the current thread until complete. The BingX HTTP call
 can take 1-5 seconds depending on network latency and exchange load.
 During this block, the event loop **cannot** process other async tasks (data feed
 updates, health checks, signal processing). In a single-runtime deployment, this
 stalls the entire policy cycle.
 ### Fix
 Make `process_intent` in `ExecutionKernel` accept an async venue callback, or
 make `BingxVenueAdapter` truly async (not sync-with-thread-bridge). For now,
 at minimum the PINK runtime should run `step()` in an executor to avoid blocking
 the main event loop.
 ---
 ### Flaw A12: Stale KernelStateView slot references after reconcile
 **Location:** `rust_backend.py` lines ~350-365 (`KernelStateView.refresh`)
 **Severity:** **Low**
 **Nature:** Stale data — view not rebuilt on reconcile.
 ```python
 class KernelStateView:
    def __init__(self, kernel):
        self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
        # ...
    def refresh(self) -> None:
        snapshot = self._kernel._snapshot_backend()
        self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
        self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
        self.client_order_index = dict(snapshot.get("client_order_index", {}))
 ```
 `refresh()` updates the index maps but does **not** recreate `self.slots`. The slot
 views in `self.slots` are live proxies (they read through `_get_slot` each time),
 so slot data is current. But if `max_slots` changes (it shouldn't, but it's mutable)
 or if slots are re-indexed by a reconcile, the view list is wrong.
 Not critical because `max_slots` is set at init and never changes, but worth
 fixing for robustness.
 ---
 ### Flaw A13: `persist_fill_events` uses current price as exit price
 **Location:** `pink_clickhouse.py` lines ~408
 **Severity:** **Low**
 **Nature:** Historical accuracy — logged price may not match fill price.
 ```python
 price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
              if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)
 ```
 This correctly reads from the event's price. But `decision.reference_price` at line
 417 falls back to this price, which is the fill price. The trade_event row at line
 835 uses `exit_price = slot_dict.get("entry_price", ...)` — which is the **entry**
 price, not the exit price. The trade_event always shows exit_price == entry_price.
 This means `trade_events` in ClickHouse will never show a realistic exit price
 for the persisted trade, breaking any PnL reconstruction that relies on
 `(exit_price - entry_price) * size * leverage`.
 ---
 ### Flaw A14: `_write_position_state` maps active_leg_index to entry_bar
 **Location:** `pink_clickhouse.py` line ~673
 **Severity:** **Low**
 **Nature:** Semantic mismatch — wrong field mapping.
 ```python
 "entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
 ```
 `active_leg_index` is the index into the exit-leg-ratios array (which leg is being
 exited next). It has nothing to do with how many bars the position has been held.
 When a position opens, `active_leg_index` is 0. After the first exit leg, it
 advances to 1. Neither value is a bar count.
 `entry_bar` should be `bars_held` from the intent/decision, or a computed value
 from `entry_time` to now.
 ---
 ### Flaw A15: `persist_recovery_state` passes account dict as slot dict
 **Location:** `pink_clickhouse.py` lines ~447-460
 **Severity:** **Low**
 **Nature:** Wrong data — account snapshot used where slot data is expected.
 ```python
 def persist_recovery_state(self, *, snapshot, acc_dict, ...):
    slot_dict = acc_dict or {}   # ← acc_dict is an account snapshot, not a slot
    self._write_position_state(..., slot_dict={}, ...)  # ← correctly uses empty dict
    self._write_trade_reconstruction(
        snapshot,
        trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
        # acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
    )
 ```
 The `trade_id` in the trade_reconstruction row will always be `""` because
 `acc_dict` comes from `kernel.snapshot()["account"]` which has keys `capital`,
 `equity`, `realized_pnl`, etc. — not `trade_id`. This means the recovery
 `trade_reconstruction` row has no trade_id linkage.