siloqy/PINK_DITAv2_E2E_TRACE_ANALYSIS.md

# PINK DITAv2 — End-to-End Trace & Flaw Analysis

**Analysis date:** 2026-05-31
**Method:** Full-trace static analysis — every file, every data path, every
boundary crossing in the PINK execution pipeline. No test execution.
**System scope:** 34 active source files, ~12,000 lines across Rust kernel,
Python bridge, venue adapter, runtime, and persistence.

> **Central flaw registry:** [PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md](./PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md)
> contains the combined catalog of all 116 flaws (A, T, E, F, G series) with
> severity distribution and cross-references. This file provides the deep E2E
> trace context — read the central registry for the master list.

---

## E2E Data Flow (One Call)

Every E2E path in the PINK system traces through this sequence. Each numbered
step below is a site where data crosses a module boundary and can be lost,
mangled, or misinterpreted.

```
PinkDirectRuntime.step()                    # R1: policy cycle entry
  ├─ pump_venue_events()                    # R2: drain async fills
  ├─ kernel.snapshot()["account"]           # R3: read capital
  ├─ kernel.slot(0)                         # R4: read slot state
  ├─ decision_engine.decide()               # R5: policy-layer ENTER/EXIT
  ├─ intent_engine.plan()                   # R6: intent sizing
  ├─ _decision_to_kernel_intent()           # R7: Decision → KernelIntent
  ├─ kernel.process_intent(kernel_intent)   # R8: KERNEL BOUNDARY
  │   ├─ rust_backend._intent_to_payload()  # R8a: KernelIntent → JSON
  │   ├─ _RustKernelLib.process_intent()    # R8b: JSON → C FFI
  │   │   └─ Rust process_intent()          # R8c: FSM mutates TradeSlot
  │   ├─ venue.submit(intent)               # R9: VENUE BOUNDARY
  │   │   ├─ bingx_venue._legacy_intent()   # R9a: KernelIntent → LegacyIntent
  │   │   ├─ BingxDirectExecutionAdapter    # R9b: HTTP POST /trade/order
  │   │   │   .submit_intent()
  │   │   └─ bingx_venue._events_from_submit() # R9c: receipt → VenueEvent[]
  │   └─ on_venue_event(event)              # R10: FEEDBACK BOUNDARY
  │       ├─ _RustKernelLib → Rust FSM      # R10a: C FFI → FSM transition
  │       ├─ account.settle(delta)          # R10b: incremental PnL settlement
  │       └─ persistence writes             # R10c: ClickHouse / Zinc / HZ
  ├─ kernel.snapshot()["account"]           # R11: read final capital
  └─ persistence.persist_step()             # R12: PERSISTENCE BOUNDARY
```

---

## Layer 1: Policy Cycle Entry (pink_direct.py:422)

### E1: `step()` calls `pump_venue_events()` every cycle unconditionally

**pink_direct.py:436**
```python
await self.pump_venue_events(snapshot, market_state=market_state)
```

This is called **before** reading slot/account state for the policy decision.
The pump calls `venue.reconcile()` which for `BingxVenueAdapter` does 5 HTTP
requests (balance, positions, open orders, plus history if `include_history`).

For MARKET-only workflows, no resting orders exist, so `reconcile()` returns
empty events every time. But the HTTP calls still happen. On BingX VST with
~10 req/s limit and a 5s policy cycle, this burns 1 req/s just to learn
"nothing changed." Add the actual trade HTTP calls, and the budget is tight.

**Flaw: E1 — unconditional exchange poll wastes rate limit.**
Already documented as A10, but worse when traced E2E: each `pump_venue_events`
calls `venue.reconcile()` → `_backend_snapshot()` → parallel `asyncio.gather`
of 3 HTTP GETs. The `_refresh_exchange_state` at bingx_direct.py:281-352
always fetches balance + positions + openOrders concurrently. Even when
`include_history=False` (which it is for the pump), that's 3 HTTP calls
every policy cycle regardless of whether any orders are resting.

**Severity: Medium.** Wasteful but not destructive on testnet.

### E2: `kernel.snapshot()["account"]` returns a fresh dict, not a live view

**pink_direct.py:437**
```python
acc = self.kernel.snapshot()["account"]
```

`ExecutionKernel.snapshot()` at rust_backend.py:740-752 builds a dict from
kernel state at call time. The decision/intent engines then consume this
snapshot. Between the snapshot and `process_intent()` (line 523), another
caller (or the same runtime in a concurrent cycle) could advance the kernel
state, making the decision based on stale capital.

**Flaw: E2 — TOCTOU between capital snapshot and intent execution.**
The `context.capital` read at line 437 is used at line 523 for the ENTER
safety guard (`_unsafe_entry_reason`) and possibly by the decision/intent
engines. If capital changes between these two points (e.g. an async fill
arrives via a concurrent test-HTTP path), the guard uses stale capital.

**Severity: Low** in single-threaded deployment. Critical under concurrency.

---

## Layer 2: Decision/Intent Bridging (pink_direct.py:79-115)

### E3: `_decision_to_kernel_intent` drops `order_type` and `limit_price`

**pink_direct.py:79-115**
```python
def _decision_to_kernel_intent(decision, intent, slot_id=0):
    return KernelIntent(
        ...
        # order_type and limit_price are NOT SET here
    )
```

`KernelIntent` has `order_type="MARKET"` and `limit_price=0.0` as defaults,
so MARKET orders work correctly. But the runtime **never** sets these fields
from the policy layer. If `decision` or `intent` ever carries `order_type`
or `limit_price`, it's silently dropped because the bridge doesn't map them.

**Flaw: E3 — LIMIT support in runtime is dead code.**
The `order_type`/`limit_price` fields in `KernelIntent` and the LIMIT payload
building in `bingx_direct.py` lines 384-398 are unreachable from the runtime.
The only path that can set them is direct `KernelIntent(...)` construction
in tests (`_build_pink_bodies.py` style scenarios). The `_decision_to_kernel_intent`
bridge must be patched when a policy engine needs to emit LIMIT orders.

**Severity: Medium.** Blocks any production path to LIMIT orders.

### E4: `_exit_intent_from_slot` trusts slot.size but slot may be stale

**pink_direct.py:398-420**
```python
def _exit_intent_from_slot(self, kernel_intent):
    try:
        slot_size = float(self.kernel.slot(int(kernel_intent.slot_id)).size or 0.0)
    except Exception:
        slot_size = 0.0
    ...
    exit_size = min(policy_size, slot_size) if policy_ok else slot_size
```

Reads `slot.size` fresh from the Rust kernel at call time, then uses it to
cap the exit size. Between this read and the `process_intent` call that
actually executes the EXIT (line 523), the slot can be modified by
`pump_venue_events` (line 436) or a concurrent cycle. If a partial fill
arrived between the slot read and the EXIT, the exit size could be wrong.

**Flaw: E4 — TOCTOU between exit sizing and exit execution.**
Same class as E2 but for exit size rather than capital. If the pump drained
a partial fill between R4 (slot read) and R8 (process_intent), the EXIT
requests a size based on pre-pump remaining size. The kernel caps it at
actual remaining, so this is self-correcting — but the intent payload has
wrong metadata.

**Severity: Low.** Self-correcting at kernel level.

---

## Layer 3: Kernel Bridge — Rust FSM Entry (rust_backend.py)

### E5: JSON serialization round-trip loses numeric precision

**rust_backend.py:460-485 (`_intent_to_payload`)**

`KernelIntent` fields like `reference_price`, `target_size`, `leverage` are
Python floats. They're serialized to JSON text, sent through C FFI, parsed
by serde_json into Rust `f64`, then serialized back to JSON, parsed by Python
`json.loads()`. Each serialization step can introduce precision loss:

```python
# Python float → JSON: 0.1 → "0.1" → Rust f64: 0.10000000000000000555
# Rust f64 → JSON: → serde_json may print "0.10000000000000001"
# Python json.loads → 0.10000000000000001
```

For prices (TRXUSDT at ~$0.08), a 1e-16 relative error is negligible. For
PnL accumulation over thousands of trades at 9x leverage, the error can grow
to cents or dollars. The `|Δcapital − realized| < 1e-9` assertion in tests
would catch gross errors but not sub-cent accumulation.

**Flaw: E5 — JSON serialization precision drift over long runs.**
**Severity: Low.** Not a practical concern for the current deployment scale.

### E6: `_RustKernelLib` is a global singleton — shared across all kernels

**rust_backend.py:40-45**
```python
_RUST: _RustKernelLib | None = None

def _get_rust() -> _RustKernelLib:
    global _RUST
    if _RUST is None:
        _RUST = _RustKernelLib()
    return _RUST
```

The `_RustKernelLib` singleton loads the `.so` shared library once and
provides FFI functions. Each `ExecutionKernel` instance gets its own
`KernelHandle` via `_get_rust().create(max_slots)`. The FFI functions take
the handle as the first argument, so multiple kernels are isolated at the
Rust level.

**However**, the singleton means ALL kernels share the same ctypes function
pointer table. If a second kernel is created and the first is destroyed,
`KernelHandle` of the first becomes a dangling pointer. Calling any FFI
function on the destroyed kernel's handle is use-after-free.

**Flaw: E6 — No protection against use-after-free on kernel destroy.**
Already documented as T7. Worth re-emphasizing in the E2E trace because the
test infrastructure creates and destroys kernels frequently (fresh-kernel
reconcile tests, each `_build_rb()` call in scenario wrappers).

**Severity: High.** Use-after-free in C FFI is memory corruption.

---

## Layer 4: Rust Kernel FSM (lib.rs:728)

### E7: ENTER handler silently allows re-entry with same trade_id

**lib.rs:740-745**
```rust
if !slot.is_free() && !slot.trade_id.is_empty() && slot.trade_id != intent.trade_id {
    return SLOT_BUSY;
}
```

If `slot.trade_id == intent.trade_id`, the ENTER is accepted even if the
slot is not free (e.g., POSITION_OPEN with an active position). This is by
design — it lets the same trade_id re-enter after the slot was partially
reconciled or restored from a snapshot. But it also means:

1. EXIT sets `slot.closed=true` and transitions to `CLOSED`
2. A new ENTER with the **same** trade_id re-enters the CLOSED slot
3. The slot resets `slot.closed=false`, `slot.size=0.0`, `slot.initial_size=0.0`
4. Kernel now thinks the trade is new, but the Rust indexes still have the
   old trade_id pointing to slot 0

**Downstream effect:** After a re-entry with the same trade_id, the
`active_trade_index[trade_id]` still correctly points to slot 0. But the
old `VenueOrder` in `client_order_index` and `venue_order_index` is still
present until the new entry fills and creates new orders. A reconcile event
addressed to the old `venue_client_id` could stomp on the new trade.

**Flaw: E7 — Re-entry with same trade_id leaves stale index entries.**
**Severity: Low.** The `rebuild_indexes()` call in `commit_slot()` rebuilds
from scratch, so stale entries are cleared on the first write.

### E8: EXIT handler uses `initial_size` not `current size`

**lib.rs:770-775**
```rust
let exit_ratio = slot.next_exit_ratio();
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
let exit_size = (base_size * exit_ratio).max(0.0);
```

Already documented as A1. In the E2E trace, this is the single most impactful
execution flaw. A concrete scenario:

1. Enter `size=1.0`, `initial_size=1.0`, `exit_leg_ratios=(0.5, 0.5, 1.0)`
2. EXIT leg 0: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.5.
3. EXIT leg 1: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.0.
   `active_leg_index` advances to 2. `all_legs_done = (2 >= 3) = false`.
   But wait — `exit_leg_ratios.len()` is 3: [0.5, 0.5, 1.0]. So
   `all_legs_done = (2 >= 3) = false`. The slot stays at `POSITION_OPEN`,
   `size=0.0`, `!closed`.
4. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is at 0.0.
   `slot.is_free()`: `fsm_state=POSITION_OPEN`, not in `{IDLE, CLOSED}`.
   `slot.size <= 0.0` is true. But `!slot.is_free()` returns true because
   of the FSM state check, not the size check. The ENTER guard `!slot.is_free()`
   blocks re-entry. The EXIT guard `slot.is_free() || slot.closed || size <= 0.0`
   triggers — returns `NO_OPEN_POSITION`.
5. **Slot is stuck forever.** No operation can advance it.

**Severity: High.** Concrete, reproducible, and not caught by any test.

### E9: CANCEL handler returns diagnostic even when nothing happened

**lib.rs:795-810**
```rust
if matches!(intent.action, KernelCommandType::CANCEL) {
    let has_cancellable_exit = slot.active_exit_order.is_some();
    let has_cancellable_entry = slot.active_entry_order.is_some()
        && matches!(slot.fsm_state, ENTRY_WORKING | ORDER_REQUESTED | ORDER_SENT | IDLE);
    if !has_cancellable_exit && !has_cancellable_entry {
        return KernelResult {
            outcome: KernelOutcome {
                accepted: false,
                diagnostic_code: NO_ACTIVE_EXIT_ORDER,
                ...
            },
            ...
        };
    }
    return KernelResult {
        outcome: KernelOutcome {
            accepted: true,
            ...
        },
        ...
    };
}
```

Two issues:
1. When **neither** is cancellable, the diagnostic is `NO_ACTIVE_EXIT_ORDER`
   even if the actual reason is "no active entry order either" or "slot is
   already IDLE". The diagnostic is misleading.
2. When at least one IS cancellable, the Rust kernel returns `accepted=true`
   but does **not** mutate the slot at all — it returns immediately with the
   slot as-is. The actual cancel (HTTP call + FSM transition) happens in the
   Python bridge. The Rust kernel's "accept" just means "yes you may try to
   cancel this" — not "the cancel is complete."

This disconnect means: if the Python bridge's `venue.cancel()` fails (HTTP
error), the Rust kernel has already returned `accepted=true` for a cancel
that never happened. The caller sees `accepted=true` but the slot state
hasn't changed.

**Flaw: E9 — Rust CANCEL "accepts" before Python actually cancels.**
**Severity: Medium.** The `outcome.accepted` boolean is misleading for CANCEL.

### E10: `apply_fill` entry branch double-sets `active_entry_order`

**lib.rs:1330-1390**
```rust
// First set — at the top of the entry branch:
slot.active_entry_order = Some(VenueOrder {
    ...
    filled_size: fill_size,
    status: if partial { PARTIALLY_FILLED } else { FILLED },
    ...
});

// ... then later for full fill:
if !partial {
    slot.fsm_state = TradeStage::POSITION_OPEN;
    slot.active_entry_order = Some(VenueOrder {  // SECOND SET
        ...
        filled_size: slot.size,    // uses updated slot.size
        ...
    });
}
```

The entry branch sets `active_entry_order` at the top with `filled_size` from
the event, then for a FULL_FILL, sets it again with `filled_size = slot.size`
(which may have been updated by `slot.initial_size = fill_size` above). The
first VenueOrder's `intended_size` is from the event, the second uses
`slot.size`. Both are correct in isolation, but the double-write is wasteful.

More importantly, for a PARTIAL_FILL entry, the first set is the ONLY set.
If a second PARTIAL_FILL arrives for the same order, the entry branch at
line 1334 checks `slot.active_entry_order.is_some()` which is true (set by
the first partial), but the FSM state is `ENTRY_WORKING` (also set by first
partial). The condition at line 1334-1338 matches `ENTRY_WORKING`, so the
second partial enters the entry branch again. But `fill_size` is the event's
`filled_size` — the **total** filled, not the incremental amount.

**Flaw: E10 — Second PARTIAL_FILL on entry overwrites, doesn't accumulate.**
```rust
let fill_size = if event.filled_size > 0.0 {
    event.filled_size      // ← TOTAL filled, not incremental
} else {
    event.size
}.max(0.0);

slot.active_entry_order = Some(VenueOrder {
    ...
    filled_size: fill_size,  // ← overwrites previous filled_size
    ...
});

slot.initial_size = slot.initial_size.max(fill_size);  // ← OK, uses max
slot.size = fill_size;  // ← OVERWRITES previous size with total
```

On a RESTING LIMIT entry that partially fills in two events:
- Event 1: filled_size=0.3 → slot.size=0.3, entry_order.filled_size=0.3
- Event 2: filled_size=0.7 → slot.size=0.7, entry_order.filled_size=0.7

The `filled_size` on the VenueOrder correctly reflects cumulative fill
(0.7), but `slot.size` jumps from 0.3 to 0.7 — the increment is 0.4, which
is correct because `fill_size` IS the cumulative fill (0.7). Actually this
is correct — the venue sends cumulative filled_size, not incremental. Let
me re-verify: at `bingx_venue._events_from_submit()` line ~480:
```python
filled_size = _row_float(ack_row, "executedQty", ...)
```
This reads `executedQty` which on BingX IS cumulative. So the second event's
`filled_size=0.7` means "total filled across all fills = 0.7." The kernel
sets `slot.size = 0.7` which is the total position size. This is correct.

But the second fill event has `slot.entry_price` overwritten by the new
fill's price. If the first fill was at 0.0834 and the second at 0.0836, the
slot's `entry_price` becomes 0.0836 — losing the blended average. For a LIMIT
entry with two partial fills at different prices, the entry_price in the slot
is the price of the LAST fill, not the VWAP.

**Flaw: E10a — Entry price on multi-partial entry is last-fill, not VWAP.**
**Severity: Low.** Unrealized PnL computation uses this price. Error is small
for tight spreads.

---

## Layer 5: Venue Adapter Boundary (bingx_venue.py)

### E11: `_legacy_intent()` is a lossy conversion

**bingx_venue.py:270-285**
```python
@staticmethod
def _legacy_intent(intent: KernelIntent) -> LegacyIntent:
    action = LegacyDecisionAction.ENTER if intent.action == E.ENTER else ...
    side = LegacyTradeSide.SHORT if intent.side == TS.SHORT else ...
    metadata = dict(intent.metadata)
    metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
    metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
    return LegacyIntent(
        timestamp=intent.timestamp,
        trade_id=intent.trade_id,
        decision_id=intent.intent_id,
        asset=intent.asset,
        action=action,
        side=side,
        reason=intent.reason,
        target_size=float(intent.target_size),
        leverage=float(intent.leverage),
        reference_price=float(intent.reference_price),
        confidence=1.0,           # ← HARDCODED
        bars_held=0,              # ← HARDCODED
        exit_leg_ratios=tuple(intent.exit_leg_ratios or (1.0,)),
        metadata=metadata,
    )
```

`confidence` is always 1.0 and `bars_held` is always 0. The `LegacyIntent`
carries these to `BingxDirectExecutionAdapter.submit_intent()` which ignores
them (it only reads `asset`, `side`, `action`, `target_size`, `leverage`,
and `metadata`). So the hardcoded values don't affect execution — but they
affect the `ExecutionReceipt` and any downstream consumers that might read
`receipt.confidence`.

**Flaw: E11 — Lossy conversion with hardcoded metadata.**
**Severity: Informational.** No downstream consumer reads these fields.

### E12: `_events_from_submit()` price fallback chain can lose venue price

**bingx_venue.py:375-400 (`_events_from_submit`)**
```python
base_event = VenueEvent(
    ...
    price=safe_float(getattr(receipt, "price", 0.0), 0.0),
    ...
)

# ... later for fill event:
fill_price = safe_float(
    _row_float(ack_row, "avgPrice", "ap", "price", "lastFillPrice",
               default=getattr(receipt, "price", 0.0)),
    0.0
)
```

The fill price is read from `ack_row` (the HTTP response dict) first, falling
back to `receipt.price` (the `ExecutionReceipt` field). The `executionReceipt`
price comes from `bingx_direct.py:434`:
```python
fill_price = 0.0
for key in ("avgPrice", "avgFilledPrice", "price", "lastFillPrice", "tradePrice"):
    try: value = float(ack_row.get(key) or 0.0)
    except: value = 0.0
    if value > 0: fill_price = value; break
if fill_price <= 0 and self._state is not None:
    fill_price = next((float(...)) for ... in self._state.open_positions.values() ...)
```

So the price flows: BingX HTTP ack → `ack_row[key]` → `receipt.price` →
`_events_from_submit()` → `fill_price` in VenueEvent.

If `ack_row` has no price field AND `self._state.open_positions` has no matching
position (e.g., first fill on a new entry), `fill_price` stays 0.0. The kernel's
`apply_fill` at lib.rs:1397 checks `if event.price > 0.0` before setting
`entry_price` — so a zero fill price leaves `entry_price` at 0.0. This means:

- The slot's `entry_price` stays 0.0
- `realized_pnl()` at lib.rs:662 checks `if slot.entry_price <= 0.0` → returns 0.0
- **PnL is never computed for this fill**
- Capital never settles

This is very unlikely on BingX VST, which always returns `avgPrice` in order
acknowledgements. But on any venue that doesn't, PnL is silently zeroed.

**Flaw: E12 — Zero fill price → zero entry_price → zero PnL.**
**Severity: Medium.** Silent PnL loss if venue returns no price.

### E13: `_backend_snapshot()` timeout returns stale data

**bingx_venue.py:290-320**
```python
def _backend_snapshot(self, *, include_history=False, timeout_ms=5000.0):
    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
        with self._snap_lock:
            return self._last_snapshot  # ← STALE DATA
```

If the previous snapshot fetch is still in-flight when a new caller arrives,
the timeout returns `self._last_snapshot` — which could be seconds or minutes
old. The caller (e.g., `submit()`) then uses this stale snapshot to compute
`_filled_size_from_snapshots()` — potentially comparing stale "before" data
with fresh "after" data, producing a wrong delta.

**Flaw: E13 — Stale snapshot fallback causes wrong fill-size detection.**
**Severity: Medium.** The `_filled_size_from_snapshots` diff can be wrong.

### E14: `_events_from_cancel` uses stale `slot_id` from order metadata

**bingx_venue.py:485-510**
```python
VenueEvent(
    ...
    slot_id=int(order.metadata.get("slot_id", 0) or 0),
    ...
)
```

The `slot_id` in the CANCEL event comes from the `VenueOrder.metadata` which
was set when the order was created (in Rust FSM's `process_intent` or
`on_venue_event`). If the slot was re-assigned or the kernel's slot count
changed since order creation, this slot_id is wrong. The Rust kernel's
`resolve_slot()` at lib.rs:610-624 would use the event's `slot_id` (the
stale one) and find the wrong slot.

**Flaw: E14 — Cancel event carries stale slot_id from order creation.**
**Severity: Low.** Slots are stable and never renumbered.

---

## Layer 6: BingX Direct Adapter (bingx_direct.py)

### E15: Submit sets leverage via separate HTTP call

**bingx_direct.py:376-379**
```python
await self._client.signed_post(
    "/openApi/swap/v2/trade/leverage",
    {"symbol": symbol, "side": "BOTH", "leverage": leverage},
)
```

This is a POST to set exchange leverage **before** each order. If this call
fails (rate limit, network error), the exception at line 417 sets
`status = "RATE_LIMITED"` and returns a rejection — the order is NOT
submitted. But the error handling at line 417 catches `BingxHttpError` for
the leverage call AND the order call with the same handler. If the leverage
call fails with a non-rate-limit error (e.g., `400 Bad Request` for invalid
symbol), the status is `"REJECTED"` and no order is placed. This is correct
behavior — but the error message doesn't distinguish "leverage set failed"
from "order submission failed."

**Flaw: E15 — Leverage-set failure and order failure share error handler.**
**Severity: Low.** Correct behavior, poor diagnostics.

### E16: `_format_quantity` and `_format_price` use `_instrument_step`/`_instrument_tick` — both may be zero

**bingx_direct.py:234-268**
```python
def _instrument_step(self, asset):
    instrument = self._resolve_instrument(asset)
    if instrument is not None:
        try: return Decimal(str(instrument.size_increment.as_decimal()))
        except: pass
    return Decimal("0.001")  # fallback

def _format_quantity(self, asset, quantity):
    step = self._instrument_step(asset)
    if step <= 0:
        return str(max(0.0, quantity))
    ...
```

If `_resolve_instrument` returns None (asset not in provider), `step=0.001`
and `tick=0.01`. These defaults are correct for most USDT perpetuals on
BingX VST, but may be wrong for non-standard symbols. The format functions
still produce a valid string — just possibly with wrong precision.

More concerning: `_resolve_instrument` at line 211-226 tries three lookup
strategies and iterates all instruments on the third. This iteration is O(n)
in the number of instruments and happens on EVERY `submit_intent()` call.
With 540 instruments, this is ~0.5ms — acceptable. But `_instrument_step`
and `_instrument_tick` each call `_resolve_instrument` independently, so
`submit_intent()` calls it twice (once for quantity, once for price, plus
once for `_instrument_venue_symbol` at line 358). Three full-instrument-list
iterations per order.

**Flaw: E16 — Instrument resolution called 3x per order with O(n) scan.**
**Severity: Low.** Performance, not correctness.

### E17: Cancel uses truth-based confirmation — can mask real errors

**bingx_direct.py:474-498**
```python
still_open = True
try:
    oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
    ...
    still_open = (venue_order_id in ids) if venue_order_id else (venue_client_id in cids)
except Exception:
    still_open = None

if still_open is False:
    return {"status": "CANCELED", ...}
if str(delete_resp.get("status", "")).upper() in {"CANCELED", "CANCELLED", "SUCCESS", "OK"}:
    return {"status": "CANCELED", ...}
return {"status": delete_resp.get("status", "REJECTED"), ...}
```

The cancel logic:
1. DELETE the order on BingX
2. GET open orders to verify
3. If the order is no longer open, return CANCELED
4. If the DELETE response says CANCELED, return CANCELED
5. Otherwise return REJECTED

If step 2's GET fails (network error, rate limit), `still_open=None`.
Then step 4 checks the DELETE response. If the DELETE also returned an error
(e.g., "order not found" because it was already cancelled by another caller),
`status` is `"ERROR"` or `"not found"` — neither matches `"CANCELED"`.
The cancel is reported as `REJECTED` even though the order IS cancelled.

The `bingx_venue._events_from_cancel()` then emits `CANCEL_REJECT` instead
of `CANCEL_ACK`. The Rust kernel handles `CANCEL_REJECT` at lib.rs:1218:
```rust
KernelEventKind::CANCEL_REJECT => {
    if slot.fsm_state == TradeStage::EXIT_WORKING {
        slot.fsm_state = TradeStage::EXIT_WORKING;  // no-op
    }
    diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
}
```

The slot stays in its current state (e.g., `EXIT_WORKING`) with no active order
(the exchange has no record of it). The slot is stuck until a manual reconcile.

**Flaw: E17 — Cancel can return false REJECTED for already-cancelled orders.**
**Severity: Medium.** Leads to stuck slot requiring manual intervention.

---

## Layer 7: Fill Feedback Loop (rust_backend.py on_venue_event)

### E18: `on_venue_event` settles PnL incrementally — but fees are never included

**rust_backend.py:530-545**
```python
incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
if abs(incremental_pnl) > 1e-12:
    self.account.settle(incremental_pnl)
    self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
```

The Rust kernel's `apply_fill` computes realized PnL as:
```rust
let realized = Self::realized_pnl(slot, event.price, fill_size);
slot.realized_pnl += realized;
```

No fee subtraction. No commission reading from the event. The `VenueEvent`
could carry fee data via `metadata["fee"]` or `raw_payload["commission"]`,
but the Rust kernel doesn't read it and the Python bridge doesn't extract it.

Over the 142 live test scenarios on VST (where fees are 0 or negligible),
this is invisible. On live mainnet with exchange fees of 0.02-0.04%, the
cumulative error is unbounded.

**Flaw: E18 — PnL settlement ignores fees.**
Already documented as A7. In the E2E trace, the gap is specifically here:
`VenueEvent.price` is used for `realized_pnl()` but `VenueEvent.metadata`
(which could carry `commission` from the venue) is never read.

**Severity: Medium** (grows with trade volume).

### E19: `observe_slots` called with ALL slots, not just changed ones

**rust_backend.py:538-545**
```python
slots = [self._get_slot(i) for i in range(self.max_slots)]
self.account.observe_slots(slots)
```

Every `on_venue_event` call re-reads ALL slots from the Rust kernel (N FFI
calls) and calls `observe_slots` with the full list. With `max_slots=10`,
this is 10 FFI round-trips per venue event. Each round-trip serializes a
TradeSlot to JSON, passes through C FFI, parses on the Rust side, serializes
the result, passes back, and parses on the Python side. For a multi-leg EXIT
with 3 fills (ACK + PARTIAL + FULL), that's 3 × 10 = 30 slot reads per
process_intent call.

**Flaw: E19 — Full-slot-list read on every event is N×FFI overhead.**
**Severity: Low** (performance). Not a correctness issue.

---

## Layer 8: Persistence Boundary (pink_clickhouse.py)

### E20: `_capital()` reads live from `AccountProjection` — stale row risk

**pink_clickhouse.py:199-200**
```python
def _capital(self) -> float:
    return float(self.account.snapshot.capital or 0.0)
```

Every row writer calls `_capital()` at write time to get the current capital.
But `persist_result()` is called AFTER `kernel.process_intent()` returns —
at which point the account has already been settled. The `account_events`,
`position_state`, and `trade_events` rows all record the SAME capital value
(the post-settle value). `capital_before` is then reconstructed by
subtracting PnL (already documented as A5).

The effect: all ClickHouse rows for a single `process_intent()` call show
identical `capital` / `account_capital` / `portfolio_capital` values, because
they're all written within the same Python call stack with no intervening
events. This is correct for single-threaded operation — all rows reflect
POST-trade state. But it means ClickHouse querying for "capital before trade"
must use `capital_after - pnl`, which is the wrong formula under multi-slot.

**Flaw: E20 — All persistence rows write post-trade capital, not pre-trade.**
Already documented as A5 from the capital_before angle.

**Severity: High** for multi-slot accounting reconstruction.

### E21: `persist_fill_events()` synthesizes fake Decision/Intent

**pink_clickhouse.py:383-435**
```python
def persist_fill_events(self, *, snapshot, events, slot_dict, market_state):
    ...
    decision = Decision(
        timestamp=ts, decision_id=trade_id or "async", asset=asset,
        action=action, side=side, reason="ASYNC_FILL",
        confidence=0.0, velocity_divergence=0.0, irp_alignment=0.0,
        reference_price=price, target_size=cur_size, leverage=leverage,
        ...
    )
    intent = Intent(
        timestamp=ts, trade_id=trade_id, decision_id=trade_id or "async",
        ...
    )
```

The async fill pump (called by `pump_venue_events`) constructs fake
Decision/Intent objects because there's no real policy decision backing an
async fill — it just arrived from the exchange. These synthetic objects have:
- `decision_id = trade_id` (or `"async"` if trade_id is empty)
- `decision_id` and `trade_id` are the same string
- `confidence=0.0`, `velocity_divergence=0.0`, `irp_alignment=0.0`
- `target_size = cur_size` (the remaining size after the fill, not the
  size that was filled)

These are written to `policy_events`, `trade_reconstruction`, and
`trade_events` with the same row shapes as real policy-driven fills. Any
ClickHouse query that joins `policy_events` to `trade_events` on
`decision_id` will find matching rows (both set to `trade_id`), but the
policy_events row's `target_size` is the POST-fill size, not the pre-fill
size. A replay system that reconstructs position from `policy_events` →
`trade_reconstruction` would see incorrect sizing.

**Flaw: E21 — Async fill persistence uses synthetic decision with wrong data.**
**Severity: Medium.** Misleading historical records.

### E22: `_write_trade_exit_leg` capital_before uses arithmetic reconstruction

**pink_clickhouse.py:761-762**
```python
capital_after = self._capital()
capital_before = capital_after - pnl_leg
```

Already documented as A5. In the E2E trace, the specific path is:
1. Slot 0 exit leg fills → `_capital()` returns capital AFTER settlement
   (because the kernel's `on_venue_event` already called `account.settle`)
2. `capital_before = capital_after - pnl_leg` reconstructs pre-leg capital

If slot 1 also settled between the leg fill and the persistence write
(possible in multi-threaded or concurrent scenario), `capital_after` includes
slot 1's PnL, and `capital_before` is wrong by exactly slot 1's contribution.

**Severity: High** for multi-slot.

### E23: `_write_trade_event` uses `slot_dict.get("entry_price")` as exit_price

**pink_clickhouse.py:813-815**
```python
entry_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)
exit_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)  # ← SAME FIELD
```

Already documented as A13. The `exit_price` is set to `entry_price` from
the same slot dict field. The BingX ack payload does contain the fill price,
but it's not propagated to the slot dict's `entry_price` for exit fills —
the slot's `entry_price` is set during entry fill and remains unchanged
during exit. The exit fill price is only on the `VenueEvent`, which is not
passed through to `_write_trade_event`.

The `trade_events` row in ClickHouse always shows `exit_price == entry_price`,
making PnL reconstruction from `(exit_price - entry_price) × size × lev`
impossible. The `pnl` field IS correct (it's `slot.realized_pnl`), but only
the summary is accurate — the component prices are wrong.

**Severity: Low.** `pnl` is correct, only the decomposed price is wrong.

---

## Layer 9: Test Infrastructure

### E24: `MockVenueAdapter.submit()` always emits fill on `partial_fill_ratio > 0`

**mock_venue.py:60-90**
```python
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
    fill_ratio = max(0.0, min(1.0, float(effective_ratio)))
    ...
    if is_entry:
        effective_ratio = self.scenario.entry_partial_fill_ratio if \
            self.scenario.entry_partial_fill_ratio != 1.0 else \
            self.scenario.partial_fill_ratio
    else:
        effective_ratio = self.scenario.exit_partial_fill_ratio ...
```

The default `MockVenueScenario()` has `partial_fill_ratio=1.0`. So every
`submit()` call on a default mock emits a FULL_FILL event immediately.
This means mock-venue tests always test the "order fills instantly" path —
they never test resting orders, partial fills, or async fills.

Any test that relies on the mock venue is testing a subset of real venue
behavior. The mock never produces:
- DELAYED fills (fill arrives on a later `reconcile()` call)
- PARTIAL fills with subsequent fills
- Partial fills during entry (entry fills partially, then more later)
- Mixed entry/exit partial behavior

**Flaw: E24 — Mock venue always fills synchronously — never tests async path.**
**Severity: Medium.** The `pump_venue_events()` path has never been exercised
with the mock venue.

### E25: Test scenarios use MARKET-only `_si()` helper — no LIMIT tests

**gen_live_tests.py and _gen_test.py**

The `_si()` helper constructs a `KernelIntent` with `order_type="MARKET"` and
`limit_price=0.0` (the defaults). All 157 live test scenarios use `_si()`.
The 3 "LIMIT" scenarios (`limit_does_not_fill`, `limit_immediate_fill`) use
`reference_price=0.0` and `target_size=-0.001` respectively — they test
**intent validation**, not actual LIMIT order submission.

There is **zero** live-test coverage of:
- Submitting a LIMIT order that rests on the book
- A resting LIMIT being cancelled
- A resting LIMIT receiving a partial fill then a subsequent fill
- An async fill arriving via `pump_venue_events()`

The Rust kernel's `PARTIAL_FILL` event handling and the Python bridge's
`on_venue_event` + incremental settle + async pump has never been exercised
on a live exchange.

**Flaw: E25 — Zero live tests for LIMIT/resting/async-fill paths.**
**Severity: High.** The partial-fill code path is untested in production.

### E26: Fresh-kernel reconcile tests create second kernel but share venue

**gen_live_tests.py** (fresh_kernel_reconcile_entry body)
```python
fresh = _build_fresh_kernel_from_slot(slot_data, ic=cb)
k2 = fresh.runtime.kernel
```

The `_build_fresh_kernel_from_slot` function creates a new `PinkDirectRuntime`
with a new `ExecutionKernel`. But the **venue adapter** is shared or
re-created with the same BingX backend. Two kernels making concurrent HTTP
calls to BingX through shared or separate venue adapters is exactly the
multi-threaded scenario that triggers T1 (Rust kernel UB) — except the tests
are sequential, not concurrent, so they don't trigger it.

The fresh kernel does NOT restore the venue state (open orders, positions).
The fresh kernel has a blank venue adapter state — it can't know about
previous LIMIT orders resting on the exchange. This is correct for MARKET-only
tests (no resting orders) but would fail for LIMIT tests.

**Flaw: E26 — Fresh-kernel reconcile doesn't restore venue state.**
**Severity: Medium** (would break LIMIT scenarios).

---

## Summary: Critical E2E Flaw Chain

The most dangerous E2E scenario is a **LIMIT order with partial fills** on
a live exchange:

```
1. Policy emits LIMIT ENTER                       [E3: can't happen — bridge drops order_type]
2. KernelIntent with order_type="LIMIT"            [dead code path from step 1]
3. bingx_direct.submit_intent builds LIMIT payload [works if reached]
4. BingX accepts LIMIT, returns ACK with no fill   [VenueEvent.price may be 0]
5. FSM transitions to ENTRY_WORKING                [correct]
6. RESTING LIMIT sits on book                      [no further kernel events]
7. Next policy cycle: pump_venue_events()           [E1: expensive HTTP calls]
8. Reconciled venue has no fill events              [nothing to drain]
9. Repeated cycles with no progress                 [wasteful but safe]
10. Eventually BingX fills partially               [VenueEvent arrives]
11. apply_fill PARTIAL_FILL entry branch runs       [E10: entry_price = last fill, not VWAP]
12. on_venue_event settles incremental PnL          [E18: fees not included]
13. persistence writes                              [E20/E21/E22/E23: wrong capital_before, exit_price]
14. Remaining LIMIT still rests on book             [continues to step 7]
15. Eventually full fill or cancel                  [E17: cancel can return false REJECTED]
```

**None of steps 4-15 have live test coverage.**

---

## Complete Flaw Catalog (All Layers)

| # | Flaw | Layer | Step | Severity |
|---|------|-------|------|----------|
| E1 | Unconditional pump_venue_events wastes rate limit | Runtime | R2 | Medium |
| E2 | TOCTOU between capital snapshot and intent | Runtime | R3→R8 | Medium |
| E3 | Runtime bridge drops order_type/limit_price | Bridging | R7 | **Medium** |
| E4 | TOCTOU between exit sizing and execution | Runtime | R8 | Low |
| E5 | JSON precision drift over long runs | Bridge | R8a→R8c | Low |
| E6 | Global FFI singleton no guard vs use-after-free | Bridge | R8b | **High** |
| E7 | Same-trade-id re-entry leaves stale index entries | Rust | R8c | Low |
| E8 | EXIT uses initial_size not remaining size | Rust | R8c | **High** |
| E9 | CANCEL "accepted" before cancel actually happens | Rust | R8c | Medium |
| E10 | Entry price on multi-partial fill = last fill, not VWAP | Rust | R10a | Low |
| E11 | _legacy_intent hardcodes confidence/bars_held | Venue | R9a | Info |
| E12 | Zero fill price → zero PnL | Venue | R9c | Medium |
| E13 | Stale snapshot fallback causes wrong fill delta | Venue | R9c | Medium |
| E14 | Cancel event carries stale slot_id | Venue | R9c | Low |
| E15 | Leverage-set failure and order failure share handler | Adapter | R9b | Low |
| E16 | Instrument resolution 3x per order, O(n) scan | Adapter | R9b | Low |
| E17 | Cancel returns false REJECTED for already-cancelled | Adapter | R9b | Medium |
| E18 | PnL settlement ignores fees | Bridge | R10b | **Medium** |
| E19 | Full-slot-list read on every event = N×FFI overhead | Bridge | R10b | Low |
| E20 | All persistence rows write post-trade capital | Persistence | R12 | **High** |
| E21 | Async fill uses synthetic Decision with wrong size | Persistence | R12 | Medium |
| E22 | capital_before arithmetic reconstruction wrong | Persistence | R12 | **High** |
| E23 | trade_events exit_price = entry_price | Persistence | R12 | Low |
| E24 | Mock venue always fills synchronously | Test | — | Medium |
| E25 | Zero live tests for LIMIT/async-fill paths | Test | — | **High** |
| E26 | Fresh-kernel reconcile doesn't restore venue | Test | — | Medium |

**Total: 26 E2E flaws (4 High, 10 Medium, 11 Low, 1 Info)**

The four High-severity flaws in the E2E trace:
- **E6**: Global FFI singleton + `__del__` use-after-free — memory corruption risk
- **E8**: Exit-size overshoot — slot can get stuck (A1)
- **E20/E22**: Post-trade capital in all persistence rows + arithmetic
  capital_before — ClickHouse records are misleading for accounting
- **E25**: No LIMIT/async-fill test coverage — partial-fill path is production
  code with zero live validation

---

## PASS 3 — NEW FINDINGS (Deepest E2E Trace)

### F1: `process_intent` CANCEL returns "accepted" before the cancel happens — caller gets wrong `outcome.state`

**File:** `rust_backend.py:595-614`

The CANCEL path:
1. Calls `self.venue.cancel(order)` → HTTP DELETE → returns `VenueEvent[]`
2. For each event, calls `self.on_venue_event(event)` → Rust FSM transition
3. Assembles `final_outcome` from the Rust kernel's **pre-venue-event** slot state

```python
outcome = _outcome_from_payload(result["outcome"])  # Rust CANCEL accepts (slot NOT mutated yet)
# ... venue.cancel() ...
# ... on_venue_event() for each event (now slot IS mutated) ...
final_slot = self._get_slot(outcome.slot_id)         # Re-reads post-mutation state
final_outcome = KernelOutcome(
    accepted=outcome.accepted,        # TRUE — from Rust's pre-event accept
    state=final_slot.fsm_state,       # IDLE — from post-event state
    diagnostic_code=outcome.diagnostic_code,  # "OK" — from Rust's pre-event accept
)
```

For ENTER/EXIT, the same pattern exists — the Rust kernel's `outcome` is
pre-venue. But for CANCEL the disconnect is worst: Rust returns `accepted=true`
with the slot still in `ENTRY_WORKING`, and only the subsequent
`on_venue_event(CANCEL_ACK)` transitions to `IDLE`.

**Fix:** The diagnostic code should be reconciled with the actual venue outcome,
not taken from the pre-venue Rust outcome.

**Severity: Medium**

### F2: `_last_settled_pnl` reset before `venue.submit()` — transient window

**File:** `rust_backend.py:597-604`

```python
if intent.action == KernelCommandType.ENTER and outcome.accepted:
    self._last_settled_pnl[intent.slot_id] = 0.0   # reset HERE
# ... venue.submit() called below ...
```

If `venue.submit()` fails (HTTP error, rate limit), the ENTER was accepted by
the Rust FSM but no venue order was placed. The slot is stuck in
`ORDER_REQUESTED`. If the caller retries the same ENTER, `_last_settled_pnl`
is 0.0 from the first attempt — correct for a new trade.

**Real risk:** If the previous trade on this slot had realized PnL that was
never settled (impossible with incremental settle, but hypothetically), resetting
to 0.0 loses that PnL. In practice, incremental settle makes this safe.

**Severity: Medium** (retry-safe, but exposes slot-stall)

### F3: `_first_invalid_intent_field` allows `leverage=0` and `target_size=0`

**File:** `rust_backend.py:295-316`

The guard catches NaN/Inf and negative `target_size`. Does NOT catch:
- `leverage=0` or negative (Rust silently falls back to 1.0)
- `target_size=0` (submits zero-quantity order to BingX)
- `reference_price=0` (mark_price ignores non-positive)
- `limit_price=0` with `order_type="LIMIT"` (BingX rejects price=0)

The zero-target-size case: a direct `process_intent(EXIT, target_size=0.0)`
computes `exit_size = 0`, submits MARKET order with quantity=0 to BingX,
which may return an error or silent no-op.

**Severity: Low** (runtime's `_exit_intent_from_slot` prevents for EXIT; direct
kernel API users can trigger it)

### F4: `outcome.emitted_events` only contains venue events — Rust kernel's events silently dropped

**File:** `rust_backend.py:641-652`

```python
final_outcome = KernelOutcome(
    emitted_events=tuple(emitted_events),  # only from venue.submit()
)
```

The Rust kernel's `KernelOutcome` struct has `emitted_events` — currently always
empty because the Rust FSM never sets it. If a future change adds Rust-side
event emission, those events are silently dropped: `final_outcome` only uses
the Python-side list.

**Severity: Low** (no Rust-emitted events exist today)

### F5: `on_venue_event` does redundant FFI read of slot already returned by Rust

**File:** `rust_backend.py:698-706**

```python
def on_venue_event(self, event):
    result = _get_rust().on_venue_event(...)
    outcome = _outcome_from_payload(result["outcome"])
    slot_payload = result.get("slot")
    slot = _slot_from_payload(slot_payload) if slot_payload else self._get_slot(...)
    # ...
    current = self._get_slot(slot.slot_id)  # REDUNDANT — slot already has this data!
    self.projection.write_slot(current)
```

Line 706 re-reads `current` from the backend even though `slot` (from the
Rust result) already has the exact same data. Each redundant FFI read is
JSON serialize → C FFI → Rust serialize → C FFI → Python parse — ~100μs.
With 2-3 events per process_intent and 10 slots, ~3ms wasted per cycle.

**Severity: Low** (performance)

### F6: `_record_transitions` in `process_intent` records pre-venue transitions with `event=None`

**File:** `rust_backend.py:708, 650**

```python
# process_intent line 650:
self._record_transitions(outcome.transitions, final_slot, None)  # event=None

# on_venue_event line 708:
self._record_transitions(outcome.transitions, slot, event)  # event attached
```

Venue-event transitions ARE recorded individually inside each
`on_venue_event` call (line 708). The journal has all transitions. But the
pre-venue transitions (from Rust FSM before venue call) have `event=None`
attached — no event context for the journal reader.

**Severity: Informational** (diagnostic inconvenience only)

### F7: `reconcile_from_slots` writes ALL slots to projection/zinc, not just reconciled ones

**File:** `rust_backend.py:718-733**

```python
for current in slots:          # iterates ALL max_slots
    self.projection.write_slot(current)   # writes unchanged slots too
    self.zinc_plane.write_slot(current)
```

After reconcile, ALL slots are written to projection and Zinc, even if the
reconcile only modified one slot. Slots 1-9 are serialized and written with
their unchanged state. Wasteful but harmless.

Also: Rust kernel's `reconcile_slots_json` silently ignores `slot_id` out of
range — no error returned. Caller sees `accepted=true` even if no slots were
reconciled.

**Severity: Low**

### F8: `HazelcastRowWriter.put()` is synchronous with no error handling — Hazelcast failure crashes the intent

**File:** `hazelcast_projection.py:30-48**

```python
class HazelcastRowWriter:
    def __call__(self, name, row):
        if name.endswith("trade_events"):
            self.client.get_topic(name).publish(json.dumps(row, ...))
            return
        self.client.get_map(name).put(key, json_safe(row))  # synchronous, no try/except
```

No try/except. Hazelcast `put()` is synchronous — blocks until the cluster
acknowledges. If Hazelcast is down, under load, or partitioned, this:

1. Blocks the calling thread (which holds the Rust kernel handle — no other
   operation can proceed)
2. Raises an exception that propagates through `_set_slot()` → `process_intent()`
   → crashes the entire intent

**Severity: Medium** (Hazelcast failure in hot path stalls execution)

### F9: `RealZincPlane.write_slot()` serializes ALL slots, not just the changed one

**File:** `real_zinc_plane.py:205-212**

```python
def write_slot(self, slot):
    with self._lock:
        self._slot_cache[int(slot.slot_id)] = slot
        payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
        self._write_region(self.state_region, self._state_seq, payload)
```

Every single-slot write serializes ALL `slot_count` slots (default 10) to JSON.
With VenueOrder metadata, each slot payload can be ~1-5KB → 10-50KB per write.
This is written to Zinc shared memory on every `process_intent()` and
`on_venue_event()` call.

`InMemoryZincPlane` does NOT have this problem — it only stores the one slot.

**Severity: Low** (performance + Zinc shared-memory capacity waste)

### F10: `RealZincPlane.write_slot` zeros buffer before write — concurrent read sees empty data

**File:** `real_zinc_plane.py:255-263**

```python
def _write_region(self, region, seq, payload):
    buf = region.as_buffer()
    view = memoryview(buf)
    view[:] = b"\x00" * len(view)     # Zeros the buffer
    view[: len(packet)] = packet       # Writes packet
    region.notify()
```

Between the zero and the write, any concurrent reader sees zeros or a truncated
packet. `_decode_packet` checks `size <= len(buf) - 16` — a partially-written
packet fails validation and returns `{}`. The reader (e.g., another thread
calling `read_slots()`) gets an empty result.

Window is microseconds but it exists. No version guard — reader always returns
whatever is in the region.

**Severity: Low** (brief window, no corruption — just empty results)

### F11: `RealZincPlane._write_region` has no partial-write recovery

**File:** `real_zinc_plane.py:255-263**

If `_encode_packet` raises (JSON serialization error), the method raises before
writing — region retains previous content. Safe.

If `view[:] = b"\x00"` fails (memory error), the region is partially zeroed.
Not recoverable. No fallback.

**Severity: Low** (memory errors are extremely rare)

### F12: `InMemoryZincPlane` intent_region grows without bound

**File:** `zinc_plane.py:83-85**

```python
def publish_intent(self, intent):
    self.intent_region.append(intent)   # unbounded growth
```

`self.intent_region` is `List[KernelIntent]` — grows on every `publish_intent`
call. Over thousands of policy cycles, this grows without bound.

`RealZincPlane.publish_intent()` limits to last 512 entries in shared memory,
but its `self._intent_cache` (in-memory) also grows without bound.

**Severity: Low** (memory leak — ~MB/day)

### F13: `InMemoryZincPlane` uses non-re-entrant `threading.Condition`

**File:** `zinc_plane.py:41-43**

```python
_signal: threading.Condition = field(default_factory=threading.Condition)
```

`threading.Condition` is NOT re-entrant. If any code path calls back into
`publish_intent` while holding the condition's lock — deadlock.

**Severity: Low** (no current code path triggers this, but it's a landmine)

### F14: `KernelSlotView.__setattr__` round-trips unknown fields through Rust — silently dropped

**File:** `rust_backend.py:370-395**

If a new field is added to Python's `TradeSlot` that Rust's `TradeSlot` doesn't
know about, `slot.to_dict()` includes it. `_set_slot` serializes to JSON, sends
to Rust, which deserializes with `#[serde(default)]` — unknown fields are
silently dropped. The round-trip loses data without warning.

The reverse: if Rust adds a field that Python doesn't know about,
`_slot_from_payload` ignores unknown keys. Also silently dropped.

**Severity: Low** (fields must be added to both sides atomically; no guard)

### F15: `on_venue_event` loop in `process_intent` stops on first exception — slot left in partial state

**File:** `rust_backend.py:599-610**

```python
for event in emitted_events:
    evt_outcome = self.on_venue_event(event)  # NO TRY/EXCEPT
```

If `self.on_venue_event(event)` raises (FFI error, null pointer, OOM), the loop
stops. Events after the failing event are never processed. The slot is in a
partial state — some events applied, some not.

**Concrete scenario:** ACK arrives first → applied. FULL_FILL arrives second
→ FFI error, exception raised. Slot is stuck in `ENTRY_WORKING` with `size=0`.
Next `process_intent(EXIT)` returns `NO_OPEN_POSITION`. **No recovery path exists.**

**Severity: High** — single exception during fill feedback leaves slot
unrecoverable. Zero defense in depth.

### F16: `venue.submit()` returning empty events leaves slot in `ORDER_REQUESTED`

**File:** `rust_backend.py:599-610**

If `venue.submit()` returns `[]` (venue rejected order with no response, or
internal error), the `for` loop doesn't run. No `on_venue_event` is called.
Slot stays in Rust's pre-venue state (`ORDER_REQUESTED`).

`final_outcome` has `accepted=true, state=ORDER_REQUESTED, emitted_events=[]`.
Caller sees "successful" but no exchange order exists. Slot stuck in
`ORDER_REQUESTED` until `pump_venue_events()` or manual reconcile.

**Severity: Medium** — silent slot stall with no error indication.

### F17: Cancel truth-based confirmation returns `REJECTED` for already-cancelled orders on GET failure

**File:** `bingx_direct.py:474-498**

```python
try:
    oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
    still_open = (venue_order_id in ids)
except Exception:
    still_open = None  # GET failed

if still_open is False:
    return {"status": "CANCELED", ...}
# still_open is None (GET failed) or True (order still on book)
# Falls through to DELETE response check
```

If the DELETE succeeded but the verification GET failed (network blip, rate limit
on the verification endpoint), `still_open=None`. The code then checks the DELETE
response. If the DELETE returned an ambiguous error (e.g., "order not found"
because it was already cancelled by another path), the status is "ERROR" —
reported as REJECTED even though the order IS cancelled.

The `bingx_venue._events_from_cancel()` emits `CANCEL_REJECT`. The Rust FSM
handles `CANCEL_REJECT` as a no-op — slot stays in `EXIT_WORKING` with no
active order. Stuck until `pump_venue_events()` or manual reconcile.

**Severity: Medium** — needs a third state: "definitely cancelled,"
"probably cancelled," "definitely not cancelled."

### F18: Leverage-set and order-submit failures share error handler — poor diagnostics

**File:** `bingx_direct.py:376-417**

```python
await self._client.signed_post("/openApi/swap/v2/trade/leverage", ...)  # step A
# ...
ack_payload = await self._client.signed_post("/openApi/swap/v2/trade/order", payload)  # step B
```

If step A fails (400 for invalid symbol), the exception handler at line 417
catches `BingxHttpError` and returns REJECTED. No way for the caller to know
whether the leverage set failed or the order submission failed — both go through
the same handler. The error message just says "REJECTED."

Also: if step A succeeds and step B fails, leverage was changed on the exchange
but no order was placed. System state unchanged (leverage changes don't affect
capital), but diagnostics are poor.

**Severity: Low** (correct behavior, poor diagnostics)

### F19: `_events_from_submit` stale snapshot fallback → wrong fill detection

**File:** `bingx_venue.py:375-400**

`_filled_size_from_snapshots()` diffs position quantity before and after
submit. The "before" snapshot comes from `_backend_snapshot()` which can
return stale data (E13). A stale "before" against a fresh "after" produces
a wrong diff — could be negative, zero, or larger than reality.

This wrong diff propagates to `emitted_events` — the `PARTIAL_FILL` or
`FULL_FILL` event has wrong `filled_size`. The Rust kernel's `apply_fill`
uses this wrong `filled_size` to set `slot.size`. Capital settles on the
wrong delta.

**Severity: Medium** — wrong fill size propagates to kernel state and PnL.

### F20: `__del__` frees Rust handle at unpredictable GC time — no explicit `close()`

**File:** `rust_backend.py:558-566**

```python
def __del__(self):
    backend = getattr(self, "_backend", None)
    if backend is not None:
        try: _get_rust().destroy(backend)
        except: pass
```

`ExecutionKernel` has no `close()` method. The Rust `KernelHandle` is only
freed by `__del__`, which runs on the GC thread at unpredictable time. If
any code holds a stale reference to `self._backend`, the pointer dangles
when the kernel is GC'd.

`DITAv2LauncherBundle.close()` calls `_maybe_close` on venue, zinc, and
control plane — but NOT on kernel (which has no `close()` or `disconnect()`).
The kernel is leaked until GC.

**Severity: Medium** — reliance on `__del__` for critical C resource cleanup.

### F21: `DITAv2LauncherBundle.close()` closes venue before kernel is done with it

**File:** `launcher.py:90-95**

```python
def close(self):
    _maybe_close(self.venue)       # Closes HTTP client
    _maybe_close(self.zinc_plane)  # Closes Zinc regions
```

If the kernel is mid-`process_intent` in another thread (hypothetical —
single-threaded in practice), `venue.submit()` would fail because the HTTP
client is already closed. No ordering enforcement.

**Severity: Low** (single-threaded deployment)

### F22: Silent fallback from real Zinc/Hazelcast to in-memory on error — operator unaware

**File:** `control.py:210-217`, `launcher.py:175-185`, `projection.py:30-40`

```python
def build_control_plane(...):
    if real_requested:
        try:
            return RealZincControlPlane(...)
        except Exception:
            pass  # SILENT — operator never knows
    return ZincControlPlane(snapshot=snapshot)
```

Three places have this pattern. An operator who configures `DITA_V2_ZINC=REAL`
and Zinc isn't available gets in-memory storage without any warning, error, or
log. The `ZincPlane` protocol has no introspection method to check if it's
real or in-memory.

The same applies to Hazelcast projection and the venue adapter.

**Severity: Medium** — configuration errors are silently masked.

### F23: `VenueEvent.size` = `intent.target_size` not actual fill — wrong for multi-leg EXIT

**File:** `bingx_venue.py:410-420**

```python
base_event = VenueEvent(
    size=float(intent.target_size or 0.0),  # target, not fill
)
```

For an EXIT leg, `intent.target_size` is the intended exit size. The ACK
event's `size` reflects the target, not the actual fill. For fully-filled
MARKET orders, `target == fill` so it's invisible. For partially-filled
LIMIT orders, `size` on the ACK is wrong.

The fill event later has `filled_size` from the venue's `executedQty`, so
the downstream kernel uses the correct fill size. The ACK's `size` is
unused by the kernel (the kernel uses `filled_size` for PnL computation).

**Severity: Informational** (unused by kernel)

### F24: `asyncio.run()` inside async function in test generator — nested event loops

**File:** `_build_pink_extended.py:75-81`

```python
def _check_open_orders(c, vs):
    r = __import__('asyncio').run(c._request_json("GET", ...))
```

`asyncio.run()` is called INSIDE an `async def` context (the test body is
async). This creates a new event loop on the current thread, suspending
pytest's asyncio loop. Nested event loops are "not recommended" per Python
docs.

**Severity: Low** (works in practice)

### F25: `_build_fresh_kernel_from_slot` leaks old kernel objects per call

**File:** `_build_pink_extended.py:95-108**

```python
def _build_fresh_kernel_from_slot(slot_data, ic=25000.0):
    cfg = _build_config(ic)
    b = build_launcher_bundle(venue_mode="BINGX", ...)  # NEW bundle, OLD not closed
    k = b.kernel
    return RB(runtime=Shim(k), config=cfg)
```

Each call creates a new launcher bundle (new kernel, new Rust handle, new HTTP
client, new Zinc plane) without closing the old one. Called 4 times across the
fresh-kernel test bodies. Leaks ~50MB per call (Rust lib, HTTP connections).

**Severity: Low** (test infrastructure only)

### F26: `seen_event_ids` not cleared on re-entry — event IDs accumulate across trades

**File:** `lib.rs:672-683`

When a slot re-enters (new ENTER after previous EXIT), the Rust kernel resets
most fields (lib.rs:740-765) but does NOT clear `seen_event_ids`. The new
trade inherits the previous trade's event history up to `MAX_SEEN_EVENT_IDS`
(256). After 256 events across multiple trades, old IDs are drained.

For MARKET trading (2-4 events per trade), this takes ~60-80 trades before
draining. For LIMIT trading (many partial fills), could be 5-10 trades.

**Fix:** `slot.seen_event_ids.clear()` on ENTER.

**Severity: Low** (event ID collision across trades is astronomically unlikely)

### F27: `RealZincControlPlane.read()` parses Zinc region every call — no caching

**File:** `real_control_plane.py:88-94**

```python
def read(self):
    payload = _decode_packet(self.region.as_buffer())  # JSON parse every call
    control = payload.get("control")
    self._snapshot = KernelControlSnapshot(**control)   # reconstruct every call
    return self._snapshot
```

Called by `ExecutionKernel.control` property on every `process_intent()`.
Each call re-constructs a `KernelControlSnapshot` from dict — allocating
new objects for every field. ~50μs per call. A simple cached-until-modified
pattern would eliminate all parses between writes.

**Severity: Low** (performance)

### F28: `_legacy_intent` hardcodes `confidence=1.0` and `bars_held=0`

**File:** `bingx_venue.py:270-285`

These fields are in `LegacyIntent` but unused by `submit_intent()` (which
only reads `asset`, `side`, `action`, `target_size`, `leverage`, `metadata`).
The downstream ClickHouse rows use the policy-layer `Intent`, not `LegacyIntent`,
so the hardcoded values don't reach persistence.

Only propagates through the venue adapter's internal chain. No consumer reads
them today.

**Severity: Informational**

### F29: `_slot_to_payload` in `real_zinc_plane.py` is dead code

**File:** `real_zinc_plane.py:57-59**

```python
def _slot_to_payload(slot):
    data = slot.to_dict()
    return data
```

Defined, never called anywhere in the file. All slot serialization calls
`slot.to_dict()` directly.

**Severity: Informational**

### F30: Duplicate `_slot_from_payload` in `real_zinc_plane.py` and `rust_backend.py`

**File:** `real_zinc_plane.py:62-112**, `rust_backend.py:270-310`

Two nearly identical implementations. The `real_zinc_plane` version manually
constructs `VenueOrder` objects (lines 63-88) with different defaults
(e.g., fallback to slot `size` if `intended_size` missing). The `rust_backend`
version delegates to `_order_from_payload` with all-default fallbacks.

If fields are added to `TradeSlot` or `VenueOrder`, both must be updated.

**Severity: Low** (code duplication risk)

---

## Complete Flaw Catalog

### All-Passes Combined

| Family | Focus | Count | Critical | High | Medium | Low | Info |
|--------|-------|-------|----------|------|--------|-----|------|
| A | Architectural (old 13, now superseded) | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| **Total** | | **80** | **1** | **10** | **21** | **32** | **16** |

### Most Dangerous Single Flaw: F15

An exception in `on_venue_event()` during the fill-feedback loop stops the
chain mid-apply. The ACK applied but the FILL didn't. Slot in `ENTRY_WORKING`
with no position. **No retry mechanism, no recovery path.** The slot is stuck
forever until manual intervention. Zero defense in depth — no try/except, no
undo, no validation that the slot reached a consistent state.

This is the single highest-impact E2E flaw because it requires no concurrency,
no race condition, no unusual market conditions — just a transient FFI error
during normal operation.

---

## PASS 4 — SYSTEMATIC DOMAIN SCANS (Config, Rust, Persistence, Lifecycle)

### Rust Kernel — Numeric & FSM Invariants

#### G1: EXIT_RESIDUAL action is entirely missing from Rust KernelCommandType

**File:** `_rust_kernel/src/lib.rs`

```rust
string_enum! {
    enum KernelCommandType {
        ENTER, EXIT, MARK_PRICE, RECONCILE, CONTROL, CANCEL,
    }
}
```

Six variants. **No `EXIT_RESIDUAL`.** If any caller submits an intent with `action = "EXIT_RESIDUAL"`, the string_enum deserializer fails — serde returns `INVALID_INTENT_PARSE`. Even if deserialization worked, there's no branch to handle residual-position cleanup. Any position with remaining size after partial exit legs has **no way to trigger a clean-up exit** via the intent system.

The Python `KernelCommandType` enum (contracts.py) does have `EXIT_RESIDUAL`, translated to `"EXIT_RESIDUAL"` string by `_intent_to_payload`. This string hits Rust's string_enum → parse error → `INVALID_INTENT_PARSE`.

**Fix:** Add `EXIT_RESIDUAL` variant to Rust enum + match arm that skips the `NO_OPEN_POSITION` guard for residual-sized positions.

**Severity: Critical**

#### G2: `into_c_string` uses `unwrap()` — panics on interior NUL byte

**File:** `_rust_kernel/src/lib.rs:1477`

```rust
fn into_c_string(value: &str) -> *mut c_char {
    CString::new(value).unwrap().into_raw()
}
```

`CString::new()` returns `Err` if the string contains a NUL (`'\0'`) byte. `.unwrap()` panics at the C FFI boundary. If any `serde_json::to_string()` output (e.g., user-controlled string in `KernelIntent`, `VenueEvent`, or `TradeSlot`) contains a NUL byte, this **panics the entire process**.

Triggered by every FFI call that returns a string:
- `dita_kernel_process_intent_json`
- `dita_kernel_on_venue_event_json`
- `dita_kernel_reconcile_slots_json`
- `dita_kernel_snapshot_json`
- `dita_kernel_get_slot_json`

**Fix:** Replace `.unwrap()` with `unwrap_or_else(|_| ptr::null_mut())` or feed through `invalid_intent_cstring`.

**Severity: Critical**

#### G3: `process_intent` EXIT hardcodes `prev_state = POSITION_OPEN` unconditionally

**File:** `_rust_kernel/src/lib.rs:842-890`

```rust
slot.fsm_state = TradeStage::EXIT_REQUESTED;        // unconditional override
let transition = self.transition(
    &slot,
    TradeStage::POSITION_OPEN,                        // always POSITION_OPEN
    slot.fsm_state.clone(),
    "EXIT_INTENT",
);
```

Three problems:

(a) **Transition prev_state is a lie.** If the slot was in `EXIT_WORKING`, `EXIT_SENT`, `EXIT_REQUESTED`, or `POSITION_PARTIALLY_CLOSED`, the transition record says `POSITION_OPEN` — wrong.

(b) **Backward transition.** If the slot is `EXIT_WORKING` and a new EXIT intent arrives, `fsm_state` is set to `EXIT_REQUESTED` — a backward transition from `EXIT_WORKING` → `EXIT_REQUESTED`. This corrupts the FSM.

(c) **No state guard.** EXIT should only be allowed from `POSITION_OPEN`, `EXIT_WORKING` (for additional legs), or `POSITION_PARTIALLY_CLOSED`. Currently any state that passes `!is_free() && !closed && size > 0` can transition to `EXIT_REQUESTED`.

**Fix:** Check actual FSM state before allowing EXIT, log actual prev_state, guard against backward transitions.

**Severity: Critical**

#### G4: `consume_exit_leg` advances beyond last valid index — stale `all_legs_done` variable

**File:** `_rust_kernel/src/lib.rs:1420-1435`

```rust
let all_legs_done = slot.active_leg_index >= slot.exit_leg_ratios.len(); // (A)
let should_close = (slot.size <= 1e-12 || (!partial && all_legs_done));  // (B)

if !partial {
    slot.consume_exit_leg();  // (C) — advances active_leg_index POST (A)
}

if should_close && slot.size <= 1e-12 {         // (D) — close
} else if !partial && !all_legs_done {           // (E) — stale! uses (A) not post-advance index
```

On the last leg (`active_leg_index = len - 1`):
- (A): `all_legs_done = false` (pre-advance)
- (C): advances to `len` (exhausted)
- (E): `!partial && !false` = true → enters `POSITION_OPEN` instead of examining `should_close` with post-advance index

The `all_legs_done` variable is captured **before** `consume_exit_leg` advances the index. Branch (E) should use the post-advance index to correctly detect exhaustion.

After exhaustion, `next_exit_ratio()` returns `1.0` (out-of-bounds `unwrap_or(1.0)`) — silently tries to exit remaining size as 100% instead of detecting completion.

**Severity: Critical**

#### G5: `realized_pnl` uses unbounded f64 — overflows to inf at extreme values

**File:** `_rust_kernel/src/lib.rs:648-656`

```rust
let notional = exit_size * slot.entry_price * slot.leverage.max(1.0);
delta * notional
```

No `is_finite()` check on intermediate products. At `exit_price=1e200`, `entry_price=1e-200`: `delta` = `(1e200 - 1e-200) / 1e-200` ≈ `1e400` → `inf`. The resulting `inf` is stored in `slot.realized_pnl`, corrupting all future PnL tracking.

Subnormals: `entry_price=5e-324` (subnormal) causes division to produce `inf` for modest exit prices on some platforms.

**Fix:** Add `is_finite()` guards on both prices and cap intermediate products.

**Severity: High**

#### G6: `mark_price` produces unbounded `unrealized_pnl`

**File:** `_rust_kernel/src/lib.rs:384-399`

```rust
self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage;
// No is_finite() check on result
```

If any of `delta`, `size`, `entry_price`, or `leverage` is extreme, the product overflows to `inf`. No result guard. `inf` stored in `unrealized_pnl` forever. Capped only by the `price <= 0.0` guard on input — no guard on the computation chain.

Also: `self.entry_price = price` at line 388 overwrites entry_price on every mark_price call for a position with `entry_price <= 0.0`, even when the position has been open for a while. This means a stale-zero entry_price gets set to the current market price on first mark_price after open, which is correct — but if the slot is reused (re-entry without resetting entry_price), the old entry price from the prior trade bleeds into unrealized PnL.

**Severity: High**

#### G7: `process_intent` ENTER — no `is_finite()` guard on `target_size`

**File:** `_rust_kernel/src/lib.rs:806-807`

```rust
intended_size: intent.target_size.max(0.0),
```

`f64::NAN.max(0.0)` returns `NAN`. `f64::INFINITY.max(0.0)` returns `inf`. Serde_json **does** accept `Infinity` and `NaN` by default — they're valid JSON tokens. If the Python-side `_first_invalid_intent_field` guard is bypassed (F3 — it allows these through), `NaN`/`inf` propagates into `intended_size` in `VenueOrder`, corrupting all fill calculations.

Similarly, `reference_price` is never validated for finiteness before being stored in `VenueOrder.metadata`.

**Severity: High**

#### G8: `reconcile_slots_json` — no dedup or bounds validation

**File:** `_rust_kernel/src/lib.rs:1668-1675`

```rust
for slot in slots {
    if slot.slot_id < core.slots.len() {
        core.slots[slot.slot_id] = slot.clone();
    }
}
```

Two slots with the same `slot_id`: the **second overwrites the first** silently. A slot with `slot_id >= core.slots.len()`: **silently dropped** — no error, no diagnostic. Caller sees `accepted=true` even if some/all slots were not applied.

**Severity: High**

#### G9: `exchange_order_id` propagation uses wrong order target

**File:** `_rust_kernel/src/lib.rs:1110-1125`

```rust
let target = if slot.active_entry_order.is_some() {
    slot.active_entry_order.as_mut()
} else {
    slot.active_exit_order.as_mut()
};
```

If an **entry** order exists (even if fully filled) and an **exit** fill event arrives, the code updates the entry order's `venue_order_id` instead of the exit order's. The exit order's `venue_order_id` stays empty. Any subsequent `CANCEL` intent on the exit order fails because `active_exit_order.venue_order_id` is empty — the venue can't match the cancel.

**Fix:** Disambiguate by matching `venue_client_id`, or clear `active_entry_order` when entry is complete.

**Severity: High**

#### G10: CANCEL diagnostic code says NO_ACTIVE_EXIT_ORDER for entry cancel too

**File:** `_rust_kernel/src/lib.rs:966-1005`

```rust
if !has_cancellable_exit && !has_cancellable_entry {
    return KernelResult {
        diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER, // always says exit
        details: json!({"reason": "NO_ACTIVE_EXIT_ORDER"}),
    };
}
```

When neither exit nor entry is cancellable, the diagnostic returns `NO_ACTIVE_EXIT_ORDER` regardless of which order was the target. If the user wanted to cancel an entry order that's not in a cancellable state, the diagnostic is misleading.

**Fix:** Separate diagnostic codes: `NO_ACTIVE_EXIT_ORDER`, `NO_ACTIVE_ENTRY_ORDER`, `ENTRY_NOT_CANCELLABLE`.

**Severity: High**

#### G11: `apply_fill` entry-fill overwrites `active_entry_order.intended_size` with `slot.size`

**File:** `_rust_kernel/src/lib.rs:1363-1377**

On FULL_FILL entry, `slot.active_entry_order` is entirely replaced with a new `VenueOrder` where `intended_size = slot.size` (the fill amount) instead of the original intended size. The original intended size (which could be larger than fill size for partial fills) is lost.

If a duplicate fill event arrives (dedup fails due to missing event_id), the second fill would use `slot.size` as the basis for further fills — wrong values.

**Severity: Medium**

#### G12: `leverage` unbounded after `is_finite()` — no maximum cap

**File:** `_rust_kernel/src/lib.rs:778`

```rust
slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
    intent.leverage  // 1e100 accepted here
} else { 1.0 };
```

`leverage = 1e100` passes `is_finite()`. Feeds into `realized_pnl()` as `slot.leverage.max(1.0) = 1e100`, producing `notional = exit_size * entry_price * 1e100`. Makes `unrealized_pnl` arbitrarily large.

No maximum leverage cap enforced anywhere — the exchange-level cap (`DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP`) exists in `BingxExecClientConfig` but is **never passed to the Rust kernel**.

**Severity: Medium**

#### G13: `resolve_slot` fallback returns `unwrap_or(0)` — can misroute events

**File:** `_rust_kernel/src/lib.rs:623`

```rust
self.slots.first().map(|slot| slot.slot_id).unwrap_or(0)
```

When no slot matches the event (`slot_id` out of range or all slot filters fail), returns `slot_id` of the **first slot** (which may be 0 or any value). No diagnostic emitted — caller sees slot state change with no idea the event was misrouted.

**Severity: Medium**

#### G14: `commit_slot` silently ignores out-of-bounds slot_id

**File:** `_rust_kernel/src/lib.rs:595-600**

```rust
fn commit_slot(&mut self, slot: TradeSlot) {
    if slot.slot_id < self.slots.len() {
        self.slots[slot_id] = slot;
    }
    // else: silently dropped — no error returned
}
```

Mutations to out-of-bounds slot are silently discarded. Can happen if `slot.slot_id` is corrupted via `set_slot_from_json` causing index mismatch between `slot.slot_id` and the actual slot position.

**Severity: Medium**

---

### Configuration & Validation Chain

#### G15: Zero `__post_init__` validators on all config dataclasses

Every config dataclass in the system has zero field-level validation:

| Dataclass | Fields | Validators |
|-----------|--------|------------|
| `KernelControlSnapshot` | 16 | **0** |
| `ControlUpdate` | 16 | **0** |
| `KernelIntent` | 19 | **0** |
| `TradeSlot` | 22 | **0** |
| `VenueOrder` | 8 | **0** |
| `VenueEvent` | 18 | **0** |
| `KernelTransition` | 11 | **0** |
| `KernelOutcome` | 8 | **0** |
| `AccountSnapshot` | 9 | **0** |
| **Total** | **127** | **0** |

The only validation in the entire chain:
- `_first_invalid_intent_field()` — finiteness guard at Python→Rust FFI boundary (not a dataclass validator)
- Rust `leverage = if is_finite && > 0.0 { val } else { 1.0 }` — post-hoc clamp
- Rust `KernelCore::new(max_slots.max(1))` — floor only, no ceiling
- `launcher.py:143`: `max(1, int(...))` for `active_slot_limit` — floor only

**No `__post_init__` exists anywhere. No bounds check on any field except the two floor-only guards.**

**Severity: High**

#### G16: `DITA_V2_DEBUG_CLICKHOUSE` defaults to `True` when env var is unset

**File:** `launcher.py:133`

```python
debug = _env_bool("DITA_V2_DEBUG_CLICKHOUSE", True)
```

`_env_bool` (launcher.py:75) returns `default` when the env var is unset. So `debug = True` by default. Every runtime writes debug traces to ClickHouse by default. `DITA_V2_DEBUG_CLICKHOUSE=False` is required to disable it.

This is not a bug per se, but it means debug ClickHouse writes are **on by default**, adding ~10 ClickHouse insertions per process_intent call (every transition + position state + trade event) that most production deployments may not want.

**Severity: Informational**

#### G17: String config fields have no charset/length validation — Zinc region injection risk

**File:** `control.py:31-53`, `real_zinc_plane.py:30`

`runtime_namespace`, `strategy_namespace`, `event_namespace`, `actor_name`, `exec_venue`, `data_venue`, `ledger_authority` are all free-form strings with no validation. They're used as:

1. **Zinc shared memory region names**: `self.prefix + "." + namespace + "." + kind` — an attacker-controlled namespace could collide with other processes' Zinc regions
2. **ClickHouse table names**: `DOLPHIN_BINGX_JOURNAL_STRATEGY` is used as a table suffix — SQL injection risk in ClickHouse journal
3. **Hazelcast map names**: Same injection risk via `event_namespace`

**Severity: Medium**

#### G18: `exit_leg_ratios` no sum-to-1 validation

`KernelIntent.exit_leg_ratios` and `TradeSlot.exit_leg_ratios` are tuple/list of floats. No validator ensures they sum to approximately 1.0. Ratios summing to 0.5 leave the position partially closed forever (residual can't be exited because `next_exit_ratio()` returns `1.0` after exhaustion, exiting 100% of remaining — which may exceed the intended residual).

**Severity: Low**

#### G19: `RealZincControlPlane.read()` has no sequence check — torn-read risk

**File:** `real_control_plane.py:88-94**

```python
def read(self):
    payload = _decode_packet(self.region.as_buffer())
    control = payload.get("control")
    if not isinstance(control, dict):
        return self._snapshot
    self._snapshot = KernelControlSnapshot(**control)
    return self._snapshot
```

The binary packet has a 64-bit sequence number but `read()` **never checks it**. Between the zero-write and packet-write in `_write_region`, a reader sees an empty buffer → `_decode_packet` fails → falls back to `self._snapshot` (stale). Between the packet-write and `struct.pack` header (order depends on implementation), a reader sees a partial write with wrong size → `_decode_packet` fails.

No checksum on the wire format: `struct.pack("!QQ", seq, len) + json_bytes`. A torn write produces garbage that `json.loads` may or may not parse successfully.

**Severity: Low**

#### G20: `DOLPHIN_BINGX_JOURNAL_STRATEGY`/`_DB` — ClickHouse SQL injection risk

**File:** `launcher.py:202-203`

```python
"DOLPHIN_BINGX_JOURNAL_STRATEGY": os.environ.get("DOLPHIN_BINGX_JOURNAL_STRATEGY", ""),
"DOLPHIN_BINGX_JOURNAL_DB": os.environ.get("DOLPHIN_BINGX_JOURNAL_DB", ""),
```

These are used as ClickHouse table and database name suffixes in `pink_clickhouse.py`. An attacker who can set env vars can inject SQL via semicolons or quotes in the table name. ClickHouse supports `INSERT INTO db.table FORMAT JSONEachRow` — a table name like `positions; DROP TABLE ...;` could be destructive.

**Severity: Low** (requires env var control, which implies broader access)

---

### Persistence Schema Alignment

#### G21: `entry_price` used as `exit_price` in `trade_events` — data loss

**File:** `pink_clickhouse.py (outside workspace)`

The `_write_trade_event` function maps `entry_price` from `slot.to_dict()` to both the `entry_price` and `exit_price` columns. The actual exit fill price (available on the `VenueEvent` object) is **never written** to the `exit_price` column.

**Result:** Every `trade_events` row has `exit_price == entry_price`. The `exit_price` column is a dead column — always contains the entry price, never the actual fill.

**Severity: High** — data loss to DB for the most important trade metric.

#### G22: `active_leg_index` → `entry_bar` semantic mis-mapping

**File:** `pink_clickhouse.py (outside workspace)`

```python
"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
```

`active_leg_index` tracks the exit-leg-ratios cursor (which leg of a multi-leg exit we're on), not a bar count. The value `0` at position open and `1` after the first exit leg — neither value represents bars held. **The `entry_bar` column stores the wrong concept.**

**Severity: Medium** — column contains semantically meaningless data.

#### G23: `capital_before` arithmetic reconstruction absorbs cross-slot PnL

**File:** `pink_clickhouse.py (outside workspace)`

```python
capital_before = capital_after - pnl_leg
```

`capital_before` is reconstructed by subtracting the current leg's PnL from the current capital. In a multi-slot system, other slots' PnL changes between legs are absorbed into `capital_before`. The column is **always wrong** in multi-slot scenarios because `capital_after` reflects total PnL from all slots, not just the leg being recorded.

**Severity: Medium** — wrong `capital_before` for multi-slot trading.

#### G24: Recovery `trade_reconstruction` always has `trade_id=""`

**File:** `pink_clickhouse.py (outside workspace)`

The `persist_recovery_state` function passes `kernel.snapshot()["account"]` (an account dict with keys `capital, equity, realized_pnl, ...`) where a slot dict is expected. The `trade_id` key **does not exist** on the account dict. The `recovery_state` row always has `trade_id=""`.

**Severity: Medium** — recovery data is not associable with any trade.

#### G25: `seen_event_ids`, `exit_leg_ratios`, `VenueOrder`, `metadata` not in flat ClickHouse tables

These fields are:
- Present on the Python `TradeSlot` ✅
- Transmitted through Zinc shared memory ✅
- Stored in Hazelcast ✅
- Stored in ClickHouse `dita_kernel_debug` (full JSON) ✅
- **NOT extracted** into main ClickHouse flat tables `position_state`, `trade_events`, `trade_exit_legs` ❌

Data exists at the source, travels through the pipeline, hits the debug journal — but is lost in the main analytical tables.

**Severity: Low** (data exists in debug journal if needed for reconstruction)

#### G26: `_safe_float` silently converts NaN/None/Inf to 0.0

**File:** `utils.py:15`

```python
def _safe_float(v, default=0.0):
    try:
        f = float(v)
        if not math.isfinite(f):
            return default
        return f
    except (TypeError, ValueError, OverflowError):
        return default
```

Used in multiple ClickHouse writers. Silently converts `NaN`/`Inf`/parsing errors to `0.0`. No diagnostic emitted when a non-finite value reaches the persistence layer — data silently zeroed.

**Severity: Low** (safe default but silent corruption)

---

### Lifecycle & Resource Management

#### G27: `build_launcher_bundle` has no exception safety — prior resources leak

**File:** `launcher.py:264-300**

```python
def build_launcher_bundle(...):
    control_plane = _build_control_plane(...)
    projection = build_projection(...)
    zinc_plane = _build_zinc_plane(...)
    venue = _build_venue(...)
    kernel = ExecutionKernel(...)  # ← if THIS fails, everything above leaks
```

If any step after the first raises, all previously built resources leak:
- `RealZincPlane` created → `_build_venue()` fails → 3 shared memory regions orphaned
- `RealZincControlPlane` created → `_build_zinc_plane()` fails → 1 shared memory region orphaned
- `BingxVenueAdapter` created → `ExecutionKernel.__init__()` fails → HTTP connection leaked

**No `try/finally` anywhere in the builder.** The init order is also optimized for forward construction, not backward cleanup.

**Severity: High** — shared memory leak on any build failure.

#### G28: `RealZincPlane` and `RealZincControlPlane` have no `__del__`

When `close()` is not called (exception in builder, forgotten cleanup, GC during shutdown), the shared memory regions opened by `RealZincPlane` (3 regions) and `RealZincControlPlane` (1 region) are **orphaned on the OS**. They persist in `/dev/shm/` (or platform equivalent) until system reboot.

Python's `__del__` is unreliable (not called on SIGKILL, not called if the object is part of a cycle without a GC run), but its absence means even normal garbage collection can't clean up.

**Severity: High** — shared memory leaks.

#### G29: Zero signal handlers — no cleanup on SIGTERM/SIGINT

```bash
$ grep -rn "signal\|SIGTERM\|SIGINT\|atexit" *.py  # ZERO matches
```

When SIGTERM or SIGINT arrives:
1. Python's default handler terminates the process immediately
2. No `DITAv2LauncherBundle.close()` is called
3. No `ExecutionKernel.__del__` is called (CPython may run GC on normal exit but not reliably)
4. All shared memory (RealZincPlane, RealZincControlPlane) is orphaned
5. In-flight BingX HTTP calls are interrupted mid-stream
6. Rust kernel handle is leaked

**Severity: High**

#### G30: `ExecutionKernel` has no `close()` — relies on `__del__` for Rust handle cleanup

`ExecutionKernel` has `__del__` which calls `_get_rust().destroy(backend)`. No `close()` method. `DITAv2LauncherBundle.close()` never touches the kernel — the Rust handle is only freed by GC at unpredictable time.

If any code holds a stale `_backend` pointer, the handle dangles when GC runs. If `__del__` is suppressed (e.g., during interpreter shutdown with cyclic references), the Rust handle leaks permanently.

**Fix:** Add `close()` to `ExecutionKernel`, call it from `DITAv2LauncherBundle.close()`.

**Severity: High**

#### G31: `projection` (Hazelcast) never closed

`build_projection()` returns a `HazelcastProjection` which holds a Hazelcast client connection. No `close()` or `disconnect()` method exists on the projection, projector, or row writer. `DITAv2LauncherBundle.close()` doesn't touch the projection. The Hazelcast client connection leaks on shutdown.

**Severity: Medium**

#### G32: `_maybe_close()` only calls the first method found — `break` skips the second

**File:** `launcher.py:233-243**

```python
for method_name in ("close", "disconnect"):
    method = getattr(obj, method_name, None)
    if method is None:
        continue
    try:
        result = method()
    except TypeError:
        continue
    if inspect.isawaitable(result):
        try:
            asyncio.run(result)
        except RuntimeError:
            pass
    break  # ← ONLY calls the FIRST found method, never both
```

If an object has both `close()` and `disconnect()`, only `close()` is called. `disconnect()` is silently skipped. Also: `asyncio.run(result)` silently swallows `RuntimeError` when a running event loop exists — the coroutine is **never executed**.

Currently no object has both, but the pattern is fragile.

**Severity: Low**

#### G33: `close()` is not idempotent for RealZinc components

`RealZincPlane.close()` and `RealZincControlPlane.close()` call their Zinc region's `close()` method. If called twice, the second call operates on an already-closed region — likely crashes from Hazelcast's shared memory code.

No nulling of references after close: `DITAv2LauncherBundle.close()` sets `self.venue`, `self.zinc_plane`, `self.control_plane` to `None` — **wait, it doesn't. It calls `_maybe_close()` which doesn't null references.** Double `close()` is unsafe.

**Severity: Low**

#### G34: No context manager on `DITAv2LauncherBundle`

`DITAv2LauncherBundle` has no `__enter__`/`__exit__`. Users must manually call `close()`. No `with` pattern exists anywhere in the source for lifecycle management. No `__del__` fallback on the bundle either.

**Severity: Low** (ergonomic, not a leak source if caller follows the pattern)

#### G35: `BingxVenueAdapter.connect()` exists but is never called by the launcher

`BingxDirectExecutionAdapter` has a `connect()` method that initializes the lifetime HTTP client. `BingxVenueAdapter` has `connect()` that calls `_call_backend("connect")`. Neither is called in `build_launcher_bundle()` or `_build_venue()`. If the adapter's `submit_intent()` relies on a connected client, it initializes lazily — but the connect path is dead code that exists but is never invoked.

**Severity: Informational**

#### G36: Only one `try/finally` in the entire codebase

The only `try/finally` is `_RustKernelLib._take_string()` (rust_backend.py:140-143) which frees the Rust C string. All other resource management uses `try/except` with no `finally`.

No cleanup is guaranteed on exception:
- `build_launcher_bundle()` — no cleanup on failure
- `process_intent()` — no cleanup of partial slot state on venue event exception
- `on_venue_event()` — no cleanup on FFI failure
- `_set_slot()` — no cleanup on projection or Zinc write failure

**Severity: High** (across all layers)

---

## Pass 4 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType | Rust | **Critical** |
| G2 | `into_c_string` unwrap() panics on NUL byte | Rust | **Critical** |
| G3 | EXIT hardcodes prev_state=POSITION_OPEN, allows backward FSM transition | Rust | **Critical** |
| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
| G5 | `realized_pnl` unbounded f64 overflow to inf | Rust | **High** |
| G6 | `mark_price` unbounded unrealized_pnl — no result guard | Rust | **High** |
| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
| G12 | No max leverage cap enforced by kernel | Rust | Medium |
| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
| G15 | Zero `__post_init__` validators on all config dataclasses | Config | **High** |
| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
| G17 | String config fields — Zinc region injection risk | Config | Medium |
| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
| G31 | Hazelcast projection never closed | Lifecycle | Medium |
| G32 | _maybe_close() break skips second method | Lifecycle | Low |
| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |

### Pass 4 Severity Distribution

| Severity | Count |
|----------|-------|
| **Critical** | 4 (G1, G2, G3, G4) |
| **High** | 11 (G5-G10, G15, G21, G27, G28, G29, G30, G36) |
| Medium | 11 (G11-G14, G17, G22, G23, G24, G31) |
| Low | 8 (G16, G18, G19, G20, G25, G26, G32, G33, G34, G35) |
| Info | 2 |

### Combined Catalog (All 4 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |

---

## PASS 5 — EDGE DOMAINS (Dependencies, Error Handling, Types, Contracts)

### H1: No Python dependency declaration files exist in workspace

**Files:** workspace root

Zero `requirements.txt`, `setup.py`, `setup.cfg`, `pyproject.toml`, `Pipfile`, or `poetry.lock` anywhere. All Python package dependencies are entirely implicit — determined by what's installed in the runtime environment. No reproducible installs, no version pinning, no audit trail.

The Rust side does have `Cargo.toml` + `Cargo.lock` — but all 4 direct Rust deps use open ranges (`"0.4"`, `"0.2"`, `"1"`, `"1"`).

**Severity: Critical**

### H2: Rust kernel compiled from source on every cold start via subprocess

**File:** `rust_backend.py:60-72`

```python
def _ensure_library() -> Path:
    path = _library_path()
    if not path.exists():
        _build_library()  # cargo build --release
    return path

def _build_library():
    subprocess.run(
        ["cargo", "build", "--release", ...],
        check=True,        # no timeout!
    )
```

First load takes 3-10 minutes (Rust compilation). Requires Rust toolchain in production. `subprocess.run()` has no `timeout=` — if `cargo` hangs (network, disk, lock contention), the Python process hangs indefinitely. No prebuilt binary distribution.

**Severity: Critical**

### H3: Zero logging — every swallowed error is invisible

The entire codebase has zero use of Python's `logging` module, `print()`, or `warnings.warn()` for error reporting. Every `except: pass`, `except Exception: pass`, and `return default` silently discards the error. **There is no mechanism to detect, alert, or diagnose production failures.**

All `try/except: pass` sites found:

| # | File:Line | What's Hidden |
|---|-----------|---------------|
| 1 | `bingx_venue.py:51` | `float()` conversion failure on any API field value |
| 2 | `bingx_venue.py:133` | regex match failure in rate-limit parsing |
| 3 | `bingx_venue.py:136` | int/float conversion of retry_after |
| 4 | `bingx_venue.py:325` | slot lookup failure during cancel asset resolution |
| 5 | `bingx_venue.py:350` | BingXHttpError in cancel — network error looks like rejection |
| 6 | `control.py:213` | RealZincControlPlane construction failure |
| 7 | `launcher.py:187` | RealZincPlane construction failure |
| 8 | `launcher.py:119` | malformed env var for active_slot_limit |
| 9 | `launcher.py:243` | asyncio.run() RuntimeError in _maybe_close |
| 10 | `launcher.py:277` | RealZincControlPlane fallback in build_control_plane |
| 11 | `real_control_plane.py:97` | region.wait() exception — timeout and error both return False |
| 12 | `real_control_plane.py:112` | region.notify() exception — writer thinks broadcast succeeded |
| 13 | `real_zinc_plane.py:31` | Zinc SharedRegion import failure |
| 14 | `projection.py:87` | HazelcastRowWriter import failure |
| 15 | `rust_backend.py:102` | __del__ exception in Rust kernel destroy |
| 16 | `bingx_venue.py:55` | `_row_float` tries 5+ key fallbacks, each failing silently |

**Severity: Critical**

### H4: `_row_float` rejects zero as a valid value — `or` pattern treats 0 as missing

**File:** `bingx_venue.py:47-55`

```python
def _row_float(row, *keys, default=0.0):
    for key in keys:
        try:
            value = float(row.get(key) or 0.0)  # `or 0.0` treats 0 as missing
        except Exception:
            continue
        if value == value and value not in (float("inf"), float("-inf")) and value != 0.0:
            return value                         # explicitly rejects 0.0
    return default
```

Two bugs: (a) `except Exception: continue` swallows ALL conversion errors, and (b) `value != 0.0` explicitly rejects zero as a valid return value. A legitimate zero price, zero filled quantity, or zero position amount causes `_row_float` to skip that key and search further. If ALL keys return 0, the default `0.0` is returned — indistinguishable from "none of the keys existed."

Called by every single BingX API response parser: `_position_qty()`, `_position_price()`, `_venue_order_from_row()`, `_event_from_row()`, `_fill_event_from_row()`, `_events_from_submit()`, `_events_from_cancel()`, `_filled_size_from_snapshots()`. None verify the returned 0.0 is real vs. missing-vs-zero.

**Severity: High**

### H5: `_backend_snapshot` timeout returns stale data with no signal to callers

**File:** `bingx_venue.py:242-251**

```python
def _backend_snapshot(self, *, timeout_ms=5000.0):
    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
        with self._snap_lock:
            return self._last_snapshot    # STALE — could be hours old
```

When the snapshot-fetch condition times out, returns `self._last_snapshot` — initialized to `None` and only updated on successful fetches. First timeout returns `None`. All callers (`cancel()`, `open_orders()`, `open_positions()`, `reconcile()`, `submit()`) access `.open_orders`, `.open_positions` immediately — crash with `AttributeError: 'NoneType' object has no attribute 'open_orders'`.

Even after the first fetch succeeds, subsequent timeouts return the last-good snapshot which could be arbitrarily stale. No caller timestamps, version-checks, or requests a refresh.

**Severity: High**

### H6: All enum-from-raw-string sites crash on unknown value — zero fallback

**Files:** `rust_backend.py:250-386`, `real_zinc_plane.py:70-106`

Every site that reconstructs a Python enum from a string received from the Rust kernel:

```python
side=TradeSide(str(payload.get("side", TradeSide.FLAT.value)))
status=VenueOrderStatus(str(payload.get("status", VenueOrderStatus.NEW.value)))
fsm_state=TradeStage(str(payload.get("fsm_state", TradeStage.IDLE.value)))
kind=KernelEventKind(str(row.get("kind", KernelEventKind.ORDER_ACK.value)))
```

If the Rust kernel introduces a new enum variant (e.g., `TradeStage::ENTRY_REJECTED`) not in the Python `TradeStage` enum, `TradeStage("ENTRY_REJECTED")` raises `ValueError` with zero fallback. Crashes `_outcome_from_payload()` and takes down the kernel's event processing loop.

17 sites total across `rust_backend.py` and `real_zinc_plane.py`. No try/except, no mapping, no fallback on any of them.

**Severity: High**

### H7: `_legacy_intent` reads `getattr(intent, "order_type", "MARKET")` — always defaults to MARKET

**File:** `bingx_venue.py:282-285**

```python
metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
```

`order_type` and `limit_price` are NOT fields on `KernelIntent` (contracts.py). They only exist in `intent.metadata` as `metadata["order_type"]` if set by the caller. `getattr(intent, "order_type", "MARKET")` checks the dataclass field — not the metadata dict — so it ALWAYS returns `"MARKET"`.

Even when the PINK runtime produces a LIMIT intent (LIMIT_DECISION → `metadata["order_type"] = "LIMIT"`), the legacy adapter converts is to MARKET because it reads the wrong source. Every LIMIT order is submitted as MARKET.

Similarly, `limit_price` is always `0.0` — any limit price from the metadata dict is lost.

**Severity: High**

### H8: `_venue_event_status_from_row` silently maps unknown venue status to ACKED

**File:** `bingx_venue.py:83-96**

```python
def _venue_event_status_from_row(status: str) -> VenueEventStatus:
    normalized = _normalize_status(status)
    # ... checks known statuses ...
    return VenueEventStatus.ACKED  # fallthrough for anything unknown
```

If BingX introduces a new status (`"SUSPENDED"`, `"PENDING_CANCEL"`, `"EXPIRED"`), it doesn't match any known mapping and silently returns `ACKED`. The kernel treats a suspended/cancelled/expired order as acknowledged — dangerous misclassification.

**Severity: High**

### H9: `RealZincPlane.write_slot()` — slot written to `slot_id >= slot_count` is invisible

**File:** `real_zinc_plane.py:206-210**

```python
def write_slot(self, slot):
    with self._lock:
        self._slot_cache[int(slot.slot_id)] = slot
        payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
```

`_slot_cache` is a plain dict — accepts any key. But `read_slots()` only reads 0..slot_count-1. Writing to `slot_id >= slot_count` stores the slot in the cache but it's **never serialized or read back**. No error.

**Severity: High**

### H10: `RealZincControlPlane.read()` has no atomicity with concurrent `update()`

**File:** `real_control_plane.py:70-77**

`_write_region()` zero-fills the buffer then writes the packet. If `read()` interleaves between zero-fill and write, it sees a partially-zeroed buffer → `_decode_packet` returns `{}` → returns stale `self._snapshot` with no observable error. No lock, no sequence check, no atomic read.

The same bug exists in `RealZincPlane.read_slots()` (real_zinc_plane.py:220-230) — reads shared memory while a concurrent `write_slot()` is in progress.

**Severity: High**

### H11: `_RustKernelLib` lazily initialized with race condition

**File:** `rust_backend.py:187-190**

```python
_RUST: _RustKernelLib | None = None

def _get_rust():
    global _RUST
    if _RUST is None:
        _RUST = _RustKernelLib()  # no lock — two threads can both create
    return _RUST
```

No threading lock. Two concurrent calls to `_get_rust()` (possible via `BingxVenueAdapter`'s thread pool) can create two `_RustKernelLib` objects. The `_RustKernelLib()` constructor runs `_ensure_library()` which runs `subprocess.run(["cargo", "build", ...], check=True)` — concurrent `cargo build` can corrupt the build directory.

**Severity: High**

### H12: `ExecutionKernel.__del__` can deadlock or use-after-free

**File:** `rust_backend.py:527-531**

```python
def __del__(self):
    backend = getattr(self, "_backend", None)
    if backend is not None:
        try:
            _get_rust().destroy(backend)  # accesses module singleton
        except Exception:
            pass
```

`_get_rust()` accesses the module-level `_RUST` singleton, which may already be destroyed if the module's garbage collection runs before the instance's. The destroy call happens outside any lock — one thread's destructor could destroy the Rust kernel while another thread is still using it. Use-after-free.

**Severity: High**

### H13: `MirroredControlPlane` missing protocol methods

**File:** `control.py:171-184**

`ControlPlane` protocol defines `wait()` and `notify()`. `MirroredControlPlane` inherits from nothing and only implements `read()`, `update()`, and `mirror()`. Calling `plane.wait()` on a `MirroredControlPlane` raises `AttributeError`.

**Severity: Medium**

### H14: `TradeSlot.remaining_size()` and `VenueOrder.remaining_size()` — same name, different semantics

**Files:** `contracts.py:207-208`, `contracts.py:143-145**

```python
# TradeSlot:
def remaining_size(self) -> float:
    return max(0.0, float(self.size))  # open position size

# VenueOrder:
def remaining_size(self) -> float:
    return max(0.0, self.intended_size - self.filled_size)  # unfilled order qty
```

Same method name, completely different semantics. `TradeSlot.remaining_size()` returns the current open position size. `VenueOrder.remaining_size()` returns the untracked/unfilled order quantity. A caller using `slot.remaining_size()` to check if an order is fully filled gets position size, which doesn't change with fills — it changes with entry/exit.

**Severity: Medium**

### H15: `_maybe_close()` — `asyncio.run()` RuntimeError silently swallowed for coroutines

**File:** `launcher.py:233-243**

```python
if inspect.isawaitable(result):
    try:
        asyncio.run(result)
    except RuntimeError:
        pass  # SILENT — coroutine never executed
```

When `maybe_close` is called from an async context (which it is — `DITAv2LauncherBundle.close()` is used in async test code), `asyncio.run()` raises `RuntimeError("Cannot run the event loop while another loop is running")`. The exception is swallowed, the coroutine is never awaited, and the close/disconnect never happens.

Also: `break` after calling the first found method means if an object has both `close()` and `disconnect()`, `disconnect()` is never called.

**Severity: Medium**

### H16: `_build_launcher_bundle` imports `BingxDirectExecutionAdapter` inside function — import-time side effect is safe but lazy loading masks errors

**File:** `launcher.py:254**

```python
def _build_venue(...):
    from prod.clean_arch.adapters.bingx_direct import BingxDirectExecutionAdapter
```

Import inside function — safe, lazy, no side effects. But if the `bingx_direct` module has an import error (missing dependency, version mismatch), it only surfaces at bundle construction time, not at process start. A misconfigured production deployment would fail on the first trade, not on boot.

**Severity: Informational**

### H17: `load_dotenv()` at module level — import-time filesystem I/O and env mutation

**File:** `launcher.py:49-51**

```python
load_dotenv(PROJECT_ROOT / ".env")  # executes on module import
```

Runs on every import of `launcher.py` — reads filesystem, mutates process environment. Hard to mock in tests — setting env vars in test setup gets overwritten on module import. Also: if `.env` doesn't exist, `load_dotenv()` silently does nothing — missing config is invisible.

**Severity: Medium**

### H18: `_run()` in `BingxVenueAdapter` — `asyncio.run()` thread-pool bridge blocks on every call

**File:** `bingx_venue.py:225-233**

```python
def _run(self, result):
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()  # BLOCKS
```

Every call to `_run()` that receives an awaitable blocks the calling thread via `.result()`. The BingX HTTP call inside `submit_intent()` can take 1-5 seconds. During this block, the event loop cannot process other tasks. In a single-runtime deployment, this stalls the entire policy cycle.

**Severity: Medium**

### H19: `HazelcastClientLike` protocol has zero concrete implementations in workspace

**File:** `hazelcast_projection.py:13-15**

```python
class HazelcastClientLike(Protocol):
    def get_map(self, name: str): ...
    def get_topic(self, name: str): ...
```

Used as a type hint. No code in the workspace creates an object that satisfies this protocol. The Hazelcast client comes from an external package. If the external API changes, the protocol silently drifts — no compilation check.

**Severity: Low**

### H20: `_decode_packet` in RealZinc — no bound check on `size` beyond `> len(buf)-16`

**Files:** `real_control_plane.py:50-52`, `real_zinc_plane.py:70-81**

```python
seq, size = struct.unpack_from("!QQ", buf, 0)
if size <= 0 or size > len(buf) - 16:
    return {}
payload = bytes(buf[16 : 16 + size]).decode("utf-8")  # can raise UnicodeDecodeError
out = json.loads(payload)  # can raise ValueError
```

If shared memory contains a corrupted `size` field within bounds, `.decode()` or `json.loads()` raises — uncaught by callers. A single corrupted byte in shared memory crashes the kernel.

**Severity: Low**

### H21: All Rust crate features enabled by default — `wasm-bindgen` compiled into native shared library

**File:** `_rust_kernel/Cargo.toml`, transitive through `chrono` → `iana-time-zone` → `js-sys` → `wasm-bindgen`

The Rust kernel is a native `.so`/`.dylib` but chrono's `iana-time-zone` pulls in `js-sys` and `wasm-bindgen` (WebAssembly support) even on native Linux. Larger binary, longer compile times. `cc` crate pulled in for `iana-time-zone-haiku` which only compiles on Haiku OS.

**Severity: Low**

### H22: `socket.getaddrinfo` monkey-patch in test generator code

**File:** `gen2.py:295-298**

Monkey-patches Python stdlib `socket.getaddrinfo` to force IPv4 as a workaround for IPv6 resolution failure in the deployment environment. If copied to production code, would break IPv6 connectivity.

**Severity: Low**

---

## Pass 5 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | **Critical** |
| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | **Critical** |
| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | **Critical** |
| H4 | `_row_float` rejects zero as valid, `except Exception: continue` swallows all | Venue | **High** |
| H5 | `_backend_snapshot` timeout returns stale data/None — callers crash | Venue | **High** |
| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) | Bridge | **High** |
| H7 | `_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET | Venue | **High** |
| H8 | Unknown venue status silently mapped to ACKED | Venue | **High** |
| H9 | `RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost | Zinc | **High** |
| H10 | `RealZincControlPlane.read()` no atomicity with concurrent `update()` | Control | **High** |
| H11 | `_RustKernelLib` lazy init with race condition — concurrent cargo build | Bridge | **High** |
| H12 | `ExecutionKernel.__del__` use-after-free on Rust handle | Bridge | **High** |
| H13 | `MirroredControlPlane` missing protocol methods (wait/notify) | Control | Medium |
| H14 | `TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics | Contracts | Medium |
| H15 | `_maybe_close` asyncio.run RuntimeError silently swallowed | Launcher | Medium |
| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
| H17 | `load_dotenv()` at module level — import-time I/O side effect | Launcher | Medium |
| H18 | `_run()` blocks event loop on every HTTP call via thread pool | Venue | Medium |
| H19 | `HazelcastClientLike` protocol has zero concrete implementations | Projection | Low |
| H20 | `_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM | Zinc | Low |
| H21 | `wasm-bindgen` compiled into native library unnecessarily | Build | Low |
| H22 | `socket.getaddrinfo` monkey-patch in test code | Test | Low |

### Pass 5 Severity Distribution

| Severity | Count |
|----------|-------|
| **Critical** | 3 (H1, H2, H3) |
| **High** | 9 (H4-H12) |
| Medium | 5 (H13, H14, H15, H17, H18) |
| Low | 4 (H19, H20, H21, H22) |
| Info | 1 (H16) |

### Combined Catalog (All 5 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| **Total** | | **138** | **8** | **30** | **37** | **44** | **19** |

---

## PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY

### I1: Entry `apply_fill` sets `slot.size = fill_size` — multiple partial fills overwrite instead of accumulating

**File:** `_rust_kernel/src/lib.rs:798`

```rust
// Entry fill path in apply_fill:
slot.size = fill_size;          // DIRECT ASSIGNMENT
slot.initial_size = slot.initial_size.max(fill_size);  // max, not sum
```

If a single entry order receives multiple partial fills (e.g., LIMIT order on the book):
- Fill #1: `fill_size = 0.5` → `slot.size = 0.5`, `initial_size = max(0, 0.5) = 0.5`
- Fill #2: `fill_size = 0.3` → `slot.size = 0.3`, `initial_size = max(0.5, 0.3) = 0.5`

After both fills, the actual position is 0.8 but `slot.size` reports 0.3. The position is under-counted by 0.5 — 62.5% error.

The exit path correctly does `slot.size = (slot.size - fill_size).max(0.0)` (subtractive). The entry path should accumulate: `slot.size += fill_size`.

This only manifests with LIMIT orders that receive multiple partial fills over time — a scenario entirely absent from tests (I7).

**Severity: Critical**

### I2: `exit_ratio = 0.0` creates zero-size exit order — slot stuck in EXIT_REQUESTED

**File:** `_rust_kernel/src/lib.rs:467-469`

```rust
let exit_ratio = slot.next_exit_ratio();         // returns 0.0 from exit_leg_ratios=[0.0, ...]
let base_size = if slot.initial_size > 0.0 { ... } else { slot.size };
let exit_size = (base_size * exit_ratio).max(0.0); // = 0.0
```

When `exit_leg_ratios` contains `0.0` in any position, `exit_size = 0.0`. The zero-size exit order is submitted to the venue (`intended_size = 0`). On the fill side, `realized_pnl()` returns 0.0 (guarded by `exit_size <= 0.0`), and `slot.size` is unchanged. The slot stays in `EXIT_REQUESTED` with no means to advance — the leg is consumed but nothing happened. Subsequent exits may eventually handle this, but the zero-size leg is a wasted FSM transition that leaves the slot in a confusing intermediate state.

Also: `NaN` in `exit_leg_ratios` (from `clamp(0.0, 1.0)` not guarding NaN, though serde_json rejects NaN) would produce the same zero-size exit behavior.

**Severity: Medium**

### I3: `entry_price` inconsistency — Python uses falsy check, Rust uses `<= 0.0`

**File:** `contracts.py:88-98` (Python), `_rust_kernel/src/lib.rs:227-228` (Rust)

```python
# Python TradeSlot.mark_price():
self.entry_price = self.entry_price or price   # falsy — keeps -0.5, 0.0 replaced

# Rust TradeSlot::mark_price():
if self.entry_price <= 0.0 { self.entry_price = price; }  // catches -0.5, replaces it
```

If `entry_price` is negative (possible only via `set_slot_json` direct injection — not from normal trading), Python keeps it and computes `unrealized_pnl` with wrong sign. Rust replaces it. The Python-side `mark_price` is only called from `ExecutionKernel.mark_price()` in rust_backend.py:LOW-1, which never writes back to the Rust kernel — so the Python-side calculation is purely local and the inconsistency has no effect on the Rust kernel's canonical state. However, the `observe_slots` call after `mark_price` re-reads from the Rust kernel, which recomputes PnL correctly. The Python-side mark_price is effectively wasted computation that never feeds back.

**Severity: Informational**

### I4: No Rust unit tests for 99% of kernel functionality

**File:** `_rust_kernel/src/lib.rs:1731-1765`

Only 1 Rust test exists: `enter_then_ack_fill` — creates a 2-slot kernel, submits ENTER, sends ACK, asserts state transitions. 

**Not tested in Rust:**
- EXIT, CANCEL, MARK_PRICE, RECONCILE, CONTROL actions
- Any FILL event (PARTIAL, FULL)
- CANCEL_ACK, CANCEL_REJECT, ORDER_REJECT
- RATE_LIMITED handling
- Multi-leg exits
- `consume_exit_leg` edge cases
- `realized_pnl()` formula with boundary values
- `mark_price()` with extreme values
- `resolve_slot()` fallback path
- `reconcile_slots_json` dedup/overflow
- Any C FFI boundary function
- Any serde deserialization failure
- Null pointer handling

No `#[cfg(test)]` module exists — the single test is inline. No Rust integration tests (`tests/` directory).

**Severity: High**

### I5: `MockVenueScenario` rejection flags exist but zero tests use them

**File:** `mock_venue.py:23-35`

```python
@dataclass
class MockVenueScenario:
    reject_entries: bool = False
    reject_exits: bool = False
    cancel_reject: bool = False
```

Three boolean flags to simulate venue rejection of orders. Not a single test in `test_flaws.py` sets any of them to `True`. The `ORDER_REJECT` handler in the Rust kernel's `on_venue_event` exists (lib.rs lines ~1440-1460) but is never exercised by any test.

Similarly, `entry_partial_fill_ratio` and `exit_partial_fill_ratio` exist on `MockVenueScenario` but only one test (`test_cancel_entry_with_partial_fill`) uses partial fills at all — and it only checks `size > 0`, not the full capital-accrual chain.

**Severity: High**

### I6: No LIMIT order test through the full kernel path

The test suite has zero LIMIT orders. The Rust kernel doesn't even contain LIMIT-specific logic — all orders are MARKET. The generated live tests have `limit_does_not_fill` and `limit_immediate_fill` scenario placeholders, but:
- `limit_does_not_fill` uses `reference_price=0.0` (not a real LIMIT order)
- `limit_immediate_fill` uses `target_size=-0.001` (negative size → clamped to 0.0)

Neither scenario actually submits a LIMIT order with `order_type="LIMIT"` and a non-zero `limit_price`. The `_legacy_intent` bug (H7) would convert any LIMIT attempt to MARKET anyway.

The only LIMIT-related code is the Rust kernel's `if intent.order_type == "LIMIT"` branches (lib.rs:503, 1584) which are compile-time dead code — `KernelIntent` doesn't have an `order_type` field that serde would populate.

**Severity: High**

### I7: Three weak/vacuous assertions in `test_flaws.py`

**File:** `test_flaws.py`

1. **Line 512:** `assert order.metadata.get("asset") is not None or order.metadata.get("slot_id") is not None` — mock venue always sets both, this can never fail.

2. **Line 700:** `test_pnl_warning_on_unsettled_reentry` — titled to assert a warning is raised but only checks `r.accepted`. Never checks `diagnostic_code` or verifies the warning was issued.

3. **Line 318:** `assert slot.active_entry_order is None or slot.active_entry_order.status == VenueOrderStatus.FILLED` — the `or` allows two different scenarios to pass, reducing diagnostic power.

**Severity: Low**

### I8: `slot.size = fill_size` entry overfill no guard

**File:** `_rust_kernel/src/lib.rs:798`

Already noted in I1 — entry fill sets `slot.size` directly to `fill_size`. Unlike exit fill which has `(slot.size - fill_size).max(0.0)`, there's no guard against entry overfill (venue fills more than the intended order size). For MARKET orders this is fine (one fill per order), but for LIMIT orders with multiple partial fills, the accumulated fill could exceed `initial_size`.

**Severity: Low** (only relevant with LIMIT + partial fills, which don't exist in the codebase)

### I9: No crash durability — slot state is pure in-memory until step 7 of process_intent

**File:** `rust_backend.py:470-560`

The `process_intent` sequence:
1. validate → 2. Rust FSM → 3. venue.submit() → 4. on_venue_event() → 5. projection → 6. zinc_plane

If the process crashes between steps 2-5, the slot state accumulated in the Rust kernel's in-memory `KernelCore` is **completely lost**. The Rust kernel has no WAL, no journal, no persistent store. On restart, `ExecutionKernel.__init__` creates a fresh `KernelCore` with all slots IDLE.

The crash between step 3 and step 5 is the most dangerous: the exchange has an open order/position, but the kernel has no record of it. On restart:
- The Rust kernel sees `slot.slot_id = IDLE` 
- The Zinc slot cache may or may not have the pre-crash state (depends on timing)
- No code on restart loads Zinc state back into the Rust kernel (I14)
- The exchange order lives until it fills (unexpected position) or is manually cancelled

**Concrete example:** `venue.submit()` sends POST to BingX, order placed. HTTP response arrives. `on_venue_event(ORDER_ACK)` transitions slot to `ENTRY_WORKING`. Crash between returning from `on_venue_event` and `zinc_plane.write_slot()`. On restart: slot is IDLE, no active entry order, `_last_settled_pnl` is reset. The exchange has a live ENTRY_WORKING order. Next `process_intent(ENTER)` gets `SLOT_BUSY` because... wait — the fresh kernel doesn't know the order exists, so it sees slot as IDLE and allows a new ENTER. The old order fills on the exchange → double position.

**Severity: Critical**

### I10: `seen_event_ids` lost on restart — events replayed after restart are double-processed

**File:** `_rust_kernel/src/lib.rs:672-683`

`seen_event_ids` is per-slot, per-[`KernelCore`] instance — purely in-process memory. On restart with a fresh `KernelCore`, every slot has `seen_event_ids = Vec::new()`. If events are replayed (from `pump_venue_events()` calling `venue.reconcile()` which re-fetches exchange state):

1. Original run: order fills → `FULL_FILL` with `event_id = "EV-00000042"` → processed, slot → `POSITION_OPEN`
2. Crash
3. Restart: fresh `KernelCore`, `seen_event_ids` empty
4. `pump_venue_events()` fetches same exchange state → new `VenueEvent` objects with new event IDs (adapter's `_event_seq` resets)
5. Rust kernel sees these as novel events — processes them again
6. Position is double-booked, PnL double-settled

The `bingx_venue._event_seq` is an instance-level `itertools.count()` starting from 1. On adapter restart, it resets — so the new event IDs won't match the old ones anyway. Dedup is fundamentally impossible across restarts.

**Severity: Critical**

### I11: No idempotency key (`newClientOrderId`) sent to BingX

**File:** `bingx_venue.py:282-285`, `bingx_direct.py` (external)

BingX supports `newClientOrderId` for order idempotency — sending the same ID twice returns the original order status instead of creating a duplicate. The DITAv2 kernel passes `intent.intent_id` as `decision_id` to the legacy adapter, but there's no guarantee this maps to `newClientOrderId` in the BingX payload.

If the HTTP POST to `/trade/order` times out before the response is read:
1. The order was placed on the exchange
2. `_call_backend` raises a `BingxHttpError` (or similar network exception)
3. `process_intent()` propagates the exception — no retry
4. Next cycle: caller may retry with a new `intent_id`
5. Second POST creates a **second order** on the exchange — duplicate position

Without a client-order-id that persists across retries, the system can create duplicate orders on network timeouts. The exchange has no way to deduplicate.

**Severity: High**

### I12: No graceful degradation for ANY subsystem

Every subsystem failure mode examined:

| Subsystem | Failure | Current behavior |
|-----------|---------|-----------------|
| Zinc SHM init | Corrupted region, OOM | Silent fallback to InMemoryZincPlane (no operator signal) |
| Zinc SHM write | Region overflow, write error | Unhandled exception → kernel crashes |
| Hazelcast write | Cluster unavailable | `.put()` raises → unhandled exception → kernel crashes |
| ClickHouse journal | Sink failure | Exception propagates (no try/except in callers) |
| BingX HTTP | Timeout, rate limit | Exception or REJECTED → slot stuck in ORDER_REQUESTED |
| Rust kernel | Null pointer from FFI | `_take_string` raises RuntimeError → kernel crash |
| Memory pressure | OOM | Process killed by kernel. No signal handler. Zero signal handlers. |

**No subsystem has a graceful degradation path.** No circuit breaker, no retry queue, no fallback to log-only mode, no offline/cached trading mode. Every failure (except the two init-time silent fallbacks) crashes the current kernel operation.

**Severity: High**

### I13: Stray venue event can reactivate a CLOSED slot — no guard

**File:** `_rust_kernel/src/lib.rs:625+`

The `on_venue_event` function has no guard for closed slots:

```rust
fn on_venue_event(&mut self, event: VenueEvent) -> KernelResult {
    // ... resolve slot, check duplicates ...
    // NO: if slot.closed { return ... }
    let prev_state = slot.fsm_state.clone();
    match event.kind {
        SOME_EVENT_KIND => { /* transitions regardless of closed state */ }
    }
}
```

If a stray venue event arrives for a CLOSED slot:
- `ORDER_ACK` → sets `ENTRY_WORKING` — slot re-opens from CLOSED
- `FULL_FILL` → `apply_fill` runs → `slot.size = fill_size`, `fsm_state = POSITION_OPEN`
- `ORDER_REJECT` → clears `trade_id`, `asset`, sets `IDLE` — actually benign reset

A CLOSED slot should be a terminal state that rejects all events. Currently only CANCEL_ACK is harmless on a closed slot; the rest can revive a dead position.

**Severity: High**

### I14: No `reconcile_from_slots` call on startup — Zinc state never loaded into Rust kernel

**Files:** `rust_backend.py:435-465` (init), `real_zinc_plane.py:95-115` (init)

On restart:
1. `RealZincPlane.__init__` reads state from Zinc shared memory into `_slot_cache`
2. `ExecutionKernel.__init__` creates fresh `KernelCore` — all slots IDLE
3. `KernelStateView(self)` reads from the fresh kernel
4. `account.observe_slots([self._get_slot(i) for i in range(max_slots)])` — all slots IDLE

Step 3 and 4 read from the Rust kernel, NOT from Zinc. The Zinc `_slot_cache` populated in step 1 is **never loaded into the Rust kernel**. The `reconcile_on_restart` flag exists in `KernelControlSnapshot` (default `True`) but is never checked anywhere in `ExecutionKernel.__init__` or the launcher.

The system always starts with a blank state even when durable shared memory state exists.

**Severity: High**

### I15: CANCEL_REJECT doesn't clear `active_exit_order` — slot stuck in EXIT_WORKING

**File:** `_rust_kernel/src/lib.rs:1165-1175`

```rust
KernelEventKind::CANCEL_REJECT => {
    if slot.fsm_state == TradeStage::EXIT_WORKING {
        // stays EXIT_WORKING — no state transition
        // active_exit_order remains attached
    }
    diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
}
```

When the exchange rejects a cancel (typically because the order was already filled or no longer exists), the slot stays in `EXIT_WORKING` with `active_exit_order` still attached. Every subsequent CANCEL attempt hits the same path — the exchange returns "order not found," the kernel sees `CANCEL_REJECT`, and the slot is stuck forever.

If the order was already filled (CANCEL_REJECT means "can't cancel, no longer open"), the slot should check the actual position size and potentially transition to `POSITION_OPEN` or `CLOSED` depending on fill status.

**Severity: Medium**

### I16: Zinc shared memory — world-readable/writable by same-machine processes

**Files:** `real_control_plane.py`, `real_zinc_plane.py`

The Zinc shared memory regions are created with these names:
```python
self.region_name = f"{base}_intent"       # e.g., "dita_v2_intent"
self.state_name = f"{base}_state"          # "dita_v2_state"
self.control_name = f"{base}_control"      # "dita_v2_control"
```

Region names are predictable (prefix defaults to `"dita_v2"`). The `SharedRegion` uses POSIX `shm_open` — the default permissions depend on umask (typically `0644` or `0600`). Any process on the same machine can:
- **Read**: Open the region → `as_buffer()` → `_decode_packet()` → read all slot state, PnL, open orders, control settings
- **Write**: Open the region → forge a packet (`struct.pack("!QQ", seq, len) + json_bytes`) → overwrite slot state, inject fake intents, modify control plane

No access control, no encryption, no integrity check (HMAC/signature) on the wire format. The sequence number is the only ordering mechanism, and it's trivially predictable.

**Severity: High**

### I17: `KernelSlotView` exposes full slot state via unrestricted `__getattr__`/`__setattr__`

**File:** `rust_backend.py:411-460`

```python
class KernelSlotView:
    def __getattr__(self, name):
        slot = self._snapshot()
        return getattr(slot, name)         # read ANY field

    def __setattr__(self, name, value):
        setattr(slot, name, value)
        self._kernel._set_slot(slot)       # write ANY field — bypasses FSM
```

Any code with a `KernelSlotView` reference can:
- Read all slot fields: `trade_id`, `size`, `entry_price`, `unrealized_pnl`, `realized_pnl`, `seen_event_ids`, `metadata`
- Write all slot fields: `slot_view.realized_pnl = -9999999` — directly manipulates PnL figures flowing into capital settlement

The `_set_slot` call writes through to the Rust kernel without any FSM validation. The entire kernel state is exposed through mutable Python objects with zero access control.

**Severity: High**

### I18: `sys.path.insert(0, ...)` at import time in three production files

**Files:** `real_control_plane.py:14`, `real_zinc_plane.py:22`, `test_flaws.py:13`, `_build_pink_bodies.py:2`, `_gen_test.py:3`

```python
# real_control_plane.py, real_zinc_plane.py — at MODULE LEVEL:
sys.path.insert(0, str(_ZINC_ADAPTER_PATH))

# test_flaws.py, _build_pink_bodies.py, _gen_test.py — at MODULE LEVEL:
sys.path.insert(0, '/mnt/dolphinng5_predict')
```

`sys.path.insert(0, ...)` gives the injected path highest import priority. An attacker with filesystem write access to the inserted path can create a malicious module that shadows a legitimate import (e.g., `zinc.py`, `utils.py`, `typing.py`). When any subsequent `from X import Y` runs, the attacker's module loads with the full privileges of the kernel process.

The production files use a relative path resolution (`Path(__file__).resolve().parents[3] / "zinc" / "adapters" / "python"`), while the test files use a hardcoded absolute path (`'/mnt/dolphinng5_predict'`). Both patterns are dangerous.

**Severity: High**

### I19: `pump_venue_events` re-fetches exchange state that can produce phantom position events

**File:** `bingx_venue.py:395-415`

`reconcile()` calls `_backend_snapshot()` which fetches current positions and open orders from the exchange. The `_events_from_snapshot` method diff-s the current snapshot against the last-known snapshot to produce events:

```python
def _events_from_snapshot(self, before, after):
    for symbol, current_pos in after.open_positions.items():
        prev_pos = before.open_positions.get(symbol)
        if current_pos and (not prev_pos or abs(prev_pos.position_amount) < 1e-12):
            # This looks like a new position — emit event
```

If `before` is stale (from `_backend_snapshot` timeout), the diff can produce spurious events. A position that existed before the crash is absent from the stale snapshot → the diff sees it as "new" → emits an entry fill event → Rust kernel processes it as a fresh enter → double position. This compounds with I10 (seen_event_ids lost on restart).

**Severity: High**

### I20: `exit_leg_ratios` no guard against empty list — `next_exit_ratio` returns 1.0

**File:** `contracts.py:196-198`

```python
def next_exit_ratio(self) -> float:
    if self.active_leg_index < len(self.exit_leg_ratios):
        return self.exit_leg_ratios[self.active_leg_index]
    return 1.0
```

If `exit_leg_ratios` is empty (default `(1.0,)` prevents this normally, but the default is only `(1.0,)` in the dataclass), `next_exit_ratio()` returns `1.0`. This is the same as "exit everything" — the `consume_exit_leg` then advances `active_leg_index` to `min(1, 1) = 1`, and `all_legs_done = active_leg_index >= exit_leg_ratios.len()` → `1 >= 0 = true` → slot closes. The empty-ratios edge case is silently handled with `unwrap_or(1.0)`, which happens to be correct — but undocumented.

**Severity: Informational**

### I21: No test for rate-limited events — `RATE_LIMITED` kernel path is dead code

**File:** `_rust_kernel/src/lib.rs` (event handler), `MockVenueScenario.mock_venue.py` (no rate_limit flag)

The Rust kernel has a handler for `KernelEventKind::RATE_LIMITED` (lib.rs lines ~1480-1500). The event flows through the Python bridge's `process_intent()` rate-limit detection (rust_backend.py:585-593). But `MockVenueScenario` has no flag to emit rate-limited events. The only path to trigger `RATE_LIMITED` is from the real BingX adapter — which requires live exchange connectivity.

The entire RATE_LIMITED code path — in both Python and Rust — is untested in CI. Any bug in this path only surfaces in production under rate-limit conditions.

**Severity: Medium**

### I22: Thread pool for `_run` — `max_workers=3` shared across ALL adapter instances

**File:** `bingx_venue.py:236-245**

```python
@classmethod
def _get_executor(cls):
    if cls._EXECUTOR is None:
        with cls._EXECUTOR_LOCK:
            if cls._EXECUTOR is None:
                cls._EXECUTOR = ThreadPoolExecutor(max_workers=3, ...)
    return cls._EXECUTOR
```

Class-level singleton — all `BingxVenueAdapter` instances share the same 3-thread pool. With the runtime's `step()` calling `submit()` (1 thread) + `_backend_snapshot` (potentially another thread for open orders) + `cancel()` (1 thread in parallel), all 3 threads are consumed. A fourth concurrent call blocks the calling thread at `.result()` indefinitely — freezing the entire event loop.

The pool is never shut down. If a `BingxVenueAdapter` is destroyed, the threads remain running (zombie workers). No `close()`/`disconnect()` path shuts down the executor.

**Severity: Medium**

---

## Pass 6 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| I1 | Entry `apply_fill` multiple partial fills overwrite size instead of accumulating | Rust | **Critical** |
| I2 | Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED | Rust | Medium |
| I3 | entry_price inconsistency — Python falsy vs Rust `<= 0.0` gate | Bridge | Info |
| I4 | Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer | Rust | **High** |
| I5 | MockVenueScenario rejection flags exist but zero tests use them | Test | **High** |
| I6 | No LIMIT order test through full kernel path | Test | **High** |
| I7 | Three weak/vacuous assertions in test_flaws.py | Test | Low |
| I8 | Entry overfill no guard | Rust | Low |
| I9 | No crash durability — slot state pure in-memory until step 7 of process_intent | Bridge | **Critical** |
| I10 | seen_event_ids lost on restart — events double-processed | Rust | **Critical** |
| I11 | No idempotency key sent to BingX — lost response creates duplicate orders | Venue | **High** |
| I12 | No graceful degradation for ANY subsystem | All | **High** |
| I13 | Stray venue event can reactivate CLOSED slot — no guard | Rust | **High** |
| I14 | No reconcile_from_slots call on startup — Zinc state never loaded into kernel | Restart | **High** |
| I15 | CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING | Rust | Medium |
| I16 | Zinc shared memory world-readable/writable by same-machine processes | Zinc | **High** |
| I17 | KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards | Bridge | **High** |
| I18 | sys.path.insert(0) at import time in 3 production files — malicious module loading | Build | **High** |
| I19 | pump_venue_events stale snapshot diff produces phantom position events | Venue | **High** |
| I20 | exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) | Contracts | Info |
| I21 | RATE_LIMITED code path in both Python and Rust is completely untested | All | Medium |
| I22 | Thread pool max_workers=3 shared across all adapter instances — never shut down | Venue | Medium |

### Pass 6 Severity Distribution

| Severity | Count |
|----------|-------|
| **Critical** | 3 (I1, I9, I10) |
| **High** | 9 (I4, I5, I6, I11, I12, I13, I14, I16, I17, I18, I19) |
| Medium | 4 (I2, I15, I21, I22) |
| Low | 2 (I7, I8) |
| Info | 2 (I3, I20) |

### Combined Catalog (All 6 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| **Total** | | **160** | **11** | **41** | **41** | **46** | **21** |
-												PINK: E2E trace analysis — Pass 3 deep trace (F1-F30)

Third and deepest pass across all module boundaries, data transforms, and
error paths. 30 new flaws found (F1-F30), including the highest-risk single
flaw: an unprotected on_venue_event loop that leaves slots unrecoverable on
any exception.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

											
										
										
											2026-06-01 13:42:22 +02:00
+								# PINK DITAv2 — End-to-End Trace & Flaw Analysis
 								**Analysis date:** 2026-05-31
 								**Method:** Full-trace static analysis — every file, every data path, every
 								boundary crossing in the PINK execution pipeline. No test execution.
 								**System scope:** 34 active source files, ~12,000 lines across Rust kernel,
 								Python bridge, venue adapter, runtime, and persistence.
-												PINK: centralize all flaw findings in FLAW_ANALYSIS doc

Rewrite PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md as the central registry
with combined catalog (A+T+E+F+G = 116 flaws), severity distribution, and
cross-references to the TRACE doc for deep E, F, G detail. Add reciprocal
cross-reference in TRACE doc header.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

											
										
										
											2026-06-01 16:31:23 +02:00
+								> **Central flaw registry:** [PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md](./PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md)
 								> contains the combined catalog of all 116 flaws (A, T, E, F, G series) with
 								> severity distribution and cross-references. This file provides the deep E2E
 								> trace context — read the central registry for the master list.
-												PINK: E2E trace analysis — Pass 3 deep trace (F1-F30)

Third and deepest pass across all module boundaries, data transforms, and
error paths. 30 new flaws found (F1-F30), including the highest-risk single
flaw: an unprotected on_venue_event loop that leaves slots unrecoverable on
any exception.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

											
										
										
											2026-06-01 13:42:22 +02:00
+								---
 								## E2E Data Flow (One Call)
 								Every E2E path in the PINK system traces through this sequence. Each numbered
 								step below is a site where data crosses a module boundary and can be lost,
 								mangled, or misinterpreted.
 								```
 								PinkDirectRuntime.step()                    # R1: policy cycle entry
 								  ├─ pump_venue_events()                    # R2: drain async fills
 								  ├─ kernel.snapshot()["account"]           # R3: read capital
 								  ├─ kernel.slot(0)                         # R4: read slot state
 								  ├─ decision_engine.decide()               # R5: policy-layer ENTER/EXIT
 								  ├─ intent_engine.plan()                   # R6: intent sizing
 								  ├─ _decision_to_kernel_intent()           # R7: Decision → KernelIntent
 								  ├─ kernel.process_intent(kernel_intent)   # R8: KERNEL BOUNDARY
 								  │   ├─ rust_backend._intent_to_payload()  # R8a: KernelIntent → JSON
 								  │   ├─ _RustKernelLib.process_intent()    # R8b: JSON → C FFI
 								  │   │   └─ Rust process_intent()          # R8c: FSM mutates TradeSlot
 								  │   ├─ venue.submit(intent)               # R9: VENUE BOUNDARY
 								  │   │   ├─ bingx_venue._legacy_intent()   # R9a: KernelIntent → LegacyIntent
 								  │   │   ├─ BingxDirectExecutionAdapter    # R9b: HTTP POST /trade/order
 								  │   │   │   .submit_intent()
 								  │   │   └─ bingx_venue._events_from_submit() # R9c: receipt → VenueEvent[]
 								  │   └─ on_venue_event(event)              # R10: FEEDBACK BOUNDARY
 								  │       ├─ _RustKernelLib → Rust FSM      # R10a: C FFI → FSM transition
 								  │       ├─ account.settle(delta)          # R10b: incremental PnL settlement
 								  │       └─ persistence writes             # R10c: ClickHouse / Zinc / HZ
 								  ├─ kernel.snapshot()["account"]           # R11: read final capital
 								  └─ persistence.persist_step()             # R12: PERSISTENCE BOUNDARY
 								```
 								---
 								## Layer 1: Policy Cycle Entry (pink_direct.py:422)
 								### E1: `step()` calls `pump_venue_events()` every cycle unconditionally
 								**pink_direct.py:436**
 								```python
 								await self.pump_venue_events(snapshot, market_state=market_state)
 								```
 								This is called **before** reading slot/account state for the policy decision.
 								The pump calls `venue.reconcile()` which for `BingxVenueAdapter` does 5 HTTP
 								requests (balance, positions, open orders, plus history if `include_history`).
 								For MARKET-only workflows, no resting orders exist, so `reconcile()` returns
 								empty events every time. But the HTTP calls still happen. On BingX VST with
 								~10 req/s limit and a 5s policy cycle, this burns 1 req/s just to learn
 								"nothing changed." Add the actual trade HTTP calls, and the budget is tight.
 								**Flaw: E1 — unconditional exchange poll wastes rate limit.**
 								Already documented as A10, but worse when traced E2E: each `pump_venue_events`
 								calls `venue.reconcile()` → `_backend_snapshot()` → parallel `asyncio.gather`
 								of 3 HTTP GETs. The `_refresh_exchange_state` at bingx_direct.py:281-352
 								always fetches balance + positions + openOrders concurrently. Even when
 								`include_history=False` (which it is for the pump), that's 3 HTTP calls
 								every policy cycle regardless of whether any orders are resting.
 								**Severity: Medium.** Wasteful but not destructive on testnet.
 								### E2: `kernel.snapshot()["account"]` returns a fresh dict, not a live view
 								**pink_direct.py:437**
 								```python
 								acc = self.kernel.snapshot()["account"]
 								```
 								`ExecutionKernel.snapshot()` at rust_backend.py:740-752 builds a dict from
 								kernel state at call time. The decision/intent engines then consume this
 								snapshot. Between the snapshot and `process_intent()` (line 523), another
 								caller (or the same runtime in a concurrent cycle) could advance the kernel
 								state, making the decision based on stale capital.
 								**Flaw: E2 — TOCTOU between capital snapshot and intent execution.**
 								The `context.capital` read at line 437 is used at line 523 for the ENTER
 								safety guard (`_unsafe_entry_reason`) and possibly by the decision/intent
 								engines. If capital changes between these two points (e.g. an async fill
 								arrives via a concurrent test-HTTP path), the guard uses stale capital.
 								**Severity: Low** in single-threaded deployment. Critical under concurrency.
 								---
 								## Layer 2: Decision/Intent Bridging (pink_direct.py:79-115)
 								### E3: `_decision_to_kernel_intent` drops `order_type` and `limit_price`
 								**pink_direct.py:79-115**
 								```python
 								def _decision_to_kernel_intent(decision, intent, slot_id=0):
 								    return KernelIntent(
 								        ...
 								        # order_type and limit_price are NOT SET here
 								    )
 								```
 								`KernelIntent` has `order_type="MARKET"` and `limit_price=0.0` as defaults,
 								so MARKET orders work correctly. But the runtime **never** sets these fields
 								from the policy layer. If `decision` or `intent` ever carries `order_type`
 								or `limit_price`, it's silently dropped because the bridge doesn't map them.
 								**Flaw: E3 — LIMIT support in runtime is dead code.**
 								The `order_type`/`limit_price` fields in `KernelIntent` and the LIMIT payload
 								building in `bingx_direct.py` lines 384-398 are unreachable from the runtime.
 								The only path that can set them is direct `KernelIntent(...)` construction
 								in tests (`_build_pink_bodies.py` style scenarios). The `_decision_to_kernel_intent`
 								bridge must be patched when a policy engine needs to emit LIMIT orders.
 								**Severity: Medium.** Blocks any production path to LIMIT orders.
 								### E4: `_exit_intent_from_slot` trusts slot.size but slot may be stale
 								**pink_direct.py:398-420**
 								```python
 								def _exit_intent_from_slot(self, kernel_intent):
 								    try:
 								        slot_size = float(self.kernel.slot(int(kernel_intent.slot_id)).size or 0.0)
 								    except Exception:
 								        slot_size = 0.0
 								    ...
 								    exit_size = min(policy_size, slot_size) if policy_ok else slot_size
 								```
 								Reads `slot.size` fresh from the Rust kernel at call time, then uses it to
 								cap the exit size. Between this read and the `process_intent` call that
 								actually executes the EXIT (line 523), the slot can be modified by
 								`pump_venue_events` (line 436) or a concurrent cycle. If a partial fill
 								arrived between the slot read and the EXIT, the exit size could be wrong.
 								**Flaw: E4 — TOCTOU between exit sizing and exit execution.**
 								Same class as E2 but for exit size rather than capital. If the pump drained
 								a partial fill between R4 (slot read) and R8 (process_intent), the EXIT
 								requests a size based on pre-pump remaining size. The kernel caps it at
 								actual remaining, so this is self-correcting — but the intent payload has
 								wrong metadata.
 								**Severity: Low.** Self-correcting at kernel level.
 								---
 								## Layer 3: Kernel Bridge — Rust FSM Entry (rust_backend.py)
 								### E5: JSON serialization round-trip loses numeric precision
 								**rust_backend.py:460-485 (`_intent_to_payload`)**
 								`KernelIntent` fields like `reference_price`, `target_size`, `leverage` are
 								Python floats. They're serialized to JSON text, sent through C FFI, parsed
 								by serde_json into Rust `f64`, then serialized back to JSON, parsed by Python
 								`json.loads()`. Each serialization step can introduce precision loss:
 								```python
 								# Python float → JSON: 0.1 → "0.1" → Rust f64: 0.10000000000000000555
 								# Rust f64 → JSON: → serde_json may print "0.10000000000000001"
 								# Python json.loads → 0.10000000000000001
 								```
 								For prices (TRXUSDT at ~$0.08), a 1e-16 relative error is negligible. For
 								PnL accumulation over thousands of trades at 9x leverage, the error can grow
 								to cents or dollars. The `|Δcapital − realized| < 1e-9` assertion in tests
 								would catch gross errors but not sub-cent accumulation.
 								**Flaw: E5 — JSON serialization precision drift over long runs.**
 								**Severity: Low.** Not a practical concern for the current deployment scale.
 								### E6: `_RustKernelLib` is a global singleton — shared across all kernels
 								**rust_backend.py:40-45**
 								```python
 								_RUST: _RustKernelLib | None = None
 								def _get_rust() -> _RustKernelLib:
 								    global _RUST
 								    if _RUST is None:
 								        _RUST = _RustKernelLib()
 								    return _RUST
 								```
 								The `_RustKernelLib` singleton loads the `.so` shared library once and
 								provides FFI functions. Each `ExecutionKernel` instance gets its own
 								`KernelHandle` via `_get_rust().create(max_slots)`. The FFI functions take
 								the handle as the first argument, so multiple kernels are isolated at the
 								Rust level.
 								**However**, the singleton means ALL kernels share the same ctypes function
 								pointer table. If a second kernel is created and the first is destroyed,
 								`KernelHandle` of the first becomes a dangling pointer. Calling any FFI
 								function on the destroyed kernel's handle is use-after-free.
 								**Flaw: E6 — No protection against use-after-free on kernel destroy.**
 								Already documented as T7. Worth re-emphasizing in the E2E trace because the
 								test infrastructure creates and destroys kernels frequently (fresh-kernel
 								reconcile tests, each `_build_rb()` call in scenario wrappers).
 								**Severity: High.** Use-after-free in C FFI is memory corruption.
 								---
 								## Layer 4: Rust Kernel FSM (lib.rs:728)
 								### E7: ENTER handler silently allows re-entry with same trade_id
 								**lib.rs:740-745**
 								```rust
 								if !slot.is_free() && !slot.trade_id.is_empty() && slot.trade_id != intent.trade_id {
 								    return SLOT_BUSY;
 								}
 								```
 								If `slot.trade_id == intent.trade_id`, the ENTER is accepted even if the
 								slot is not free (e.g., POSITION_OPEN with an active position). This is by
 								design — it lets the same trade_id re-enter after the slot was partially
 								reconciled or restored from a snapshot. But it also means:
 . EXIT sets `slot.closed=true` and transitions to `CLOSED`
 . A new ENTER with the **same** trade_id re-enters the CLOSED slot
 . The slot resets `slot.closed=false`, `slot.size=0.0`, `slot.initial_size=0.0`
 . Kernel now thinks the trade is new, but the Rust indexes still have the
 								   old trade_id pointing to slot 0
 								**Downstream effect:** After a re-entry with the same trade_id, the
 								`active_trade_index[trade_id]` still correctly points to slot 0. But the
 								old `VenueOrder` in `client_order_index` and `venue_order_index` is still
 								present until the new entry fills and creates new orders. A reconcile event
 								addressed to the old `venue_client_id` could stomp on the new trade.
 								**Flaw: E7 — Re-entry with same trade_id leaves stale index entries.**
 								**Severity: Low.** The `rebuild_indexes()` call in `commit_slot()` rebuilds
 								from scratch, so stale entries are cleared on the first write.
 								### E8: EXIT handler uses `initial_size` not `current size`
 								**lib.rs:770-775**
 								```rust
 								let exit_ratio = slot.next_exit_ratio();
 								let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
 								let exit_size = (base_size * exit_ratio).max(0.0);
 								```
 								Already documented as A1. In the E2E trace, this is the single most impactful
 								execution flaw. A concrete scenario:
 . Enter `size=1.0`, `initial_size=1.0`, `exit_leg_ratios=(0.5, 0.5, 1.0)`
 . EXIT leg 0: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.5.
 . EXIT leg 1: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.0.
 								   `active_leg_index` advances to 2. `all_legs_done = (2 >= 3) = false`.
 								   But wait — `exit_leg_ratios.len()` is 3: [0.5, 0.5, 1.0]. So
 								   `all_legs_done = (2 >= 3) = false`. The slot stays at `POSITION_OPEN`,
 								   `size=0.0`, `!closed`.
 . EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is at 0.0.
 								   `slot.is_free()`: `fsm_state=POSITION_OPEN`, not in `{IDLE, CLOSED}`.
 								   `slot.size <= 0.0` is true. But `!slot.is_free()` returns true because
 								   of the FSM state check, not the size check. The ENTER guard `!slot.is_free()`
 								   blocks re-entry. The EXIT guard `slot.is_free() || slot.closed || size <= 0.0`
 								   triggers — returns `NO_OPEN_POSITION`.
 . **Slot is stuck forever.** No operation can advance it.
 								**Severity: High.** Concrete, reproducible, and not caught by any test.
 								### E9: CANCEL handler returns diagnostic even when nothing happened
 								**lib.rs:795-810**
 								```rust
 								if matches!(intent.action, KernelCommandType::CANCEL) {
 								    let has_cancellable_exit = slot.active_exit_order.is_some();
 								    let has_cancellable_entry = slot.active_entry_order.is_some()
 								        && matches!(slot.fsm_state, ENTRY_WORKING | ORDER_REQUESTED | ORDER_SENT | IDLE);
 								    if !has_cancellable_exit && !has_cancellable_entry {
 								        return KernelResult {
 								            outcome: KernelOutcome {
 								                accepted: false,
 								                diagnostic_code: NO_ACTIVE_EXIT_ORDER,
 								                ...
 								            },
 								            ...
 								        };
 								    }
 								    return KernelResult {
 								        outcome: KernelOutcome {
 								            accepted: true,
 								            ...
 								        },
 								        ...
 								    };
 								}
 								```
 								Two issues:
 . When **neither** is cancellable, the diagnostic is `NO_ACTIVE_EXIT_ORDER`
 								   even if the actual reason is "no active entry order either" or "slot is
 								   already IDLE". The diagnostic is misleading.
 . When at least one IS cancellable, the Rust kernel returns `accepted=true`
 								   but does **not** mutate the slot at all — it returns immediately with the
 								   slot as-is. The actual cancel (HTTP call + FSM transition) happens in the
 								   Python bridge. The Rust kernel's "accept" just means "yes you may try to
 								   cancel this" — not "the cancel is complete."
 								This disconnect means: if the Python bridge's `venue.cancel()` fails (HTTP
 								error), the Rust kernel has already returned `accepted=true` for a cancel
 								that never happened. The caller sees `accepted=true` but the slot state
 								hasn't changed.
 								**Flaw: E9 — Rust CANCEL "accepts" before Python actually cancels.**
 								**Severity: Medium.** The `outcome.accepted` boolean is misleading for CANCEL.
 								### E10: `apply_fill` entry branch double-sets `active_entry_order`
 								**lib.rs:1330-1390**
 								```rust
 								// First set — at the top of the entry branch:
 								slot.active_entry_order = Some(VenueOrder {
 								    ...
 								    filled_size: fill_size,
 								    status: if partial { PARTIALLY_FILLED } else { FILLED },
 								    ...
 								});
 								// ... then later for full fill:
 								if !partial {
 								    slot.fsm_state = TradeStage::POSITION_OPEN;
 								    slot.active_entry_order = Some(VenueOrder {  // SECOND SET
 								        ...
 								        filled_size: slot.size,    // uses updated slot.size
 								        ...
 								    });
 								}
 								```
 								The entry branch sets `active_entry_order` at the top with `filled_size` from
 								the event, then for a FULL_FILL, sets it again with `filled_size = slot.size`
 								(which may have been updated by `slot.initial_size = fill_size` above). The
 								first VenueOrder's `intended_size` is from the event, the second uses
 								`slot.size`. Both are correct in isolation, but the double-write is wasteful.
 								More importantly, for a PARTIAL_FILL entry, the first set is the ONLY set.
 								If a second PARTIAL_FILL arrives for the same order, the entry branch at
 								line 1334 checks `slot.active_entry_order.is_some()` which is true (set by
 								the first partial), but the FSM state is `ENTRY_WORKING` (also set by first
 								partial). The condition at line 1334-1338 matches `ENTRY_WORKING`, so the
 								second partial enters the entry branch again. But `fill_size` is the event's
 								`filled_size` — the **total** filled, not the incremental amount.
 								**Flaw: E10 — Second PARTIAL_FILL on entry overwrites, doesn't accumulate.**
 								```rust
 								let fill_size = if event.filled_size > 0.0 {
 								    event.filled_size      // ← TOTAL filled, not incremental
 								} else {
 								    event.size
 								}.max(0.0);
 								slot.active_entry_order = Some(VenueOrder {
 								    ...
 								    filled_size: fill_size,  // ← overwrites previous filled_size
 								    ...
 								});
 								slot.initial_size = slot.initial_size.max(fill_size);  // ← OK, uses max
 								slot.size = fill_size;  // ← OVERWRITES previous size with total
 								```
 								On a RESTING LIMIT entry that partially fills in two events:
 								- Event 1: filled_size=0.3 → slot.size=0.3, entry_order.filled_size=0.3
 								- Event 2: filled_size=0.7 → slot.size=0.7, entry_order.filled_size=0.7
 								The `filled_size` on the VenueOrder correctly reflects cumulative fill
 								(0.7), but `slot.size` jumps from 0.3 to 0.7 — the increment is 0.4, which
 								is correct because `fill_size` IS the cumulative fill (0.7). Actually this
 								is correct — the venue sends cumulative filled_size, not incremental. Let
 								me re-verify: at `bingx_venue._events_from_submit()` line ~480:
 								```python
 								filled_size = _row_float(ack_row, "executedQty", ...)
 								```
 								This reads `executedQty` which on BingX IS cumulative. So the second event's
 								`filled_size=0.7` means "total filled across all fills = 0.7." The kernel
 								sets `slot.size = 0.7` which is the total position size. This is correct.
 								But the second fill event has `slot.entry_price` overwritten by the new
 								fill's price. If the first fill was at 0.0834 and the second at 0.0836, the
 								slot's `entry_price` becomes 0.0836 — losing the blended average. For a LIMIT
 								entry with two partial fills at different prices, the entry_price in the slot
 								is the price of the LAST fill, not the VWAP.
 								**Flaw: E10a — Entry price on multi-partial entry is last-fill, not VWAP.**
 								**Severity: Low.** Unrealized PnL computation uses this price. Error is small
 								for tight spreads.
 								---
 								## Layer 5: Venue Adapter Boundary (bingx_venue.py)
 								### E11: `_legacy_intent()` is a lossy conversion
 								**bingx_venue.py:270-285**
 								```python
 								@staticmethod
 								def _legacy_intent(intent: KernelIntent) -> LegacyIntent:
 								    action = LegacyDecisionAction.ENTER if intent.action == E.ENTER else ...
 								    side = LegacyTradeSide.SHORT if intent.side == TS.SHORT else ...
 								    metadata = dict(intent.metadata)
 								    metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
 								    metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
 								    return LegacyIntent(
 								        timestamp=intent.timestamp,
 								        trade_id=intent.trade_id,
 								        decision_id=intent.intent_id,
 								        asset=intent.asset,
 								        action=action,
 								        side=side,
 								        reason=intent.reason,
 								        target_size=float(intent.target_size),
 								        leverage=float(intent.leverage),
 								        reference_price=float(intent.reference_price),
 								        confidence=1.0,           # ← HARDCODED
 								        bars_held=0,              # ← HARDCODED
 								        exit_leg_ratios=tuple(intent.exit_leg_ratios or (1.0,)),
 								        metadata=metadata,
 								    )
 								```
 								`confidence` is always 1.0 and `bars_held` is always 0. The `LegacyIntent`
 								carries these to `BingxDirectExecutionAdapter.submit_intent()` which ignores
 								them (it only reads `asset`, `side`, `action`, `target_size`, `leverage`,
 								and `metadata`). So the hardcoded values don't affect execution — but they
 								affect the `ExecutionReceipt` and any downstream consumers that might read
 								`receipt.confidence`.
 								**Flaw: E11 — Lossy conversion with hardcoded metadata.**
 								**Severity: Informational.** No downstream consumer reads these fields.
 								### E12: `_events_from_submit()` price fallback chain can lose venue price
 								**bingx_venue.py:375-400 (`_events_from_submit`)**
 								```python
 								base_event = VenueEvent(
 								    ...
 								    price=safe_float(getattr(receipt, "price", 0.0), 0.0),
 								    ...
 								)
 								# ... later for fill event:
 								fill_price = safe_float(
 								    _row_float(ack_row, "avgPrice", "ap", "price", "lastFillPrice",
 								               default=getattr(receipt, "price", 0.0)),
 .0
 								)
 								```
 								The fill price is read from `ack_row` (the HTTP response dict) first, falling
 								back to `receipt.price` (the `ExecutionReceipt` field). The `executionReceipt`
 								price comes from `bingx_direct.py:434`:
 								```python
 								fill_price = 0.0
 								for key in ("avgPrice", "avgFilledPrice", "price", "lastFillPrice", "tradePrice"):
 								    try: value = float(ack_row.get(key) or 0.0)
 								    except: value = 0.0
 								    if value > 0: fill_price = value; break
 								if fill_price <= 0 and self._state is not None:
 								    fill_price = next((float(...)) for ... in self._state.open_positions.values() ...)
 								```
 								So the price flows: BingX HTTP ack → `ack_row[key]` → `receipt.price` →
 								`_events_from_submit()` → `fill_price` in VenueEvent.
 								If `ack_row` has no price field AND `self._state.open_positions` has no matching
 								position (e.g., first fill on a new entry), `fill_price` stays 0.0. The kernel's
 								`apply_fill` at lib.rs:1397 checks `if event.price > 0.0` before setting
 								`entry_price` — so a zero fill price leaves `entry_price` at 0.0. This means:
 								- The slot's `entry_price` stays 0.0
 								- `realized_pnl()` at lib.rs:662 checks `if slot.entry_price <= 0.0` → returns 0.0
 								- **PnL is never computed for this fill**
 								- Capital never settles
 								This is very unlikely on BingX VST, which always returns `avgPrice` in order
 								acknowledgements. But on any venue that doesn't, PnL is silently zeroed.
 								**Flaw: E12 — Zero fill price → zero entry_price → zero PnL.**
 								**Severity: Medium.** Silent PnL loss if venue returns no price.
 								### E13: `_backend_snapshot()` timeout returns stale data
 								**bingx_venue.py:290-320**
 								```python
 								def _backend_snapshot(self, *, include_history=False, timeout_ms=5000.0):
 								    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
 								        with self._snap_lock:
 								            return self._last_snapshot  # ← STALE DATA
 								```
 								If the previous snapshot fetch is still in-flight when a new caller arrives,
 								the timeout returns `self._last_snapshot` — which could be seconds or minutes
 								old. The caller (e.g., `submit()`) then uses this stale snapshot to compute
 								`_filled_size_from_snapshots()` — potentially comparing stale "before" data
 								with fresh "after" data, producing a wrong delta.
 								**Flaw: E13 — Stale snapshot fallback causes wrong fill-size detection.**
 								**Severity: Medium.** The `_filled_size_from_snapshots` diff can be wrong.
 								### E14: `_events_from_cancel` uses stale `slot_id` from order metadata
 								**bingx_venue.py:485-510**
 								```python
 								VenueEvent(
 								    ...
 								    slot_id=int(order.metadata.get("slot_id", 0) or 0),
 								    ...
 								)
 								```
 								The `slot_id` in the CANCEL event comes from the `VenueOrder.metadata` which
 								was set when the order was created (in Rust FSM's `process_intent` or
 								`on_venue_event`). If the slot was re-assigned or the kernel's slot count
 								changed since order creation, this slot_id is wrong. The Rust kernel's
 								`resolve_slot()` at lib.rs:610-624 would use the event's `slot_id` (the
 								stale one) and find the wrong slot.
 								**Flaw: E14 — Cancel event carries stale slot_id from order creation.**
 								**Severity: Low.** Slots are stable and never renumbered.
 								---
 								## Layer 6: BingX Direct Adapter (bingx_direct.py)
 								### E15: Submit sets leverage via separate HTTP call
 								**bingx_direct.py:376-379**
 								```python
 								await self._client.signed_post(
 								    "/openApi/swap/v2/trade/leverage",
 								    {"symbol": symbol, "side": "BOTH", "leverage": leverage},
 								)
 								```
 								This is a POST to set exchange leverage **before** each order. If this call
 								fails (rate limit, network error), the exception at line 417 sets
 								`status = "RATE_LIMITED"` and returns a rejection — the order is NOT
 								submitted. But the error handling at line 417 catches `BingxHttpError` for
 								the leverage call AND the order call with the same handler. If the leverage
 								call fails with a non-rate-limit error (e.g., `400 Bad Request` for invalid
 								symbol), the status is `"REJECTED"` and no order is placed. This is correct
 								behavior — but the error message doesn't distinguish "leverage set failed"
 								from "order submission failed."
 								**Flaw: E15 — Leverage-set failure and order failure share error handler.**
 								**Severity: Low.** Correct behavior, poor diagnostics.
 								### E16: `_format_quantity` and `_format_price` use `_instrument_step`/`_instrument_tick` — both may be zero
 								**bingx_direct.py:234-268**
 								```python
 								def _instrument_step(self, asset):
 								    instrument = self._resolve_instrument(asset)
 								    if instrument is not None:
 								        try: return Decimal(str(instrument.size_increment.as_decimal()))
 								        except: pass
 								    return Decimal("0.001")  # fallback
 								def _format_quantity(self, asset, quantity):
 								    step = self._instrument_step(asset)
 								    if step <= 0:
 								        return str(max(0.0, quantity))
 								    ...
 								```
 								If `_resolve_instrument` returns None (asset not in provider), `step=0.001`
 								and `tick=0.01`. These defaults are correct for most USDT perpetuals on
 								BingX VST, but may be wrong for non-standard symbols. The format functions
 								still produce a valid string — just possibly with wrong precision.
 								More concerning: `_resolve_instrument` at line 211-226 tries three lookup
 								strategies and iterates all instruments on the third. This iteration is O(n)
 								in the number of instruments and happens on EVERY `submit_intent()` call.
 								With 540 instruments, this is ~0.5ms — acceptable. But `_instrument_step`
 								and `_instrument_tick` each call `_resolve_instrument` independently, so
 								`submit_intent()` calls it twice (once for quantity, once for price, plus
 								once for `_instrument_venue_symbol` at line 358). Three full-instrument-list
 								iterations per order.
 								**Flaw: E16 — Instrument resolution called 3x per order with O(n) scan.**
 								**Severity: Low.** Performance, not correctness.
 								### E17: Cancel uses truth-based confirmation — can mask real errors
 								**bingx_direct.py:474-498**
 								```python
 								still_open = True
 								try:
 								    oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
 								    ...
 								    still_open = (venue_order_id in ids) if venue_order_id else (venue_client_id in cids)
 								except Exception:
 								    still_open = None
 								if still_open is False:
 								    return {"status": "CANCELED", ...}
 								if str(delete_resp.get("status", "")).upper() in {"CANCELED", "CANCELLED", "SUCCESS", "OK"}:
 								    return {"status": "CANCELED", ...}
 								return {"status": delete_resp.get("status", "REJECTED"), ...}
 								```
 								The cancel logic:
 . DELETE the order on BingX
 . GET open orders to verify
 . If the order is no longer open, return CANCELED
 . If the DELETE response says CANCELED, return CANCELED
 . Otherwise return REJECTED
 								If step 2's GET fails (network error, rate limit), `still_open=None`.
 								Then step 4 checks the DELETE response. If the DELETE also returned an error
 								(e.g., "order not found" because it was already cancelled by another caller),
 								`status` is `"ERROR"` or `"not found"` — neither matches `"CANCELED"`.
 								The cancel is reported as `REJECTED` even though the order IS cancelled.
 								The `bingx_venue._events_from_cancel()` then emits `CANCEL_REJECT` instead
 								of `CANCEL_ACK`. The Rust kernel handles `CANCEL_REJECT` at lib.rs:1218:
 								```rust
 								KernelEventKind::CANCEL_REJECT => {
 								    if slot.fsm_state == TradeStage::EXIT_WORKING {
 								        slot.fsm_state = TradeStage::EXIT_WORKING;  // no-op
 								    }
 								    diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
 								}
 								```
 								The slot stays in its current state (e.g., `EXIT_WORKING`) with no active order
 								(the exchange has no record of it). The slot is stuck until a manual reconcile.
 								**Flaw: E17 — Cancel can return false REJECTED for already-cancelled orders.**
 								**Severity: Medium.** Leads to stuck slot requiring manual intervention.
 								---
 								## Layer 7: Fill Feedback Loop (rust_backend.py on_venue_event)
 								### E18: `on_venue_event` settles PnL incrementally — but fees are never included
 								**rust_backend.py:530-545**
 								```python
 								incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
 								if abs(incremental_pnl) > 1e-12:
 								    self.account.settle(incremental_pnl)
 								    self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
 								```
 								The Rust kernel's `apply_fill` computes realized PnL as:
 								```rust
 								let realized = Self::realized_pnl(slot, event.price, fill_size);
 								slot.realized_pnl += realized;
 								```
 								No fee subtraction. No commission reading from the event. The `VenueEvent`
 								could carry fee data via `metadata["fee"]` or `raw_payload["commission"]`,
 								but the Rust kernel doesn't read it and the Python bridge doesn't extract it.
 								Over the 142 live test scenarios on VST (where fees are 0 or negligible),
 								this is invisible. On live mainnet with exchange fees of 0.02-0.04%, the
 								cumulative error is unbounded.
 								**Flaw: E18 — PnL settlement ignores fees.**
 								Already documented as A7. In the E2E trace, the gap is specifically here:
 								`VenueEvent.price` is used for `realized_pnl()` but `VenueEvent.metadata`
 								(which could carry `commission` from the venue) is never read.
 								**Severity: Medium** (grows with trade volume).
 								### E19: `observe_slots` called with ALL slots, not just changed ones
 								**rust_backend.py:538-545**
 								```python
 								slots = [self._get_slot(i) for i in range(self.max_slots)]
 								self.account.observe_slots(slots)
 								```
 								Every `on_venue_event` call re-reads ALL slots from the Rust kernel (N FFI
 								calls) and calls `observe_slots` with the full list. With `max_slots=10`,
 								this is 10 FFI round-trips per venue event. Each round-trip serializes a
 								TradeSlot to JSON, passes through C FFI, parses on the Rust side, serializes
 								the result, passes back, and parses on the Python side. For a multi-leg EXIT
 								with 3 fills (ACK + PARTIAL + FULL), that's 3 × 10 = 30 slot reads per
 								process_intent call.
 								**Flaw: E19 — Full-slot-list read on every event is N×FFI overhead.**
 								**Severity: Low** (performance). Not a correctness issue.
 								---
 								## Layer 8: Persistence Boundary (pink_clickhouse.py)
 								### E20: `_capital()` reads live from `AccountProjection` — stale row risk
 								**pink_clickhouse.py:199-200**
 								```python
 								def _capital(self) -> float:
 								    return float(self.account.snapshot.capital or 0.0)
 								```
 								Every row writer calls `_capital()` at write time to get the current capital.
 								But `persist_result()` is called AFTER `kernel.process_intent()` returns —
 								at which point the account has already been settled. The `account_events`,
 								`position_state`, and `trade_events` rows all record the SAME capital value
 								(the post-settle value). `capital_before` is then reconstructed by
 								subtracting PnL (already documented as A5).
 								The effect: all ClickHouse rows for a single `process_intent()` call show
 								identical `capital` / `account_capital` / `portfolio_capital` values, because
 								they're all written within the same Python call stack with no intervening
 								events. This is correct for single-threaded operation — all rows reflect
 								POST-trade state. But it means ClickHouse querying for "capital before trade"
 								must use `capital_after - pnl`, which is the wrong formula under multi-slot.
 								**Flaw: E20 — All persistence rows write post-trade capital, not pre-trade.**
 								Already documented as A5 from the capital_before angle.
 								**Severity: High** for multi-slot accounting reconstruction.
 								### E21: `persist_fill_events()` synthesizes fake Decision/Intent
 								**pink_clickhouse.py:383-435**
 								```python
 								def persist_fill_events(self, *, snapshot, events, slot_dict, market_state):
 								    ...
 								    decision = Decision(
 								        timestamp=ts, decision_id=trade_id or "async", asset=asset,
 								        action=action, side=side, reason="ASYNC_FILL",
 								        confidence=0.0, velocity_divergence=0.0, irp_alignment=0.0,
 								        reference_price=price, target_size=cur_size, leverage=leverage,
 								        ...
 								    )
 								    intent = Intent(
 								        timestamp=ts, trade_id=trade_id, decision_id=trade_id or "async",
 								        ...
 								    )
 								```
 								The async fill pump (called by `pump_venue_events`) constructs fake
 								Decision/Intent objects because there's no real policy decision backing an
 								async fill — it just arrived from the exchange. These synthetic objects have:
 								- `decision_id = trade_id` (or `"async"` if trade_id is empty)
 								- `decision_id` and `trade_id` are the same string
 								- `confidence=0.0`, `velocity_divergence=0.0`, `irp_alignment=0.0`
 								- `target_size = cur_size` (the remaining size after the fill, not the
 								  size that was filled)
 								These are written to `policy_events`, `trade_reconstruction`, and
 								`trade_events` with the same row shapes as real policy-driven fills. Any
 								ClickHouse query that joins `policy_events` to `trade_events` on
 								`decision_id` will find matching rows (both set to `trade_id`), but the
 								policy_events row's `target_size` is the POST-fill size, not the pre-fill
 								size. A replay system that reconstructs position from `policy_events` →
 								`trade_reconstruction` would see incorrect sizing.
 								**Flaw: E21 — Async fill persistence uses synthetic decision with wrong data.**
 								**Severity: Medium.** Misleading historical records.
 								### E22: `_write_trade_exit_leg` capital_before uses arithmetic reconstruction
 								**pink_clickhouse.py:761-762**
 								```python
 								capital_after = self._capital()
 								capital_before = capital_after - pnl_leg
 								```
 								Already documented as A5. In the E2E trace, the specific path is:
 . Slot 0 exit leg fills → `_capital()` returns capital AFTER settlement
 								   (because the kernel's `on_venue_event` already called `account.settle`)
 . `capital_before = capital_after - pnl_leg` reconstructs pre-leg capital
 								If slot 1 also settled between the leg fill and the persistence write
 								(possible in multi-threaded or concurrent scenario), `capital_after` includes
 								slot 1's PnL, and `capital_before` is wrong by exactly slot 1's contribution.
 								**Severity: High** for multi-slot.
 								### E23: `_write_trade_event` uses `slot_dict.get("entry_price")` as exit_price
 								**pink_clickhouse.py:813-815**
 								```python
 								entry_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)
 								exit_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)  # ← SAME FIELD
 								```
 								Already documented as A13. The `exit_price` is set to `entry_price` from
 								the same slot dict field. The BingX ack payload does contain the fill price,
 								but it's not propagated to the slot dict's `entry_price` for exit fills —
 								the slot's `entry_price` is set during entry fill and remains unchanged
 								during exit. The exit fill price is only on the `VenueEvent`, which is not
 								passed through to `_write_trade_event`.
 								The `trade_events` row in ClickHouse always shows `exit_price == entry_price`,
 								making PnL reconstruction from `(exit_price - entry_price) × size × lev`
 								impossible. The `pnl` field IS correct (it's `slot.realized_pnl`), but only
 								the summary is accurate — the component prices are wrong.
 								**Severity: Low.** `pnl` is correct, only the decomposed price is wrong.
 								---
 								## Layer 9: Test Infrastructure
 								### E24: `MockVenueAdapter.submit()` always emits fill on `partial_fill_ratio > 0`
 								**mock_venue.py:60-90**
 								```python
 								if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
 								    fill_ratio = max(0.0, min(1.0, float(effective_ratio)))
 								    ...
 								    if is_entry:
 								        effective_ratio = self.scenario.entry_partial_fill_ratio if \
 								            self.scenario.entry_partial_fill_ratio != 1.0 else \
 								            self.scenario.partial_fill_ratio
 								    else:
 								        effective_ratio = self.scenario.exit_partial_fill_ratio ...
 								```
 								The default `MockVenueScenario()` has `partial_fill_ratio=1.0`. So every
 								`submit()` call on a default mock emits a FULL_FILL event immediately.
 								This means mock-venue tests always test the "order fills instantly" path —
 								they never test resting orders, partial fills, or async fills.
 								Any test that relies on the mock venue is testing a subset of real venue
 								behavior. The mock never produces:
 								- DELAYED fills (fill arrives on a later `reconcile()` call)
 								- PARTIAL fills with subsequent fills
 								- Partial fills during entry (entry fills partially, then more later)
 								- Mixed entry/exit partial behavior
 								**Flaw: E24 — Mock venue always fills synchronously — never tests async path.**
 								**Severity: Medium.** The `pump_venue_events()` path has never been exercised
 								with the mock venue.
 								### E25: Test scenarios use MARKET-only `_si()` helper — no LIMIT tests
 								**gen_live_tests.py and _gen_test.py**
 								The `_si()` helper constructs a `KernelIntent` with `order_type="MARKET"` and
 								`limit_price=0.0` (the defaults). All 157 live test scenarios use `_si()`.
 								The 3 "LIMIT" scenarios (`limit_does_not_fill`, `limit_immediate_fill`) use
 								`reference_price=0.0` and `target_size=-0.001` respectively — they test
 								**intent validation**, not actual LIMIT order submission.
 								There is **zero** live-test coverage of:
 								- Submitting a LIMIT order that rests on the book
 								- A resting LIMIT being cancelled
 								- A resting LIMIT receiving a partial fill then a subsequent fill
 								- An async fill arriving via `pump_venue_events()`
 								The Rust kernel's `PARTIAL_FILL` event handling and the Python bridge's
 								`on_venue_event` + incremental settle + async pump has never been exercised
 								on a live exchange.
 								**Flaw: E25 — Zero live tests for LIMIT/resting/async-fill paths.**
 								**Severity: High.** The partial-fill code path is untested in production.
 								### E26: Fresh-kernel reconcile tests create second kernel but share venue
 								**gen_live_tests.py** (fresh_kernel_reconcile_entry body)
 								```python
 								fresh = _build_fresh_kernel_from_slot(slot_data, ic=cb)
 								k2 = fresh.runtime.kernel
 								```
 								The `_build_fresh_kernel_from_slot` function creates a new `PinkDirectRuntime`
 								with a new `ExecutionKernel`. But the **venue adapter** is shared or
 								re-created with the same BingX backend. Two kernels making concurrent HTTP
 								calls to BingX through shared or separate venue adapters is exactly the
 								multi-threaded scenario that triggers T1 (Rust kernel UB) — except the tests
 								are sequential, not concurrent, so they don't trigger it.
 								The fresh kernel does NOT restore the venue state (open orders, positions).
 								The fresh kernel has a blank venue adapter state — it can't know about
 								previous LIMIT orders resting on the exchange. This is correct for MARKET-only
 								tests (no resting orders) but would fail for LIMIT tests.
 								**Flaw: E26 — Fresh-kernel reconcile doesn't restore venue state.**
 								**Severity: Medium** (would break LIMIT scenarios).
 								---
 								## Summary: Critical E2E Flaw Chain
 								The most dangerous E2E scenario is a **LIMIT order with partial fills** on
 								a live exchange:
 								```
 . Policy emits LIMIT ENTER                       [E3: can't happen — bridge drops order_type]
 . KernelIntent with order_type="LIMIT"            [dead code path from step 1]
 . bingx_direct.submit_intent builds LIMIT payload [works if reached]
 . BingX accepts LIMIT, returns ACK with no fill   [VenueEvent.price may be 0]
 . FSM transitions to ENTRY_WORKING                [correct]
 . RESTING LIMIT sits on book                      [no further kernel events]
 . Next policy cycle: pump_venue_events()           [E1: expensive HTTP calls]
 . Reconciled venue has no fill events              [nothing to drain]
 . Repeated cycles with no progress                 [wasteful but safe]
 . Eventually BingX fills partially               [VenueEvent arrives]
 . apply_fill PARTIAL_FILL entry branch runs       [E10: entry_price = last fill, not VWAP]
 . on_venue_event settles incremental PnL          [E18: fees not included]
 . persistence writes                              [E20/E21/E22/E23: wrong capital_before, exit_price]
 . Remaining LIMIT still rests on book             [continues to step 7]
 . Eventually full fill or cancel                  [E17: cancel can return false REJECTED]
 								```
 								**None of steps 4-15 have live test coverage.**
 								---
 								## Complete Flaw Catalog (All Layers)
 								| # | Flaw | Layer | Step | Severity |
 								|---|------|-------|------|----------|
 								| E1 | Unconditional pump_venue_events wastes rate limit | Runtime | R2 | Medium |
 								| E2 | TOCTOU between capital snapshot and intent | Runtime | R3→R8 | Medium |
 								| E3 | Runtime bridge drops order_type/limit_price | Bridging | R7 | **Medium** |
 								| E4 | TOCTOU between exit sizing and execution | Runtime | R8 | Low |
 								| E5 | JSON precision drift over long runs | Bridge | R8a→R8c | Low |
 								| E6 | Global FFI singleton no guard vs use-after-free | Bridge | R8b | **High** |
 								| E7 | Same-trade-id re-entry leaves stale index entries | Rust | R8c | Low |
 								| E8 | EXIT uses initial_size not remaining size | Rust | R8c | **High** |
 								| E9 | CANCEL "accepted" before cancel actually happens | Rust | R8c | Medium |
 								| E10 | Entry price on multi-partial fill = last fill, not VWAP | Rust | R10a | Low |
 								| E11 | _legacy_intent hardcodes confidence/bars_held | Venue | R9a | Info |
 								| E12 | Zero fill price → zero PnL | Venue | R9c | Medium |
 								| E13 | Stale snapshot fallback causes wrong fill delta | Venue | R9c | Medium |
 								| E14 | Cancel event carries stale slot_id | Venue | R9c | Low |
 								| E15 | Leverage-set failure and order failure share handler | Adapter | R9b | Low |
 								| E16 | Instrument resolution 3x per order, O(n) scan | Adapter | R9b | Low |
 								| E17 | Cancel returns false REJECTED for already-cancelled | Adapter | R9b | Medium |
 								| E18 | PnL settlement ignores fees | Bridge | R10b | **Medium** |
 								| E19 | Full-slot-list read on every event = N×FFI overhead | Bridge | R10b | Low |
 								| E20 | All persistence rows write post-trade capital | Persistence | R12 | **High** |
 								| E21 | Async fill uses synthetic Decision with wrong size | Persistence | R12 | Medium |
 								| E22 | capital_before arithmetic reconstruction wrong | Persistence | R12 | **High** |
 								| E23 | trade_events exit_price = entry_price | Persistence | R12 | Low |
 								| E24 | Mock venue always fills synchronously | Test | — | Medium |
 								| E25 | Zero live tests for LIMIT/async-fill paths | Test | — | **High** |
 								| E26 | Fresh-kernel reconcile doesn't restore venue | Test | — | Medium |
 								**Total: 26 E2E flaws (4 High, 10 Medium, 11 Low, 1 Info)**
 								The four High-severity flaws in the E2E trace:
 								- **E6**: Global FFI singleton + `__del__` use-after-free — memory corruption risk
 								- **E8**: Exit-size overshoot — slot can get stuck (A1)
 								- **E20/E22**: Post-trade capital in all persistence rows + arithmetic
 								  capital_before — ClickHouse records are misleading for accounting
 								- **E25**: No LIMIT/async-fill test coverage — partial-fill path is production
 								  code with zero live validation
 								---
 								## PASS 3 — NEW FINDINGS (Deepest E2E Trace)
 								### F1: `process_intent` CANCEL returns "accepted" before the cancel happens — caller gets wrong `outcome.state`
 								**File:** `rust_backend.py:595-614`
 								The CANCEL path:
 . Calls `self.venue.cancel(order)` → HTTP DELETE → returns `VenueEvent[]`
 . For each event, calls `self.on_venue_event(event)` → Rust FSM transition
 . Assembles `final_outcome` from the Rust kernel's **pre-venue-event** slot state
 								```python
 								outcome = _outcome_from_payload(result["outcome"])  # Rust CANCEL accepts (slot NOT mutated yet)
 								# ... venue.cancel() ...
 								# ... on_venue_event() for each event (now slot IS mutated) ...
 								final_slot = self._get_slot(outcome.slot_id)         # Re-reads post-mutation state
 								final_outcome = KernelOutcome(
 								    accepted=outcome.accepted,        # TRUE — from Rust's pre-event accept
 								    state=final_slot.fsm_state,       # IDLE — from post-event state
 								    diagnostic_code=outcome.diagnostic_code,  # "OK" — from Rust's pre-event accept
 								)
 								```
 								For ENTER/EXIT, the same pattern exists — the Rust kernel's `outcome` is
 								pre-venue. But for CANCEL the disconnect is worst: Rust returns `accepted=true`
 								with the slot still in `ENTRY_WORKING`, and only the subsequent
 								`on_venue_event(CANCEL_ACK)` transitions to `IDLE`.
 								**Fix:** The diagnostic code should be reconciled with the actual venue outcome,
 								not taken from the pre-venue Rust outcome.
 								**Severity: Medium**
 								### F2: `_last_settled_pnl` reset before `venue.submit()` — transient window
 								**File:** `rust_backend.py:597-604`
 								```python
 								if intent.action == KernelCommandType.ENTER and outcome.accepted:
 								    self._last_settled_pnl[intent.slot_id] = 0.0   # reset HERE
 								# ... venue.submit() called below ...
 								```
 								If `venue.submit()` fails (HTTP error, rate limit), the ENTER was accepted by
 								the Rust FSM but no venue order was placed. The slot is stuck in
 								`ORDER_REQUESTED`. If the caller retries the same ENTER, `_last_settled_pnl`
 								is 0.0 from the first attempt — correct for a new trade.
 								**Real risk:** If the previous trade on this slot had realized PnL that was
 								never settled (impossible with incremental settle, but hypothetically), resetting
 								to 0.0 loses that PnL. In practice, incremental settle makes this safe.
 								**Severity: Medium** (retry-safe, but exposes slot-stall)
 								### F3: `_first_invalid_intent_field` allows `leverage=0` and `target_size=0`
 								**File:** `rust_backend.py:295-316`
 								The guard catches NaN/Inf and negative `target_size`. Does NOT catch:
 								- `leverage=0` or negative (Rust silently falls back to 1.0)
 								- `target_size=0` (submits zero-quantity order to BingX)
 								- `reference_price=0` (mark_price ignores non-positive)
 								- `limit_price=0` with `order_type="LIMIT"` (BingX rejects price=0)
 								The zero-target-size case: a direct `process_intent(EXIT, target_size=0.0)`
 								computes `exit_size = 0`, submits MARKET order with quantity=0 to BingX,
 								which may return an error or silent no-op.
 								**Severity: Low** (runtime's `_exit_intent_from_slot` prevents for EXIT; direct
 								kernel API users can trigger it)
 								### F4: `outcome.emitted_events` only contains venue events — Rust kernel's events silently dropped
 								**File:** `rust_backend.py:641-652`
 								```python
 								final_outcome = KernelOutcome(
 								    emitted_events=tuple(emitted_events),  # only from venue.submit()
 								)
 								```
 								The Rust kernel's `KernelOutcome` struct has `emitted_events` — currently always
 								empty because the Rust FSM never sets it. If a future change adds Rust-side
 								event emission, those events are silently dropped: `final_outcome` only uses
 								the Python-side list.
 								**Severity: Low** (no Rust-emitted events exist today)
 								### F5: `on_venue_event` does redundant FFI read of slot already returned by Rust
 								**File:** `rust_backend.py:698-706**
 								```python
 								def on_venue_event(self, event):
 								    result = _get_rust().on_venue_event(...)
 								    outcome = _outcome_from_payload(result["outcome"])
 								    slot_payload = result.get("slot")
 								    slot = _slot_from_payload(slot_payload) if slot_payload else self._get_slot(...)
 								    # ...
 								    current = self._get_slot(slot.slot_id)  # REDUNDANT — slot already has this data!
 								    self.projection.write_slot(current)
 								```
 								Line 706 re-reads `current` from the backend even though `slot` (from the
 								Rust result) already has the exact same data. Each redundant FFI read is
 								JSON serialize → C FFI → Rust serialize → C FFI → Python parse — ~100μs.
 								With 2-3 events per process_intent and 10 slots, ~3ms wasted per cycle.
 								**Severity: Low** (performance)
 								### F6: `_record_transitions` in `process_intent` records pre-venue transitions with `event=None`
 								**File:** `rust_backend.py:708, 650**
 								```python
 								# process_intent line 650:
 								self._record_transitions(outcome.transitions, final_slot, None)  # event=None
 								# on_venue_event line 708:
 								self._record_transitions(outcome.transitions, slot, event)  # event attached
 								```
 								Venue-event transitions ARE recorded individually inside each
 								`on_venue_event` call (line 708). The journal has all transitions. But the
 								pre-venue transitions (from Rust FSM before venue call) have `event=None`
 								attached — no event context for the journal reader.
 								**Severity: Informational** (diagnostic inconvenience only)
 								### F7: `reconcile_from_slots` writes ALL slots to projection/zinc, not just reconciled ones
 								**File:** `rust_backend.py:718-733**
 								```python
 								for current in slots:          # iterates ALL max_slots
 								    self.projection.write_slot(current)   # writes unchanged slots too
 								    self.zinc_plane.write_slot(current)
 								```
 								After reconcile, ALL slots are written to projection and Zinc, even if the
 								reconcile only modified one slot. Slots 1-9 are serialized and written with
 								their unchanged state. Wasteful but harmless.
 								Also: Rust kernel's `reconcile_slots_json` silently ignores `slot_id` out of
 								range — no error returned. Caller sees `accepted=true` even if no slots were
 								reconciled.
 								**Severity: Low**
 								### F8: `HazelcastRowWriter.put()` is synchronous with no error handling — Hazelcast failure crashes the intent
 								**File:** `hazelcast_projection.py:30-48**
 								```python
 								class HazelcastRowWriter:
 								    def __call__(self, name, row):
 								        if name.endswith("trade_events"):
 								            self.client.get_topic(name).publish(json.dumps(row, ...))
 								            return
 								        self.client.get_map(name).put(key, json_safe(row))  # synchronous, no try/except
 								```
 								No try/except. Hazelcast `put()` is synchronous — blocks until the cluster
 								acknowledges. If Hazelcast is down, under load, or partitioned, this:
 . Blocks the calling thread (which holds the Rust kernel handle — no other
 								   operation can proceed)
 . Raises an exception that propagates through `_set_slot()` → `process_intent()`
 								   → crashes the entire intent
 								**Severity: Medium** (Hazelcast failure in hot path stalls execution)
 								### F9: `RealZincPlane.write_slot()` serializes ALL slots, not just the changed one
 								**File:** `real_zinc_plane.py:205-212**
 								```python
 								def write_slot(self, slot):
 								    with self._lock:
 								        self._slot_cache[int(slot.slot_id)] = slot
 								        payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
 								        self._write_region(self.state_region, self._state_seq, payload)
 								```
 								Every single-slot write serializes ALL `slot_count` slots (default 10) to JSON.
 								With VenueOrder metadata, each slot payload can be ~1-5KB → 10-50KB per write.
 								This is written to Zinc shared memory on every `process_intent()` and
 								`on_venue_event()` call.
 								`InMemoryZincPlane` does NOT have this problem — it only stores the one slot.
 								**Severity: Low** (performance + Zinc shared-memory capacity waste)
 								### F10: `RealZincPlane.write_slot` zeros buffer before write — concurrent read sees empty data
 								**File:** `real_zinc_plane.py:255-263**
 								```python
 								def _write_region(self, region, seq, payload):
 								    buf = region.as_buffer()
 								    view = memoryview(buf)
 								    view[:] = b"\x00" * len(view)     # Zeros the buffer
 								    view[: len(packet)] = packet       # Writes packet
 								    region.notify()
 								```
 								Between the zero and the write, any concurrent reader sees zeros or a truncated
 								packet. `_decode_packet` checks `size <= len(buf) - 16` — a partially-written
 								packet fails validation and returns `{}`. The reader (e.g., another thread
 								calling `read_slots()`) gets an empty result.
 								Window is microseconds but it exists. No version guard — reader always returns
 								whatever is in the region.
 								**Severity: Low** (brief window, no corruption — just empty results)
 								### F11: `RealZincPlane._write_region` has no partial-write recovery
 								**File:** `real_zinc_plane.py:255-263**
 								If `_encode_packet` raises (JSON serialization error), the method raises before
 								writing — region retains previous content. Safe.
 								If `view[:] = b"\x00"` fails (memory error), the region is partially zeroed.
 								Not recoverable. No fallback.
 								**Severity: Low** (memory errors are extremely rare)
 								### F12: `InMemoryZincPlane` intent_region grows without bound
 								**File:** `zinc_plane.py:83-85**
 								```python
 								def publish_intent(self, intent):
 								    self.intent_region.append(intent)   # unbounded growth
 								```
 								`self.intent_region` is `List[KernelIntent]` — grows on every `publish_intent`
 								call. Over thousands of policy cycles, this grows without bound.
 								`RealZincPlane.publish_intent()` limits to last 512 entries in shared memory,
 								but its `self._intent_cache` (in-memory) also grows without bound.
 								**Severity: Low** (memory leak — ~MB/day)
 								### F13: `InMemoryZincPlane` uses non-re-entrant `threading.Condition`
 								**File:** `zinc_plane.py:41-43**
 								```python
 								_signal: threading.Condition = field(default_factory=threading.Condition)
 								```
 								`threading.Condition` is NOT re-entrant. If any code path calls back into
 								`publish_intent` while holding the condition's lock — deadlock.
 								**Severity: Low** (no current code path triggers this, but it's a landmine)
 								### F14: `KernelSlotView.__setattr__` round-trips unknown fields through Rust — silently dropped
 								**File:** `rust_backend.py:370-395**
 								If a new field is added to Python's `TradeSlot` that Rust's `TradeSlot` doesn't
 								know about, `slot.to_dict()` includes it. `_set_slot` serializes to JSON, sends
 								to Rust, which deserializes with `#[serde(default)]` — unknown fields are
 								silently dropped. The round-trip loses data without warning.
 								The reverse: if Rust adds a field that Python doesn't know about,
 								`_slot_from_payload` ignores unknown keys. Also silently dropped.
 								**Severity: Low** (fields must be added to both sides atomically; no guard)
 								### F15: `on_venue_event` loop in `process_intent` stops on first exception — slot left in partial state
 								**File:** `rust_backend.py:599-610**
 								```python
 								for event in emitted_events:
 								    evt_outcome = self.on_venue_event(event)  # NO TRY/EXCEPT
 								```
 								If `self.on_venue_event(event)` raises (FFI error, null pointer, OOM), the loop
 								stops. Events after the failing event are never processed. The slot is in a
 								partial state — some events applied, some not.
 								**Concrete scenario:** ACK arrives first → applied. FULL_FILL arrives second
 								→ FFI error, exception raised. Slot is stuck in `ENTRY_WORKING` with `size=0`.
 								Next `process_intent(EXIT)` returns `NO_OPEN_POSITION`. **No recovery path exists.**
 								**Severity: High** — single exception during fill feedback leaves slot
 								unrecoverable. Zero defense in depth.
 								### F16: `venue.submit()` returning empty events leaves slot in `ORDER_REQUESTED`
 								**File:** `rust_backend.py:599-610**
 								If `venue.submit()` returns `[]` (venue rejected order with no response, or
 								internal error), the `for` loop doesn't run. No `on_venue_event` is called.
 								Slot stays in Rust's pre-venue state (`ORDER_REQUESTED`).
 								`final_outcome` has `accepted=true, state=ORDER_REQUESTED, emitted_events=[]`.
 								Caller sees "successful" but no exchange order exists. Slot stuck in
 								`ORDER_REQUESTED` until `pump_venue_events()` or manual reconcile.
 								**Severity: Medium** — silent slot stall with no error indication.
 								### F17: Cancel truth-based confirmation returns `REJECTED` for already-cancelled orders on GET failure
 								**File:** `bingx_direct.py:474-498**
 								```python
 								try:
 								    oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
 								    still_open = (venue_order_id in ids)
 								except Exception:
 								    still_open = None  # GET failed
 								if still_open is False:
 								    return {"status": "CANCELED", ...}
 								# still_open is None (GET failed) or True (order still on book)
 								# Falls through to DELETE response check
 								```
 								If the DELETE succeeded but the verification GET failed (network blip, rate limit
 								on the verification endpoint), `still_open=None`. The code then checks the DELETE
 								response. If the DELETE returned an ambiguous error (e.g., "order not found"
 								because it was already cancelled by another path), the status is "ERROR" —
 								reported as REJECTED even though the order IS cancelled.
 								The `bingx_venue._events_from_cancel()` emits `CANCEL_REJECT`. The Rust FSM
 								handles `CANCEL_REJECT` as a no-op — slot stays in `EXIT_WORKING` with no
 								active order. Stuck until `pump_venue_events()` or manual reconcile.
 								**Severity: Medium** — needs a third state: "definitely cancelled,"
 								"probably cancelled," "definitely not cancelled."
 								### F18: Leverage-set and order-submit failures share error handler — poor diagnostics
 								**File:** `bingx_direct.py:376-417**
 								```python
 								await self._client.signed_post("/openApi/swap/v2/trade/leverage", ...)  # step A
 								# ...
 								ack_payload = await self._client.signed_post("/openApi/swap/v2/trade/order", payload)  # step B
 								```
 								If step A fails (400 for invalid symbol), the exception handler at line 417
 								catches `BingxHttpError` and returns REJECTED. No way for the caller to know
 								whether the leverage set failed or the order submission failed — both go through
 								the same handler. The error message just says "REJECTED."
 								Also: if step A succeeds and step B fails, leverage was changed on the exchange
 								but no order was placed. System state unchanged (leverage changes don't affect
 								capital), but diagnostics are poor.
 								**Severity: Low** (correct behavior, poor diagnostics)
 								### F19: `_events_from_submit` stale snapshot fallback → wrong fill detection
 								**File:** `bingx_venue.py:375-400**
 								`_filled_size_from_snapshots()` diffs position quantity before and after
 								submit. The "before" snapshot comes from `_backend_snapshot()` which can
 								return stale data (E13). A stale "before" against a fresh "after" produces
 								a wrong diff — could be negative, zero, or larger than reality.
 								This wrong diff propagates to `emitted_events` — the `PARTIAL_FILL` or
 								`FULL_FILL` event has wrong `filled_size`. The Rust kernel's `apply_fill`
 								uses this wrong `filled_size` to set `slot.size`. Capital settles on the
 								wrong delta.
 								**Severity: Medium** — wrong fill size propagates to kernel state and PnL.
 								### F20: `__del__` frees Rust handle at unpredictable GC time — no explicit `close()`
 								**File:** `rust_backend.py:558-566**
 								```python
 								def __del__(self):
 								    backend = getattr(self, "_backend", None)
 								    if backend is not None:
 								        try: _get_rust().destroy(backend)
 								        except: pass
 								```
 								`ExecutionKernel` has no `close()` method. The Rust `KernelHandle` is only
 								freed by `__del__`, which runs on the GC thread at unpredictable time. If
 								any code holds a stale reference to `self._backend`, the pointer dangles
 								when the kernel is GC'd.
 								`DITAv2LauncherBundle.close()` calls `_maybe_close` on venue, zinc, and
 								control plane — but NOT on kernel (which has no `close()` or `disconnect()`).
 								The kernel is leaked until GC.
 								**Severity: Medium** — reliance on `__del__` for critical C resource cleanup.
 								### F21: `DITAv2LauncherBundle.close()` closes venue before kernel is done with it
 								**File:** `launcher.py:90-95**
 								```python
 								def close(self):
 								    _maybe_close(self.venue)       # Closes HTTP client
 								    _maybe_close(self.zinc_plane)  # Closes Zinc regions
 								```
 								If the kernel is mid-`process_intent` in another thread (hypothetical —
 								single-threaded in practice), `venue.submit()` would fail because the HTTP
 								client is already closed. No ordering enforcement.
 								**Severity: Low** (single-threaded deployment)
 								### F22: Silent fallback from real Zinc/Hazelcast to in-memory on error — operator unaware
 								**File:** `control.py:210-217`, `launcher.py:175-185`, `projection.py:30-40`
 								```python
 								def build_control_plane(...):
 								    if real_requested:
 								        try:
 								            return RealZincControlPlane(...)
 								        except Exception:
 								            pass  # SILENT — operator never knows
 								    return ZincControlPlane(snapshot=snapshot)
 								```
 								Three places have this pattern. An operator who configures `DITA_V2_ZINC=REAL`
 								and Zinc isn't available gets in-memory storage without any warning, error, or
 								log. The `ZincPlane` protocol has no introspection method to check if it's
 								real or in-memory.
 								The same applies to Hazelcast projection and the venue adapter.
 								**Severity: Medium** — configuration errors are silently masked.
 								### F23: `VenueEvent.size` = `intent.target_size` not actual fill — wrong for multi-leg EXIT
 								**File:** `bingx_venue.py:410-420**
 								```python
 								base_event = VenueEvent(
 								    size=float(intent.target_size or 0.0),  # target, not fill
 								)
 								```
 								For an EXIT leg, `intent.target_size` is the intended exit size. The ACK
 								event's `size` reflects the target, not the actual fill. For fully-filled
 								MARKET orders, `target == fill` so it's invisible. For partially-filled
 								LIMIT orders, `size` on the ACK is wrong.
 								The fill event later has `filled_size` from the venue's `executedQty`, so
 								the downstream kernel uses the correct fill size. The ACK's `size` is
 								unused by the kernel (the kernel uses `filled_size` for PnL computation).
 								**Severity: Informational** (unused by kernel)
 								### F24: `asyncio.run()` inside async function in test generator — nested event loops
 								**File:** `_build_pink_extended.py:75-81`
 								```python
 								def _check_open_orders(c, vs):
 								    r = __import__('asyncio').run(c._request_json("GET", ...))
 								```
 								`asyncio.run()` is called INSIDE an `async def` context (the test body is
 								async). This creates a new event loop on the current thread, suspending
 								pytest's asyncio loop. Nested event loops are "not recommended" per Python
 								docs.
 								**Severity: Low** (works in practice)
 								### F25: `_build_fresh_kernel_from_slot` leaks old kernel objects per call
 								**File:** `_build_pink_extended.py:95-108**
 								```python
 								def _build_fresh_kernel_from_slot(slot_data, ic=25000.0):
 								    cfg = _build_config(ic)
 								    b = build_launcher_bundle(venue_mode="BINGX", ...)  # NEW bundle, OLD not closed
 								    k = b.kernel
 								    return RB(runtime=Shim(k), config=cfg)
 								```
 								Each call creates a new launcher bundle (new kernel, new Rust handle, new HTTP
 								client, new Zinc plane) without closing the old one. Called 4 times across the
 								fresh-kernel test bodies. Leaks ~50MB per call (Rust lib, HTTP connections).
 								**Severity: Low** (test infrastructure only)
 								### F26: `seen_event_ids` not cleared on re-entry — event IDs accumulate across trades
 								**File:** `lib.rs:672-683`
 								When a slot re-enters (new ENTER after previous EXIT), the Rust kernel resets
 								most fields (lib.rs:740-765) but does NOT clear `seen_event_ids`. The new
 								trade inherits the previous trade's event history up to `MAX_SEEN_EVENT_IDS`
 								(256). After 256 events across multiple trades, old IDs are drained.
 								For MARKET trading (2-4 events per trade), this takes ~60-80 trades before
 								draining. For LIMIT trading (many partial fills), could be 5-10 trades.
 								**Fix:** `slot.seen_event_ids.clear()` on ENTER.
 								**Severity: Low** (event ID collision across trades is astronomically unlikely)
 								### F27: `RealZincControlPlane.read()` parses Zinc region every call — no caching
 								**File:** `real_control_plane.py:88-94**
 								```python
 								def read(self):
 								    payload = _decode_packet(self.region.as_buffer())  # JSON parse every call
 								    control = payload.get("control")
 								    self._snapshot = KernelControlSnapshot(**control)   # reconstruct every call
 								    return self._snapshot
 								```
 								Called by `ExecutionKernel.control` property on every `process_intent()`.
 								Each call re-constructs a `KernelControlSnapshot` from dict — allocating
 								new objects for every field. ~50μs per call. A simple cached-until-modified
 								pattern would eliminate all parses between writes.
 								**Severity: Low** (performance)
 								### F28: `_legacy_intent` hardcodes `confidence=1.0` and `bars_held=0`
 								**File:** `bingx_venue.py:270-285`
 								These fields are in `LegacyIntent` but unused by `submit_intent()` (which
 								only reads `asset`, `side`, `action`, `target_size`, `leverage`, `metadata`).
 								The downstream ClickHouse rows use the policy-layer `Intent`, not `LegacyIntent`,
 								so the hardcoded values don't reach persistence.
 								Only propagates through the venue adapter's internal chain. No consumer reads
 								them today.
 								**Severity: Informational**
 								### F29: `_slot_to_payload` in `real_zinc_plane.py` is dead code
 								**File:** `real_zinc_plane.py:57-59**
 								```python
 								def _slot_to_payload(slot):
 								    data = slot.to_dict()
 								    return data
 								```
 								Defined, never called anywhere in the file. All slot serialization calls
 								`slot.to_dict()` directly.
 								**Severity: Informational**
 								### F30: Duplicate `_slot_from_payload` in `real_zinc_plane.py` and `rust_backend.py`
 								**File:** `real_zinc_plane.py:62-112**, `rust_backend.py:270-310`
 								Two nearly identical implementations. The `real_zinc_plane` version manually
 								constructs `VenueOrder` objects (lines 63-88) with different defaults
 								(e.g., fallback to slot `size` if `intended_size` missing). The `rust_backend`
 								version delegates to `_order_from_payload` with all-default fallbacks.
 								If fields are added to `TradeSlot` or `VenueOrder`, both must be updated.
 								**Severity: Low** (code duplication risk)
 								---
 								## Complete Flaw Catalog
 								### All-Passes Combined
 								| Family | Focus | Count | Critical | High | Medium | Low | Info |
 								|--------|-------|-------|----------|------|--------|-----|------|
 								| A | Architectural (old 13, now superseded) | 15 | 0 | 2 | 0 | 2 | 11 |
 								| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
 								| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
 								| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
 								| **Total** | | **80** | **1** | **10** | **21** | **32** | **16** |
 								### Most Dangerous Single Flaw: F15
 								An exception in `on_venue_event()` during the fill-feedback loop stops the
 								chain mid-apply. The ACK applied but the FILL didn't. Slot in `ENTRY_WORKING`
 								with no position. **No retry mechanism, no recovery path.** The slot is stuck
 								forever until manual intervention. Zero defense in depth — no try/except, no
 								undo, no validation that the slot reached a consistent state.
 								This is the single highest-impact E2E flaw because it requires no concurrency,
 								no race condition, no unusual market conditions — just a transient FFI error
 								during normal operation.
-												PINK: E2E trace analysis — Pass 4 domain scans (G1-G36)

Four systematic passes covering Rust kernel invariants (4 criticals — missing
EXIT_RESIDUAL action, unwrap() panic on NUL, backward FSM transition, stale
all_legs_done variable), config validation chain (zero validators on 127 fields),
persistence schema drift (7 confirmed field-level mismatches), and lifecycle
management (no signal handlers, no __del__, no exception safety in builder).

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

											
										
										
											2026-06-01 14:26:36 +02:00
 								---
 								## PASS 4 — SYSTEMATIC DOMAIN SCANS (Config, Rust, Persistence, Lifecycle)
 								### Rust Kernel — Numeric & FSM Invariants
 								#### G1: EXIT_RESIDUAL action is entirely missing from Rust KernelCommandType
 								**File:** `_rust_kernel/src/lib.rs`
 								```rust
 								string_enum! {
 								    enum KernelCommandType {
 								        ENTER, EXIT, MARK_PRICE, RECONCILE, CONTROL, CANCEL,
 								    }
 								}
 								```
 								Six variants. **No `EXIT_RESIDUAL`.** If any caller submits an intent with `action = "EXIT_RESIDUAL"`, the string_enum deserializer fails — serde returns `INVALID_INTENT_PARSE`. Even if deserialization worked, there's no branch to handle residual-position cleanup. Any position with remaining size after partial exit legs has **no way to trigger a clean-up exit** via the intent system.
 								The Python `KernelCommandType` enum (contracts.py) does have `EXIT_RESIDUAL`, translated to `"EXIT_RESIDUAL"` string by `_intent_to_payload`. This string hits Rust's string_enum → parse error → `INVALID_INTENT_PARSE`.
 								**Fix:** Add `EXIT_RESIDUAL` variant to Rust enum + match arm that skips the `NO_OPEN_POSITION` guard for residual-sized positions.
 								**Severity: Critical**
 								#### G2: `into_c_string` uses `unwrap()` — panics on interior NUL byte
 								**File:** `_rust_kernel/src/lib.rs:1477`
 								```rust
 								fn into_c_string(value: &str) -> *mut c_char {
 								    CString::new(value).unwrap().into_raw()
 								}
 								```
 								`CString::new()` returns `Err` if the string contains a NUL (`'\0'`) byte. `.unwrap()` panics at the C FFI boundary. If any `serde_json::to_string()` output (e.g., user-controlled string in `KernelIntent`, `VenueEvent`, or `TradeSlot`) contains a NUL byte, this **panics the entire process**.
 								Triggered by every FFI call that returns a string:
 								- `dita_kernel_process_intent_json`
 								- `dita_kernel_on_venue_event_json`
 								- `dita_kernel_reconcile_slots_json`
 								- `dita_kernel_snapshot_json`
 								- `dita_kernel_get_slot_json`
 								**Fix:** Replace `.unwrap()` with `unwrap_or_else(|_| ptr::null_mut())` or feed through `invalid_intent_cstring`.
 								**Severity: Critical**
 								#### G3: `process_intent` EXIT hardcodes `prev_state = POSITION_OPEN` unconditionally
 								**File:** `_rust_kernel/src/lib.rs:842-890`
 								```rust
 								slot.fsm_state = TradeStage::EXIT_REQUESTED;        // unconditional override
 								let transition = self.transition(
 								    &slot,
 								    TradeStage::POSITION_OPEN,                        // always POSITION_OPEN
 								    slot.fsm_state.clone(),
 								    "EXIT_INTENT",
 								);
 								```
 								Three problems:
 								(a) **Transition prev_state is a lie.** If the slot was in `EXIT_WORKING`, `EXIT_SENT`, `EXIT_REQUESTED`, or `POSITION_PARTIALLY_CLOSED`, the transition record says `POSITION_OPEN` — wrong.
 								(b) **Backward transition.** If the slot is `EXIT_WORKING` and a new EXIT intent arrives, `fsm_state` is set to `EXIT_REQUESTED` — a backward transition from `EXIT_WORKING` → `EXIT_REQUESTED`. This corrupts the FSM.
 								(c) **No state guard.** EXIT should only be allowed from `POSITION_OPEN`, `EXIT_WORKING` (for additional legs), or `POSITION_PARTIALLY_CLOSED`. Currently any state that passes `!is_free() && !closed && size > 0` can transition to `EXIT_REQUESTED`.
 								**Fix:** Check actual FSM state before allowing EXIT, log actual prev_state, guard against backward transitions.
 								**Severity: Critical**
 								#### G4: `consume_exit_leg` advances beyond last valid index — stale `all_legs_done` variable
 								**File:** `_rust_kernel/src/lib.rs:1420-1435`
 								```rust
 								let all_legs_done = slot.active_leg_index >= slot.exit_leg_ratios.len(); // (A)
 								let should_close = (slot.size <= 1e-12 || (!partial && all_legs_done));  // (B)
 								if !partial {
 								    slot.consume_exit_leg();  // (C) — advances active_leg_index POST (A)
 								}
 								if should_close && slot.size <= 1e-12 {         // (D) — close
 								} else if !partial && !all_legs_done {           // (E) — stale! uses (A) not post-advance index
 								```
 								On the last leg (`active_leg_index = len - 1`):
 								- (A): `all_legs_done = false` (pre-advance)
 								- (C): advances to `len` (exhausted)
 								- (E): `!partial && !false` = true → enters `POSITION_OPEN` instead of examining `should_close` with post-advance index
 								The `all_legs_done` variable is captured **before** `consume_exit_leg` advances the index. Branch (E) should use the post-advance index to correctly detect exhaustion.
 								After exhaustion, `next_exit_ratio()` returns `1.0` (out-of-bounds `unwrap_or(1.0)`) — silently tries to exit remaining size as 100% instead of detecting completion.
 								**Severity: Critical**
 								#### G5: `realized_pnl` uses unbounded f64 — overflows to inf at extreme values
 								**File:** `_rust_kernel/src/lib.rs:648-656`
 								```rust
 								let notional = exit_size * slot.entry_price * slot.leverage.max(1.0);
 								delta * notional
 								```
 								No `is_finite()` check on intermediate products. At `exit_price=1e200`, `entry_price=1e-200`: `delta` = `(1e200 - 1e-200) / 1e-200` ≈ `1e400` → `inf`. The resulting `inf` is stored in `slot.realized_pnl`, corrupting all future PnL tracking.
 								Subnormals: `entry_price=5e-324` (subnormal) causes division to produce `inf` for modest exit prices on some platforms.
 								**Fix:** Add `is_finite()` guards on both prices and cap intermediate products.
 								**Severity: High**
 								#### G6: `mark_price` produces unbounded `unrealized_pnl`
 								**File:** `_rust_kernel/src/lib.rs:384-399`
 								```rust
 								self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage;
 								// No is_finite() check on result
 								```
 								If any of `delta`, `size`, `entry_price`, or `leverage` is extreme, the product overflows to `inf`. No result guard. `inf` stored in `unrealized_pnl` forever. Capped only by the `price <= 0.0` guard on input — no guard on the computation chain.
 								Also: `self.entry_price = price` at line 388 overwrites entry_price on every mark_price call for a position with `entry_price <= 0.0`, even when the position has been open for a while. This means a stale-zero entry_price gets set to the current market price on first mark_price after open, which is correct — but if the slot is reused (re-entry without resetting entry_price), the old entry price from the prior trade bleeds into unrealized PnL.
 								**Severity: High**
 								#### G7: `process_intent` ENTER — no `is_finite()` guard on `target_size`
 								**File:** `_rust_kernel/src/lib.rs:806-807`
 								```rust
 								intended_size: intent.target_size.max(0.0),
 								```
 								`f64::NAN.max(0.0)` returns `NAN`. `f64::INFINITY.max(0.0)` returns `inf`. Serde_json **does** accept `Infinity` and `NaN` by default — they're valid JSON tokens. If the Python-side `_first_invalid_intent_field` guard is bypassed (F3 — it allows these through), `NaN`/`inf` propagates into `intended_size` in `VenueOrder`, corrupting all fill calculations.
 								Similarly, `reference_price` is never validated for finiteness before being stored in `VenueOrder.metadata`.
 								**Severity: High**
 								#### G8: `reconcile_slots_json` — no dedup or bounds validation
 								**File:** `_rust_kernel/src/lib.rs:1668-1675`
 								```rust
 								for slot in slots {
 								    if slot.slot_id < core.slots.len() {
 								        core.slots[slot.slot_id] = slot.clone();
 								    }
 								}
 								```
 								Two slots with the same `slot_id`: the **second overwrites the first** silently. A slot with `slot_id >= core.slots.len()`: **silently dropped** — no error, no diagnostic. Caller sees `accepted=true` even if some/all slots were not applied.
 								**Severity: High**
 								#### G9: `exchange_order_id` propagation uses wrong order target
 								**File:** `_rust_kernel/src/lib.rs:1110-1125`
 								```rust
 								let target = if slot.active_entry_order.is_some() {
 								    slot.active_entry_order.as_mut()
 								} else {
 								    slot.active_exit_order.as_mut()
 								};
 								```
 								If an **entry** order exists (even if fully filled) and an **exit** fill event arrives, the code updates the entry order's `venue_order_id` instead of the exit order's. The exit order's `venue_order_id` stays empty. Any subsequent `CANCEL` intent on the exit order fails because `active_exit_order.venue_order_id` is empty — the venue can't match the cancel.
 								**Fix:** Disambiguate by matching `venue_client_id`, or clear `active_entry_order` when entry is complete.
 								**Severity: High**
 								#### G10: CANCEL diagnostic code says NO_ACTIVE_EXIT_ORDER for entry cancel too
 								**File:** `_rust_kernel/src/lib.rs:966-1005`
 								```rust
 								if !has_cancellable_exit && !has_cancellable_entry {
 								    return KernelResult {
 								        diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER, // always says exit
 								        details: json!({"reason": "NO_ACTIVE_EXIT_ORDER"}),
 								    };
 								}
 								```
 								When neither exit nor entry is cancellable, the diagnostic returns `NO_ACTIVE_EXIT_ORDER` regardless of which order was the target. If the user wanted to cancel an entry order that's not in a cancellable state, the diagnostic is misleading.
 								**Fix:** Separate diagnostic codes: `NO_ACTIVE_EXIT_ORDER`, `NO_ACTIVE_ENTRY_ORDER`, `ENTRY_NOT_CANCELLABLE`.
 								**Severity: High**
 								#### G11: `apply_fill` entry-fill overwrites `active_entry_order.intended_size` with `slot.size`
 								**File:** `_rust_kernel/src/lib.rs:1363-1377**
 								On FULL_FILL entry, `slot.active_entry_order` is entirely replaced with a new `VenueOrder` where `intended_size = slot.size` (the fill amount) instead of the original intended size. The original intended size (which could be larger than fill size for partial fills) is lost.
 								If a duplicate fill event arrives (dedup fails due to missing event_id), the second fill would use `slot.size` as the basis for further fills — wrong values.
 								**Severity: Medium**
 								#### G12: `leverage` unbounded after `is_finite()` — no maximum cap
 								**File:** `_rust_kernel/src/lib.rs:778`
 								```rust
 								slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
 								    intent.leverage  // 1e100 accepted here
 								} else { 1.0 };
 								```
 								`leverage = 1e100` passes `is_finite()`. Feeds into `realized_pnl()` as `slot.leverage.max(1.0) = 1e100`, producing `notional = exit_size * entry_price * 1e100`. Makes `unrealized_pnl` arbitrarily large.
 								No maximum leverage cap enforced anywhere — the exchange-level cap (`DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP`) exists in `BingxExecClientConfig` but is **never passed to the Rust kernel**.
 								**Severity: Medium**
 								#### G13: `resolve_slot` fallback returns `unwrap_or(0)` — can misroute events
 								**File:** `_rust_kernel/src/lib.rs:623`
 								```rust
 								self.slots.first().map(|slot| slot.slot_id).unwrap_or(0)
 								```
 								When no slot matches the event (`slot_id` out of range or all slot filters fail), returns `slot_id` of the **first slot** (which may be 0 or any value). No diagnostic emitted — caller sees slot state change with no idea the event was misrouted.
 								**Severity: Medium**
 								#### G14: `commit_slot` silently ignores out-of-bounds slot_id
 								**File:** `_rust_kernel/src/lib.rs:595-600**
 								```rust
 								fn commit_slot(&mut self, slot: TradeSlot) {
 								    if slot.slot_id < self.slots.len() {
 								        self.slots[slot_id] = slot;
 								    }
 								    // else: silently dropped — no error returned
 								}
 								```
 								Mutations to out-of-bounds slot are silently discarded. Can happen if `slot.slot_id` is corrupted via `set_slot_from_json` causing index mismatch between `slot.slot_id` and the actual slot position.
 								**Severity: Medium**
 								---
 								### Configuration & Validation Chain
 								#### G15: Zero `__post_init__` validators on all config dataclasses
 								Every config dataclass in the system has zero field-level validation:
 								| Dataclass | Fields | Validators |
 								|-----------|--------|------------|
 								| `KernelControlSnapshot` | 16 | **0** |
 								| `ControlUpdate` | 16 | **0** |
 								| `KernelIntent` | 19 | **0** |
 								| `TradeSlot` | 22 | **0** |
 								| `VenueOrder` | 8 | **0** |
 								| `VenueEvent` | 18 | **0** |
 								| `KernelTransition` | 11 | **0** |
 								| `KernelOutcome` | 8 | **0** |
 								| `AccountSnapshot` | 9 | **0** |
 								| **Total** | **127** | **0** |
 								The only validation in the entire chain:
 								- `_first_invalid_intent_field()` — finiteness guard at Python→Rust FFI boundary (not a dataclass validator)
 								- Rust `leverage = if is_finite && > 0.0 { val } else { 1.0 }` — post-hoc clamp
 								- Rust `KernelCore::new(max_slots.max(1))` — floor only, no ceiling
 								- `launcher.py:143`: `max(1, int(...))` for `active_slot_limit` — floor only
 								**No `__post_init__` exists anywhere. No bounds check on any field except the two floor-only guards.**
 								**Severity: High**
 								#### G16: `DITA_V2_DEBUG_CLICKHOUSE` defaults to `True` when env var is unset
 								**File:** `launcher.py:133`
 								```python
 								debug = _env_bool("DITA_V2_DEBUG_CLICKHOUSE", True)
 								```
 								`_env_bool` (launcher.py:75) returns `default` when the env var is unset. So `debug = True` by default. Every runtime writes debug traces to ClickHouse by default. `DITA_V2_DEBUG_CLICKHOUSE=False` is required to disable it.
 								This is not a bug per se, but it means debug ClickHouse writes are **on by default**, adding ~10 ClickHouse insertions per process_intent call (every transition + position state + trade event) that most production deployments may not want.
 								**Severity: Informational**
 								#### G17: String config fields have no charset/length validation — Zinc region injection risk
 								**File:** `control.py:31-53`, `real_zinc_plane.py:30`
 								`runtime_namespace`, `strategy_namespace`, `event_namespace`, `actor_name`, `exec_venue`, `data_venue`, `ledger_authority` are all free-form strings with no validation. They're used as:
 . **Zinc shared memory region names**: `self.prefix + "." + namespace + "." + kind` — an attacker-controlled namespace could collide with other processes' Zinc regions
 . **ClickHouse table names**: `DOLPHIN_BINGX_JOURNAL_STRATEGY` is used as a table suffix — SQL injection risk in ClickHouse journal
 . **Hazelcast map names**: Same injection risk via `event_namespace`
 								**Severity: Medium**
 								#### G18: `exit_leg_ratios` no sum-to-1 validation
 								`KernelIntent.exit_leg_ratios` and `TradeSlot.exit_leg_ratios` are tuple/list of floats. No validator ensures they sum to approximately 1.0. Ratios summing to 0.5 leave the position partially closed forever (residual can't be exited because `next_exit_ratio()` returns `1.0` after exhaustion, exiting 100% of remaining — which may exceed the intended residual).
 								**Severity: Low**
 								#### G19: `RealZincControlPlane.read()` has no sequence check — torn-read risk
 								**File:** `real_control_plane.py:88-94**
 								```python
 								def read(self):
 								    payload = _decode_packet(self.region.as_buffer())
 								    control = payload.get("control")
 								    if not isinstance(control, dict):
 								        return self._snapshot
 								    self._snapshot = KernelControlSnapshot(**control)
 								    return self._snapshot
 								```
 								The binary packet has a 64-bit sequence number but `read()` **never checks it**. Between the zero-write and packet-write in `_write_region`, a reader sees an empty buffer → `_decode_packet` fails → falls back to `self._snapshot` (stale). Between the packet-write and `struct.pack` header (order depends on implementation), a reader sees a partial write with wrong size → `_decode_packet` fails.
 								No checksum on the wire format: `struct.pack("!QQ", seq, len) + json_bytes`. A torn write produces garbage that `json.loads` may or may not parse successfully.
 								**Severity: Low**
 								#### G20: `DOLPHIN_BINGX_JOURNAL_STRATEGY`/`_DB` — ClickHouse SQL injection risk
 								**File:** `launcher.py:202-203`
 								```python
 								"DOLPHIN_BINGX_JOURNAL_STRATEGY": os.environ.get("DOLPHIN_BINGX_JOURNAL_STRATEGY", ""),
 								"DOLPHIN_BINGX_JOURNAL_DB": os.environ.get("DOLPHIN_BINGX_JOURNAL_DB", ""),
 								```
 								These are used as ClickHouse table and database name suffixes in `pink_clickhouse.py`. An attacker who can set env vars can inject SQL via semicolons or quotes in the table name. ClickHouse supports `INSERT INTO db.table FORMAT JSONEachRow` — a table name like `positions; DROP TABLE ...;` could be destructive.
 								**Severity: Low** (requires env var control, which implies broader access)
 								---
 								### Persistence Schema Alignment
 								#### G21: `entry_price` used as `exit_price` in `trade_events` — data loss
 								**File:** `pink_clickhouse.py (outside workspace)`
 								The `_write_trade_event` function maps `entry_price` from `slot.to_dict()` to both the `entry_price` and `exit_price` columns. The actual exit fill price (available on the `VenueEvent` object) is **never written** to the `exit_price` column.
 								**Result:** Every `trade_events` row has `exit_price == entry_price`. The `exit_price` column is a dead column — always contains the entry price, never the actual fill.
 								**Severity: High** — data loss to DB for the most important trade metric.
 								#### G22: `active_leg_index` → `entry_bar` semantic mis-mapping
 								**File:** `pink_clickhouse.py (outside workspace)`
 								```python
 								"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
 								```
 								`active_leg_index` tracks the exit-leg-ratios cursor (which leg of a multi-leg exit we're on), not a bar count. The value `0` at position open and `1` after the first exit leg — neither value represents bars held. **The `entry_bar` column stores the wrong concept.**
 								**Severity: Medium** — column contains semantically meaningless data.
 								#### G23: `capital_before` arithmetic reconstruction absorbs cross-slot PnL
 								**File:** `pink_clickhouse.py (outside workspace)`
 								```python
 								capital_before = capital_after - pnl_leg
 								```
 								`capital_before` is reconstructed by subtracting the current leg's PnL from the current capital. In a multi-slot system, other slots' PnL changes between legs are absorbed into `capital_before`. The column is **always wrong** in multi-slot scenarios because `capital_after` reflects total PnL from all slots, not just the leg being recorded.
 								**Severity: Medium** — wrong `capital_before` for multi-slot trading.
 								#### G24: Recovery `trade_reconstruction` always has `trade_id=""`
 								**File:** `pink_clickhouse.py (outside workspace)`
 								The `persist_recovery_state` function passes `kernel.snapshot()["account"]` (an account dict with keys `capital, equity, realized_pnl, ...`) where a slot dict is expected. The `trade_id` key **does not exist** on the account dict. The `recovery_state` row always has `trade_id=""`.
 								**Severity: Medium** — recovery data is not associable with any trade.
 								#### G25: `seen_event_ids`, `exit_leg_ratios`, `VenueOrder`, `metadata` not in flat ClickHouse tables
 								These fields are:
 								- Present on the Python `TradeSlot` ✅
 								- Transmitted through Zinc shared memory ✅
 								- Stored in Hazelcast ✅
 								- Stored in ClickHouse `dita_kernel_debug` (full JSON) ✅
 								- **NOT extracted** into main ClickHouse flat tables `position_state`, `trade_events`, `trade_exit_legs` ❌
 								Data exists at the source, travels through the pipeline, hits the debug journal — but is lost in the main analytical tables.
 								**Severity: Low** (data exists in debug journal if needed for reconstruction)
 								#### G26: `_safe_float` silently converts NaN/None/Inf to 0.0
 								**File:** `utils.py:15`
 								```python
 								def _safe_float(v, default=0.0):
 								    try:
 								        f = float(v)
 								        if not math.isfinite(f):
 								            return default
 								        return f
 								    except (TypeError, ValueError, OverflowError):
 								        return default
 								```
 								Used in multiple ClickHouse writers. Silently converts `NaN`/`Inf`/parsing errors to `0.0`. No diagnostic emitted when a non-finite value reaches the persistence layer — data silently zeroed.
 								**Severity: Low** (safe default but silent corruption)
 								---
 								### Lifecycle & Resource Management
 								#### G27: `build_launcher_bundle` has no exception safety — prior resources leak
 								**File:** `launcher.py:264-300**
 								```python
 								def build_launcher_bundle(...):
 								    control_plane = _build_control_plane(...)
 								    projection = build_projection(...)
 								    zinc_plane = _build_zinc_plane(...)
 								    venue = _build_venue(...)
 								    kernel = ExecutionKernel(...)  # ← if THIS fails, everything above leaks
 								```
 								If any step after the first raises, all previously built resources leak:
 								- `RealZincPlane` created → `_build_venue()` fails → 3 shared memory regions orphaned
 								- `RealZincControlPlane` created → `_build_zinc_plane()` fails → 1 shared memory region orphaned
 								- `BingxVenueAdapter` created → `ExecutionKernel.__init__()` fails → HTTP connection leaked
 								**No `try/finally` anywhere in the builder.** The init order is also optimized for forward construction, not backward cleanup.
 								**Severity: High** — shared memory leak on any build failure.
 								#### G28: `RealZincPlane` and `RealZincControlPlane` have no `__del__`
 								When `close()` is not called (exception in builder, forgotten cleanup, GC during shutdown), the shared memory regions opened by `RealZincPlane` (3 regions) and `RealZincControlPlane` (1 region) are **orphaned on the OS**. They persist in `/dev/shm/` (or platform equivalent) until system reboot.
 								Python's `__del__` is unreliable (not called on SIGKILL, not called if the object is part of a cycle without a GC run), but its absence means even normal garbage collection can't clean up.
 								**Severity: High** — shared memory leaks.
 								#### G29: Zero signal handlers — no cleanup on SIGTERM/SIGINT
 								```bash
 								$ grep -rn "signal\|SIGTERM\|SIGINT\|atexit" *.py  # ZERO matches
 								```
 								When SIGTERM or SIGINT arrives:
 . Python's default handler terminates the process immediately
 . No `DITAv2LauncherBundle.close()` is called
 . No `ExecutionKernel.__del__` is called (CPython may run GC on normal exit but not reliably)
 . All shared memory (RealZincPlane, RealZincControlPlane) is orphaned
 . In-flight BingX HTTP calls are interrupted mid-stream
 . Rust kernel handle is leaked
 								**Severity: High**
 								#### G30: `ExecutionKernel` has no `close()` — relies on `__del__` for Rust handle cleanup
 								`ExecutionKernel` has `__del__` which calls `_get_rust().destroy(backend)`. No `close()` method. `DITAv2LauncherBundle.close()` never touches the kernel — the Rust handle is only freed by GC at unpredictable time.
 								If any code holds a stale `_backend` pointer, the handle dangles when GC runs. If `__del__` is suppressed (e.g., during interpreter shutdown with cyclic references), the Rust handle leaks permanently.
 								**Fix:** Add `close()` to `ExecutionKernel`, call it from `DITAv2LauncherBundle.close()`.
 								**Severity: High**
 								#### G31: `projection` (Hazelcast) never closed
 								`build_projection()` returns a `HazelcastProjection` which holds a Hazelcast client connection. No `close()` or `disconnect()` method exists on the projection, projector, or row writer. `DITAv2LauncherBundle.close()` doesn't touch the projection. The Hazelcast client connection leaks on shutdown.
 								**Severity: Medium**
 								#### G32: `_maybe_close()` only calls the first method found — `break` skips the second
 								**File:** `launcher.py:233-243**
 								```python
 								for method_name in ("close", "disconnect"):
 								    method = getattr(obj, method_name, None)
 								    if method is None:
 								        continue
 								    try:
 								        result = method()
 								    except TypeError:
 								        continue
 								    if inspect.isawaitable(result):
 								        try:
 								            asyncio.run(result)
 								        except RuntimeError:
 								            pass
 								    break  # ← ONLY calls the FIRST found method, never both
 								```
 								If an object has both `close()` and `disconnect()`, only `close()` is called. `disconnect()` is silently skipped. Also: `asyncio.run(result)` silently swallows `RuntimeError` when a running event loop exists — the coroutine is **never executed**.
 								Currently no object has both, but the pattern is fragile.
 								**Severity: Low**
 								#### G33: `close()` is not idempotent for RealZinc components
 								`RealZincPlane.close()` and `RealZincControlPlane.close()` call their Zinc region's `close()` method. If called twice, the second call operates on an already-closed region — likely crashes from Hazelcast's shared memory code.
 								No nulling of references after close: `DITAv2LauncherBundle.close()` sets `self.venue`, `self.zinc_plane`, `self.control_plane` to `None` — **wait, it doesn't. It calls `_maybe_close()` which doesn't null references.** Double `close()` is unsafe.
 								**Severity: Low**
 								#### G34: No context manager on `DITAv2LauncherBundle`
 								`DITAv2LauncherBundle` has no `__enter__`/`__exit__`. Users must manually call `close()`. No `with` pattern exists anywhere in the source for lifecycle management. No `__del__` fallback on the bundle either.
 								**Severity: Low** (ergonomic, not a leak source if caller follows the pattern)
 								#### G35: `BingxVenueAdapter.connect()` exists but is never called by the launcher
 								`BingxDirectExecutionAdapter` has a `connect()` method that initializes the lifetime HTTP client. `BingxVenueAdapter` has `connect()` that calls `_call_backend("connect")`. Neither is called in `build_launcher_bundle()` or `_build_venue()`. If the adapter's `submit_intent()` relies on a connected client, it initializes lazily — but the connect path is dead code that exists but is never invoked.
 								**Severity: Informational**
 								#### G36: Only one `try/finally` in the entire codebase
 								The only `try/finally` is `_RustKernelLib._take_string()` (rust_backend.py:140-143) which frees the Rust C string. All other resource management uses `try/except` with no `finally`.
 								No cleanup is guaranteed on exception:
 								- `build_launcher_bundle()` — no cleanup on failure
 								- `process_intent()` — no cleanup of partial slot state on venue event exception
 								- `on_venue_event()` — no cleanup on FFI failure
 								- `_set_slot()` — no cleanup on projection or Zinc write failure
 								**Severity: High** (across all layers)
 								---
 								## Pass 4 Summary
 								| # | Flaw | Layer | Severity |
 								|---|------|-------|----------|
 								| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType | Rust | **Critical** |
 								| G2 | `into_c_string` unwrap() panics on NUL byte | Rust | **Critical** |
 								| G3 | EXIT hardcodes prev_state=POSITION_OPEN, allows backward FSM transition | Rust | **Critical** |
 								| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
 								| G5 | `realized_pnl` unbounded f64 overflow to inf | Rust | **High** |
 								| G6 | `mark_price` unbounded unrealized_pnl — no result guard | Rust | **High** |
 								| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
 								| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
 								| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
 								| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
 								| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
 								| G12 | No max leverage cap enforced by kernel | Rust | Medium |
 								| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
 								| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
 								| G15 | Zero `__post_init__` validators on all config dataclasses | Config | **High** |
 								| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
 								| G17 | String config fields — Zinc region injection risk | Config | Medium |
 								| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
 								| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
 								| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
 								| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
 								| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
 								| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
 								| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
 								| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
 								| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
 								| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
 								| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
 								| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
 								| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
 								| G31 | Hazelcast projection never closed | Lifecycle | Medium |
 								| G32 | _maybe_close() break skips second method | Lifecycle | Low |
 								| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
 								| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
 								| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
 								| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
 								### Pass 4 Severity Distribution
 								| Severity | Count |
 								|----------|-------|
 								| **Critical** | 4 (G1, G2, G3, G4) |
 								| **High** | 11 (G5-G10, G15, G21, G27, G28, G29, G30, G36) |
 								| Medium | 11 (G11-G14, G17, G22, G23, G24, G31) |
 								| Low | 8 (G16, G18, G19, G20, G25, G26, G32, G33, G34, G35) |
 								| Info | 2 |
 								### Combined Catalog (All 4 Passes)
 								| Pass | Focus | Count | Critical | High | Medium | Low | Info |
 								|------|-------|-------|----------|------|--------|-----|------|
 								| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
 								| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
 								| E | E2E Trace | 26 | 0 | 4 | 10 | 11 | 1 |
 								| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
 								| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
 								| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |
-												PINK: E2E trace analysis — Pass 5 edge domains (H1-H22)

Fifth pass covering dependency management (no Python lockfile, Rust compiled
from source), error handling observability (zero logging, 16+ silent swallows),
type safety (17 enum-from-string crash sites, _legacy_intent always MARKET),
and protocol contracts (MirroredControlPlane missing methods, RealZinc read
atomicity, __del__ use-after-free). 22 new flaws. Combined catalog now 138.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

											
										
										
											2026-06-01 17:42:58 +02:00
 								---
 								## PASS 5 — EDGE DOMAINS (Dependencies, Error Handling, Types, Contracts)
 								### H1: No Python dependency declaration files exist in workspace
 								**Files:** workspace root
 								Zero `requirements.txt`, `setup.py`, `setup.cfg`, `pyproject.toml`, `Pipfile`, or `poetry.lock` anywhere. All Python package dependencies are entirely implicit — determined by what's installed in the runtime environment. No reproducible installs, no version pinning, no audit trail.
 								The Rust side does have `Cargo.toml` + `Cargo.lock` — but all 4 direct Rust deps use open ranges (`"0.4"`, `"0.2"`, `"1"`, `"1"`).
 								**Severity: Critical**
 								### H2: Rust kernel compiled from source on every cold start via subprocess
 								**File:** `rust_backend.py:60-72`
 								```python
 								def _ensure_library() -> Path:
 								    path = _library_path()
 								    if not path.exists():
 								        _build_library()  # cargo build --release
 								    return path
 								def _build_library():
 								    subprocess.run(
 								        ["cargo", "build", "--release", ...],
 								        check=True,        # no timeout!
 								    )
 								```
 								First load takes 3-10 minutes (Rust compilation). Requires Rust toolchain in production. `subprocess.run()` has no `timeout=` — if `cargo` hangs (network, disk, lock contention), the Python process hangs indefinitely. No prebuilt binary distribution.
 								**Severity: Critical**
 								### H3: Zero logging — every swallowed error is invisible
 								The entire codebase has zero use of Python's `logging` module, `print()`, or `warnings.warn()` for error reporting. Every `except: pass`, `except Exception: pass`, and `return default` silently discards the error. **There is no mechanism to detect, alert, or diagnose production failures.**
 								All `try/except: pass` sites found:
 								| # | File:Line | What's Hidden |
 								|---|-----------|---------------|
 								| 1 | `bingx_venue.py:51` | `float()` conversion failure on any API field value |
 								| 2 | `bingx_venue.py:133` | regex match failure in rate-limit parsing |
 								| 3 | `bingx_venue.py:136` | int/float conversion of retry_after |
 								| 4 | `bingx_venue.py:325` | slot lookup failure during cancel asset resolution |
 								| 5 | `bingx_venue.py:350` | BingXHttpError in cancel — network error looks like rejection |
 								| 6 | `control.py:213` | RealZincControlPlane construction failure |
 								| 7 | `launcher.py:187` | RealZincPlane construction failure |
 								| 8 | `launcher.py:119` | malformed env var for active_slot_limit |
 								| 9 | `launcher.py:243` | asyncio.run() RuntimeError in _maybe_close |
 								| 10 | `launcher.py:277` | RealZincControlPlane fallback in build_control_plane |
 								| 11 | `real_control_plane.py:97` | region.wait() exception — timeout and error both return False |
 								| 12 | `real_control_plane.py:112` | region.notify() exception — writer thinks broadcast succeeded |
 								| 13 | `real_zinc_plane.py:31` | Zinc SharedRegion import failure |
 								| 14 | `projection.py:87` | HazelcastRowWriter import failure |
 								| 15 | `rust_backend.py:102` | __del__ exception in Rust kernel destroy |
 								| 16 | `bingx_venue.py:55` | `_row_float` tries 5+ key fallbacks, each failing silently |
 								**Severity: Critical**
 								### H4: `_row_float` rejects zero as a valid value — `or` pattern treats 0 as missing
 								**File:** `bingx_venue.py:47-55`
 								```python
 								def _row_float(row, *keys, default=0.0):
 								    for key in keys:
 								        try:
 								            value = float(row.get(key) or 0.0)  # `or 0.0` treats 0 as missing
 								        except Exception:
 								            continue
 								        if value == value and value not in (float("inf"), float("-inf")) and value != 0.0:
 								            return value                         # explicitly rejects 0.0
 								    return default
 								```
 								Two bugs: (a) `except Exception: continue` swallows ALL conversion errors, and (b) `value != 0.0` explicitly rejects zero as a valid return value. A legitimate zero price, zero filled quantity, or zero position amount causes `_row_float` to skip that key and search further. If ALL keys return 0, the default `0.0` is returned — indistinguishable from "none of the keys existed."
 								Called by every single BingX API response parser: `_position_qty()`, `_position_price()`, `_venue_order_from_row()`, `_event_from_row()`, `_fill_event_from_row()`, `_events_from_submit()`, `_events_from_cancel()`, `_filled_size_from_snapshots()`. None verify the returned 0.0 is real vs. missing-vs-zero.
 								**Severity: High**
 								### H5: `_backend_snapshot` timeout returns stale data with no signal to callers
 								**File:** `bingx_venue.py:242-251**
 								```python
 								def _backend_snapshot(self, *, timeout_ms=5000.0):
 								    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
 								        with self._snap_lock:
 								            return self._last_snapshot    # STALE — could be hours old
 								```
 								When the snapshot-fetch condition times out, returns `self._last_snapshot` — initialized to `None` and only updated on successful fetches. First timeout returns `None`. All callers (`cancel()`, `open_orders()`, `open_positions()`, `reconcile()`, `submit()`) access `.open_orders`, `.open_positions` immediately — crash with `AttributeError: 'NoneType' object has no attribute 'open_orders'`.
 								Even after the first fetch succeeds, subsequent timeouts return the last-good snapshot which could be arbitrarily stale. No caller timestamps, version-checks, or requests a refresh.
 								**Severity: High**
 								### H6: All enum-from-raw-string sites crash on unknown value — zero fallback
 								**Files:** `rust_backend.py:250-386`, `real_zinc_plane.py:70-106`
 								Every site that reconstructs a Python enum from a string received from the Rust kernel:
 								```python
 								side=TradeSide(str(payload.get("side", TradeSide.FLAT.value)))
 								status=VenueOrderStatus(str(payload.get("status", VenueOrderStatus.NEW.value)))
 								fsm_state=TradeStage(str(payload.get("fsm_state", TradeStage.IDLE.value)))
 								kind=KernelEventKind(str(row.get("kind", KernelEventKind.ORDER_ACK.value)))
 								```
 								If the Rust kernel introduces a new enum variant (e.g., `TradeStage::ENTRY_REJECTED`) not in the Python `TradeStage` enum, `TradeStage("ENTRY_REJECTED")` raises `ValueError` with zero fallback. Crashes `_outcome_from_payload()` and takes down the kernel's event processing loop.
 sites total across `rust_backend.py` and `real_zinc_plane.py`. No try/except, no mapping, no fallback on any of them.
 								**Severity: High**
 								### H7: `_legacy_intent` reads `getattr(intent, "order_type", "MARKET")` — always defaults to MARKET
 								**File:** `bingx_venue.py:282-285**
 								```python
 								metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
 								metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
 								```
 								`order_type` and `limit_price` are NOT fields on `KernelIntent` (contracts.py). They only exist in `intent.metadata` as `metadata["order_type"]` if set by the caller. `getattr(intent, "order_type", "MARKET")` checks the dataclass field — not the metadata dict — so it ALWAYS returns `"MARKET"`.
 								Even when the PINK runtime produces a LIMIT intent (LIMIT_DECISION → `metadata["order_type"] = "LIMIT"`), the legacy adapter converts is to MARKET because it reads the wrong source. Every LIMIT order is submitted as MARKET.
 								Similarly, `limit_price` is always `0.0` — any limit price from the metadata dict is lost.
 								**Severity: High**
 								### H8: `_venue_event_status_from_row` silently maps unknown venue status to ACKED
 								**File:** `bingx_venue.py:83-96**
 								```python
 								def _venue_event_status_from_row(status: str) -> VenueEventStatus:
 								    normalized = _normalize_status(status)
 								    # ... checks known statuses ...
 								    return VenueEventStatus.ACKED  # fallthrough for anything unknown
 								```
 								If BingX introduces a new status (`"SUSPENDED"`, `"PENDING_CANCEL"`, `"EXPIRED"`), it doesn't match any known mapping and silently returns `ACKED`. The kernel treats a suspended/cancelled/expired order as acknowledged — dangerous misclassification.
 								**Severity: High**
 								### H9: `RealZincPlane.write_slot()` — slot written to `slot_id >= slot_count` is invisible
 								**File:** `real_zinc_plane.py:206-210**
 								```python
 								def write_slot(self, slot):
 								    with self._lock:
 								        self._slot_cache[int(slot.slot_id)] = slot
 								        payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
 								```
 								`_slot_cache` is a plain dict — accepts any key. But `read_slots()` only reads 0..slot_count-1. Writing to `slot_id >= slot_count` stores the slot in the cache but it's **never serialized or read back**. No error.
 								**Severity: High**
 								### H10: `RealZincControlPlane.read()` has no atomicity with concurrent `update()`
 								**File:** `real_control_plane.py:70-77**
 								`_write_region()` zero-fills the buffer then writes the packet. If `read()` interleaves between zero-fill and write, it sees a partially-zeroed buffer → `_decode_packet` returns `{}` → returns stale `self._snapshot` with no observable error. No lock, no sequence check, no atomic read.
 								The same bug exists in `RealZincPlane.read_slots()` (real_zinc_plane.py:220-230) — reads shared memory while a concurrent `write_slot()` is in progress.
 								**Severity: High**
 								### H11: `_RustKernelLib` lazily initialized with race condition
 								**File:** `rust_backend.py:187-190**
 								```python
 								_RUST: _RustKernelLib | None = None
 								def _get_rust():
 								    global _RUST
 								    if _RUST is None:
 								        _RUST = _RustKernelLib()  # no lock — two threads can both create
 								    return _RUST
 								```
 								No threading lock. Two concurrent calls to `_get_rust()` (possible via `BingxVenueAdapter`'s thread pool) can create two `_RustKernelLib` objects. The `_RustKernelLib()` constructor runs `_ensure_library()` which runs `subprocess.run(["cargo", "build", ...], check=True)` — concurrent `cargo build` can corrupt the build directory.
 								**Severity: High**
 								### H12: `ExecutionKernel.__del__` can deadlock or use-after-free
 								**File:** `rust_backend.py:527-531**
 								```python
 								def __del__(self):
 								    backend = getattr(self, "_backend", None)
 								    if backend is not None:
 								        try:
 								            _get_rust().destroy(backend)  # accesses module singleton
 								        except Exception:
 								            pass
 								```
 								`_get_rust()` accesses the module-level `_RUST` singleton, which may already be destroyed if the module's garbage collection runs before the instance's. The destroy call happens outside any lock — one thread's destructor could destroy the Rust kernel while another thread is still using it. Use-after-free.
 								**Severity: High**
 								### H13: `MirroredControlPlane` missing protocol methods
 								**File:** `control.py:171-184**
 								`ControlPlane` protocol defines `wait()` and `notify()`. `MirroredControlPlane` inherits from nothing and only implements `read()`, `update()`, and `mirror()`. Calling `plane.wait()` on a `MirroredControlPlane` raises `AttributeError`.
 								**Severity: Medium**
 								### H14: `TradeSlot.remaining_size()` and `VenueOrder.remaining_size()` — same name, different semantics
 								**Files:** `contracts.py:207-208`, `contracts.py:143-145**
 								```python
 								# TradeSlot:
 								def remaining_size(self) -> float:
 								    return max(0.0, float(self.size))  # open position size
 								# VenueOrder:
 								def remaining_size(self) -> float:
 								    return max(0.0, self.intended_size - self.filled_size)  # unfilled order qty
 								```
 								Same method name, completely different semantics. `TradeSlot.remaining_size()` returns the current open position size. `VenueOrder.remaining_size()` returns the untracked/unfilled order quantity. A caller using `slot.remaining_size()` to check if an order is fully filled gets position size, which doesn't change with fills — it changes with entry/exit.
 								**Severity: Medium**
 								### H15: `_maybe_close()` — `asyncio.run()` RuntimeError silently swallowed for coroutines
 								**File:** `launcher.py:233-243**
 								```python
 								if inspect.isawaitable(result):
 								    try:
 								        asyncio.run(result)
 								    except RuntimeError:
 								        pass  # SILENT — coroutine never executed
 								```
 								When `maybe_close` is called from an async context (which it is — `DITAv2LauncherBundle.close()` is used in async test code), `asyncio.run()` raises `RuntimeError("Cannot run the event loop while another loop is running")`. The exception is swallowed, the coroutine is never awaited, and the close/disconnect never happens.
 								Also: `break` after calling the first found method means if an object has both `close()` and `disconnect()`, `disconnect()` is never called.
 								**Severity: Medium**
 								### H16: `_build_launcher_bundle` imports `BingxDirectExecutionAdapter` inside function — import-time side effect is safe but lazy loading masks errors
 								**File:** `launcher.py:254**
 								```python
 								def _build_venue(...):
 								    from prod.clean_arch.adapters.bingx_direct import BingxDirectExecutionAdapter
 								```
 								Import inside function — safe, lazy, no side effects. But if the `bingx_direct` module has an import error (missing dependency, version mismatch), it only surfaces at bundle construction time, not at process start. A misconfigured production deployment would fail on the first trade, not on boot.
 								**Severity: Informational**
 								### H17: `load_dotenv()` at module level — import-time filesystem I/O and env mutation
 								**File:** `launcher.py:49-51**
 								```python
 								load_dotenv(PROJECT_ROOT / ".env")  # executes on module import
 								```
 								Runs on every import of `launcher.py` — reads filesystem, mutates process environment. Hard to mock in tests — setting env vars in test setup gets overwritten on module import. Also: if `.env` doesn't exist, `load_dotenv()` silently does nothing — missing config is invisible.
 								**Severity: Medium**
 								### H18: `_run()` in `BingxVenueAdapter` — `asyncio.run()` thread-pool bridge blocks on every call
 								**File:** `bingx_venue.py:225-233**
 								```python
 								def _run(self, result):
 								    if inspect.isawaitable(result):
 								        try:
 								            asyncio.get_running_loop()
 								        except RuntimeError:
 								            return asyncio.run(result)
 								        pool = self._get_executor()
 								        return pool.submit(asyncio.run, result).result()  # BLOCKS
 								```
 								Every call to `_run()` that receives an awaitable blocks the calling thread via `.result()`. The BingX HTTP call inside `submit_intent()` can take 1-5 seconds. During this block, the event loop cannot process other tasks. In a single-runtime deployment, this stalls the entire policy cycle.
 								**Severity: Medium**
 								### H19: `HazelcastClientLike` protocol has zero concrete implementations in workspace
 								**File:** `hazelcast_projection.py:13-15**
 								```python
 								class HazelcastClientLike(Protocol):
 								    def get_map(self, name: str): ...
 								    def get_topic(self, name: str): ...
 								```
 								Used as a type hint. No code in the workspace creates an object that satisfies this protocol. The Hazelcast client comes from an external package. If the external API changes, the protocol silently drifts — no compilation check.
 								**Severity: Low**
 								### H20: `_decode_packet` in RealZinc — no bound check on `size` beyond `> len(buf)-16`
 								**Files:** `real_control_plane.py:50-52`, `real_zinc_plane.py:70-81**
 								```python
 								seq, size = struct.unpack_from("!QQ", buf, 0)
 								if size <= 0 or size > len(buf) - 16:
 								    return {}
 								payload = bytes(buf[16 : 16 + size]).decode("utf-8")  # can raise UnicodeDecodeError
 								out = json.loads(payload)  # can raise ValueError
 								```
 								If shared memory contains a corrupted `size` field within bounds, `.decode()` or `json.loads()` raises — uncaught by callers. A single corrupted byte in shared memory crashes the kernel.
 								**Severity: Low**
 								### H21: All Rust crate features enabled by default — `wasm-bindgen` compiled into native shared library
 								**File:** `_rust_kernel/Cargo.toml`, transitive through `chrono` → `iana-time-zone` → `js-sys` → `wasm-bindgen`
 								The Rust kernel is a native `.so`/`.dylib` but chrono's `iana-time-zone` pulls in `js-sys` and `wasm-bindgen` (WebAssembly support) even on native Linux. Larger binary, longer compile times. `cc` crate pulled in for `iana-time-zone-haiku` which only compiles on Haiku OS.
 								**Severity: Low**
 								### H22: `socket.getaddrinfo` monkey-patch in test generator code
 								**File:** `gen2.py:295-298**
 								Monkey-patches Python stdlib `socket.getaddrinfo` to force IPv4 as a workaround for IPv6 resolution failure in the deployment environment. If copied to production code, would break IPv6 connectivity.
 								**Severity: Low**
 								---
 								## Pass 5 Summary
 								| # | Flaw | Layer | Severity |
 								|---|------|-------|----------|
 								| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | **Critical** |
 								| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | **Critical** |
 								| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | **Critical** |
 								| H4 | `_row_float` rejects zero as valid, `except Exception: continue` swallows all | Venue | **High** |
 								| H5 | `_backend_snapshot` timeout returns stale data/None — callers crash | Venue | **High** |
 								| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) | Bridge | **High** |
 								| H7 | `_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET | Venue | **High** |
 								| H8 | Unknown venue status silently mapped to ACKED | Venue | **High** |
 								| H9 | `RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost | Zinc | **High** |
 								| H10 | `RealZincControlPlane.read()` no atomicity with concurrent `update()` | Control | **High** |
 								| H11 | `_RustKernelLib` lazy init with race condition — concurrent cargo build | Bridge | **High** |
 								| H12 | `ExecutionKernel.__del__` use-after-free on Rust handle | Bridge | **High** |
 								| H13 | `MirroredControlPlane` missing protocol methods (wait/notify) | Control | Medium |
 								| H14 | `TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics | Contracts | Medium |
 								| H15 | `_maybe_close` asyncio.run RuntimeError silently swallowed | Launcher | Medium |
 								| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
 								| H17 | `load_dotenv()` at module level — import-time I/O side effect | Launcher | Medium |
 								| H18 | `_run()` blocks event loop on every HTTP call via thread pool | Venue | Medium |
 								| H19 | `HazelcastClientLike` protocol has zero concrete implementations | Projection | Low |
 								| H20 | `_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM | Zinc | Low |
 								| H21 | `wasm-bindgen` compiled into native library unnecessarily | Build | Low |
 								| H22 | `socket.getaddrinfo` monkey-patch in test code | Test | Low |
 								### Pass 5 Severity Distribution
 								| Severity | Count |
 								|----------|-------|
 								| **Critical** | 3 (H1, H2, H3) |
 								| **High** | 9 (H4-H12) |
 								| Medium | 5 (H13, H14, H15, H17, H18) |
 								| Low | 4 (H19, H20, H21, H22) |
 								| Info | 1 (H16) |
 								### Combined Catalog (All 5 Passes)
 								| Pass | Focus | Count | Critical | High | Medium | Low | Info |
 								|------|-------|-------|----------|------|--------|-----|------|
 								| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
 								| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
 								| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
 								| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
 								| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
 								| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
 								| **Total** | | **138** | **8** | **30** | **37** | **44** | **19** |
-												PINK: E2E trace analysis — Pass 6 deep math/tests/concurrency/security (I1-I22)

Sixth pass: entry-fill accumulation bug (multiple partial fills overwrite
size), crash durability (slot state lost between step 2-5 of process_intent),
seen_event_ids lost on restart (double event processing), idempotency gap
(no newClientOrderId), no graceful degradation, no startup reconcile from
Zinc, Zinc SHM world-readable, KernelSlotView unrestricted write access,
sys.path injection at import time. 22 new flaws. Combined catalog now 160.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

											
										
										
											2026-06-01 19:01:49 +02:00
 								---
 								## PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY
 								### I1: Entry `apply_fill` sets `slot.size = fill_size` — multiple partial fills overwrite instead of accumulating
 								**File:** `_rust_kernel/src/lib.rs:798`
 								```rust
 								// Entry fill path in apply_fill:
 								slot.size = fill_size;          // DIRECT ASSIGNMENT
 								slot.initial_size = slot.initial_size.max(fill_size);  // max, not sum
 								```
 								If a single entry order receives multiple partial fills (e.g., LIMIT order on the book):
 								- Fill #1: `fill_size = 0.5` → `slot.size = 0.5`, `initial_size = max(0, 0.5) = 0.5`
 								- Fill #2: `fill_size = 0.3` → `slot.size = 0.3`, `initial_size = max(0.5, 0.3) = 0.5`
 								After both fills, the actual position is 0.8 but `slot.size` reports 0.3. The position is under-counted by 0.5 — 62.5% error.
 								The exit path correctly does `slot.size = (slot.size - fill_size).max(0.0)` (subtractive). The entry path should accumulate: `slot.size += fill_size`.
 								This only manifests with LIMIT orders that receive multiple partial fills over time — a scenario entirely absent from tests (I7).
 								**Severity: Critical**
 								### I2: `exit_ratio = 0.0` creates zero-size exit order — slot stuck in EXIT_REQUESTED
 								**File:** `_rust_kernel/src/lib.rs:467-469`
 								```rust
 								let exit_ratio = slot.next_exit_ratio();         // returns 0.0 from exit_leg_ratios=[0.0, ...]
 								let base_size = if slot.initial_size > 0.0 { ... } else { slot.size };
 								let exit_size = (base_size * exit_ratio).max(0.0); // = 0.0
 								```
 								When `exit_leg_ratios` contains `0.0` in any position, `exit_size = 0.0`. The zero-size exit order is submitted to the venue (`intended_size = 0`). On the fill side, `realized_pnl()` returns 0.0 (guarded by `exit_size <= 0.0`), and `slot.size` is unchanged. The slot stays in `EXIT_REQUESTED` with no means to advance — the leg is consumed but nothing happened. Subsequent exits may eventually handle this, but the zero-size leg is a wasted FSM transition that leaves the slot in a confusing intermediate state.
 								Also: `NaN` in `exit_leg_ratios` (from `clamp(0.0, 1.0)` not guarding NaN, though serde_json rejects NaN) would produce the same zero-size exit behavior.
 								**Severity: Medium**
 								### I3: `entry_price` inconsistency — Python uses falsy check, Rust uses `<= 0.0`
 								**File:** `contracts.py:88-98` (Python), `_rust_kernel/src/lib.rs:227-228` (Rust)
 								```python
 								# Python TradeSlot.mark_price():
 								self.entry_price = self.entry_price or price   # falsy — keeps -0.5, 0.0 replaced
 								# Rust TradeSlot::mark_price():
 								if self.entry_price <= 0.0 { self.entry_price = price; }  // catches -0.5, replaces it
 								```
 								If `entry_price` is negative (possible only via `set_slot_json` direct injection — not from normal trading), Python keeps it and computes `unrealized_pnl` with wrong sign. Rust replaces it. The Python-side `mark_price` is only called from `ExecutionKernel.mark_price()` in rust_backend.py:LOW-1, which never writes back to the Rust kernel — so the Python-side calculation is purely local and the inconsistency has no effect on the Rust kernel's canonical state. However, the `observe_slots` call after `mark_price` re-reads from the Rust kernel, which recomputes PnL correctly. The Python-side mark_price is effectively wasted computation that never feeds back.
 								**Severity: Informational**
 								### I4: No Rust unit tests for 99% of kernel functionality
 								**File:** `_rust_kernel/src/lib.rs:1731-1765`
 								Only 1 Rust test exists: `enter_then_ack_fill` — creates a 2-slot kernel, submits ENTER, sends ACK, asserts state transitions.
 								**Not tested in Rust:**
 								- EXIT, CANCEL, MARK_PRICE, RECONCILE, CONTROL actions
 								- Any FILL event (PARTIAL, FULL)
 								- CANCEL_ACK, CANCEL_REJECT, ORDER_REJECT
 								- RATE_LIMITED handling
 								- Multi-leg exits
 								- `consume_exit_leg` edge cases
 								- `realized_pnl()` formula with boundary values
 								- `mark_price()` with extreme values
 								- `resolve_slot()` fallback path
 								- `reconcile_slots_json` dedup/overflow
 								- Any C FFI boundary function
 								- Any serde deserialization failure
 								- Null pointer handling
 								No `#[cfg(test)]` module exists — the single test is inline. No Rust integration tests (`tests/` directory).
 								**Severity: High**
 								### I5: `MockVenueScenario` rejection flags exist but zero tests use them
 								**File:** `mock_venue.py:23-35`
 								```python
 								@dataclass
 								class MockVenueScenario:
 								    reject_entries: bool = False
 								    reject_exits: bool = False
 								    cancel_reject: bool = False
 								```
 								Three boolean flags to simulate venue rejection of orders. Not a single test in `test_flaws.py` sets any of them to `True`. The `ORDER_REJECT` handler in the Rust kernel's `on_venue_event` exists (lib.rs lines ~1440-1460) but is never exercised by any test.
 								Similarly, `entry_partial_fill_ratio` and `exit_partial_fill_ratio` exist on `MockVenueScenario` but only one test (`test_cancel_entry_with_partial_fill`) uses partial fills at all — and it only checks `size > 0`, not the full capital-accrual chain.
 								**Severity: High**
 								### I6: No LIMIT order test through the full kernel path
 								The test suite has zero LIMIT orders. The Rust kernel doesn't even contain LIMIT-specific logic — all orders are MARKET. The generated live tests have `limit_does_not_fill` and `limit_immediate_fill` scenario placeholders, but:
 								- `limit_does_not_fill` uses `reference_price=0.0` (not a real LIMIT order)
 								- `limit_immediate_fill` uses `target_size=-0.001` (negative size → clamped to 0.0)
 								Neither scenario actually submits a LIMIT order with `order_type="LIMIT"` and a non-zero `limit_price`. The `_legacy_intent` bug (H7) would convert any LIMIT attempt to MARKET anyway.
 								The only LIMIT-related code is the Rust kernel's `if intent.order_type == "LIMIT"` branches (lib.rs:503, 1584) which are compile-time dead code — `KernelIntent` doesn't have an `order_type` field that serde would populate.
 								**Severity: High**
 								### I7: Three weak/vacuous assertions in `test_flaws.py`
 								**File:** `test_flaws.py`
 . **Line 512:** `assert order.metadata.get("asset") is not None or order.metadata.get("slot_id") is not None` — mock venue always sets both, this can never fail.
 . **Line 700:** `test_pnl_warning_on_unsettled_reentry` — titled to assert a warning is raised but only checks `r.accepted`. Never checks `diagnostic_code` or verifies the warning was issued.
 . **Line 318:** `assert slot.active_entry_order is None or slot.active_entry_order.status == VenueOrderStatus.FILLED` — the `or` allows two different scenarios to pass, reducing diagnostic power.
 								**Severity: Low**
 								### I8: `slot.size = fill_size` entry overfill no guard
 								**File:** `_rust_kernel/src/lib.rs:798`
 								Already noted in I1 — entry fill sets `slot.size` directly to `fill_size`. Unlike exit fill which has `(slot.size - fill_size).max(0.0)`, there's no guard against entry overfill (venue fills more than the intended order size). For MARKET orders this is fine (one fill per order), but for LIMIT orders with multiple partial fills, the accumulated fill could exceed `initial_size`.
 								**Severity: Low** (only relevant with LIMIT + partial fills, which don't exist in the codebase)
 								### I9: No crash durability — slot state is pure in-memory until step 7 of process_intent
 								**File:** `rust_backend.py:470-560`
 								The `process_intent` sequence:
 . validate → 2. Rust FSM → 3. venue.submit() → 4. on_venue_event() → 5. projection → 6. zinc_plane
 								If the process crashes between steps 2-5, the slot state accumulated in the Rust kernel's in-memory `KernelCore` is **completely lost**. The Rust kernel has no WAL, no journal, no persistent store. On restart, `ExecutionKernel.__init__` creates a fresh `KernelCore` with all slots IDLE.
 								The crash between step 3 and step 5 is the most dangerous: the exchange has an open order/position, but the kernel has no record of it. On restart:
 								- The Rust kernel sees `slot.slot_id = IDLE`
 								- The Zinc slot cache may or may not have the pre-crash state (depends on timing)
 								- No code on restart loads Zinc state back into the Rust kernel (I14)
 								- The exchange order lives until it fills (unexpected position) or is manually cancelled
 								**Concrete example:** `venue.submit()` sends POST to BingX, order placed. HTTP response arrives. `on_venue_event(ORDER_ACK)` transitions slot to `ENTRY_WORKING`. Crash between returning from `on_venue_event` and `zinc_plane.write_slot()`. On restart: slot is IDLE, no active entry order, `_last_settled_pnl` is reset. The exchange has a live ENTRY_WORKING order. Next `process_intent(ENTER)` gets `SLOT_BUSY` because... wait — the fresh kernel doesn't know the order exists, so it sees slot as IDLE and allows a new ENTER. The old order fills on the exchange → double position.
 								**Severity: Critical**
 								### I10: `seen_event_ids` lost on restart — events replayed after restart are double-processed
 								**File:** `_rust_kernel/src/lib.rs:672-683`
 								`seen_event_ids` is per-slot, per-[`KernelCore`] instance — purely in-process memory. On restart with a fresh `KernelCore`, every slot has `seen_event_ids = Vec::new()`. If events are replayed (from `pump_venue_events()` calling `venue.reconcile()` which re-fetches exchange state):
 . Original run: order fills → `FULL_FILL` with `event_id = "EV-00000042"` → processed, slot → `POSITION_OPEN`
 . Crash
 . Restart: fresh `KernelCore`, `seen_event_ids` empty
 . `pump_venue_events()` fetches same exchange state → new `VenueEvent` objects with new event IDs (adapter's `_event_seq` resets)
 . Rust kernel sees these as novel events — processes them again
 . Position is double-booked, PnL double-settled
 								The `bingx_venue._event_seq` is an instance-level `itertools.count()` starting from 1. On adapter restart, it resets — so the new event IDs won't match the old ones anyway. Dedup is fundamentally impossible across restarts.
 								**Severity: Critical**
 								### I11: No idempotency key (`newClientOrderId`) sent to BingX
 								**File:** `bingx_venue.py:282-285`, `bingx_direct.py` (external)
 								BingX supports `newClientOrderId` for order idempotency — sending the same ID twice returns the original order status instead of creating a duplicate. The DITAv2 kernel passes `intent.intent_id` as `decision_id` to the legacy adapter, but there's no guarantee this maps to `newClientOrderId` in the BingX payload.
 								If the HTTP POST to `/trade/order` times out before the response is read:
 . The order was placed on the exchange
 . `_call_backend` raises a `BingxHttpError` (or similar network exception)
 . `process_intent()` propagates the exception — no retry
 . Next cycle: caller may retry with a new `intent_id`
 . Second POST creates a **second order** on the exchange — duplicate position
 								Without a client-order-id that persists across retries, the system can create duplicate orders on network timeouts. The exchange has no way to deduplicate.
 								**Severity: High**
 								### I12: No graceful degradation for ANY subsystem
 								Every subsystem failure mode examined:
 								| Subsystem | Failure | Current behavior |
 								|-----------|---------|-----------------|
 								| Zinc SHM init | Corrupted region, OOM | Silent fallback to InMemoryZincPlane (no operator signal) |
 								| Zinc SHM write | Region overflow, write error | Unhandled exception → kernel crashes |
 								| Hazelcast write | Cluster unavailable | `.put()` raises → unhandled exception → kernel crashes |
 								| ClickHouse journal | Sink failure | Exception propagates (no try/except in callers) |
 								| BingX HTTP | Timeout, rate limit | Exception or REJECTED → slot stuck in ORDER_REQUESTED |
 								| Rust kernel | Null pointer from FFI | `_take_string` raises RuntimeError → kernel crash |
 								| Memory pressure | OOM | Process killed by kernel. No signal handler. Zero signal handlers. |
 								**No subsystem has a graceful degradation path.** No circuit breaker, no retry queue, no fallback to log-only mode, no offline/cached trading mode. Every failure (except the two init-time silent fallbacks) crashes the current kernel operation.
 								**Severity: High**
 								### I13: Stray venue event can reactivate a CLOSED slot — no guard
 								**File:** `_rust_kernel/src/lib.rs:625+`
 								The `on_venue_event` function has no guard for closed slots:
 								```rust
 								fn on_venue_event(&mut self, event: VenueEvent) -> KernelResult {
 								    // ... resolve slot, check duplicates ...
 								    // NO: if slot.closed { return ... }
 								    let prev_state = slot.fsm_state.clone();
 								    match event.kind {
 								        SOME_EVENT_KIND => { /* transitions regardless of closed state */ }
 								    }
 								}
 								```
 								If a stray venue event arrives for a CLOSED slot:
 								- `ORDER_ACK` → sets `ENTRY_WORKING` — slot re-opens from CLOSED
 								- `FULL_FILL` → `apply_fill` runs → `slot.size = fill_size`, `fsm_state = POSITION_OPEN`
 								- `ORDER_REJECT` → clears `trade_id`, `asset`, sets `IDLE` — actually benign reset
 								A CLOSED slot should be a terminal state that rejects all events. Currently only CANCEL_ACK is harmless on a closed slot; the rest can revive a dead position.
 								**Severity: High**
 								### I14: No `reconcile_from_slots` call on startup — Zinc state never loaded into Rust kernel
 								**Files:** `rust_backend.py:435-465` (init), `real_zinc_plane.py:95-115` (init)
 								On restart:
 . `RealZincPlane.__init__` reads state from Zinc shared memory into `_slot_cache`
 . `ExecutionKernel.__init__` creates fresh `KernelCore` — all slots IDLE
 . `KernelStateView(self)` reads from the fresh kernel
 . `account.observe_slots([self._get_slot(i) for i in range(max_slots)])` — all slots IDLE
 								Step 3 and 4 read from the Rust kernel, NOT from Zinc. The Zinc `_slot_cache` populated in step 1 is **never loaded into the Rust kernel**. The `reconcile_on_restart` flag exists in `KernelControlSnapshot` (default `True`) but is never checked anywhere in `ExecutionKernel.__init__` or the launcher.
 								The system always starts with a blank state even when durable shared memory state exists.
 								**Severity: High**
 								### I15: CANCEL_REJECT doesn't clear `active_exit_order` — slot stuck in EXIT_WORKING
 								**File:** `_rust_kernel/src/lib.rs:1165-1175`
 								```rust
 								KernelEventKind::CANCEL_REJECT => {
 								    if slot.fsm_state == TradeStage::EXIT_WORKING {
 								        // stays EXIT_WORKING — no state transition
 								        // active_exit_order remains attached
 								    }
 								    diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
 								}
 								```
 								When the exchange rejects a cancel (typically because the order was already filled or no longer exists), the slot stays in `EXIT_WORKING` with `active_exit_order` still attached. Every subsequent CANCEL attempt hits the same path — the exchange returns "order not found," the kernel sees `CANCEL_REJECT`, and the slot is stuck forever.
 								If the order was already filled (CANCEL_REJECT means "can't cancel, no longer open"), the slot should check the actual position size and potentially transition to `POSITION_OPEN` or `CLOSED` depending on fill status.
 								**Severity: Medium**
 								### I16: Zinc shared memory — world-readable/writable by same-machine processes
 								**Files:** `real_control_plane.py`, `real_zinc_plane.py`
 								The Zinc shared memory regions are created with these names:
 								```python
 								self.region_name = f"{base}_intent"       # e.g., "dita_v2_intent"
 								self.state_name = f"{base}_state"          # "dita_v2_state"
 								self.control_name = f"{base}_control"      # "dita_v2_control"
 								```
 								Region names are predictable (prefix defaults to `"dita_v2"`). The `SharedRegion` uses POSIX `shm_open` — the default permissions depend on umask (typically `0644` or `0600`). Any process on the same machine can:
 								- **Read**: Open the region → `as_buffer()` → `_decode_packet()` → read all slot state, PnL, open orders, control settings
 								- **Write**: Open the region → forge a packet (`struct.pack("!QQ", seq, len) + json_bytes`) → overwrite slot state, inject fake intents, modify control plane
 								No access control, no encryption, no integrity check (HMAC/signature) on the wire format. The sequence number is the only ordering mechanism, and it's trivially predictable.
 								**Severity: High**
 								### I17: `KernelSlotView` exposes full slot state via unrestricted `__getattr__`/`__setattr__`
 								**File:** `rust_backend.py:411-460`
 								```python
 								class KernelSlotView:
 								    def __getattr__(self, name):
 								        slot = self._snapshot()
 								        return getattr(slot, name)         # read ANY field
 								    def __setattr__(self, name, value):
 								        setattr(slot, name, value)
 								        self._kernel._set_slot(slot)       # write ANY field — bypasses FSM
 								```
 								Any code with a `KernelSlotView` reference can:
 								- Read all slot fields: `trade_id`, `size`, `entry_price`, `unrealized_pnl`, `realized_pnl`, `seen_event_ids`, `metadata`
 								- Write all slot fields: `slot_view.realized_pnl = -9999999` — directly manipulates PnL figures flowing into capital settlement
 								The `_set_slot` call writes through to the Rust kernel without any FSM validation. The entire kernel state is exposed through mutable Python objects with zero access control.
 								**Severity: High**
 								### I18: `sys.path.insert(0, ...)` at import time in three production files
 								**Files:** `real_control_plane.py:14`, `real_zinc_plane.py:22`, `test_flaws.py:13`, `_build_pink_bodies.py:2`, `_gen_test.py:3`
 								```python
 								# real_control_plane.py, real_zinc_plane.py — at MODULE LEVEL:
 								sys.path.insert(0, str(_ZINC_ADAPTER_PATH))
 								# test_flaws.py, _build_pink_bodies.py, _gen_test.py — at MODULE LEVEL:
 								sys.path.insert(0, '/mnt/dolphinng5_predict')
 								```
 								`sys.path.insert(0, ...)` gives the injected path highest import priority. An attacker with filesystem write access to the inserted path can create a malicious module that shadows a legitimate import (e.g., `zinc.py`, `utils.py`, `typing.py`). When any subsequent `from X import Y` runs, the attacker's module loads with the full privileges of the kernel process.
 								The production files use a relative path resolution (`Path(__file__).resolve().parents[3] / "zinc" / "adapters" / "python"`), while the test files use a hardcoded absolute path (`'/mnt/dolphinng5_predict'`). Both patterns are dangerous.
 								**Severity: High**
 								### I19: `pump_venue_events` re-fetches exchange state that can produce phantom position events
 								**File:** `bingx_venue.py:395-415`
 								`reconcile()` calls `_backend_snapshot()` which fetches current positions and open orders from the exchange. The `_events_from_snapshot` method diff-s the current snapshot against the last-known snapshot to produce events:
 								```python
 								def _events_from_snapshot(self, before, after):
 								    for symbol, current_pos in after.open_positions.items():
 								        prev_pos = before.open_positions.get(symbol)
 								        if current_pos and (not prev_pos or abs(prev_pos.position_amount) < 1e-12):
 								            # This looks like a new position — emit event
 								```
 								If `before` is stale (from `_backend_snapshot` timeout), the diff can produce spurious events. A position that existed before the crash is absent from the stale snapshot → the diff sees it as "new" → emits an entry fill event → Rust kernel processes it as a fresh enter → double position. This compounds with I10 (seen_event_ids lost on restart).
 								**Severity: High**
 								### I20: `exit_leg_ratios` no guard against empty list — `next_exit_ratio` returns 1.0
 								**File:** `contracts.py:196-198`
 								```python
 								def next_exit_ratio(self) -> float:
 								    if self.active_leg_index < len(self.exit_leg_ratios):
 								        return self.exit_leg_ratios[self.active_leg_index]
 								    return 1.0
 								```
 								If `exit_leg_ratios` is empty (default `(1.0,)` prevents this normally, but the default is only `(1.0,)` in the dataclass), `next_exit_ratio()` returns `1.0`. This is the same as "exit everything" — the `consume_exit_leg` then advances `active_leg_index` to `min(1, 1) = 1`, and `all_legs_done = active_leg_index >= exit_leg_ratios.len()` → `1 >= 0 = true` → slot closes. The empty-ratios edge case is silently handled with `unwrap_or(1.0)`, which happens to be correct — but undocumented.
 								**Severity: Informational**
 								### I21: No test for rate-limited events — `RATE_LIMITED` kernel path is dead code
 								**File:** `_rust_kernel/src/lib.rs` (event handler), `MockVenueScenario.mock_venue.py` (no rate_limit flag)
 								The Rust kernel has a handler for `KernelEventKind::RATE_LIMITED` (lib.rs lines ~1480-1500). The event flows through the Python bridge's `process_intent()` rate-limit detection (rust_backend.py:585-593). But `MockVenueScenario` has no flag to emit rate-limited events. The only path to trigger `RATE_LIMITED` is from the real BingX adapter — which requires live exchange connectivity.
 								The entire RATE_LIMITED code path — in both Python and Rust — is untested in CI. Any bug in this path only surfaces in production under rate-limit conditions.
 								**Severity: Medium**
 								### I22: Thread pool for `_run` — `max_workers=3` shared across ALL adapter instances
 								**File:** `bingx_venue.py:236-245**
 								```python
 								@classmethod
 								def _get_executor(cls):
 								    if cls._EXECUTOR is None:
 								        with cls._EXECUTOR_LOCK:
 								            if cls._EXECUTOR is None:
 								                cls._EXECUTOR = ThreadPoolExecutor(max_workers=3, ...)
 								    return cls._EXECUTOR
 								```
 								Class-level singleton — all `BingxVenueAdapter` instances share the same 3-thread pool. With the runtime's `step()` calling `submit()` (1 thread) + `_backend_snapshot` (potentially another thread for open orders) + `cancel()` (1 thread in parallel), all 3 threads are consumed. A fourth concurrent call blocks the calling thread at `.result()` indefinitely — freezing the entire event loop.
 								The pool is never shut down. If a `BingxVenueAdapter` is destroyed, the threads remain running (zombie workers). No `close()`/`disconnect()` path shuts down the executor.
 								**Severity: Medium**
 								---
 								## Pass 6 Summary
 								| # | Flaw | Layer | Severity |
 								|---|------|-------|----------|
 								| I1 | Entry `apply_fill` multiple partial fills overwrite size instead of accumulating | Rust | **Critical** |
 								| I2 | Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED | Rust | Medium |
 								| I3 | entry_price inconsistency — Python falsy vs Rust `<= 0.0` gate | Bridge | Info |
 								| I4 | Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer | Rust | **High** |
 								| I5 | MockVenueScenario rejection flags exist but zero tests use them | Test | **High** |
 								| I6 | No LIMIT order test through full kernel path | Test | **High** |
 								| I7 | Three weak/vacuous assertions in test_flaws.py | Test | Low |
 								| I8 | Entry overfill no guard | Rust | Low |
 								| I9 | No crash durability — slot state pure in-memory until step 7 of process_intent | Bridge | **Critical** |
 								| I10 | seen_event_ids lost on restart — events double-processed | Rust | **Critical** |
 								| I11 | No idempotency key sent to BingX — lost response creates duplicate orders | Venue | **High** |
 								| I12 | No graceful degradation for ANY subsystem | All | **High** |
 								| I13 | Stray venue event can reactivate CLOSED slot — no guard | Rust | **High** |
 								| I14 | No reconcile_from_slots call on startup — Zinc state never loaded into kernel | Restart | **High** |
 								| I15 | CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING | Rust | Medium |
 								| I16 | Zinc shared memory world-readable/writable by same-machine processes | Zinc | **High** |
 								| I17 | KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards | Bridge | **High** |
 								| I18 | sys.path.insert(0) at import time in 3 production files — malicious module loading | Build | **High** |
 								| I19 | pump_venue_events stale snapshot diff produces phantom position events | Venue | **High** |
 								| I20 | exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) | Contracts | Info |
 								| I21 | RATE_LIMITED code path in both Python and Rust is completely untested | All | Medium |
 								| I22 | Thread pool max_workers=3 shared across all adapter instances — never shut down | Venue | Medium |
 								### Pass 6 Severity Distribution
 								| Severity | Count |
 								|----------|-------|
 								| **Critical** | 3 (I1, I9, I10) |
 								| **High** | 9 (I4, I5, I6, I11, I12, I13, I14, I16, I17, I18, I19) |
 								| Medium | 4 (I2, I15, I21, I22) |
 								| Low | 2 (I7, I8) |
 								| Info | 2 (I3, I20) |
 								### Combined Catalog (All 6 Passes)
 								| Pass | Focus | Count | Critical | High | Medium | Low | Info |
 								|------|-------|-------|----------|------|--------|-----|------|
 								| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
 								| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
 								| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
 								| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
 								| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
 								| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
 								| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
 								| **Total** | | **160** | **11** | **41** | **41** | **46** | **21** |