siloqy/PINK_DITAv2_E2E_TRACE_ANALYSIS.md

# PINK DITAv2 — End-to-End Trace & Flaw Analysis

**Analysis date:** 2026-05-31
**Method:** Full-trace static analysis — every file, every data path, every
boundary crossing in the PINK execution pipeline. No test execution.
**System scope:** 34 active source files, ~12,000 lines across Rust kernel,
Python bridge, venue adapter, runtime, and persistence.

> **Central flaw registry:** [PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md](./PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md)
> contains the combined catalog of all 116 flaws (A, T, E, F, G series) with
> severity distribution and cross-references. This file provides the deep E2E
> trace context — read the central registry for the master list.

---

## E2E Data Flow (One Call)

Every E2E path in the PINK system traces through this sequence. Each numbered
step below is a site where data crosses a module boundary and can be lost,
mangled, or misinterpreted.

```
PinkDirectRuntime.step()                    # R1: policy cycle entry
  ├─ pump_venue_events()                    # R2: drain async fills
  ├─ kernel.snapshot()["account"]           # R3: read capital
  ├─ kernel.slot(0)                         # R4: read slot state
  ├─ decision_engine.decide()               # R5: policy-layer ENTER/EXIT
  ├─ intent_engine.plan()                   # R6: intent sizing
  ├─ _decision_to_kernel_intent()           # R7: Decision → KernelIntent
  ├─ kernel.process_intent(kernel_intent)   # R8: KERNEL BOUNDARY
  │   ├─ rust_backend._intent_to_payload()  # R8a: KernelIntent → JSON
  │   ├─ _RustKernelLib.process_intent()    # R8b: JSON → C FFI
  │   │   └─ Rust process_intent()          # R8c: FSM mutates TradeSlot
  │   ├─ venue.submit(intent)               # R9: VENUE BOUNDARY
  │   │   ├─ bingx_venue._legacy_intent()   # R9a: KernelIntent → LegacyIntent
  │   │   ├─ BingxDirectExecutionAdapter    # R9b: HTTP POST /trade/order
  │   │   │   .submit_intent()
  │   │   └─ bingx_venue._events_from_submit() # R9c: receipt → VenueEvent[]
  │   └─ on_venue_event(event)              # R10: FEEDBACK BOUNDARY
  │       ├─ _RustKernelLib → Rust FSM      # R10a: C FFI → FSM transition
  │       ├─ account.settle(delta)          # R10b: incremental PnL settlement
  │       └─ persistence writes             # R10c: ClickHouse / Zinc / HZ
  ├─ kernel.snapshot()["account"]           # R11: read final capital
  └─ persistence.persist_step()             # R12: PERSISTENCE BOUNDARY
```

---

## Layer 1: Policy Cycle Entry (pink_direct.py:422)

### E1: `step()` calls `pump_venue_events()` every cycle unconditionally

**pink_direct.py:436**
```python
await self.pump_venue_events(snapshot, market_state=market_state)
```

This is called **before** reading slot/account state for the policy decision.
The pump calls `venue.reconcile()` which for `BingxVenueAdapter` does 5 HTTP
requests (balance, positions, open orders, plus history if `include_history`).

For MARKET-only workflows, no resting orders exist, so `reconcile()` returns
empty events every time. But the HTTP calls still happen. On BingX VST with
~10 req/s limit and a 5s policy cycle, this burns 1 req/s just to learn
"nothing changed." Add the actual trade HTTP calls, and the budget is tight.

**Flaw: E1 — unconditional exchange poll wastes rate limit.**
Already documented as A10, but worse when traced E2E: each `pump_venue_events`
calls `venue.reconcile()` → `_backend_snapshot()` → parallel `asyncio.gather`
of 3 HTTP GETs. The `_refresh_exchange_state` at bingx_direct.py:281-352
always fetches balance + positions + openOrders concurrently. Even when
`include_history=False` (which it is for the pump), that's 3 HTTP calls
every policy cycle regardless of whether any orders are resting.

**Severity: Medium.** Wasteful but not destructive on testnet.

### E2: `kernel.snapshot()["account"]` returns a fresh dict, not a live view

**pink_direct.py:437**
```python
acc = self.kernel.snapshot()["account"]
```

`ExecutionKernel.snapshot()` at rust_backend.py:740-752 builds a dict from
kernel state at call time. The decision/intent engines then consume this
snapshot. Between the snapshot and `process_intent()` (line 523), another
caller (or the same runtime in a concurrent cycle) could advance the kernel
state, making the decision based on stale capital.

**Flaw: E2 — TOCTOU between capital snapshot and intent execution.**
The `context.capital` read at line 437 is used at line 523 for the ENTER
safety guard (`_unsafe_entry_reason`) and possibly by the decision/intent
engines. If capital changes between these two points (e.g. an async fill
arrives via a concurrent test-HTTP path), the guard uses stale capital.

**Severity: Low** in single-threaded deployment. Critical under concurrency.

---

## Layer 2: Decision/Intent Bridging (pink_direct.py:79-115)

### E3: `_decision_to_kernel_intent` drops `order_type` and `limit_price`

**pink_direct.py:79-115**
```python
def _decision_to_kernel_intent(decision, intent, slot_id=0):
    return KernelIntent(
        ...
        # order_type and limit_price are NOT SET here
    )
```

`KernelIntent` has `order_type="MARKET"` and `limit_price=0.0` as defaults,
so MARKET orders work correctly. But the runtime **never** sets these fields
from the policy layer. If `decision` or `intent` ever carries `order_type`
or `limit_price`, it's silently dropped because the bridge doesn't map them.

**Flaw: E3 — LIMIT support in runtime is dead code.**
The `order_type`/`limit_price` fields in `KernelIntent` and the LIMIT payload
building in `bingx_direct.py` lines 384-398 are unreachable from the runtime.
The only path that can set them is direct `KernelIntent(...)` construction
in tests (`_build_pink_bodies.py` style scenarios). The `_decision_to_kernel_intent`
bridge must be patched when a policy engine needs to emit LIMIT orders.

**Severity: Medium.** Blocks any production path to LIMIT orders.

### E4: `_exit_intent_from_slot` trusts slot.size but slot may be stale

**pink_direct.py:398-420**
```python
def _exit_intent_from_slot(self, kernel_intent):
    try:
        slot_size = float(self.kernel.slot(int(kernel_intent.slot_id)).size or 0.0)
    except Exception:
        slot_size = 0.0
    ...
    exit_size = min(policy_size, slot_size) if policy_ok else slot_size
```

Reads `slot.size` fresh from the Rust kernel at call time, then uses it to
cap the exit size. Between this read and the `process_intent` call that
actually executes the EXIT (line 523), the slot can be modified by
`pump_venue_events` (line 436) or a concurrent cycle. If a partial fill
arrived between the slot read and the EXIT, the exit size could be wrong.

**Flaw: E4 — TOCTOU between exit sizing and exit execution.**
Same class as E2 but for exit size rather than capital. If the pump drained
a partial fill between R4 (slot read) and R8 (process_intent), the EXIT
requests a size based on pre-pump remaining size. The kernel caps it at
actual remaining, so this is self-correcting — but the intent payload has
wrong metadata.

**Severity: Low.** Self-correcting at kernel level.

---

## Layer 3: Kernel Bridge — Rust FSM Entry (rust_backend.py)

### E5: JSON serialization round-trip loses numeric precision

**rust_backend.py:460-485 (`_intent_to_payload`)**

`KernelIntent` fields like `reference_price`, `target_size`, `leverage` are
Python floats. They're serialized to JSON text, sent through C FFI, parsed
by serde_json into Rust `f64`, then serialized back to JSON, parsed by Python
`json.loads()`. Each serialization step can introduce precision loss:

```python
# Python float → JSON: 0.1 → "0.1" → Rust f64: 0.10000000000000000555
# Rust f64 → JSON: → serde_json may print "0.10000000000000001"
# Python json.loads → 0.10000000000000001
```

For prices (TRXUSDT at ~$0.08), a 1e-16 relative error is negligible. For
PnL accumulation over thousands of trades at 9x leverage, the error can grow
to cents or dollars. The `|Δcapital − realized| < 1e-9` assertion in tests
would catch gross errors but not sub-cent accumulation.

**Flaw: E5 — JSON serialization precision drift over long runs.**
**Severity: Low.** Not a practical concern for the current deployment scale.

### E6: `_RustKernelLib` is a global singleton — shared across all kernels

**rust_backend.py:40-45**
```python
_RUST: _RustKernelLib | None = None

def _get_rust() -> _RustKernelLib:
    global _RUST
    if _RUST is None:
        _RUST = _RustKernelLib()
    return _RUST
```

The `_RustKernelLib` singleton loads the `.so` shared library once and
provides FFI functions. Each `ExecutionKernel` instance gets its own
`KernelHandle` via `_get_rust().create(max_slots)`. The FFI functions take
the handle as the first argument, so multiple kernels are isolated at the
Rust level.

**However**, the singleton means ALL kernels share the same ctypes function
pointer table. If a second kernel is created and the first is destroyed,
`KernelHandle` of the first becomes a dangling pointer. Calling any FFI
function on the destroyed kernel's handle is use-after-free.

**Flaw: E6 — No protection against use-after-free on kernel destroy.**
Already documented as T7. Worth re-emphasizing in the E2E trace because the
test infrastructure creates and destroys kernels frequently (fresh-kernel
reconcile tests, each `_build_rb()` call in scenario wrappers).

**Severity: High.** Use-after-free in C FFI is memory corruption.

---

## Layer 4: Rust Kernel FSM (lib.rs:728)

### E7: ENTER handler silently allows re-entry with same trade_id

**lib.rs:740-745**
```rust
if !slot.is_free() && !slot.trade_id.is_empty() && slot.trade_id != intent.trade_id {
    return SLOT_BUSY;
}
```

If `slot.trade_id == intent.trade_id`, the ENTER is accepted even if the
slot is not free (e.g., POSITION_OPEN with an active position). This is by
design — it lets the same trade_id re-enter after the slot was partially
reconciled or restored from a snapshot. But it also means:

1. EXIT sets `slot.closed=true` and transitions to `CLOSED`
2. A new ENTER with the **same** trade_id re-enters the CLOSED slot
3. The slot resets `slot.closed=false`, `slot.size=0.0`, `slot.initial_size=0.0`
4. Kernel now thinks the trade is new, but the Rust indexes still have the
   old trade_id pointing to slot 0

**Downstream effect:** After a re-entry with the same trade_id, the
`active_trade_index[trade_id]` still correctly points to slot 0. But the
old `VenueOrder` in `client_order_index` and `venue_order_index` is still
present until the new entry fills and creates new orders. A reconcile event
addressed to the old `venue_client_id` could stomp on the new trade.

**Flaw: E7 — Re-entry with same trade_id leaves stale index entries.**
**Severity: Low.** The `rebuild_indexes()` call in `commit_slot()` rebuilds
from scratch, so stale entries are cleared on the first write.

### E8: EXIT handler uses `initial_size` not `current size`

**lib.rs:770-775**
```rust
let exit_ratio = slot.next_exit_ratio();
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
let exit_size = (base_size * exit_ratio).max(0.0);
```

Already documented as A1. In the E2E trace, this is the single most impactful
execution flaw. A concrete scenario:

1. Enter `size=1.0`, `initial_size=1.0`, `exit_leg_ratios=(0.5, 0.5, 1.0)`
2. EXIT leg 0: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.5.
3. EXIT leg 1: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.0.
   `active_leg_index` advances to 2. `all_legs_done = (2 >= 3) = false`.
   But wait — `exit_leg_ratios.len()` is 3: [0.5, 0.5, 1.0]. So
   `all_legs_done = (2 >= 3) = false`. The slot stays at `POSITION_OPEN`,
   `size=0.0`, `!closed`.
4. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is at 0.0.
   `slot.is_free()`: `fsm_state=POSITION_OPEN`, not in `{IDLE, CLOSED}`.
   `slot.size <= 0.0` is true. But `!slot.is_free()` returns true because
   of the FSM state check, not the size check. The ENTER guard `!slot.is_free()`
   blocks re-entry. The EXIT guard `slot.is_free() || slot.closed || size <= 0.0`
   triggers — returns `NO_OPEN_POSITION`.
5. **Slot is stuck forever.** No operation can advance it.

**Severity: High.** Concrete, reproducible, and not caught by any test.

### E9: CANCEL handler returns diagnostic even when nothing happened

**lib.rs:795-810**
```rust
if matches!(intent.action, KernelCommandType::CANCEL) {
    let has_cancellable_exit = slot.active_exit_order.is_some();
    let has_cancellable_entry = slot.active_entry_order.is_some()
        && matches!(slot.fsm_state, ENTRY_WORKING | ORDER_REQUESTED | ORDER_SENT | IDLE);
    if !has_cancellable_exit && !has_cancellable_entry {
        return KernelResult {
            outcome: KernelOutcome {
                accepted: false,
                diagnostic_code: NO_ACTIVE_EXIT_ORDER,
                ...
            },
            ...
        };
    }
    return KernelResult {
        outcome: KernelOutcome {
            accepted: true,
            ...
        },
        ...
    };
}
```

Two issues:
1. When **neither** is cancellable, the diagnostic is `NO_ACTIVE_EXIT_ORDER`
   even if the actual reason is "no active entry order either" or "slot is
   already IDLE". The diagnostic is misleading.
2. When at least one IS cancellable, the Rust kernel returns `accepted=true`
   but does **not** mutate the slot at all — it returns immediately with the
   slot as-is. The actual cancel (HTTP call + FSM transition) happens in the
   Python bridge. The Rust kernel's "accept" just means "yes you may try to
   cancel this" — not "the cancel is complete."

This disconnect means: if the Python bridge's `venue.cancel()` fails (HTTP
error), the Rust kernel has already returned `accepted=true` for a cancel
that never happened. The caller sees `accepted=true` but the slot state
hasn't changed.

**Flaw: E9 — Rust CANCEL "accepts" before Python actually cancels.**
**Severity: Medium.** The `outcome.accepted` boolean is misleading for CANCEL.

### E10: `apply_fill` entry branch double-sets `active_entry_order`

**lib.rs:1330-1390**
```rust
// First set — at the top of the entry branch:
slot.active_entry_order = Some(VenueOrder {
    ...
    filled_size: fill_size,
    status: if partial { PARTIALLY_FILLED } else { FILLED },
    ...
});

// ... then later for full fill:
if !partial {
    slot.fsm_state = TradeStage::POSITION_OPEN;
    slot.active_entry_order = Some(VenueOrder {  // SECOND SET
        ...
        filled_size: slot.size,    // uses updated slot.size
        ...
    });
}
```

The entry branch sets `active_entry_order` at the top with `filled_size` from
the event, then for a FULL_FILL, sets it again with `filled_size = slot.size`
(which may have been updated by `slot.initial_size = fill_size` above). The
first VenueOrder's `intended_size` is from the event, the second uses
`slot.size`. Both are correct in isolation, but the double-write is wasteful.

More importantly, for a PARTIAL_FILL entry, the first set is the ONLY set.
If a second PARTIAL_FILL arrives for the same order, the entry branch at
line 1334 checks `slot.active_entry_order.is_some()` which is true (set by
the first partial), but the FSM state is `ENTRY_WORKING` (also set by first
partial). The condition at line 1334-1338 matches `ENTRY_WORKING`, so the
second partial enters the entry branch again. But `fill_size` is the event's
`filled_size` — the **total** filled, not the incremental amount.

**Flaw: E10 — Second PARTIAL_FILL on entry overwrites, doesn't accumulate.**
```rust
let fill_size = if event.filled_size > 0.0 {
    event.filled_size      // ← TOTAL filled, not incremental
} else {
    event.size
}.max(0.0);

slot.active_entry_order = Some(VenueOrder {
    ...
    filled_size: fill_size,  // ← overwrites previous filled_size
    ...
});

slot.initial_size = slot.initial_size.max(fill_size);  // ← OK, uses max
slot.size = fill_size;  // ← OVERWRITES previous size with total
```

On a RESTING LIMIT entry that partially fills in two events:
- Event 1: filled_size=0.3 → slot.size=0.3, entry_order.filled_size=0.3
- Event 2: filled_size=0.7 → slot.size=0.7, entry_order.filled_size=0.7

The `filled_size` on the VenueOrder correctly reflects cumulative fill
(0.7), but `slot.size` jumps from 0.3 to 0.7 — the increment is 0.4, which
is correct because `fill_size` IS the cumulative fill (0.7). Actually this
is correct — the venue sends cumulative filled_size, not incremental. Let
me re-verify: at `bingx_venue._events_from_submit()` line ~480:
```python
filled_size = _row_float(ack_row, "executedQty", ...)
```
This reads `executedQty` which on BingX IS cumulative. So the second event's
`filled_size=0.7` means "total filled across all fills = 0.7." The kernel
sets `slot.size = 0.7` which is the total position size. This is correct.

But the second fill event has `slot.entry_price` overwritten by the new
fill's price. If the first fill was at 0.0834 and the second at 0.0836, the
slot's `entry_price` becomes 0.0836 — losing the blended average. For a LIMIT
entry with two partial fills at different prices, the entry_price in the slot
is the price of the LAST fill, not the VWAP.

**Flaw: E10a — Entry price on multi-partial entry is last-fill, not VWAP.**
**Severity: Low.** Unrealized PnL computation uses this price. Error is small
for tight spreads.

---

## Layer 5: Venue Adapter Boundary (bingx_venue.py)

### E11: `_legacy_intent()` is a lossy conversion

**bingx_venue.py:270-285**
```python
@staticmethod
def _legacy_intent(intent: KernelIntent) -> LegacyIntent:
    action = LegacyDecisionAction.ENTER if intent.action == E.ENTER else ...
    side = LegacyTradeSide.SHORT if intent.side == TS.SHORT else ...
    metadata = dict(intent.metadata)
    metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
    metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
    return LegacyIntent(
        timestamp=intent.timestamp,
        trade_id=intent.trade_id,
        decision_id=intent.intent_id,
        asset=intent.asset,
        action=action,
        side=side,
        reason=intent.reason,
        target_size=float(intent.target_size),
        leverage=float(intent.leverage),
        reference_price=float(intent.reference_price),
        confidence=1.0,           # ← HARDCODED
        bars_held=0,              # ← HARDCODED
        exit_leg_ratios=tuple(intent.exit_leg_ratios or (1.0,)),
        metadata=metadata,
    )
```

`confidence` is always 1.0 and `bars_held` is always 0. The `LegacyIntent`
carries these to `BingxDirectExecutionAdapter.submit_intent()` which ignores
them (it only reads `asset`, `side`, `action`, `target_size`, `leverage`,
and `metadata`). So the hardcoded values don't affect execution — but they
affect the `ExecutionReceipt` and any downstream consumers that might read
`receipt.confidence`.

**Flaw: E11 — Lossy conversion with hardcoded metadata.**
**Severity: Informational.** No downstream consumer reads these fields.

### E12: `_events_from_submit()` price fallback chain can lose venue price

**bingx_venue.py:375-400 (`_events_from_submit`)**
```python
base_event = VenueEvent(
    ...
    price=safe_float(getattr(receipt, "price", 0.0), 0.0),
    ...
)

# ... later for fill event:
fill_price = safe_float(
    _row_float(ack_row, "avgPrice", "ap", "price", "lastFillPrice",
               default=getattr(receipt, "price", 0.0)),
    0.0
)
```

The fill price is read from `ack_row` (the HTTP response dict) first, falling
back to `receipt.price` (the `ExecutionReceipt` field). The `executionReceipt`
price comes from `bingx_direct.py:434`:
```python
fill_price = 0.0
for key in ("avgPrice", "avgFilledPrice", "price", "lastFillPrice", "tradePrice"):
    try: value = float(ack_row.get(key) or 0.0)
    except: value = 0.0
    if value > 0: fill_price = value; break
if fill_price <= 0 and self._state is not None:
    fill_price = next((float(...)) for ... in self._state.open_positions.values() ...)
```

So the price flows: BingX HTTP ack → `ack_row[key]` → `receipt.price` →
`_events_from_submit()` → `fill_price` in VenueEvent.

If `ack_row` has no price field AND `self._state.open_positions` has no matching
position (e.g., first fill on a new entry), `fill_price` stays 0.0. The kernel's
`apply_fill` at lib.rs:1397 checks `if event.price > 0.0` before setting
`entry_price` — so a zero fill price leaves `entry_price` at 0.0. This means:

- The slot's `entry_price` stays 0.0
- `realized_pnl()` at lib.rs:662 checks `if slot.entry_price <= 0.0` → returns 0.0
- **PnL is never computed for this fill**
- Capital never settles

This is very unlikely on BingX VST, which always returns `avgPrice` in order
acknowledgements. But on any venue that doesn't, PnL is silently zeroed.

**Flaw: E12 — Zero fill price → zero entry_price → zero PnL.**
**Severity: Medium.** Silent PnL loss if venue returns no price.

### E13: `_backend_snapshot()` timeout returns stale data

**bingx_venue.py:290-320**
```python
def _backend_snapshot(self, *, include_history=False, timeout_ms=5000.0):
    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
        with self._snap_lock:
            return self._last_snapshot  # ← STALE DATA
```

If the previous snapshot fetch is still in-flight when a new caller arrives,
the timeout returns `self._last_snapshot` — which could be seconds or minutes
old. The caller (e.g., `submit()`) then uses this stale snapshot to compute
`_filled_size_from_snapshots()` — potentially comparing stale "before" data
with fresh "after" data, producing a wrong delta.

**Flaw: E13 — Stale snapshot fallback causes wrong fill-size detection.**
**Severity: Medium.** The `_filled_size_from_snapshots` diff can be wrong.

### E14: `_events_from_cancel` uses stale `slot_id` from order metadata

**bingx_venue.py:485-510**
```python
VenueEvent(
    ...
    slot_id=int(order.metadata.get("slot_id", 0) or 0),
    ...
)
```

The `slot_id` in the CANCEL event comes from the `VenueOrder.metadata` which
was set when the order was created (in Rust FSM's `process_intent` or
`on_venue_event`). If the slot was re-assigned or the kernel's slot count
changed since order creation, this slot_id is wrong. The Rust kernel's
`resolve_slot()` at lib.rs:610-624 would use the event's `slot_id` (the
stale one) and find the wrong slot.

**Flaw: E14 — Cancel event carries stale slot_id from order creation.**
**Severity: Low.** Slots are stable and never renumbered.

---

## Layer 6: BingX Direct Adapter (bingx_direct.py)

### E15: Submit sets leverage via separate HTTP call

**bingx_direct.py:376-379**
```python
await self._client.signed_post(
    "/openApi/swap/v2/trade/leverage",
    {"symbol": symbol, "side": "BOTH", "leverage": leverage},
)
```

This is a POST to set exchange leverage **before** each order. If this call
fails (rate limit, network error), the exception at line 417 sets
`status = "RATE_LIMITED"` and returns a rejection — the order is NOT
submitted. But the error handling at line 417 catches `BingxHttpError` for
the leverage call AND the order call with the same handler. If the leverage
call fails with a non-rate-limit error (e.g., `400 Bad Request` for invalid
symbol), the status is `"REJECTED"` and no order is placed. This is correct
behavior — but the error message doesn't distinguish "leverage set failed"
from "order submission failed."

**Flaw: E15 — Leverage-set failure and order failure share error handler.**
**Severity: Low.** Correct behavior, poor diagnostics.

### E16: `_format_quantity` and `_format_price` use `_instrument_step`/`_instrument_tick` — both may be zero

**bingx_direct.py:234-268**
```python
def _instrument_step(self, asset):
    instrument = self._resolve_instrument(asset)
    if instrument is not None:
        try: return Decimal(str(instrument.size_increment.as_decimal()))
        except: pass
    return Decimal("0.001")  # fallback

def _format_quantity(self, asset, quantity):
    step = self._instrument_step(asset)
    if step <= 0:
        return str(max(0.0, quantity))
    ...
```

If `_resolve_instrument` returns None (asset not in provider), `step=0.001`
and `tick=0.01`. These defaults are correct for most USDT perpetuals on
BingX VST, but may be wrong for non-standard symbols. The format functions
still produce a valid string — just possibly with wrong precision.

More concerning: `_resolve_instrument` at line 211-226 tries three lookup
strategies and iterates all instruments on the third. This iteration is O(n)
in the number of instruments and happens on EVERY `submit_intent()` call.
With 540 instruments, this is ~0.5ms — acceptable. But `_instrument_step`
and `_instrument_tick` each call `_resolve_instrument` independently, so
`submit_intent()` calls it twice (once for quantity, once for price, plus
once for `_instrument_venue_symbol` at line 358). Three full-instrument-list
iterations per order.

**Flaw: E16 — Instrument resolution called 3x per order with O(n) scan.**
**Severity: Low.** Performance, not correctness.

### E17: Cancel uses truth-based confirmation — can mask real errors

**bingx_direct.py:474-498**
```python
still_open = True
try:
    oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
    ...
    still_open = (venue_order_id in ids) if venue_order_id else (venue_client_id in cids)
except Exception:
    still_open = None

if still_open is False:
    return {"status": "CANCELED", ...}
if str(delete_resp.get("status", "")).upper() in {"CANCELED", "CANCELLED", "SUCCESS", "OK"}:
    return {"status": "CANCELED", ...}
return {"status": delete_resp.get("status", "REJECTED"), ...}
```

The cancel logic:
1. DELETE the order on BingX
2. GET open orders to verify
3. If the order is no longer open, return CANCELED
4. If the DELETE response says CANCELED, return CANCELED
5. Otherwise return REJECTED

If step 2's GET fails (network error, rate limit), `still_open=None`.
Then step 4 checks the DELETE response. If the DELETE also returned an error
(e.g., "order not found" because it was already cancelled by another caller),
`status` is `"ERROR"` or `"not found"` — neither matches `"CANCELED"`.
The cancel is reported as `REJECTED` even though the order IS cancelled.

The `bingx_venue._events_from_cancel()` then emits `CANCEL_REJECT` instead
of `CANCEL_ACK`. The Rust kernel handles `CANCEL_REJECT` at lib.rs:1218:
```rust
KernelEventKind::CANCEL_REJECT => {
    if slot.fsm_state == TradeStage::EXIT_WORKING {
        slot.fsm_state = TradeStage::EXIT_WORKING;  // no-op
    }
    diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
}
```

The slot stays in its current state (e.g., `EXIT_WORKING`) with no active order
(the exchange has no record of it). The slot is stuck until a manual reconcile.

**Flaw: E17 — Cancel can return false REJECTED for already-cancelled orders.**
**Severity: Medium.** Leads to stuck slot requiring manual intervention.

---

## Layer 7: Fill Feedback Loop (rust_backend.py on_venue_event)

### E18: `on_venue_event` settles PnL incrementally — but fees are never included

**rust_backend.py:530-545**
```python
incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
if abs(incremental_pnl) > 1e-12:
    self.account.settle(incremental_pnl)
    self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
```

The Rust kernel's `apply_fill` computes realized PnL as:
```rust
let realized = Self::realized_pnl(slot, event.price, fill_size);
slot.realized_pnl += realized;
```

No fee subtraction. No commission reading from the event. The `VenueEvent`
could carry fee data via `metadata["fee"]` or `raw_payload["commission"]`,
but the Rust kernel doesn't read it and the Python bridge doesn't extract it.

Over the 142 live test scenarios on VST (where fees are 0 or negligible),
this is invisible. On live mainnet with exchange fees of 0.02-0.04%, the
cumulative error is unbounded.

**Flaw: E18 — PnL settlement ignores fees.**
Already documented as A7. In the E2E trace, the gap is specifically here:
`VenueEvent.price` is used for `realized_pnl()` but `VenueEvent.metadata`
(which could carry `commission` from the venue) is never read.

**Severity: Medium** (grows with trade volume).

### E19: `observe_slots` called with ALL slots, not just changed ones

**rust_backend.py:538-545**
```python
slots = [self._get_slot(i) for i in range(self.max_slots)]
self.account.observe_slots(slots)
```

Every `on_venue_event` call re-reads ALL slots from the Rust kernel (N FFI
calls) and calls `observe_slots` with the full list. With `max_slots=10`,
this is 10 FFI round-trips per venue event. Each round-trip serializes a
TradeSlot to JSON, passes through C FFI, parses on the Rust side, serializes
the result, passes back, and parses on the Python side. For a multi-leg EXIT
with 3 fills (ACK + PARTIAL + FULL), that's 3 × 10 = 30 slot reads per
process_intent call.

**Flaw: E19 — Full-slot-list read on every event is N×FFI overhead.**
**Severity: Low** (performance). Not a correctness issue.

---

## Layer 8: Persistence Boundary (pink_clickhouse.py)

### E20: `_capital()` reads live from `AccountProjection` — stale row risk

**pink_clickhouse.py:199-200**
```python
def _capital(self) -> float:
    return float(self.account.snapshot.capital or 0.0)
```

Every row writer calls `_capital()` at write time to get the current capital.
But `persist_result()` is called AFTER `kernel.process_intent()` returns —
at which point the account has already been settled. The `account_events`,
`position_state`, and `trade_events` rows all record the SAME capital value
(the post-settle value). `capital_before` is then reconstructed by
subtracting PnL (already documented as A5).

The effect: all ClickHouse rows for a single `process_intent()` call show
identical `capital` / `account_capital` / `portfolio_capital` values, because
they're all written within the same Python call stack with no intervening
events. This is correct for single-threaded operation — all rows reflect
POST-trade state. But it means ClickHouse querying for "capital before trade"
must use `capital_after - pnl`, which is the wrong formula under multi-slot.

**Flaw: E20 — All persistence rows write post-trade capital, not pre-trade.**
Already documented as A5 from the capital_before angle.

**Severity: High** for multi-slot accounting reconstruction.

### E21: `persist_fill_events()` synthesizes fake Decision/Intent

**pink_clickhouse.py:383-435**
```python
def persist_fill_events(self, *, snapshot, events, slot_dict, market_state):
    ...
    decision = Decision(
        timestamp=ts, decision_id=trade_id or "async", asset=asset,
        action=action, side=side, reason="ASYNC_FILL",
        confidence=0.0, velocity_divergence=0.0, irp_alignment=0.0,
        reference_price=price, target_size=cur_size, leverage=leverage,
        ...
    )
    intent = Intent(
        timestamp=ts, trade_id=trade_id, decision_id=trade_id or "async",
        ...
    )
```

The async fill pump (called by `pump_venue_events`) constructs fake
Decision/Intent objects because there's no real policy decision backing an
async fill — it just arrived from the exchange. These synthetic objects have:
- `decision_id = trade_id` (or `"async"` if trade_id is empty)
- `decision_id` and `trade_id` are the same string
- `confidence=0.0`, `velocity_divergence=0.0`, `irp_alignment=0.0`
- `target_size = cur_size` (the remaining size after the fill, not the
  size that was filled)

These are written to `policy_events`, `trade_reconstruction`, and
`trade_events` with the same row shapes as real policy-driven fills. Any
ClickHouse query that joins `policy_events` to `trade_events` on
`decision_id` will find matching rows (both set to `trade_id`), but the
policy_events row's `target_size` is the POST-fill size, not the pre-fill
size. A replay system that reconstructs position from `policy_events` →
`trade_reconstruction` would see incorrect sizing.

**Flaw: E21 — Async fill persistence uses synthetic decision with wrong data.**
**Severity: Medium.** Misleading historical records.

### E22: `_write_trade_exit_leg` capital_before uses arithmetic reconstruction

**pink_clickhouse.py:761-762**
```python
capital_after = self._capital()
capital_before = capital_after - pnl_leg
```

Already documented as A5. In the E2E trace, the specific path is:
1. Slot 0 exit leg fills → `_capital()` returns capital AFTER settlement
   (because the kernel's `on_venue_event` already called `account.settle`)
2. `capital_before = capital_after - pnl_leg` reconstructs pre-leg capital

If slot 1 also settled between the leg fill and the persistence write
(possible in multi-threaded or concurrent scenario), `capital_after` includes
slot 1's PnL, and `capital_before` is wrong by exactly slot 1's contribution.

**Severity: High** for multi-slot.

### E23: `_write_trade_event` uses `slot_dict.get("entry_price")` as exit_price

**pink_clickhouse.py:813-815**
```python
entry_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)
exit_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)  # ← SAME FIELD
```

Already documented as A13. The `exit_price` is set to `entry_price` from
the same slot dict field. The BingX ack payload does contain the fill price,
but it's not propagated to the slot dict's `entry_price` for exit fills —
the slot's `entry_price` is set during entry fill and remains unchanged
during exit. The exit fill price is only on the `VenueEvent`, which is not
passed through to `_write_trade_event`.

The `trade_events` row in ClickHouse always shows `exit_price == entry_price`,
making PnL reconstruction from `(exit_price - entry_price) × size × lev`
impossible. The `pnl` field IS correct (it's `slot.realized_pnl`), but only
the summary is accurate — the component prices are wrong.

**Severity: Low.** `pnl` is correct, only the decomposed price is wrong.

---

## Layer 9: Test Infrastructure

### E24: `MockVenueAdapter.submit()` always emits fill on `partial_fill_ratio > 0`

**mock_venue.py:60-90**
```python
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
    fill_ratio = max(0.0, min(1.0, float(effective_ratio)))
    ...
    if is_entry:
        effective_ratio = self.scenario.entry_partial_fill_ratio if \
            self.scenario.entry_partial_fill_ratio != 1.0 else \
            self.scenario.partial_fill_ratio
    else:
        effective_ratio = self.scenario.exit_partial_fill_ratio ...
```

The default `MockVenueScenario()` has `partial_fill_ratio=1.0`. So every
`submit()` call on a default mock emits a FULL_FILL event immediately.
This means mock-venue tests always test the "order fills instantly" path —
they never test resting orders, partial fills, or async fills.

Any test that relies on the mock venue is testing a subset of real venue
behavior. The mock never produces:
- DELAYED fills (fill arrives on a later `reconcile()` call)
- PARTIAL fills with subsequent fills
- Partial fills during entry (entry fills partially, then more later)
- Mixed entry/exit partial behavior

**Flaw: E24 — Mock venue always fills synchronously — never tests async path.**
**Severity: Medium.** The `pump_venue_events()` path has never been exercised
with the mock venue.

### E25: Test scenarios use MARKET-only `_si()` helper — no LIMIT tests

**gen_live_tests.py and _gen_test.py**

The `_si()` helper constructs a `KernelIntent` with `order_type="MARKET"` and
`limit_price=0.0` (the defaults). All 157 live test scenarios use `_si()`.
The 3 "LIMIT" scenarios (`limit_does_not_fill`, `limit_immediate_fill`) use
`reference_price=0.0` and `target_size=-0.001` respectively — they test
**intent validation**, not actual LIMIT order submission.

There is **zero** live-test coverage of:
- Submitting a LIMIT order that rests on the book
- A resting LIMIT being cancelled
- A resting LIMIT receiving a partial fill then a subsequent fill
- An async fill arriving via `pump_venue_events()`

The Rust kernel's `PARTIAL_FILL` event handling and the Python bridge's
`on_venue_event` + incremental settle + async pump has never been exercised
on a live exchange.

**Flaw: E25 — Zero live tests for LIMIT/resting/async-fill paths.**
**Severity: High.** The partial-fill code path is untested in production.

### E26: Fresh-kernel reconcile tests create second kernel but share venue

**gen_live_tests.py** (fresh_kernel_reconcile_entry body)
```python
fresh = _build_fresh_kernel_from_slot(slot_data, ic=cb)
k2 = fresh.runtime.kernel
```

The `_build_fresh_kernel_from_slot` function creates a new `PinkDirectRuntime`
with a new `ExecutionKernel`. But the **venue adapter** is shared or
re-created with the same BingX backend. Two kernels making concurrent HTTP
calls to BingX through shared or separate venue adapters is exactly the
multi-threaded scenario that triggers T1 (Rust kernel UB) — except the tests
are sequential, not concurrent, so they don't trigger it.

The fresh kernel does NOT restore the venue state (open orders, positions).
The fresh kernel has a blank venue adapter state — it can't know about
previous LIMIT orders resting on the exchange. This is correct for MARKET-only
tests (no resting orders) but would fail for LIMIT tests.

**Flaw: E26 — Fresh-kernel reconcile doesn't restore venue state.**
**Severity: Medium** (would break LIMIT scenarios).

---

## Summary: Critical E2E Flaw Chain

The most dangerous E2E scenario is a **LIMIT order with partial fills** on
a live exchange:

```
1. Policy emits LIMIT ENTER                       [E3: can't happen — bridge drops order_type]
2. KernelIntent with order_type="LIMIT"            [dead code path from step 1]
3. bingx_direct.submit_intent builds LIMIT payload [works if reached]
4. BingX accepts LIMIT, returns ACK with no fill   [VenueEvent.price may be 0]
5. FSM transitions to ENTRY_WORKING                [correct]
6. RESTING LIMIT sits on book                      [no further kernel events]
7. Next policy cycle: pump_venue_events()           [E1: expensive HTTP calls]
8. Reconciled venue has no fill events              [nothing to drain]
9. Repeated cycles with no progress                 [wasteful but safe]
10. Eventually BingX fills partially               [VenueEvent arrives]
11. apply_fill PARTIAL_FILL entry branch runs       [E10: entry_price = last fill, not VWAP]
12. on_venue_event settles incremental PnL          [E18: fees not included]
13. persistence writes                              [E20/E21/E22/E23: wrong capital_before, exit_price]
14. Remaining LIMIT still rests on book             [continues to step 7]
15. Eventually full fill or cancel                  [E17: cancel can return false REJECTED]
```

**None of steps 4-15 have live test coverage.**

---

## Complete Flaw Catalog (All Layers)

| # | Flaw | Layer | Step | Severity |
|---|------|-------|------|----------|
| E1 | Unconditional pump_venue_events wastes rate limit | Runtime | R2 | Medium |
| E2 | TOCTOU between capital snapshot and intent | Runtime | R3→R8 | Medium |
| E3 | Runtime bridge drops order_type/limit_price | Bridging | R7 | **Medium** |
| E4 | TOCTOU between exit sizing and execution | Runtime | R8 | Low |
| E5 | JSON precision drift over long runs | Bridge | R8a→R8c | Low |
| E6 | Global FFI singleton no guard vs use-after-free | Bridge | R8b | **High** |
| E7 | Same-trade-id re-entry leaves stale index entries | Rust | R8c | Low |
| E8 | EXIT uses initial_size not remaining size | Rust | R8c | **High** |
| E9 | CANCEL "accepted" before cancel actually happens | Rust | R8c | Medium |
| E10 | Entry price on multi-partial fill = last fill, not VWAP | Rust | R10a | Low |
| E11 | _legacy_intent hardcodes confidence/bars_held | Venue | R9a | Info |
| E12 | Zero fill price → zero PnL | Venue | R9c | Medium |
| E13 | Stale snapshot fallback causes wrong fill delta | Venue | R9c | Medium |
| E14 | Cancel event carries stale slot_id | Venue | R9c | Low |
| E15 | Leverage-set failure and order failure share handler | Adapter | R9b | Low |
| E16 | Instrument resolution 3x per order, O(n) scan | Adapter | R9b | Low |
| E17 | Cancel returns false REJECTED for already-cancelled | Adapter | R9b | Medium |
| E18 | PnL settlement ignores fees | Bridge | R10b | **Medium** |
| E19 | Full-slot-list read on every event = N×FFI overhead | Bridge | R10b | Low |
| E20 | All persistence rows write post-trade capital | Persistence | R12 | **High** |
| E21 | Async fill uses synthetic Decision with wrong size | Persistence | R12 | Medium |
| E22 | capital_before arithmetic reconstruction wrong | Persistence | R12 | **High** |
| E23 | trade_events exit_price = entry_price | Persistence | R12 | Low |
| E24 | Mock venue always fills synchronously | Test | — | Medium |
| E25 | Zero live tests for LIMIT/async-fill paths | Test | — | **High** |
| E26 | Fresh-kernel reconcile doesn't restore venue | Test | — | Medium |

**Total: 26 E2E flaws (4 High, 10 Medium, 11 Low, 1 Info)**

The four High-severity flaws in the E2E trace:
- **E6**: Global FFI singleton + `__del__` use-after-free — memory corruption risk
- **E8**: Exit-size overshoot — slot can get stuck (A1)
- **E20/E22**: Post-trade capital in all persistence rows + arithmetic
  capital_before — ClickHouse records are misleading for accounting
- **E25**: No LIMIT/async-fill test coverage — partial-fill path is production
  code with zero live validation

---

## PASS 3 — NEW FINDINGS (Deepest E2E Trace)

### F1: `process_intent` CANCEL returns "accepted" before the cancel happens — caller gets wrong `outcome.state`

**File:** `rust_backend.py:595-614`

The CANCEL path:
1. Calls `self.venue.cancel(order)` → HTTP DELETE → returns `VenueEvent[]`
2. For each event, calls `self.on_venue_event(event)` → Rust FSM transition
3. Assembles `final_outcome` from the Rust kernel's **pre-venue-event** slot state

```python
outcome = _outcome_from_payload(result["outcome"])  # Rust CANCEL accepts (slot NOT mutated yet)
# ... venue.cancel() ...
# ... on_venue_event() for each event (now slot IS mutated) ...
final_slot = self._get_slot(outcome.slot_id)         # Re-reads post-mutation state
final_outcome = KernelOutcome(
    accepted=outcome.accepted,        # TRUE — from Rust's pre-event accept
    state=final_slot.fsm_state,       # IDLE — from post-event state
    diagnostic_code=outcome.diagnostic_code,  # "OK" — from Rust's pre-event accept
)
```

For ENTER/EXIT, the same pattern exists — the Rust kernel's `outcome` is
pre-venue. But for CANCEL the disconnect is worst: Rust returns `accepted=true`
with the slot still in `ENTRY_WORKING`, and only the subsequent
`on_venue_event(CANCEL_ACK)` transitions to `IDLE`.

**Fix:** The diagnostic code should be reconciled with the actual venue outcome,
not taken from the pre-venue Rust outcome.

**Severity: Medium**

### F2: `_last_settled_pnl` reset before `venue.submit()` — transient window

**File:** `rust_backend.py:597-604`

```python
if intent.action == KernelCommandType.ENTER and outcome.accepted:
    self._last_settled_pnl[intent.slot_id] = 0.0   # reset HERE
# ... venue.submit() called below ...
```

If `venue.submit()` fails (HTTP error, rate limit), the ENTER was accepted by
the Rust FSM but no venue order was placed. The slot is stuck in
`ORDER_REQUESTED`. If the caller retries the same ENTER, `_last_settled_pnl`
is 0.0 from the first attempt — correct for a new trade.

**Real risk:** If the previous trade on this slot had realized PnL that was
never settled (impossible with incremental settle, but hypothetically), resetting
to 0.0 loses that PnL. In practice, incremental settle makes this safe.

**Severity: Medium** (retry-safe, but exposes slot-stall)

### F3: `_first_invalid_intent_field` allows `leverage=0` and `target_size=0`

**File:** `rust_backend.py:295-316`

The guard catches NaN/Inf and negative `target_size`. Does NOT catch:
- `leverage=0` or negative (Rust silently falls back to 1.0)
- `target_size=0` (submits zero-quantity order to BingX)
- `reference_price=0` (mark_price ignores non-positive)
- `limit_price=0` with `order_type="LIMIT"` (BingX rejects price=0)

The zero-target-size case: a direct `process_intent(EXIT, target_size=0.0)`
computes `exit_size = 0`, submits MARKET order with quantity=0 to BingX,
which may return an error or silent no-op.

**Severity: Low** (runtime's `_exit_intent_from_slot` prevents for EXIT; direct
kernel API users can trigger it)

### F4: `outcome.emitted_events` only contains venue events — Rust kernel's events silently dropped

**File:** `rust_backend.py:641-652`

```python
final_outcome = KernelOutcome(
    emitted_events=tuple(emitted_events),  # only from venue.submit()
)
```

The Rust kernel's `KernelOutcome` struct has `emitted_events` — currently always
empty because the Rust FSM never sets it. If a future change adds Rust-side
event emission, those events are silently dropped: `final_outcome` only uses
the Python-side list.

**Severity: Low** (no Rust-emitted events exist today)

### F5: `on_venue_event` does redundant FFI read of slot already returned by Rust

**File:** `rust_backend.py:698-706**

```python
def on_venue_event(self, event):
    result = _get_rust().on_venue_event(...)
    outcome = _outcome_from_payload(result["outcome"])
    slot_payload = result.get("slot")
    slot = _slot_from_payload(slot_payload) if slot_payload else self._get_slot(...)
    # ...
    current = self._get_slot(slot.slot_id)  # REDUNDANT — slot already has this data!
    self.projection.write_slot(current)
```

Line 706 re-reads `current` from the backend even though `slot` (from the
Rust result) already has the exact same data. Each redundant FFI read is
JSON serialize → C FFI → Rust serialize → C FFI → Python parse — ~100μs.
With 2-3 events per process_intent and 10 slots, ~3ms wasted per cycle.

**Severity: Low** (performance)

### F6: `_record_transitions` in `process_intent` records pre-venue transitions with `event=None`

**File:** `rust_backend.py:708, 650**

```python
# process_intent line 650:
self._record_transitions(outcome.transitions, final_slot, None)  # event=None

# on_venue_event line 708:
self._record_transitions(outcome.transitions, slot, event)  # event attached
```

Venue-event transitions ARE recorded individually inside each
`on_venue_event` call (line 708). The journal has all transitions. But the
pre-venue transitions (from Rust FSM before venue call) have `event=None`
attached — no event context for the journal reader.

**Severity: Informational** (diagnostic inconvenience only)

### F7: `reconcile_from_slots` writes ALL slots to projection/zinc, not just reconciled ones

**File:** `rust_backend.py:718-733**

```python
for current in slots:          # iterates ALL max_slots
    self.projection.write_slot(current)   # writes unchanged slots too
    self.zinc_plane.write_slot(current)
```

After reconcile, ALL slots are written to projection and Zinc, even if the
reconcile only modified one slot. Slots 1-9 are serialized and written with
their unchanged state. Wasteful but harmless.

Also: Rust kernel's `reconcile_slots_json` silently ignores `slot_id` out of
range — no error returned. Caller sees `accepted=true` even if no slots were
reconciled.

**Severity: Low**

### F8: `HazelcastRowWriter.put()` is synchronous with no error handling — Hazelcast failure crashes the intent

**File:** `hazelcast_projection.py:30-48**

```python
class HazelcastRowWriter:
    def __call__(self, name, row):
        if name.endswith("trade_events"):
            self.client.get_topic(name).publish(json.dumps(row, ...))
            return
        self.client.get_map(name).put(key, json_safe(row))  # synchronous, no try/except
```

No try/except. Hazelcast `put()` is synchronous — blocks until the cluster
acknowledges. If Hazelcast is down, under load, or partitioned, this:

1. Blocks the calling thread (which holds the Rust kernel handle — no other
   operation can proceed)
2. Raises an exception that propagates through `_set_slot()` → `process_intent()`
   → crashes the entire intent

**Severity: Medium** (Hazelcast failure in hot path stalls execution)

### F9: `RealZincPlane.write_slot()` serializes ALL slots, not just the changed one

**File:** `real_zinc_plane.py:205-212**

```python
def write_slot(self, slot):
    with self._lock:
        self._slot_cache[int(slot.slot_id)] = slot
        payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
        self._write_region(self.state_region, self._state_seq, payload)
```

Every single-slot write serializes ALL `slot_count` slots (default 10) to JSON.
With VenueOrder metadata, each slot payload can be ~1-5KB → 10-50KB per write.
This is written to Zinc shared memory on every `process_intent()` and
`on_venue_event()` call.

`InMemoryZincPlane` does NOT have this problem — it only stores the one slot.

**Severity: Low** (performance + Zinc shared-memory capacity waste)

### F10: `RealZincPlane.write_slot` zeros buffer before write — concurrent read sees empty data

**File:** `real_zinc_plane.py:255-263**

```python
def _write_region(self, region, seq, payload):
    buf = region.as_buffer()
    view = memoryview(buf)
    view[:] = b"\x00" * len(view)     # Zeros the buffer
    view[: len(packet)] = packet       # Writes packet
    region.notify()
```

Between the zero and the write, any concurrent reader sees zeros or a truncated
packet. `_decode_packet` checks `size <= len(buf) - 16` — a partially-written
packet fails validation and returns `{}`. The reader (e.g., another thread
calling `read_slots()`) gets an empty result.

Window is microseconds but it exists. No version guard — reader always returns
whatever is in the region.

**Severity: Low** (brief window, no corruption — just empty results)

### F11: `RealZincPlane._write_region` has no partial-write recovery

**File:** `real_zinc_plane.py:255-263**

If `_encode_packet` raises (JSON serialization error), the method raises before
writing — region retains previous content. Safe.

If `view[:] = b"\x00"` fails (memory error), the region is partially zeroed.
Not recoverable. No fallback.

**Severity: Low** (memory errors are extremely rare)

### F12: `InMemoryZincPlane` intent_region grows without bound

**File:** `zinc_plane.py:83-85**

```python
def publish_intent(self, intent):
    self.intent_region.append(intent)   # unbounded growth
```

`self.intent_region` is `List[KernelIntent]` — grows on every `publish_intent`
call. Over thousands of policy cycles, this grows without bound.

`RealZincPlane.publish_intent()` limits to last 512 entries in shared memory,
but its `self._intent_cache` (in-memory) also grows without bound.

**Severity: Low** (memory leak — ~MB/day)

### F13: `InMemoryZincPlane` uses non-re-entrant `threading.Condition`

**File:** `zinc_plane.py:41-43**

```python
_signal: threading.Condition = field(default_factory=threading.Condition)
```

`threading.Condition` is NOT re-entrant. If any code path calls back into
`publish_intent` while holding the condition's lock — deadlock.

**Severity: Low** (no current code path triggers this, but it's a landmine)

### F14: `KernelSlotView.__setattr__` round-trips unknown fields through Rust — silently dropped

**File:** `rust_backend.py:370-395**

If a new field is added to Python's `TradeSlot` that Rust's `TradeSlot` doesn't
know about, `slot.to_dict()` includes it. `_set_slot` serializes to JSON, sends
to Rust, which deserializes with `#[serde(default)]` — unknown fields are
silently dropped. The round-trip loses data without warning.

The reverse: if Rust adds a field that Python doesn't know about,
`_slot_from_payload` ignores unknown keys. Also silently dropped.

**Severity: Low** (fields must be added to both sides atomically; no guard)

### F15: `on_venue_event` loop in `process_intent` stops on first exception — slot left in partial state

**File:** `rust_backend.py:599-610**

```python
for event in emitted_events:
    evt_outcome = self.on_venue_event(event)  # NO TRY/EXCEPT
```

If `self.on_venue_event(event)` raises (FFI error, null pointer, OOM), the loop
stops. Events after the failing event are never processed. The slot is in a
partial state — some events applied, some not.

**Concrete scenario:** ACK arrives first → applied. FULL_FILL arrives second
→ FFI error, exception raised. Slot is stuck in `ENTRY_WORKING` with `size=0`.
Next `process_intent(EXIT)` returns `NO_OPEN_POSITION`. **No recovery path exists.**

**Severity: High** — single exception during fill feedback leaves slot
unrecoverable. Zero defense in depth.

### F16: `venue.submit()` returning empty events leaves slot in `ORDER_REQUESTED`

**File:** `rust_backend.py:599-610**

If `venue.submit()` returns `[]` (venue rejected order with no response, or
internal error), the `for` loop doesn't run. No `on_venue_event` is called.
Slot stays in Rust's pre-venue state (`ORDER_REQUESTED`).

`final_outcome` has `accepted=true, state=ORDER_REQUESTED, emitted_events=[]`.
Caller sees "successful" but no exchange order exists. Slot stuck in
`ORDER_REQUESTED` until `pump_venue_events()` or manual reconcile.

**Severity: Medium** — silent slot stall with no error indication.

### F17: Cancel truth-based confirmation returns `REJECTED` for already-cancelled orders on GET failure

**File:** `bingx_direct.py:474-498**

```python
try:
    oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
    still_open = (venue_order_id in ids)
except Exception:
    still_open = None  # GET failed

if still_open is False:
    return {"status": "CANCELED", ...}
# still_open is None (GET failed) or True (order still on book)
# Falls through to DELETE response check
```

If the DELETE succeeded but the verification GET failed (network blip, rate limit
on the verification endpoint), `still_open=None`. The code then checks the DELETE
response. If the DELETE returned an ambiguous error (e.g., "order not found"
because it was already cancelled by another path), the status is "ERROR" —
reported as REJECTED even though the order IS cancelled.

The `bingx_venue._events_from_cancel()` emits `CANCEL_REJECT`. The Rust FSM
handles `CANCEL_REJECT` as a no-op — slot stays in `EXIT_WORKING` with no
active order. Stuck until `pump_venue_events()` or manual reconcile.

**Severity: Medium** — needs a third state: "definitely cancelled,"
"probably cancelled," "definitely not cancelled."

### F18: Leverage-set and order-submit failures share error handler — poor diagnostics

**File:** `bingx_direct.py:376-417**

```python
await self._client.signed_post("/openApi/swap/v2/trade/leverage", ...)  # step A
# ...
ack_payload = await self._client.signed_post("/openApi/swap/v2/trade/order", payload)  # step B
```

If step A fails (400 for invalid symbol), the exception handler at line 417
catches `BingxHttpError` and returns REJECTED. No way for the caller to know
whether the leverage set failed or the order submission failed — both go through
the same handler. The error message just says "REJECTED."

Also: if step A succeeds and step B fails, leverage was changed on the exchange
but no order was placed. System state unchanged (leverage changes don't affect
capital), but diagnostics are poor.

**Severity: Low** (correct behavior, poor diagnostics)

### F19: `_events_from_submit` stale snapshot fallback → wrong fill detection

**File:** `bingx_venue.py:375-400**

`_filled_size_from_snapshots()` diffs position quantity before and after
submit. The "before" snapshot comes from `_backend_snapshot()` which can
return stale data (E13). A stale "before" against a fresh "after" produces
a wrong diff — could be negative, zero, or larger than reality.

This wrong diff propagates to `emitted_events` — the `PARTIAL_FILL` or
`FULL_FILL` event has wrong `filled_size`. The Rust kernel's `apply_fill`
uses this wrong `filled_size` to set `slot.size`. Capital settles on the
wrong delta.

**Severity: Medium** — wrong fill size propagates to kernel state and PnL.

### F20: `__del__` frees Rust handle at unpredictable GC time — no explicit `close()`

**File:** `rust_backend.py:558-566**

```python
def __del__(self):
    backend = getattr(self, "_backend", None)
    if backend is not None:
        try: _get_rust().destroy(backend)
        except: pass
```

`ExecutionKernel` has no `close()` method. The Rust `KernelHandle` is only
freed by `__del__`, which runs on the GC thread at unpredictable time. If
any code holds a stale reference to `self._backend`, the pointer dangles
when the kernel is GC'd.

`DITAv2LauncherBundle.close()` calls `_maybe_close` on venue, zinc, and
control plane — but NOT on kernel (which has no `close()` or `disconnect()`).
The kernel is leaked until GC.

**Severity: Medium** — reliance on `__del__` for critical C resource cleanup.

### F21: `DITAv2LauncherBundle.close()` closes venue before kernel is done with it

**File:** `launcher.py:90-95**

```python
def close(self):
    _maybe_close(self.venue)       # Closes HTTP client
    _maybe_close(self.zinc_plane)  # Closes Zinc regions
```

If the kernel is mid-`process_intent` in another thread (hypothetical —
single-threaded in practice), `venue.submit()` would fail because the HTTP
client is already closed. No ordering enforcement.

**Severity: Low** (single-threaded deployment)

### F22: Silent fallback from real Zinc/Hazelcast to in-memory on error — operator unaware

**File:** `control.py:210-217`, `launcher.py:175-185`, `projection.py:30-40`

```python
def build_control_plane(...):
    if real_requested:
        try:
            return RealZincControlPlane(...)
        except Exception:
            pass  # SILENT — operator never knows
    return ZincControlPlane(snapshot=snapshot)
```

Three places have this pattern. An operator who configures `DITA_V2_ZINC=REAL`
and Zinc isn't available gets in-memory storage without any warning, error, or
log. The `ZincPlane` protocol has no introspection method to check if it's
real or in-memory.

The same applies to Hazelcast projection and the venue adapter.

**Severity: Medium** — configuration errors are silently masked.

### F23: `VenueEvent.size` = `intent.target_size` not actual fill — wrong for multi-leg EXIT

**File:** `bingx_venue.py:410-420**

```python
base_event = VenueEvent(
    size=float(intent.target_size or 0.0),  # target, not fill
)
```

For an EXIT leg, `intent.target_size` is the intended exit size. The ACK
event's `size` reflects the target, not the actual fill. For fully-filled
MARKET orders, `target == fill` so it's invisible. For partially-filled
LIMIT orders, `size` on the ACK is wrong.

The fill event later has `filled_size` from the venue's `executedQty`, so
the downstream kernel uses the correct fill size. The ACK's `size` is
unused by the kernel (the kernel uses `filled_size` for PnL computation).

**Severity: Informational** (unused by kernel)

### F24: `asyncio.run()` inside async function in test generator — nested event loops

**File:** `_build_pink_extended.py:75-81`

```python
def _check_open_orders(c, vs):
    r = __import__('asyncio').run(c._request_json("GET", ...))
```

`asyncio.run()` is called INSIDE an `async def` context (the test body is
async). This creates a new event loop on the current thread, suspending
pytest's asyncio loop. Nested event loops are "not recommended" per Python
docs.

**Severity: Low** (works in practice)

### F25: `_build_fresh_kernel_from_slot` leaks old kernel objects per call

**File:** `_build_pink_extended.py:95-108**

```python
def _build_fresh_kernel_from_slot(slot_data, ic=25000.0):
    cfg = _build_config(ic)
    b = build_launcher_bundle(venue_mode="BINGX", ...)  # NEW bundle, OLD not closed
    k = b.kernel
    return RB(runtime=Shim(k), config=cfg)
```

Each call creates a new launcher bundle (new kernel, new Rust handle, new HTTP
client, new Zinc plane) without closing the old one. Called 4 times across the
fresh-kernel test bodies. Leaks ~50MB per call (Rust lib, HTTP connections).

**Severity: Low** (test infrastructure only)

### F26: `seen_event_ids` not cleared on re-entry — event IDs accumulate across trades

**File:** `lib.rs:672-683`

When a slot re-enters (new ENTER after previous EXIT), the Rust kernel resets
most fields (lib.rs:740-765) but does NOT clear `seen_event_ids`. The new
trade inherits the previous trade's event history up to `MAX_SEEN_EVENT_IDS`
(256). After 256 events across multiple trades, old IDs are drained.

For MARKET trading (2-4 events per trade), this takes ~60-80 trades before
draining. For LIMIT trading (many partial fills), could be 5-10 trades.

**Fix:** `slot.seen_event_ids.clear()` on ENTER.

**Severity: Low** (event ID collision across trades is astronomically unlikely)

### F27: `RealZincControlPlane.read()` parses Zinc region every call — no caching

**File:** `real_control_plane.py:88-94**

```python
def read(self):
    payload = _decode_packet(self.region.as_buffer())  # JSON parse every call
    control = payload.get("control")
    self._snapshot = KernelControlSnapshot(**control)   # reconstruct every call
    return self._snapshot
```

Called by `ExecutionKernel.control` property on every `process_intent()`.
Each call re-constructs a `KernelControlSnapshot` from dict — allocating
new objects for every field. ~50μs per call. A simple cached-until-modified
pattern would eliminate all parses between writes.

**Severity: Low** (performance)

### F28: `_legacy_intent` hardcodes `confidence=1.0` and `bars_held=0`

**File:** `bingx_venue.py:270-285`

These fields are in `LegacyIntent` but unused by `submit_intent()` (which
only reads `asset`, `side`, `action`, `target_size`, `leverage`, `metadata`).
The downstream ClickHouse rows use the policy-layer `Intent`, not `LegacyIntent`,
so the hardcoded values don't reach persistence.

Only propagates through the venue adapter's internal chain. No consumer reads
them today.

**Severity: Informational**

### F29: `_slot_to_payload` in `real_zinc_plane.py` is dead code

**File:** `real_zinc_plane.py:57-59**

```python
def _slot_to_payload(slot):
    data = slot.to_dict()
    return data
```

Defined, never called anywhere in the file. All slot serialization calls
`slot.to_dict()` directly.

**Severity: Informational**

### F30: Duplicate `_slot_from_payload` in `real_zinc_plane.py` and `rust_backend.py`

**File:** `real_zinc_plane.py:62-112**, `rust_backend.py:270-310`

Two nearly identical implementations. The `real_zinc_plane` version manually
constructs `VenueOrder` objects (lines 63-88) with different defaults
(e.g., fallback to slot `size` if `intended_size` missing). The `rust_backend`
version delegates to `_order_from_payload` with all-default fallbacks.

If fields are added to `TradeSlot` or `VenueOrder`, both must be updated.

**Severity: Low** (code duplication risk)

---

## Complete Flaw Catalog

### All-Passes Combined

| Family | Focus | Count | Critical | High | Medium | Low | Info |
|--------|-------|-------|----------|------|--------|-----|------|
| A | Architectural (old 13, now superseded) | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| **Total** | | **80** | **1** | **10** | **21** | **32** | **16** |

### Most Dangerous Single Flaw: F15

An exception in `on_venue_event()` during the fill-feedback loop stops the
chain mid-apply. The ACK applied but the FILL didn't. Slot in `ENTRY_WORKING`
with no position. **No retry mechanism, no recovery path.** The slot is stuck
forever until manual intervention. Zero defense in depth — no try/except, no
undo, no validation that the slot reached a consistent state.

This is the single highest-impact E2E flaw because it requires no concurrency,
no race condition, no unusual market conditions — just a transient FFI error
during normal operation.

---

## PASS 4 — SYSTEMATIC DOMAIN SCANS (Config, Rust, Persistence, Lifecycle)

### Rust Kernel — Numeric & FSM Invariants

#### G1: EXIT_RESIDUAL action is entirely missing from Rust KernelCommandType

**File:** `_rust_kernel/src/lib.rs`

```rust
string_enum! {
    enum KernelCommandType {
        ENTER, EXIT, MARK_PRICE, RECONCILE, CONTROL, CANCEL,
    }
}
```

Six variants. **No `EXIT_RESIDUAL`.** If any caller submits an intent with `action = "EXIT_RESIDUAL"`, the string_enum deserializer fails — serde returns `INVALID_INTENT_PARSE`. Even if deserialization worked, there's no branch to handle residual-position cleanup. Any position with remaining size after partial exit legs has **no way to trigger a clean-up exit** via the intent system.

The Python `KernelCommandType` enum (contracts.py) does have `EXIT_RESIDUAL`, translated to `"EXIT_RESIDUAL"` string by `_intent_to_payload`. This string hits Rust's string_enum → parse error → `INVALID_INTENT_PARSE`.

**Fix:** Add `EXIT_RESIDUAL` variant to Rust enum + match arm that skips the `NO_OPEN_POSITION` guard for residual-sized positions.

**Severity: Critical**

#### G2: `into_c_string` uses `unwrap()` — panics on interior NUL byte

**File:** `_rust_kernel/src/lib.rs:1477`

```rust
fn into_c_string(value: &str) -> *mut c_char {
    CString::new(value).unwrap().into_raw()
}
```

`CString::new()` returns `Err` if the string contains a NUL (`'\0'`) byte. `.unwrap()` panics at the C FFI boundary. If any `serde_json::to_string()` output (e.g., user-controlled string in `KernelIntent`, `VenueEvent`, or `TradeSlot`) contains a NUL byte, this **panics the entire process**.

Triggered by every FFI call that returns a string:
- `dita_kernel_process_intent_json`
- `dita_kernel_on_venue_event_json`
- `dita_kernel_reconcile_slots_json`
- `dita_kernel_snapshot_json`
- `dita_kernel_get_slot_json`

**Fix:** Replace `.unwrap()` with `unwrap_or_else(|_| ptr::null_mut())` or feed through `invalid_intent_cstring`.

**Severity: Critical**

#### G3: `process_intent` EXIT hardcodes `prev_state = POSITION_OPEN` unconditionally

**File:** `_rust_kernel/src/lib.rs:842-890`

```rust
slot.fsm_state = TradeStage::EXIT_REQUESTED;        // unconditional override
let transition = self.transition(
    &slot,
    TradeStage::POSITION_OPEN,                        // always POSITION_OPEN
    slot.fsm_state.clone(),
    "EXIT_INTENT",
);
```

Three problems:

(a) **Transition prev_state is a lie.** If the slot was in `EXIT_WORKING`, `EXIT_SENT`, `EXIT_REQUESTED`, or `POSITION_PARTIALLY_CLOSED`, the transition record says `POSITION_OPEN` — wrong.

(b) **Backward transition.** If the slot is `EXIT_WORKING` and a new EXIT intent arrives, `fsm_state` is set to `EXIT_REQUESTED` — a backward transition from `EXIT_WORKING` → `EXIT_REQUESTED`. This corrupts the FSM.

(c) **No state guard.** EXIT should only be allowed from `POSITION_OPEN`, `EXIT_WORKING` (for additional legs), or `POSITION_PARTIALLY_CLOSED`. Currently any state that passes `!is_free() && !closed && size > 0` can transition to `EXIT_REQUESTED`.

**Fix:** Check actual FSM state before allowing EXIT, log actual prev_state, guard against backward transitions.

**Severity: Critical**

#### G4: `consume_exit_leg` advances beyond last valid index — stale `all_legs_done` variable

**File:** `_rust_kernel/src/lib.rs:1420-1435`

```rust
let all_legs_done = slot.active_leg_index >= slot.exit_leg_ratios.len(); // (A)
let should_close = (slot.size <= 1e-12 || (!partial && all_legs_done));  // (B)

if !partial {
    slot.consume_exit_leg();  // (C) — advances active_leg_index POST (A)
}

if should_close && slot.size <= 1e-12 {         // (D) — close
} else if !partial && !all_legs_done {           // (E) — stale! uses (A) not post-advance index
```

On the last leg (`active_leg_index = len - 1`):
- (A): `all_legs_done = false` (pre-advance)
- (C): advances to `len` (exhausted)
- (E): `!partial && !false` = true → enters `POSITION_OPEN` instead of examining `should_close` with post-advance index

The `all_legs_done` variable is captured **before** `consume_exit_leg` advances the index. Branch (E) should use the post-advance index to correctly detect exhaustion.

After exhaustion, `next_exit_ratio()` returns `1.0` (out-of-bounds `unwrap_or(1.0)`) — silently tries to exit remaining size as 100% instead of detecting completion.

**Severity: Critical**

#### G5: `realized_pnl` uses unbounded f64 — overflows to inf at extreme values

**File:** `_rust_kernel/src/lib.rs:648-656`

```rust
let notional = exit_size * slot.entry_price * slot.leverage.max(1.0);
delta * notional
```

No `is_finite()` check on intermediate products. At `exit_price=1e200`, `entry_price=1e-200`: `delta` = `(1e200 - 1e-200) / 1e-200` ≈ `1e400` → `inf`. The resulting `inf` is stored in `slot.realized_pnl`, corrupting all future PnL tracking.

Subnormals: `entry_price=5e-324` (subnormal) causes division to produce `inf` for modest exit prices on some platforms.

**Fix:** Add `is_finite()` guards on both prices and cap intermediate products.

**Severity: High**

#### G6: `mark_price` produces unbounded `unrealized_pnl`

**File:** `_rust_kernel/src/lib.rs:384-399`

```rust
self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage;
// No is_finite() check on result
```

If any of `delta`, `size`, `entry_price`, or `leverage` is extreme, the product overflows to `inf`. No result guard. `inf` stored in `unrealized_pnl` forever. Capped only by the `price <= 0.0` guard on input — no guard on the computation chain.

Also: `self.entry_price = price` at line 388 overwrites entry_price on every mark_price call for a position with `entry_price <= 0.0`, even when the position has been open for a while. This means a stale-zero entry_price gets set to the current market price on first mark_price after open, which is correct — but if the slot is reused (re-entry without resetting entry_price), the old entry price from the prior trade bleeds into unrealized PnL.

**Severity: High**

#### G7: `process_intent` ENTER — no `is_finite()` guard on `target_size`

**File:** `_rust_kernel/src/lib.rs:806-807`

```rust
intended_size: intent.target_size.max(0.0),
```

`f64::NAN.max(0.0)` returns `NAN`. `f64::INFINITY.max(0.0)` returns `inf`. Serde_json **does** accept `Infinity` and `NaN` by default — they're valid JSON tokens. If the Python-side `_first_invalid_intent_field` guard is bypassed (F3 — it allows these through), `NaN`/`inf` propagates into `intended_size` in `VenueOrder`, corrupting all fill calculations.

Similarly, `reference_price` is never validated for finiteness before being stored in `VenueOrder.metadata`.

**Severity: High**

#### G8: `reconcile_slots_json` — no dedup or bounds validation

**File:** `_rust_kernel/src/lib.rs:1668-1675`

```rust
for slot in slots {
    if slot.slot_id < core.slots.len() {
        core.slots[slot.slot_id] = slot.clone();
    }
}
```

Two slots with the same `slot_id`: the **second overwrites the first** silently. A slot with `slot_id >= core.slots.len()`: **silently dropped** — no error, no diagnostic. Caller sees `accepted=true` even if some/all slots were not applied.

**Severity: High**

#### G9: `exchange_order_id` propagation uses wrong order target

**File:** `_rust_kernel/src/lib.rs:1110-1125`

```rust
let target = if slot.active_entry_order.is_some() {
    slot.active_entry_order.as_mut()
} else {
    slot.active_exit_order.as_mut()
};
```

If an **entry** order exists (even if fully filled) and an **exit** fill event arrives, the code updates the entry order's `venue_order_id` instead of the exit order's. The exit order's `venue_order_id` stays empty. Any subsequent `CANCEL` intent on the exit order fails because `active_exit_order.venue_order_id` is empty — the venue can't match the cancel.

**Fix:** Disambiguate by matching `venue_client_id`, or clear `active_entry_order` when entry is complete.

**Severity: High**

#### G10: CANCEL diagnostic code says NO_ACTIVE_EXIT_ORDER for entry cancel too

**File:** `_rust_kernel/src/lib.rs:966-1005`

```rust
if !has_cancellable_exit && !has_cancellable_entry {
    return KernelResult {
        diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER, // always says exit
        details: json!({"reason": "NO_ACTIVE_EXIT_ORDER"}),
    };
}
```

When neither exit nor entry is cancellable, the diagnostic returns `NO_ACTIVE_EXIT_ORDER` regardless of which order was the target. If the user wanted to cancel an entry order that's not in a cancellable state, the diagnostic is misleading.

**Fix:** Separate diagnostic codes: `NO_ACTIVE_EXIT_ORDER`, `NO_ACTIVE_ENTRY_ORDER`, `ENTRY_NOT_CANCELLABLE`.

**Severity: High**

#### G11: `apply_fill` entry-fill overwrites `active_entry_order.intended_size` with `slot.size`

**File:** `_rust_kernel/src/lib.rs:1363-1377**

On FULL_FILL entry, `slot.active_entry_order` is entirely replaced with a new `VenueOrder` where `intended_size = slot.size` (the fill amount) instead of the original intended size. The original intended size (which could be larger than fill size for partial fills) is lost.

If a duplicate fill event arrives (dedup fails due to missing event_id), the second fill would use `slot.size` as the basis for further fills — wrong values.

**Severity: Medium**

#### G12: `leverage` unbounded after `is_finite()` — no maximum cap

**File:** `_rust_kernel/src/lib.rs:778`

```rust
slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
    intent.leverage  // 1e100 accepted here
} else { 1.0 };
```

`leverage = 1e100` passes `is_finite()`. Feeds into `realized_pnl()` as `slot.leverage.max(1.0) = 1e100`, producing `notional = exit_size * entry_price * 1e100`. Makes `unrealized_pnl` arbitrarily large.

No maximum leverage cap enforced anywhere — the exchange-level cap (`DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP`) exists in `BingxExecClientConfig` but is **never passed to the Rust kernel**.

**Severity: Medium**

#### G13: `resolve_slot` fallback returns `unwrap_or(0)` — can misroute events

**File:** `_rust_kernel/src/lib.rs:623`

```rust
self.slots.first().map(|slot| slot.slot_id).unwrap_or(0)
```

When no slot matches the event (`slot_id` out of range or all slot filters fail), returns `slot_id` of the **first slot** (which may be 0 or any value). No diagnostic emitted — caller sees slot state change with no idea the event was misrouted.

**Severity: Medium**

#### G14: `commit_slot` silently ignores out-of-bounds slot_id

**File:** `_rust_kernel/src/lib.rs:595-600**

```rust
fn commit_slot(&mut self, slot: TradeSlot) {
    if slot.slot_id < self.slots.len() {
        self.slots[slot_id] = slot;
    }
    // else: silently dropped — no error returned
}
```

Mutations to out-of-bounds slot are silently discarded. Can happen if `slot.slot_id` is corrupted via `set_slot_from_json` causing index mismatch between `slot.slot_id` and the actual slot position.

**Severity: Medium**

---

### Configuration & Validation Chain

#### G15: Zero `__post_init__` validators on all config dataclasses

Every config dataclass in the system has zero field-level validation:

| Dataclass | Fields | Validators |
|-----------|--------|------------|
| `KernelControlSnapshot` | 16 | **0** |
| `ControlUpdate` | 16 | **0** |
| `KernelIntent` | 19 | **0** |
| `TradeSlot` | 22 | **0** |
| `VenueOrder` | 8 | **0** |
| `VenueEvent` | 18 | **0** |
| `KernelTransition` | 11 | **0** |
| `KernelOutcome` | 8 | **0** |
| `AccountSnapshot` | 9 | **0** |
| **Total** | **127** | **0** |

The only validation in the entire chain:
- `_first_invalid_intent_field()` — finiteness guard at Python→Rust FFI boundary (not a dataclass validator)
- Rust `leverage = if is_finite && > 0.0 { val } else { 1.0 }` — post-hoc clamp
- Rust `KernelCore::new(max_slots.max(1))` — floor only, no ceiling
- `launcher.py:143`: `max(1, int(...))` for `active_slot_limit` — floor only

**No `__post_init__` exists anywhere. No bounds check on any field except the two floor-only guards.**

**Severity: High**

#### G16: `DITA_V2_DEBUG_CLICKHOUSE` defaults to `True` when env var is unset

**File:** `launcher.py:133`

```python
debug = _env_bool("DITA_V2_DEBUG_CLICKHOUSE", True)
```

`_env_bool` (launcher.py:75) returns `default` when the env var is unset. So `debug = True` by default. Every runtime writes debug traces to ClickHouse by default. `DITA_V2_DEBUG_CLICKHOUSE=False` is required to disable it.

This is not a bug per se, but it means debug ClickHouse writes are **on by default**, adding ~10 ClickHouse insertions per process_intent call (every transition + position state + trade event) that most production deployments may not want.

**Severity: Informational**

#### G17: String config fields have no charset/length validation — Zinc region injection risk

**File:** `control.py:31-53`, `real_zinc_plane.py:30`

`runtime_namespace`, `strategy_namespace`, `event_namespace`, `actor_name`, `exec_venue`, `data_venue`, `ledger_authority` are all free-form strings with no validation. They're used as:

1. **Zinc shared memory region names**: `self.prefix + "." + namespace + "." + kind` — an attacker-controlled namespace could collide with other processes' Zinc regions
2. **ClickHouse table names**: `DOLPHIN_BINGX_JOURNAL_STRATEGY` is used as a table suffix — SQL injection risk in ClickHouse journal
3. **Hazelcast map names**: Same injection risk via `event_namespace`

**Severity: Medium**

#### G18: `exit_leg_ratios` no sum-to-1 validation

`KernelIntent.exit_leg_ratios` and `TradeSlot.exit_leg_ratios` are tuple/list of floats. No validator ensures they sum to approximately 1.0. Ratios summing to 0.5 leave the position partially closed forever (residual can't be exited because `next_exit_ratio()` returns `1.0` after exhaustion, exiting 100% of remaining — which may exceed the intended residual).

**Severity: Low**

#### G19: `RealZincControlPlane.read()` has no sequence check — torn-read risk

**File:** `real_control_plane.py:88-94**

```python
def read(self):
    payload = _decode_packet(self.region.as_buffer())
    control = payload.get("control")
    if not isinstance(control, dict):
        return self._snapshot
    self._snapshot = KernelControlSnapshot(**control)
    return self._snapshot
```

The binary packet has a 64-bit sequence number but `read()` **never checks it**. Between the zero-write and packet-write in `_write_region`, a reader sees an empty buffer → `_decode_packet` fails → falls back to `self._snapshot` (stale). Between the packet-write and `struct.pack` header (order depends on implementation), a reader sees a partial write with wrong size → `_decode_packet` fails.

No checksum on the wire format: `struct.pack("!QQ", seq, len) + json_bytes`. A torn write produces garbage that `json.loads` may or may not parse successfully.

**Severity: Low**

#### G20: `DOLPHIN_BINGX_JOURNAL_STRATEGY`/`_DB` — ClickHouse SQL injection risk

**File:** `launcher.py:202-203`

```python
"DOLPHIN_BINGX_JOURNAL_STRATEGY": os.environ.get("DOLPHIN_BINGX_JOURNAL_STRATEGY", ""),
"DOLPHIN_BINGX_JOURNAL_DB": os.environ.get("DOLPHIN_BINGX_JOURNAL_DB", ""),
```

These are used as ClickHouse table and database name suffixes in `pink_clickhouse.py`. An attacker who can set env vars can inject SQL via semicolons or quotes in the table name. ClickHouse supports `INSERT INTO db.table FORMAT JSONEachRow` — a table name like `positions; DROP TABLE ...;` could be destructive.

**Severity: Low** (requires env var control, which implies broader access)

---

### Persistence Schema Alignment

#### G21: `entry_price` used as `exit_price` in `trade_events` — data loss

**File:** `pink_clickhouse.py (outside workspace)`

The `_write_trade_event` function maps `entry_price` from `slot.to_dict()` to both the `entry_price` and `exit_price` columns. The actual exit fill price (available on the `VenueEvent` object) is **never written** to the `exit_price` column.

**Result:** Every `trade_events` row has `exit_price == entry_price`. The `exit_price` column is a dead column — always contains the entry price, never the actual fill.

**Severity: High** — data loss to DB for the most important trade metric.

#### G22: `active_leg_index` → `entry_bar` semantic mis-mapping

**File:** `pink_clickhouse.py (outside workspace)`

```python
"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
```

`active_leg_index` tracks the exit-leg-ratios cursor (which leg of a multi-leg exit we're on), not a bar count. The value `0` at position open and `1` after the first exit leg — neither value represents bars held. **The `entry_bar` column stores the wrong concept.**

**Severity: Medium** — column contains semantically meaningless data.

#### G23: `capital_before` arithmetic reconstruction absorbs cross-slot PnL

**File:** `pink_clickhouse.py (outside workspace)`

```python
capital_before = capital_after - pnl_leg
```

`capital_before` is reconstructed by subtracting the current leg's PnL from the current capital. In a multi-slot system, other slots' PnL changes between legs are absorbed into `capital_before`. The column is **always wrong** in multi-slot scenarios because `capital_after` reflects total PnL from all slots, not just the leg being recorded.

**Severity: Medium** — wrong `capital_before` for multi-slot trading.

#### G24: Recovery `trade_reconstruction` always has `trade_id=""`

**File:** `pink_clickhouse.py (outside workspace)`

The `persist_recovery_state` function passes `kernel.snapshot()["account"]` (an account dict with keys `capital, equity, realized_pnl, ...`) where a slot dict is expected. The `trade_id` key **does not exist** on the account dict. The `recovery_state` row always has `trade_id=""`.

**Severity: Medium** — recovery data is not associable with any trade.

#### G25: `seen_event_ids`, `exit_leg_ratios`, `VenueOrder`, `metadata` not in flat ClickHouse tables

These fields are:
- Present on the Python `TradeSlot` ✅
- Transmitted through Zinc shared memory ✅
- Stored in Hazelcast ✅
- Stored in ClickHouse `dita_kernel_debug` (full JSON) ✅
- **NOT extracted** into main ClickHouse flat tables `position_state`, `trade_events`, `trade_exit_legs` ❌

Data exists at the source, travels through the pipeline, hits the debug journal — but is lost in the main analytical tables.

**Severity: Low** (data exists in debug journal if needed for reconstruction)

#### G26: `_safe_float` silently converts NaN/None/Inf to 0.0

**File:** `utils.py:15`

```python
def _safe_float(v, default=0.0):
    try:
        f = float(v)
        if not math.isfinite(f):
            return default
        return f
    except (TypeError, ValueError, OverflowError):
        return default
```

Used in multiple ClickHouse writers. Silently converts `NaN`/`Inf`/parsing errors to `0.0`. No diagnostic emitted when a non-finite value reaches the persistence layer — data silently zeroed.

**Severity: Low** (safe default but silent corruption)

---

### Lifecycle & Resource Management

#### G27: `build_launcher_bundle` has no exception safety — prior resources leak

**File:** `launcher.py:264-300**

```python
def build_launcher_bundle(...):
    control_plane = _build_control_plane(...)
    projection = build_projection(...)
    zinc_plane = _build_zinc_plane(...)
    venue = _build_venue(...)
    kernel = ExecutionKernel(...)  # ← if THIS fails, everything above leaks
```

If any step after the first raises, all previously built resources leak:
- `RealZincPlane` created → `_build_venue()` fails → 3 shared memory regions orphaned
- `RealZincControlPlane` created → `_build_zinc_plane()` fails → 1 shared memory region orphaned
- `BingxVenueAdapter` created → `ExecutionKernel.__init__()` fails → HTTP connection leaked

**No `try/finally` anywhere in the builder.** The init order is also optimized for forward construction, not backward cleanup.

**Severity: High** — shared memory leak on any build failure.

#### G28: `RealZincPlane` and `RealZincControlPlane` have no `__del__`

When `close()` is not called (exception in builder, forgotten cleanup, GC during shutdown), the shared memory regions opened by `RealZincPlane` (3 regions) and `RealZincControlPlane` (1 region) are **orphaned on the OS**. They persist in `/dev/shm/` (or platform equivalent) until system reboot.

Python's `__del__` is unreliable (not called on SIGKILL, not called if the object is part of a cycle without a GC run), but its absence means even normal garbage collection can't clean up.

**Severity: High** — shared memory leaks.

#### G29: Zero signal handlers — no cleanup on SIGTERM/SIGINT

```bash
$ grep -rn "signal\|SIGTERM\|SIGINT\|atexit" *.py  # ZERO matches
```

When SIGTERM or SIGINT arrives:
1. Python's default handler terminates the process immediately
2. No `DITAv2LauncherBundle.close()` is called
3. No `ExecutionKernel.__del__` is called (CPython may run GC on normal exit but not reliably)
4. All shared memory (RealZincPlane, RealZincControlPlane) is orphaned
5. In-flight BingX HTTP calls are interrupted mid-stream
6. Rust kernel handle is leaked

**Severity: High**

#### G30: `ExecutionKernel` has no `close()` — relies on `__del__` for Rust handle cleanup

`ExecutionKernel` has `__del__` which calls `_get_rust().destroy(backend)`. No `close()` method. `DITAv2LauncherBundle.close()` never touches the kernel — the Rust handle is only freed by GC at unpredictable time.

If any code holds a stale `_backend` pointer, the handle dangles when GC runs. If `__del__` is suppressed (e.g., during interpreter shutdown with cyclic references), the Rust handle leaks permanently.

**Fix:** Add `close()` to `ExecutionKernel`, call it from `DITAv2LauncherBundle.close()`.

**Severity: High**

#### G31: `projection` (Hazelcast) never closed

`build_projection()` returns a `HazelcastProjection` which holds a Hazelcast client connection. No `close()` or `disconnect()` method exists on the projection, projector, or row writer. `DITAv2LauncherBundle.close()` doesn't touch the projection. The Hazelcast client connection leaks on shutdown.

**Severity: Medium**

#### G32: `_maybe_close()` only calls the first method found — `break` skips the second

**File:** `launcher.py:233-243**

```python
for method_name in ("close", "disconnect"):
    method = getattr(obj, method_name, None)
    if method is None:
        continue
    try:
        result = method()
    except TypeError:
        continue
    if inspect.isawaitable(result):
        try:
            asyncio.run(result)
        except RuntimeError:
            pass
    break  # ← ONLY calls the FIRST found method, never both
```

If an object has both `close()` and `disconnect()`, only `close()` is called. `disconnect()` is silently skipped. Also: `asyncio.run(result)` silently swallows `RuntimeError` when a running event loop exists — the coroutine is **never executed**.

Currently no object has both, but the pattern is fragile.

**Severity: Low**

#### G33: `close()` is not idempotent for RealZinc components

`RealZincPlane.close()` and `RealZincControlPlane.close()` call their Zinc region's `close()` method. If called twice, the second call operates on an already-closed region — likely crashes from Hazelcast's shared memory code.

No nulling of references after close: `DITAv2LauncherBundle.close()` sets `self.venue`, `self.zinc_plane`, `self.control_plane` to `None` — **wait, it doesn't. It calls `_maybe_close()` which doesn't null references.** Double `close()` is unsafe.

**Severity: Low**

#### G34: No context manager on `DITAv2LauncherBundle`

`DITAv2LauncherBundle` has no `__enter__`/`__exit__`. Users must manually call `close()`. No `with` pattern exists anywhere in the source for lifecycle management. No `__del__` fallback on the bundle either.

**Severity: Low** (ergonomic, not a leak source if caller follows the pattern)

#### G35: `BingxVenueAdapter.connect()` exists but is never called by the launcher

`BingxDirectExecutionAdapter` has a `connect()` method that initializes the lifetime HTTP client. `BingxVenueAdapter` has `connect()` that calls `_call_backend("connect")`. Neither is called in `build_launcher_bundle()` or `_build_venue()`. If the adapter's `submit_intent()` relies on a connected client, it initializes lazily — but the connect path is dead code that exists but is never invoked.

**Severity: Informational**

#### G36: Only one `try/finally` in the entire codebase

The only `try/finally` is `_RustKernelLib._take_string()` (rust_backend.py:140-143) which frees the Rust C string. All other resource management uses `try/except` with no `finally`.

No cleanup is guaranteed on exception:
- `build_launcher_bundle()` — no cleanup on failure
- `process_intent()` — no cleanup of partial slot state on venue event exception
- `on_venue_event()` — no cleanup on FFI failure
- `_set_slot()` — no cleanup on projection or Zinc write failure

**Severity: High** (across all layers)

---

## Pass 4 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType | Rust | **Critical** |
| G2 | `into_c_string` unwrap() panics on NUL byte | Rust | **Critical** |
| G3 | EXIT hardcodes prev_state=POSITION_OPEN, allows backward FSM transition | Rust | **Critical** |
| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
| G5 | `realized_pnl` unbounded f64 overflow to inf | Rust | **High** |
| G6 | `mark_price` unbounded unrealized_pnl — no result guard | Rust | **High** |
| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
| G12 | No max leverage cap enforced by kernel | Rust | Medium |
| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
| G15 | Zero `__post_init__` validators on all config dataclasses | Config | **High** |
| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
| G17 | String config fields — Zinc region injection risk | Config | Medium |
| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
| G31 | Hazelcast projection never closed | Lifecycle | Medium |
| G32 | _maybe_close() break skips second method | Lifecycle | Low |
| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |

### Pass 4 Severity Distribution

| Severity | Count |
|----------|-------|
| **Critical** | 4 (G1, G2, G3, G4) |
| **High** | 11 (G5-G10, G15, G21, G27, G28, G29, G30, G36) |
| Medium | 11 (G11-G14, G17, G22, G23, G24, G31) |
| Low | 8 (G16, G18, G19, G20, G25, G26, G32, G33, G34, G35) |
| Info | 2 |

### Combined Catalog (All 4 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |

---

## PASS 5 — EDGE DOMAINS (Dependencies, Error Handling, Types, Contracts)

### H1: No Python dependency declaration files exist in workspace

**Files:** workspace root

Zero `requirements.txt`, `setup.py`, `setup.cfg`, `pyproject.toml`, `Pipfile`, or `poetry.lock` anywhere. All Python package dependencies are entirely implicit — determined by what's installed in the runtime environment. No reproducible installs, no version pinning, no audit trail.

The Rust side does have `Cargo.toml` + `Cargo.lock` — but all 4 direct Rust deps use open ranges (`"0.4"`, `"0.2"`, `"1"`, `"1"`).

**Severity: Critical**

### H2: Rust kernel compiled from source on every cold start via subprocess

**File:** `rust_backend.py:60-72`

```python
def _ensure_library() -> Path:
    path = _library_path()
    if not path.exists():
        _build_library()  # cargo build --release
    return path

def _build_library():
    subprocess.run(
        ["cargo", "build", "--release", ...],
        check=True,        # no timeout!
    )
```

First load takes 3-10 minutes (Rust compilation). Requires Rust toolchain in production. `subprocess.run()` has no `timeout=` — if `cargo` hangs (network, disk, lock contention), the Python process hangs indefinitely. No prebuilt binary distribution.

**Severity: Critical**

### H3: Zero logging — every swallowed error is invisible

The entire codebase has zero use of Python's `logging` module, `print()`, or `warnings.warn()` for error reporting. Every `except: pass`, `except Exception: pass`, and `return default` silently discards the error. **There is no mechanism to detect, alert, or diagnose production failures.**

All `try/except: pass` sites found:

| # | File:Line | What's Hidden |
|---|-----------|---------------|
| 1 | `bingx_venue.py:51` | `float()` conversion failure on any API field value |
| 2 | `bingx_venue.py:133` | regex match failure in rate-limit parsing |
| 3 | `bingx_venue.py:136` | int/float conversion of retry_after |
| 4 | `bingx_venue.py:325` | slot lookup failure during cancel asset resolution |
| 5 | `bingx_venue.py:350` | BingXHttpError in cancel — network error looks like rejection |
| 6 | `control.py:213` | RealZincControlPlane construction failure |
| 7 | `launcher.py:187` | RealZincPlane construction failure |
| 8 | `launcher.py:119` | malformed env var for active_slot_limit |
| 9 | `launcher.py:243` | asyncio.run() RuntimeError in _maybe_close |
| 10 | `launcher.py:277` | RealZincControlPlane fallback in build_control_plane |
| 11 | `real_control_plane.py:97` | region.wait() exception — timeout and error both return False |
| 12 | `real_control_plane.py:112` | region.notify() exception — writer thinks broadcast succeeded |
| 13 | `real_zinc_plane.py:31` | Zinc SharedRegion import failure |
| 14 | `projection.py:87` | HazelcastRowWriter import failure |
| 15 | `rust_backend.py:102` | __del__ exception in Rust kernel destroy |
| 16 | `bingx_venue.py:55` | `_row_float` tries 5+ key fallbacks, each failing silently |

**Severity: Critical**

### H4: `_row_float` rejects zero as a valid value — `or` pattern treats 0 as missing

**File:** `bingx_venue.py:47-55`

```python
def _row_float(row, *keys, default=0.0):
    for key in keys:
        try:
            value = float(row.get(key) or 0.0)  # `or 0.0` treats 0 as missing
        except Exception:
            continue
        if value == value and value not in (float("inf"), float("-inf")) and value != 0.0:
            return value                         # explicitly rejects 0.0
    return default
```

Two bugs: (a) `except Exception: continue` swallows ALL conversion errors, and (b) `value != 0.0` explicitly rejects zero as a valid return value. A legitimate zero price, zero filled quantity, or zero position amount causes `_row_float` to skip that key and search further. If ALL keys return 0, the default `0.0` is returned — indistinguishable from "none of the keys existed."

Called by every single BingX API response parser: `_position_qty()`, `_position_price()`, `_venue_order_from_row()`, `_event_from_row()`, `_fill_event_from_row()`, `_events_from_submit()`, `_events_from_cancel()`, `_filled_size_from_snapshots()`. None verify the returned 0.0 is real vs. missing-vs-zero.

**Severity: High**

### H5: `_backend_snapshot` timeout returns stale data with no signal to callers

**File:** `bingx_venue.py:242-251**

```python
def _backend_snapshot(self, *, timeout_ms=5000.0):
    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
        with self._snap_lock:
            return self._last_snapshot    # STALE — could be hours old
```

When the snapshot-fetch condition times out, returns `self._last_snapshot` — initialized to `None` and only updated on successful fetches. First timeout returns `None`. All callers (`cancel()`, `open_orders()`, `open_positions()`, `reconcile()`, `submit()`) access `.open_orders`, `.open_positions` immediately — crash with `AttributeError: 'NoneType' object has no attribute 'open_orders'`.

Even after the first fetch succeeds, subsequent timeouts return the last-good snapshot which could be arbitrarily stale. No caller timestamps, version-checks, or requests a refresh.

**Severity: High**

### H6: All enum-from-raw-string sites crash on unknown value — zero fallback

**Files:** `rust_backend.py:250-386`, `real_zinc_plane.py:70-106`

Every site that reconstructs a Python enum from a string received from the Rust kernel:

```python
side=TradeSide(str(payload.get("side", TradeSide.FLAT.value)))
status=VenueOrderStatus(str(payload.get("status", VenueOrderStatus.NEW.value)))
fsm_state=TradeStage(str(payload.get("fsm_state", TradeStage.IDLE.value)))
kind=KernelEventKind(str(row.get("kind", KernelEventKind.ORDER_ACK.value)))
```

If the Rust kernel introduces a new enum variant (e.g., `TradeStage::ENTRY_REJECTED`) not in the Python `TradeStage` enum, `TradeStage("ENTRY_REJECTED")` raises `ValueError` with zero fallback. Crashes `_outcome_from_payload()` and takes down the kernel's event processing loop.

17 sites total across `rust_backend.py` and `real_zinc_plane.py`. No try/except, no mapping, no fallback on any of them.

**Severity: High**

### H7: `_legacy_intent` reads `getattr(intent, "order_type", "MARKET")` — always defaults to MARKET

**File:** `bingx_venue.py:282-285**

```python
metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
```

`order_type` and `limit_price` are NOT fields on `KernelIntent` (contracts.py). They only exist in `intent.metadata` as `metadata["order_type"]` if set by the caller. `getattr(intent, "order_type", "MARKET")` checks the dataclass field — not the metadata dict — so it ALWAYS returns `"MARKET"`.

Even when the PINK runtime produces a LIMIT intent (LIMIT_DECISION → `metadata["order_type"] = "LIMIT"`), the legacy adapter converts is to MARKET because it reads the wrong source. Every LIMIT order is submitted as MARKET.

Similarly, `limit_price` is always `0.0` — any limit price from the metadata dict is lost.

**Severity: High**

### H8: `_venue_event_status_from_row` silently maps unknown venue status to ACKED

**File:** `bingx_venue.py:83-96**

```python
def _venue_event_status_from_row(status: str) -> VenueEventStatus:
    normalized = _normalize_status(status)
    # ... checks known statuses ...
    return VenueEventStatus.ACKED  # fallthrough for anything unknown
```

If BingX introduces a new status (`"SUSPENDED"`, `"PENDING_CANCEL"`, `"EXPIRED"`), it doesn't match any known mapping and silently returns `ACKED`. The kernel treats a suspended/cancelled/expired order as acknowledged — dangerous misclassification.

**Severity: High**

### H9: `RealZincPlane.write_slot()` — slot written to `slot_id >= slot_count` is invisible

**File:** `real_zinc_plane.py:206-210**

```python
def write_slot(self, slot):
    with self._lock:
        self._slot_cache[int(slot.slot_id)] = slot
        payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
```

`_slot_cache` is a plain dict — accepts any key. But `read_slots()` only reads 0..slot_count-1. Writing to `slot_id >= slot_count` stores the slot in the cache but it's **never serialized or read back**. No error.

**Severity: High**

### H10: `RealZincControlPlane.read()` has no atomicity with concurrent `update()`

**File:** `real_control_plane.py:70-77**

`_write_region()` zero-fills the buffer then writes the packet. If `read()` interleaves between zero-fill and write, it sees a partially-zeroed buffer → `_decode_packet` returns `{}` → returns stale `self._snapshot` with no observable error. No lock, no sequence check, no atomic read.

The same bug exists in `RealZincPlane.read_slots()` (real_zinc_plane.py:220-230) — reads shared memory while a concurrent `write_slot()` is in progress.

**Severity: High**

### H11: `_RustKernelLib` lazily initialized with race condition

**File:** `rust_backend.py:187-190**

```python
_RUST: _RustKernelLib | None = None

def _get_rust():
    global _RUST
    if _RUST is None:
        _RUST = _RustKernelLib()  # no lock — two threads can both create
    return _RUST
```

No threading lock. Two concurrent calls to `_get_rust()` (possible via `BingxVenueAdapter`'s thread pool) can create two `_RustKernelLib` objects. The `_RustKernelLib()` constructor runs `_ensure_library()` which runs `subprocess.run(["cargo", "build", ...], check=True)` — concurrent `cargo build` can corrupt the build directory.

**Severity: High**

### H12: `ExecutionKernel.__del__` can deadlock or use-after-free

**File:** `rust_backend.py:527-531**

```python
def __del__(self):
    backend = getattr(self, "_backend", None)
    if backend is not None:
        try:
            _get_rust().destroy(backend)  # accesses module singleton
        except Exception:
            pass
```

`_get_rust()` accesses the module-level `_RUST` singleton, which may already be destroyed if the module's garbage collection runs before the instance's. The destroy call happens outside any lock — one thread's destructor could destroy the Rust kernel while another thread is still using it. Use-after-free.

**Severity: High**

### H13: `MirroredControlPlane` missing protocol methods

**File:** `control.py:171-184**

`ControlPlane` protocol defines `wait()` and `notify()`. `MirroredControlPlane` inherits from nothing and only implements `read()`, `update()`, and `mirror()`. Calling `plane.wait()` on a `MirroredControlPlane` raises `AttributeError`.

**Severity: Medium**

### H14: `TradeSlot.remaining_size()` and `VenueOrder.remaining_size()` — same name, different semantics

**Files:** `contracts.py:207-208`, `contracts.py:143-145**

```python
# TradeSlot:
def remaining_size(self) -> float:
    return max(0.0, float(self.size))  # open position size

# VenueOrder:
def remaining_size(self) -> float:
    return max(0.0, self.intended_size - self.filled_size)  # unfilled order qty
```

Same method name, completely different semantics. `TradeSlot.remaining_size()` returns the current open position size. `VenueOrder.remaining_size()` returns the untracked/unfilled order quantity. A caller using `slot.remaining_size()` to check if an order is fully filled gets position size, which doesn't change with fills — it changes with entry/exit.

**Severity: Medium**

### H15: `_maybe_close()` — `asyncio.run()` RuntimeError silently swallowed for coroutines

**File:** `launcher.py:233-243**

```python
if inspect.isawaitable(result):
    try:
        asyncio.run(result)
    except RuntimeError:
        pass  # SILENT — coroutine never executed
```

When `maybe_close` is called from an async context (which it is — `DITAv2LauncherBundle.close()` is used in async test code), `asyncio.run()` raises `RuntimeError("Cannot run the event loop while another loop is running")`. The exception is swallowed, the coroutine is never awaited, and the close/disconnect never happens.

Also: `break` after calling the first found method means if an object has both `close()` and `disconnect()`, `disconnect()` is never called.

**Severity: Medium**

### H16: `_build_launcher_bundle` imports `BingxDirectExecutionAdapter` inside function — import-time side effect is safe but lazy loading masks errors

**File:** `launcher.py:254**

```python
def _build_venue(...):
    from prod.clean_arch.adapters.bingx_direct import BingxDirectExecutionAdapter
```

Import inside function — safe, lazy, no side effects. But if the `bingx_direct` module has an import error (missing dependency, version mismatch), it only surfaces at bundle construction time, not at process start. A misconfigured production deployment would fail on the first trade, not on boot.

**Severity: Informational**

### H17: `load_dotenv()` at module level — import-time filesystem I/O and env mutation

**File:** `launcher.py:49-51**

```python
load_dotenv(PROJECT_ROOT / ".env")  # executes on module import
```

Runs on every import of `launcher.py` — reads filesystem, mutates process environment. Hard to mock in tests — setting env vars in test setup gets overwritten on module import. Also: if `.env` doesn't exist, `load_dotenv()` silently does nothing — missing config is invisible.

**Severity: Medium**

### H18: `_run()` in `BingxVenueAdapter` — `asyncio.run()` thread-pool bridge blocks on every call

**File:** `bingx_venue.py:225-233**

```python
def _run(self, result):
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()  # BLOCKS
```

Every call to `_run()` that receives an awaitable blocks the calling thread via `.result()`. The BingX HTTP call inside `submit_intent()` can take 1-5 seconds. During this block, the event loop cannot process other tasks. In a single-runtime deployment, this stalls the entire policy cycle.

**Severity: Medium**

### H19: `HazelcastClientLike` protocol has zero concrete implementations in workspace

**File:** `hazelcast_projection.py:13-15**

```python
class HazelcastClientLike(Protocol):
    def get_map(self, name: str): ...
    def get_topic(self, name: str): ...
```

Used as a type hint. No code in the workspace creates an object that satisfies this protocol. The Hazelcast client comes from an external package. If the external API changes, the protocol silently drifts — no compilation check.

**Severity: Low**

### H20: `_decode_packet` in RealZinc — no bound check on `size` beyond `> len(buf)-16`

**Files:** `real_control_plane.py:50-52`, `real_zinc_plane.py:70-81**

```python
seq, size = struct.unpack_from("!QQ", buf, 0)
if size <= 0 or size > len(buf) - 16:
    return {}
payload = bytes(buf[16 : 16 + size]).decode("utf-8")  # can raise UnicodeDecodeError
out = json.loads(payload)  # can raise ValueError
```

If shared memory contains a corrupted `size` field within bounds, `.decode()` or `json.loads()` raises — uncaught by callers. A single corrupted byte in shared memory crashes the kernel.

**Severity: Low**

### H21: All Rust crate features enabled by default — `wasm-bindgen` compiled into native shared library

**File:** `_rust_kernel/Cargo.toml`, transitive through `chrono` → `iana-time-zone` → `js-sys` → `wasm-bindgen`

The Rust kernel is a native `.so`/`.dylib` but chrono's `iana-time-zone` pulls in `js-sys` and `wasm-bindgen` (WebAssembly support) even on native Linux. Larger binary, longer compile times. `cc` crate pulled in for `iana-time-zone-haiku` which only compiles on Haiku OS.

**Severity: Low**

### H22: `socket.getaddrinfo` monkey-patch in test generator code

**File:** `gen2.py:295-298**

Monkey-patches Python stdlib `socket.getaddrinfo` to force IPv4 as a workaround for IPv6 resolution failure in the deployment environment. If copied to production code, would break IPv6 connectivity.

**Severity: Low**

---

## Pass 5 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | **Critical** |
| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | **Critical** |
| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | **Critical** |
| H4 | `_row_float` rejects zero as valid, `except Exception: continue` swallows all | Venue | **High** |
| H5 | `_backend_snapshot` timeout returns stale data/None — callers crash | Venue | **High** |
| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) | Bridge | **High** |
| H7 | `_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET | Venue | **High** |
| H8 | Unknown venue status silently mapped to ACKED | Venue | **High** |
| H9 | `RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost | Zinc | **High** |
| H10 | `RealZincControlPlane.read()` no atomicity with concurrent `update()` | Control | **High** |
| H11 | `_RustKernelLib` lazy init with race condition — concurrent cargo build | Bridge | **High** |
| H12 | `ExecutionKernel.__del__` use-after-free on Rust handle | Bridge | **High** |
| H13 | `MirroredControlPlane` missing protocol methods (wait/notify) | Control | Medium |
| H14 | `TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics | Contracts | Medium |
| H15 | `_maybe_close` asyncio.run RuntimeError silently swallowed | Launcher | Medium |
| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
| H17 | `load_dotenv()` at module level — import-time I/O side effect | Launcher | Medium |
| H18 | `_run()` blocks event loop on every HTTP call via thread pool | Venue | Medium |
| H19 | `HazelcastClientLike` protocol has zero concrete implementations | Projection | Low |
| H20 | `_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM | Zinc | Low |
| H21 | `wasm-bindgen` compiled into native library unnecessarily | Build | Low |
| H22 | `socket.getaddrinfo` monkey-patch in test code | Test | Low |

### Pass 5 Severity Distribution

| Severity | Count |
|----------|-------|
| **Critical** | 3 (H1, H2, H3) |
| **High** | 9 (H4-H12) |
| Medium | 5 (H13, H14, H15, H17, H18) |
| Low | 4 (H19, H20, H21, H22) |
| Info | 1 (H16) |

### Combined Catalog (All 5 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| **Total** | | **138** | **8** | **30** | **37** | **44** | **19** |

---

## PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY

### I1: Entry `apply_fill` sets `slot.size = fill_size` — multiple partial fills overwrite instead of accumulating

**File:** `_rust_kernel/src/lib.rs:798`

```rust
// Entry fill path in apply_fill:
slot.size = fill_size;          // DIRECT ASSIGNMENT
slot.initial_size = slot.initial_size.max(fill_size);  // max, not sum
```

If a single entry order receives multiple partial fills (e.g., LIMIT order on the book):
- Fill #1: `fill_size = 0.5` → `slot.size = 0.5`, `initial_size = max(0, 0.5) = 0.5`
- Fill #2: `fill_size = 0.3` → `slot.size = 0.3`, `initial_size = max(0.5, 0.3) = 0.5`

After both fills, the actual position is 0.8 but `slot.size` reports 0.3. The position is under-counted by 0.5 — 62.5% error.

The exit path correctly does `slot.size = (slot.size - fill_size).max(0.0)` (subtractive). The entry path should accumulate: `slot.size += fill_size`.

This only manifests with LIMIT orders that receive multiple partial fills over time — a scenario entirely absent from tests (I7).

**Severity: Critical**

### I2: `exit_ratio = 0.0` creates zero-size exit order — slot stuck in EXIT_REQUESTED

**File:** `_rust_kernel/src/lib.rs:467-469`

```rust
let exit_ratio = slot.next_exit_ratio();         // returns 0.0 from exit_leg_ratios=[0.0, ...]
let base_size = if slot.initial_size > 0.0 { ... } else { slot.size };
let exit_size = (base_size * exit_ratio).max(0.0); // = 0.0
```

When `exit_leg_ratios` contains `0.0` in any position, `exit_size = 0.0`. The zero-size exit order is submitted to the venue (`intended_size = 0`). On the fill side, `realized_pnl()` returns 0.0 (guarded by `exit_size <= 0.0`), and `slot.size` is unchanged. The slot stays in `EXIT_REQUESTED` with no means to advance — the leg is consumed but nothing happened. Subsequent exits may eventually handle this, but the zero-size leg is a wasted FSM transition that leaves the slot in a confusing intermediate state.

Also: `NaN` in `exit_leg_ratios` (from `clamp(0.0, 1.0)` not guarding NaN, though serde_json rejects NaN) would produce the same zero-size exit behavior.

**Severity: Medium**

### I3: `entry_price` inconsistency — Python uses falsy check, Rust uses `<= 0.0`

**File:** `contracts.py:88-98` (Python), `_rust_kernel/src/lib.rs:227-228` (Rust)

```python
# Python TradeSlot.mark_price():
self.entry_price = self.entry_price or price   # falsy — keeps -0.5, 0.0 replaced

# Rust TradeSlot::mark_price():
if self.entry_price <= 0.0 { self.entry_price = price; }  // catches -0.5, replaces it
```

If `entry_price` is negative (possible only via `set_slot_json` direct injection — not from normal trading), Python keeps it and computes `unrealized_pnl` with wrong sign. Rust replaces it. The Python-side `mark_price` is only called from `ExecutionKernel.mark_price()` in rust_backend.py:LOW-1, which never writes back to the Rust kernel — so the Python-side calculation is purely local and the inconsistency has no effect on the Rust kernel's canonical state. However, the `observe_slots` call after `mark_price` re-reads from the Rust kernel, which recomputes PnL correctly. The Python-side mark_price is effectively wasted computation that never feeds back.

**Severity: Informational**

### I4: No Rust unit tests for 99% of kernel functionality

**File:** `_rust_kernel/src/lib.rs:1731-1765`

Only 1 Rust test exists: `enter_then_ack_fill` — creates a 2-slot kernel, submits ENTER, sends ACK, asserts state transitions. 

**Not tested in Rust:**
- EXIT, CANCEL, MARK_PRICE, RECONCILE, CONTROL actions
- Any FILL event (PARTIAL, FULL)
- CANCEL_ACK, CANCEL_REJECT, ORDER_REJECT
- RATE_LIMITED handling
- Multi-leg exits
- `consume_exit_leg` edge cases
- `realized_pnl()` formula with boundary values
- `mark_price()` with extreme values
- `resolve_slot()` fallback path
- `reconcile_slots_json` dedup/overflow
- Any C FFI boundary function
- Any serde deserialization failure
- Null pointer handling

No `#[cfg(test)]` module exists — the single test is inline. No Rust integration tests (`tests/` directory).

**Severity: High**

### I5: `MockVenueScenario` rejection flags exist but zero tests use them

**File:** `mock_venue.py:23-35`

```python
@dataclass
class MockVenueScenario:
    reject_entries: bool = False
    reject_exits: bool = False
    cancel_reject: bool = False
```

Three boolean flags to simulate venue rejection of orders. Not a single test in `test_flaws.py` sets any of them to `True`. The `ORDER_REJECT` handler in the Rust kernel's `on_venue_event` exists (lib.rs lines ~1440-1460) but is never exercised by any test.

Similarly, `entry_partial_fill_ratio` and `exit_partial_fill_ratio` exist on `MockVenueScenario` but only one test (`test_cancel_entry_with_partial_fill`) uses partial fills at all — and it only checks `size > 0`, not the full capital-accrual chain.

**Severity: High**

### I6: No LIMIT order test through the full kernel path

The test suite has zero LIMIT orders. The Rust kernel doesn't even contain LIMIT-specific logic — all orders are MARKET. The generated live tests have `limit_does_not_fill` and `limit_immediate_fill` scenario placeholders, but:
- `limit_does_not_fill` uses `reference_price=0.0` (not a real LIMIT order)
- `limit_immediate_fill` uses `target_size=-0.001` (negative size → clamped to 0.0)

Neither scenario actually submits a LIMIT order with `order_type="LIMIT"` and a non-zero `limit_price`. The `_legacy_intent` bug (H7) would convert any LIMIT attempt to MARKET anyway.

The only LIMIT-related code is the Rust kernel's `if intent.order_type == "LIMIT"` branches (lib.rs:503, 1584) which are compile-time dead code — `KernelIntent` doesn't have an `order_type` field that serde would populate.

**Severity: High**

### I7: Three weak/vacuous assertions in `test_flaws.py`

**File:** `test_flaws.py`

1. **Line 512:** `assert order.metadata.get("asset") is not None or order.metadata.get("slot_id") is not None` — mock venue always sets both, this can never fail.

2. **Line 700:** `test_pnl_warning_on_unsettled_reentry` — titled to assert a warning is raised but only checks `r.accepted`. Never checks `diagnostic_code` or verifies the warning was issued.

3. **Line 318:** `assert slot.active_entry_order is None or slot.active_entry_order.status == VenueOrderStatus.FILLED` — the `or` allows two different scenarios to pass, reducing diagnostic power.

**Severity: Low**

### I8: `slot.size = fill_size` entry overfill no guard

**File:** `_rust_kernel/src/lib.rs:798`

Already noted in I1 — entry fill sets `slot.size` directly to `fill_size`. Unlike exit fill which has `(slot.size - fill_size).max(0.0)`, there's no guard against entry overfill (venue fills more than the intended order size). For MARKET orders this is fine (one fill per order), but for LIMIT orders with multiple partial fills, the accumulated fill could exceed `initial_size`.

**Severity: Low** (only relevant with LIMIT + partial fills, which don't exist in the codebase)

### I9: No crash durability — slot state is pure in-memory until step 7 of process_intent

**File:** `rust_backend.py:470-560`

The `process_intent` sequence:
1. validate → 2. Rust FSM → 3. venue.submit() → 4. on_venue_event() → 5. projection → 6. zinc_plane

If the process crashes between steps 2-5, the slot state accumulated in the Rust kernel's in-memory `KernelCore` is **completely lost**. The Rust kernel has no WAL, no journal, no persistent store. On restart, `ExecutionKernel.__init__` creates a fresh `KernelCore` with all slots IDLE.

The crash between step 3 and step 5 is the most dangerous: the exchange has an open order/position, but the kernel has no record of it. On restart:
- The Rust kernel sees `slot.slot_id = IDLE` 
- The Zinc slot cache may or may not have the pre-crash state (depends on timing)
- No code on restart loads Zinc state back into the Rust kernel (I14)
- The exchange order lives until it fills (unexpected position) or is manually cancelled

**Concrete example:** `venue.submit()` sends POST to BingX, order placed. HTTP response arrives. `on_venue_event(ORDER_ACK)` transitions slot to `ENTRY_WORKING`. Crash between returning from `on_venue_event` and `zinc_plane.write_slot()`. On restart: slot is IDLE, no active entry order, `_last_settled_pnl` is reset. The exchange has a live ENTRY_WORKING order. Next `process_intent(ENTER)` gets `SLOT_BUSY` because... wait — the fresh kernel doesn't know the order exists, so it sees slot as IDLE and allows a new ENTER. The old order fills on the exchange → double position.

**Severity: Critical**

### I10: `seen_event_ids` lost on restart — events replayed after restart are double-processed

**File:** `_rust_kernel/src/lib.rs:672-683`

`seen_event_ids` is per-slot, per-[`KernelCore`] instance — purely in-process memory. On restart with a fresh `KernelCore`, every slot has `seen_event_ids = Vec::new()`. If events are replayed (from `pump_venue_events()` calling `venue.reconcile()` which re-fetches exchange state):

1. Original run: order fills → `FULL_FILL` with `event_id = "EV-00000042"` → processed, slot → `POSITION_OPEN`
2. Crash
3. Restart: fresh `KernelCore`, `seen_event_ids` empty
4. `pump_venue_events()` fetches same exchange state → new `VenueEvent` objects with new event IDs (adapter's `_event_seq` resets)
5. Rust kernel sees these as novel events — processes them again
6. Position is double-booked, PnL double-settled

The `bingx_venue._event_seq` is an instance-level `itertools.count()` starting from 1. On adapter restart, it resets — so the new event IDs won't match the old ones anyway. Dedup is fundamentally impossible across restarts.

**Severity: Critical**

### I11: No idempotency key (`newClientOrderId`) sent to BingX

**File:** `bingx_venue.py:282-285`, `bingx_direct.py` (external)

BingX supports `newClientOrderId` for order idempotency — sending the same ID twice returns the original order status instead of creating a duplicate. The DITAv2 kernel passes `intent.intent_id` as `decision_id` to the legacy adapter, but there's no guarantee this maps to `newClientOrderId` in the BingX payload.

If the HTTP POST to `/trade/order` times out before the response is read:
1. The order was placed on the exchange
2. `_call_backend` raises a `BingxHttpError` (or similar network exception)
3. `process_intent()` propagates the exception — no retry
4. Next cycle: caller may retry with a new `intent_id`
5. Second POST creates a **second order** on the exchange — duplicate position

Without a client-order-id that persists across retries, the system can create duplicate orders on network timeouts. The exchange has no way to deduplicate.

**Severity: High**

### I12: No graceful degradation for ANY subsystem

Every subsystem failure mode examined:

| Subsystem | Failure | Current behavior |
|-----------|---------|-----------------|
| Zinc SHM init | Corrupted region, OOM | Silent fallback to InMemoryZincPlane (no operator signal) |
| Zinc SHM write | Region overflow, write error | Unhandled exception → kernel crashes |
| Hazelcast write | Cluster unavailable | `.put()` raises → unhandled exception → kernel crashes |
| ClickHouse journal | Sink failure | Exception propagates (no try/except in callers) |
| BingX HTTP | Timeout, rate limit | Exception or REJECTED → slot stuck in ORDER_REQUESTED |
| Rust kernel | Null pointer from FFI | `_take_string` raises RuntimeError → kernel crash |
| Memory pressure | OOM | Process killed by kernel. No signal handler. Zero signal handlers. |

**No subsystem has a graceful degradation path.** No circuit breaker, no retry queue, no fallback to log-only mode, no offline/cached trading mode. Every failure (except the two init-time silent fallbacks) crashes the current kernel operation.

**Severity: High**

### I13: Stray venue event can reactivate a CLOSED slot — no guard

**File:** `_rust_kernel/src/lib.rs:625+`

The `on_venue_event` function has no guard for closed slots:

```rust
fn on_venue_event(&mut self, event: VenueEvent) -> KernelResult {
    // ... resolve slot, check duplicates ...
    // NO: if slot.closed { return ... }
    let prev_state = slot.fsm_state.clone();
    match event.kind {
        SOME_EVENT_KIND => { /* transitions regardless of closed state */ }
    }
}
```

If a stray venue event arrives for a CLOSED slot:
- `ORDER_ACK` → sets `ENTRY_WORKING` — slot re-opens from CLOSED
- `FULL_FILL` → `apply_fill` runs → `slot.size = fill_size`, `fsm_state = POSITION_OPEN`
- `ORDER_REJECT` → clears `trade_id`, `asset`, sets `IDLE` — actually benign reset

A CLOSED slot should be a terminal state that rejects all events. Currently only CANCEL_ACK is harmless on a closed slot; the rest can revive a dead position.

**Severity: High**

### I14: No `reconcile_from_slots` call on startup — Zinc state never loaded into Rust kernel

**Files:** `rust_backend.py:435-465` (init), `real_zinc_plane.py:95-115` (init)

On restart:
1. `RealZincPlane.__init__` reads state from Zinc shared memory into `_slot_cache`
2. `ExecutionKernel.__init__` creates fresh `KernelCore` — all slots IDLE
3. `KernelStateView(self)` reads from the fresh kernel
4. `account.observe_slots([self._get_slot(i) for i in range(max_slots)])` — all slots IDLE

Step 3 and 4 read from the Rust kernel, NOT from Zinc. The Zinc `_slot_cache` populated in step 1 is **never loaded into the Rust kernel**. The `reconcile_on_restart` flag exists in `KernelControlSnapshot` (default `True`) but is never checked anywhere in `ExecutionKernel.__init__` or the launcher.

The system always starts with a blank state even when durable shared memory state exists.

**Severity: High**

### I15: CANCEL_REJECT doesn't clear `active_exit_order` — slot stuck in EXIT_WORKING

**File:** `_rust_kernel/src/lib.rs:1165-1175`

```rust
KernelEventKind::CANCEL_REJECT => {
    if slot.fsm_state == TradeStage::EXIT_WORKING {
        // stays EXIT_WORKING — no state transition
        // active_exit_order remains attached
    }
    diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
}
```

When the exchange rejects a cancel (typically because the order was already filled or no longer exists), the slot stays in `EXIT_WORKING` with `active_exit_order` still attached. Every subsequent CANCEL attempt hits the same path — the exchange returns "order not found," the kernel sees `CANCEL_REJECT`, and the slot is stuck forever.

If the order was already filled (CANCEL_REJECT means "can't cancel, no longer open"), the slot should check the actual position size and potentially transition to `POSITION_OPEN` or `CLOSED` depending on fill status.

**Severity: Medium**

### I16: Zinc shared memory — world-readable/writable by same-machine processes

**Files:** `real_control_plane.py`, `real_zinc_plane.py`

The Zinc shared memory regions are created with these names:
```python
self.region_name = f"{base}_intent"       # e.g., "dita_v2_intent"
self.state_name = f"{base}_state"          # "dita_v2_state"
self.control_name = f"{base}_control"      # "dita_v2_control"
```

Region names are predictable (prefix defaults to `"dita_v2"`). The `SharedRegion` uses POSIX `shm_open` — the default permissions depend on umask (typically `0644` or `0600`). Any process on the same machine can:
- **Read**: Open the region → `as_buffer()` → `_decode_packet()` → read all slot state, PnL, open orders, control settings
- **Write**: Open the region → forge a packet (`struct.pack("!QQ", seq, len) + json_bytes`) → overwrite slot state, inject fake intents, modify control plane

No access control, no encryption, no integrity check (HMAC/signature) on the wire format. The sequence number is the only ordering mechanism, and it's trivially predictable.

**Severity: High**

### I17: `KernelSlotView` exposes full slot state via unrestricted `__getattr__`/`__setattr__`

**File:** `rust_backend.py:411-460`

```python
class KernelSlotView:
    def __getattr__(self, name):
        slot = self._snapshot()
        return getattr(slot, name)         # read ANY field

    def __setattr__(self, name, value):
        setattr(slot, name, value)
        self._kernel._set_slot(slot)       # write ANY field — bypasses FSM
```

Any code with a `KernelSlotView` reference can:
- Read all slot fields: `trade_id`, `size`, `entry_price`, `unrealized_pnl`, `realized_pnl`, `seen_event_ids`, `metadata`
- Write all slot fields: `slot_view.realized_pnl = -9999999` — directly manipulates PnL figures flowing into capital settlement

The `_set_slot` call writes through to the Rust kernel without any FSM validation. The entire kernel state is exposed through mutable Python objects with zero access control.

**Severity: High**

### I18: `sys.path.insert(0, ...)` at import time in three production files

**Files:** `real_control_plane.py:14`, `real_zinc_plane.py:22`, `test_flaws.py:13`, `_build_pink_bodies.py:2`, `_gen_test.py:3`

```python
# real_control_plane.py, real_zinc_plane.py — at MODULE LEVEL:
sys.path.insert(0, str(_ZINC_ADAPTER_PATH))

# test_flaws.py, _build_pink_bodies.py, _gen_test.py — at MODULE LEVEL:
sys.path.insert(0, '/mnt/dolphinng5_predict')
```

`sys.path.insert(0, ...)` gives the injected path highest import priority. An attacker with filesystem write access to the inserted path can create a malicious module that shadows a legitimate import (e.g., `zinc.py`, `utils.py`, `typing.py`). When any subsequent `from X import Y` runs, the attacker's module loads with the full privileges of the kernel process.

The production files use a relative path resolution (`Path(__file__).resolve().parents[3] / "zinc" / "adapters" / "python"`), while the test files use a hardcoded absolute path (`'/mnt/dolphinng5_predict'`). Both patterns are dangerous.

**Severity: High**

### I19: `pump_venue_events` re-fetches exchange state that can produce phantom position events

**File:** `bingx_venue.py:395-415`

`reconcile()` calls `_backend_snapshot()` which fetches current positions and open orders from the exchange. The `_events_from_snapshot` method diff-s the current snapshot against the last-known snapshot to produce events:

```python
def _events_from_snapshot(self, before, after):
    for symbol, current_pos in after.open_positions.items():
        prev_pos = before.open_positions.get(symbol)
        if current_pos and (not prev_pos or abs(prev_pos.position_amount) < 1e-12):
            # This looks like a new position — emit event
```

If `before` is stale (from `_backend_snapshot` timeout), the diff can produce spurious events. A position that existed before the crash is absent from the stale snapshot → the diff sees it as "new" → emits an entry fill event → Rust kernel processes it as a fresh enter → double position. This compounds with I10 (seen_event_ids lost on restart).

**Severity: High**

### I20: `exit_leg_ratios` no guard against empty list — `next_exit_ratio` returns 1.0

**File:** `contracts.py:196-198`

```python
def next_exit_ratio(self) -> float:
    if self.active_leg_index < len(self.exit_leg_ratios):
        return self.exit_leg_ratios[self.active_leg_index]
    return 1.0
```

If `exit_leg_ratios` is empty (default `(1.0,)` prevents this normally, but the default is only `(1.0,)` in the dataclass), `next_exit_ratio()` returns `1.0`. This is the same as "exit everything" — the `consume_exit_leg` then advances `active_leg_index` to `min(1, 1) = 1`, and `all_legs_done = active_leg_index >= exit_leg_ratios.len()` → `1 >= 0 = true` → slot closes. The empty-ratios edge case is silently handled with `unwrap_or(1.0)`, which happens to be correct — but undocumented.

**Severity: Informational**

### I21: No test for rate-limited events — `RATE_LIMITED` kernel path is dead code

**File:** `_rust_kernel/src/lib.rs` (event handler), `MockVenueScenario.mock_venue.py` (no rate_limit flag)

The Rust kernel has a handler for `KernelEventKind::RATE_LIMITED` (lib.rs lines ~1480-1500). The event flows through the Python bridge's `process_intent()` rate-limit detection (rust_backend.py:585-593). But `MockVenueScenario` has no flag to emit rate-limited events. The only path to trigger `RATE_LIMITED` is from the real BingX adapter — which requires live exchange connectivity.

The entire RATE_LIMITED code path — in both Python and Rust — is untested in CI. Any bug in this path only surfaces in production under rate-limit conditions.

**Severity: Medium**

### I22: Thread pool for `_run` — `max_workers=3` shared across ALL adapter instances

**File:** `bingx_venue.py:236-245**

```python
@classmethod
def _get_executor(cls):
    if cls._EXECUTOR is None:
        with cls._EXECUTOR_LOCK:
            if cls._EXECUTOR is None:
                cls._EXECUTOR = ThreadPoolExecutor(max_workers=3, ...)
    return cls._EXECUTOR
```

Class-level singleton — all `BingxVenueAdapter` instances share the same 3-thread pool. With the runtime's `step()` calling `submit()` (1 thread) + `_backend_snapshot` (potentially another thread for open orders) + `cancel()` (1 thread in parallel), all 3 threads are consumed. A fourth concurrent call blocks the calling thread at `.result()` indefinitely — freezing the entire event loop.

The pool is never shut down. If a `BingxVenueAdapter` is destroyed, the threads remain running (zombie workers). No `close()`/`disconnect()` path shuts down the executor.

**Severity: Medium**

---

## Pass 6 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| I1 | Entry `apply_fill` multiple partial fills overwrite size instead of accumulating | Rust | **Critical** |
| I2 | Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED | Rust | Medium |
| I3 | entry_price inconsistency — Python falsy vs Rust `<= 0.0` gate | Bridge | Info |
| I4 | Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer | Rust | **High** |
| I5 | MockVenueScenario rejection flags exist but zero tests use them | Test | **High** |
| I6 | No LIMIT order test through full kernel path | Test | **High** |
| I7 | Three weak/vacuous assertions in test_flaws.py | Test | Low |
| I8 | Entry overfill no guard | Rust | Low |
| I9 | No crash durability — slot state pure in-memory until step 7 of process_intent | Bridge | **Critical** |
| I10 | seen_event_ids lost on restart — events double-processed | Rust | **Critical** |
| I11 | No idempotency key sent to BingX — lost response creates duplicate orders | Venue | **High** |
| I12 | No graceful degradation for ANY subsystem | All | **High** |
| I13 | Stray venue event can reactivate CLOSED slot — no guard | Rust | **High** |
| I14 | No reconcile_from_slots call on startup — Zinc state never loaded into kernel | Restart | **High** |
| I15 | CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING | Rust | Medium |
| I16 | Zinc shared memory world-readable/writable by same-machine processes | Zinc | **High** |
| I17 | KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards | Bridge | **High** |
| I18 | sys.path.insert(0) at import time in 3 production files — malicious module loading | Build | **High** |
| I19 | pump_venue_events stale snapshot diff produces phantom position events | Venue | **High** |
| I20 | exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) | Contracts | Info |
| I21 | RATE_LIMITED code path in both Python and Rust is completely untested | All | Medium |
| I22 | Thread pool max_workers=3 shared across all adapter instances — never shut down | Venue | Medium |

### Pass 6 Severity Distribution

| Severity | Count |
|----------|-------|
| **Critical** | 3 (I1, I9, I10) |
| **High** | 9 (I4, I5, I6, I11, I12, I13, I14, I16, I17, I18, I19) |
| Medium | 4 (I2, I15, I21, I22) |
| Low | 2 (I7, I8) |
| Info | 2 (I3, I20) |

### Combined Catalog (All 6 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| **Total** | | **160** | **11** | **41** | **41** | **46** | **21** |

---

## PASS 7 — TEST INFRA, DATA FEED, RUST DEEPER, ENV PARSING, CONNECTIONS

### J1: Test `_flatten` helper submits wrong direction for LONG positions

**File:** `_build_pink_extended.py` (patch), `gen2.py:399-412`, `gen_live_tests.py:155-169`

Every instance of `_flatten` in the codebase submits a SHORT exit regardless of the actual position direction:

```python
def _flatten(k, symbol, price, label):
    _exit(k, symbol, price, slot_id=0)    # _exit creates a SHORT exit
    # ... if still not free, tries LONG exit
```

`_exit` calls `_si(k, EXIT, ..., "SHORT", ...)`. If the open position is LONG, this SHORT exit is actually an **enter short** — a new position opening, not a flatten. Only after the first attempt fails does it try a LONG exit. This can double the position instead of flattening it.

No test in the suite has ever hit this because no test before the `_verify` step has an open position with the wrong direction — but the code is fundamentally wrong: it assumes all positions are SHORT.

**Severity: Medium**

### J2: Test `_check_slot_accounting` double-counts unrealized PnL

**File:** `_build_pink_extended.py` (patched into generated file)

```python
total_rp = sum(k.slot(i).realized_pnl for i in range(k.max_slots))
total_up = sum(k.slot(i).unrealized_pnl for i in range(k.max_slots))
expected = start_cap + total_rp + total_up
actual = k.account.snapshot.capital
assert abs(actual - expected) < 0.01
```

The accounting identity `capital = start_cap + Σrealized_pnl + Σunrealized_pnl` double-counts unrealized PnL if the Rust kernel's capital computation already includes it. The kernel's `account.snapshot.capital` is updated by `settle()` which adds `realized_pnl` only — so unrealized PnL is NOT included in capital. This means the assertion is actually correct semantically: capital = start + realized + unrealized. Wait — that IS correct. Let me re-examine...

Actually, `account.settle(realized_pnl)` adds only realized PnL to capital. Capital does NOT include unrealized. So `capital = start + realized` and the test adds unrealized on top. If `unrealized > 0`, the assertion `actual == expected` where `expected = actual + unrealized` will **always fail** for open positions. The test only passes when `unrealized ≈ 0` (closed positions or when `mark_price` hasn't been called — which is always, per J4).

**This assertion produces false failures for every test with an open position.** The only reason it doesn't trigger is that `mark_price` is never called, so `slot.unrealized_pnl` is always 0. Silent near-miss.

**Severity: Medium**

### J3: `_build_live_snapshot` uses `time.time()` (float) as timestamp — downstream expects datetime

**File:** `gen_live_tests.py:81`

```python
def _build_live_snapshot(client, symbol, interval=None):
    # ...
    return MarketSnapshot(
        timestamp=time.time(),   # ← float (Unix epoch seconds)
        ...
    )
```

While `gen2.py:352` and `_gen_test.py:138` correctly use `datetime.now(timezone.utc)` (timezone-aware datetime). If any downstream code calls `.isoformat()` or `.strftime()` on the snapshot's timestamp, it crashes with `AttributeError: 'float' object has no attribute 'isoformat'`.

This function is used by the newer `_run` harness in the generated live-test file. Whether the crash manifests depends on what `MarketSnapshot` and `PinkDirectRuntime.step()` do with the timestamp field.

**Severity: High**

### J4: `ExecutionKernel.mark_price()` exists but is never called — no periodic mark-to-market

**File:** `rust_backend.py:667-672`

```python
def mark_price(self, asset: str, price: float) -> None:
    for slot in self.state.slots:
        if slot.asset == asset and slot.is_open():
            slot.mark_price(price)
    self.account.observe_slots(...)
```

This method exists on `ExecutionKernel` but has **zero callers** in the entire codebase. Unrealized PnL is never updated outside of `process_intent` and `on_venue_event` (which only compute realized PnL). The `slot.unrealized_pnl` field stays at its initial value (0) unless `mark_price` is called externally.

The `AccountProjection.observe_slots()` (account.py:53-66) reads `slot.unrealized_pnl` and reports it — but since nothing ever updates it, unrealized PnL is always 0 in the account snapshot.

This means the capital figure reported by `kernel.snapshot()["account"]["unrealized_pnl"]` is **always zero for open positions** — the system has no live mark-to-market.

**Severity: High**

### J5: All VenueEvent timestamps use local machine clock, not exchange timestamp

**File:** `bingx_venue.py` (7 locations)

Every VenueEvent constructed in the venue adapter uses the local machine's clock:

```python
VenueEvent(
    timestamp=datetime.now(timezone.utc),  # local clock, not exchange
    ...
)
```

This includes:
- `_events_from_submit()` (lines 370, 390) — with `getattr(receipt, "timestamp", ...)` fallback that still uses local clock
- `_events_from_cancel()` (lines 455, 480)
- `_event_from_row()` (line 546)
- `_fill_event_from_row()` (line 570)

The exchange's HTTP response includes timestamps (`transactTime`, `updateTime`) that are authoritative. These are available in the raw response dict (stored in `raw_payload`) but are never extracted as the event timestamp. Clock skew between the local machine and the exchange is invisible — event timestamps may be ahead of or behind exchange time.

**Severity: Medium**

### J6: No monotonic timestamp verification anywhere in the system

No code path in the entire codebase checks whether a new timestamp is >= the previous one for the same asset/slot:

- `process_intent()` — no comparison between intent timestamp and slot's `last_event_time`
- `on_venue_event()` — no check that event timestamp >= previous events
- `TradeSlot.last_event_time` is stored but never validated for monotonicity
- `VenueEvent` timestamps from `pump_venue_events()` are never compared with event history

With NTP clock adjustments, daylight saving time changes, or VM clock drift, timestamps can go backwards. The system has no detection or guard.

**Severity: Low**

### J7: `rebuild_indexes()` silently overwrites duplicate `trade_id` — last slot wins, first becomes invisible

**File:** `_rust_kernel/src/lib.rs:571-596`

```rust
fn rebuild_indexes(&mut self) {
    for slot in &self.slots {
        if !slot.trade_id.is_empty() {
            self.active_trade_index.insert(slot.trade_id.clone(), slot.slot_id);
            // ↑ HashMap::insert overwrites — no duplicate check
        }
    }
}
```

If two slots happen to have the same `trade_id` (not prevented by any invariant check), the index maps to the **last** slot with that trade_id. The first slot becomes invisible to `resolve_slot()`'s trade_id-based fallback. Any venue event for that trade_id with an unspecified or negative `slot_id` always resolves to the last slot.

The `process_intent` ENTER handler checks `slot.trade_id != intent.trade_id` to prevent overwriting a different trade on the same slot — but there's no global uniqueness check across all slots.

**Severity: High**

### J8: `resolve_slot()` falls back to slot 0 when all indexes miss — stray event corrupts slot 0

**File:** `_rust_kernel/src/lib.rs:606-622`

```rust
fn resolve_slot(&self, event: &VenueEvent) -> usize {
    // ... try by slot_id, trade_id, venue_order_id, client_order_id ...
    self.slots.first().map(|slot| slot.slot_id).unwrap_or(0)
}
```

When a venue event has:
- `slot_id = -1` (negative — can't be used as usize)
- Empty `trade_id` (trade not found on new kernel after restart)
- Empty `venue_order_id` and `venue_client_id`

...the event is routed to **slot 0** regardless of which slot it was intended for. If slot 0 is in the middle of a trade, the stray event (e.g., a stale ORDER_ACK from a pre-crash order) overwrites slot 0's state. Combined with I10 (seen_event_ids lost on restart), this is a concrete crash-recovery failure path.

**Severity: High**

### J9: `dita_kernel_get_slot_json` and `dita_kernel_snapshot_json` return null with no diagnostic

**File:** `_rust_kernel/src/lib.rs` (FFI exports)

The intent/event processing paths (`process_intent_json`, `on_venue_event_json`) have **two layers** of error handling — parse errors produce a structured `invalid_intent_cstring()` diagnostic JSON, and serialization errors also produce diagnostics.

But the slot/snapshot read functions return bare null pointers:

```rust
// dita_kernel_get_slot_json (line 1608):
Err(_) => ptr::null_mut()    // ← no diagnostic

// dita_kernel_snapshot_json (line 1765):
Err(_) => ptr::null_mut()    // ← no diagnostic
```

The Python caller (`_RustKernelLib.get_slot_json`, line 164) checks `if not raw: raise IndexError(...)` — so null is caught, but the IndexError provides no detail about why it failed. If `snapshot()` returns null (serialization failure with f64 NaN/Inf in some slot), the Python code gets a bare IndexError or RuntimeError with no diagnostic.

**Severity: Medium**

### J10: Two processes with same `DITA_V2_PREFIX` corrupt shared Zinc memory

**File:** `real_zinc_plane.py:79-82`, `launcher.py:302`

```python
# launcher.py:
resolved_prefix = (prefix or os.environ.get("DITA_V2_PREFIX", "dita_v2")).strip() or "dita_v2"

# real_zinc_plane.py:
self.intent_name = f"{base}_intent"   # e.g., "dita_v2_intent"
self.state_name = f"{base}_state"      # "dita_v2_state"
self.control_name = f"{base}_control"  # "dita_v2_control"
```

Two processes on the same machine with the same prefix will:
1. Attach to the same named shared memory regions
2. Overwrite each other's slot state, intents, and control settings
3. Race on concurrent writes — last writer wins with no coordination
4. One process's `create=True` conflicts with another's — `SharedRegion.create()` may fail

There is no prefix uniqueness validation, no PID suffix, no UUID, no lock file, no access control. The prefix defaults to `"dita_v2"` — trivially guessable.

**Severity: High**

### J11: `load_dotenv()` only runs when `launcher.py` is imported — env vars unset for other module paths

**File:** `launcher.py:49-51`, `control.py:205`, `projection.py:71`

```python
# launcher.py (at module level):
load_dotenv(PROJECT_ROOT / ".env")  # only runs on `import launcher`

# control.py (at function call time):
raw = os.environ.get("DITA_V2_CONTROL_PLANE")  # reads env var

# projection.py (at function call time):
raw = os.environ.get("DITA_V2_HAZELCAST")      # reads env var
```

If any code imports `from .control import build_control_plane` directly (without first importing `launcher.py`), `load_dotenv()` has not run. The `.env` file is never loaded. Env vars that should have been set from `.env` are absent.

This creates an ordering dependency: module import order determines whether config files are loaded. Different import paths can produce different runtime behavior.

**Severity: Medium**

### J12: `BINGX_API_KEY`/`BINGX_SECRET_KEY` passed as `None` with no validation — fails at HTTP time

**File:** `launcher.py:195-196`

```python
api_key=os.environ.get("BINGX_API_KEY"),      # None if unset
secret_key=os.environ.get("BINGX_SECRET_KEY"), # None if unset
```

When keys are unset, `None` is passed to `BingxExecClientConfig` and then to the HTTP client. No validation occurs at config/build time. The system:

1. Successfully builds a full `DITAv2LauncherBundle` with empty keys
2. Creates an `ExecutionKernel`
3. The first trade's `venue.submit()` call sends an HTTP request to BingX with empty auth
4. BingX returns 401 — cryptic `"signature verification failed"` error

This is a **late failure** — the operator has no indication of misconfiguration until the first trade attempt. Fast failure at launcher time would catch this.

Also: `gen_live_tests.py:116-117` and `gen2.py:320` use bracket access `os.environ["BINGX_API_KEY"]` which crashes with `KeyError` if the var is missing — an inconsistent pattern (crash immediately vs fail at HTTP time).

**Severity: Medium**

### J13: API credentials never masked in error messages or tracebacks

**File:** `launcher.py:195-196`, `bingx_venue.py` (through config object)

Credentials flow through:
1. `os.environ.get("BINGX_API_KEY")` → `BingxExecClientConfig(api_key=...)`
2. `BingxExecClientConfig` → `BingxDirectExecutionAdapter.__init__(config)`
3. Config object stored as Python attribute — accessible via `repr()`, `str()`, error tracebacks

No code masks, redacts, truncates, or otherwise protects the API key or secret key. If an exception propagates and the traceback includes the config object (through local variables, frame inspection, or exception chaining), the credentials are exposed in logs.

The generated live-test code also embeds credentials literally:
```python
client = BingxHttpClient(api_key="<ACTUAL_KEY>", secret="<ACTUAL_SECRET>", ...)
```

When test files are checked into version control (even temporarily), credentials are at risk.

**Severity: High**

### J14: `_env_bool` treats empty-string var as `False` while unset returns `default` — inconsistent

**File:** `launcher.py:84-88`

```python
def _env_bool(name: str, default: bool = False) -> bool:
    raw = os.environ.get(name)
    if raw is None:
        return default           # unset → uses caller's default
    return str(raw).strip().lower() in {"1", "true", "yes", "on"}
    # empty/whitespace → "" → False
```

| Env Var State | `_env_bool(name, True)` returns |
|---|---|
| Unset (key absent) | `True` (caller's default) |
| Set to `""` (empty) | `False` (empty not in truthy set) |
| Set to `"  "` (whitespace) | `False` |

Setting `DITA_V2_DEBUG_CLICKHOUSE=""` (intending "don't set, use default") actually forces it to `False`, **overriding** the default. And setting `DITA_V2_DEBUG_CLICKHOUSE=" "` (whitespace accidentally) does the same. The operator would need to know that empty and whitespace are treated as explicit falsy values, not as "unset."

**Severity: Low**

### J15: `gen2.py` and `_gen_test.py` both write to the same output file — last writer wins

**Files:** `gen2.py` and `_gen_test.py`

Both generators write to:
```python
OUTPUT = "/mnt/dolphinng5_predict/prod/tests/test_pink_bingx_dita_live_e2e.py"
```

`_gen_test.py` is more complete (includes `_inspect_outcome`, `_assert_accepted`, `_check_slot_accounting`, `_build_fresh_kernel_from_slot` helpers). `gen2.py` is simpler (no helpers). The last file to execute determines what the test file contains.

If `gen2.py` runs last, the helpers from `_build_pink_extended.py` and `_build_pink_bodies.py` are lost — their patches to the generated file become stale updates to a now-overwritten file. The `_check_slot_accounting` assertions in 14 body functions silently become dead code.

**Severity: Medium**

### J16: Shim test bridge has no `step()`, `decision_engine`, `intent_engine` — zero fidelity to production runtime

**File:** (generated in `_build_rb` sections across all test generators)

```python
class Shim:
    def __init__(self, k): self.kernel = k
    async def connect(self, ic=0): self.kernel.venue.connect()
    async def disconnect(self):
        try: self.kernel.venue.disconnect()
        except: pass
```

The Shim provides none of `PinkDirectRuntime`'s capabilities:
- No `step()` method — tests call `k.process_intent()` directly
- No `data_feed` — tests must provide prices manually
- No `decision_engine` — tests construct intents manually
- No `intent_engine` — no intent sizing/validation
- No lifecycle beyond connect/disconnect

The test suite effectively tests `ExecutionKernel` in isolation, not the full runtime pipeline. Any bug in the decision→intent→kernel→fill→persist chain that passes through `step()` is invisible to these tests.

**Severity: High**

---

## Pass 7 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| J1 | `_flatten` submites wrong direction for LONG positions | Test | Medium |
| J2 | `_check_slot_accounting` double-counts unrealized PnL | Test | Medium |
| J3 | `_build_live_snapshot` timestamp is float vs datetime — type crash risk | Data Feed | **High** |
| J4 | `ExecutionKernel.mark_price()` never called — no mark-to-market | Bridge | **High** |
| J5 | All VenueEvent timestamps use local clock, not exchange timestamp | Venue | Medium |
| J6 | No monotonic timestamp verification anywhere | All | Low |
| J7 | `rebuild_indexes()` overwrites duplicate trade_id — last wins, first invisible | Rust | **High** |
| J8 | `resolve_slot()` falls back to slot 0 — stray event corrupts slot 0 | Rust | **High** |
| J9 | `get_slot_json`/`snapshot_json` return null with no diagnostic | Rust | Medium |
| J10 | Two processes with same DITA_V2_PREFIX corrupt shared Zinc memory | Zinc | **High** |
| J11 | `load_dotenv()` only runs on launcher.py import — ordering dependency | Config | Medium |
| J12 | BINGX_API_KEY passed None with no validation — fails at HTTP time | Config | Medium |
| J13 | API credentials never masked in error messages or tracebacks | Config | **High** |
| J14 | `_env_bool` inconsistent: empty string = False vs unset = default | Config | Low |
| J15 | gen2.py and _gen_test.py write to same output — last writer wins | Test | Medium |
| J16 | Shim test bridge lacks step(), decision_engine — zero runtime fidelity | Test | **High** |

### Pass 7 Severity

| Severity | Count |
|----------|-------|
| **High** | 7 (J3, J4, J7, J8, J10, J13, J16) |
| Medium | 7 (J1, J2, J5, J9, J11, J12, J15) |
| Low | 2 (J6, J14) |

### Combined Catalog (All 7 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| **Total** | | **176** | **11** | **48** | **48** | **48** | **21** |

---

## PASS 8 — OBSERVABILITY, MEMORY, TIME, DEAD CODE, MODULE INIT

### K1: Zero stdout/stderr output — system is completely silent

No production code path emits any stdout or stderr output. Zero `print()`, zero `logging` output, zero `warnings.warn()`. The system runs with zero operator-visible evidence of being alive. If Hazelcast and ClickHouse are both disabled, the system is a black box — no logs, no metrics, no health checks, no output of any kind.

`logging` is imported in exactly one file (`bingx_user_stream.py`) with no root logger configuration anywhere. Even those logging calls produce no output without `logging.basicConfig()`.

**Severity: Critical**

### K2: No health check endpoint, no metrics, no monitoring surface

There are zero:
- HTTP health check endpoints (`/health`, `/ready`)
- Prometheus metrics endpoints
- Statsd/Graphite reporters
- Periodic heartbeats
- Liveness/readiness probes
- Process manager integration (no systemd unit, no supervisor config, no container healthcheck)

The only monitoring surface is programmatic — calling `kernel.snapshot()` from Python code with access to the same `ExecutionKernel` instance. For cross-process monitoring, the operator must write custom code to read Zinc shared memory regions and parse the undocumented JSON packets.

**Severity: Critical**

### K3: Failed trades produce no notification — error exists only in return value

`process_intent()` returns `KernelOutcome(accepted=False, diagnostic_code=..., details=...)` but:
- No log line is written for the failure
- No stdout/stderr output
- The failure is not persisted to any durable store (unless debug_clickhouse is enabled and sink is configured)
- If the caller (strategy/algo) doesn't inspect the return value, the failure is completely invisible
- There is no alert mechanism, no error counter, no dead-letter queue

**Severity: High**

### K4: Exception tracebacks not captured in production — all `except:` blocks swallow silently

Every `except Exception: pass` and `except Exception: continue` in the codebase discards the full Python traceback. There is no logging infrastructure to capture it. When an exception occurs:
- `launcher.py:187`: RealZincPlane init failure → traceback lost
- `rust_backend.py:102`: `__del__` exception → traceback lost
- `bingx_venue.py:51`: `_row_float` conversion failure → traceback lost
- `bingx_venue.py:325`: slot lookup failure → traceback lost
- `bingx_venue.py:350`: cancel HTTP error → traceback lost
- `control.py:213`: control plane fallback → traceback lost
- All real_control_plane.py try/except blocks → traceback lost

The only exception information that survives is the final exception message in `BingxHttpError` (converted to a dict) and Rust kernel diagnostic codes (structured JSON). Full Python tracebacks are invisible.

**Severity: High**

### K5: ~85+ Python objects allocated per `process_intent()` call — 36 TradeSlot copies via JSON round-trip

Every `_get_slot()` call does a full JSON serialization (Rust) → C FFI → JSON parse (Python) → new `TradeSlot` dataclass. A single ENTER intent with 2 venue events results in approximately:
- 36 `TradeSlot` instances from repeated `_get_slot()` calls (state refresh, observe_slots, projection writes)
- 4 `VenueEvent` instances
- 3 `KernelOutcome` instances
- ~30 dicts for serialization payloads
- ~4 `KernelTransition` instances

No caching exists — every `_get_slot()` call goes through the full FFI round-trip. Multiple calls within the same `process_intent()` invocation fetch the same slot data multiple times.

**Severity: Medium**

### K6: Circular reference cycle `Kernel` → `StateView` → `SlotView` → `Kernel` — prevents refcount GC

```python
# KernelStateView and KernelSlotView both hold strong references:
self._kernel = kernel     # strong reference
```

This forms `ExecutionKernel → state → slots[]._kernel → ExecutionKernel`. Python's refcounting cannot free this cycle — it depends on the generational GC. The `__del__` method on `ExecutionKernel` (which destroys the Rust `KernelHandle`) fires at an unpredictable time, potentially long after the last explicit reference to the kernel is dropped.

**Severity: High**

### K7: `MemoryKernelJournal` silently drops transitions after 10,000 rows — no warning, no rollover

```python
def record(self, row):
    if len(self.rows) < self.capture_limit:  # capture_limit = 10,000
        self.rows.append(dict(row))
    # else: silently no-op — every subsequent transition is lost
```

After 10,000 transitions, `record()` becomes a no-op. No error, no warning, no FIFO eviction, no rollover to disk. In a production system with 10+ transitions per trade and 100+ trades/day, the journal dies in ~10 days. At that point, all field debugging/troubleshooting capability is silently lost.

Each row holds a full `slot.to_dict()` (~1 KB) plus event/control payloads. The 10,000 rows retain ~10-15 MB permanently.

**Severity: High**

### K8: `RealZincPlane._intent_cache` Python list unbounded — only shared memory write is bounded

```python
# real_zinc_plane.py:189-191
self._intent_cache.append(row)
self._write_region(self.intent_region, self._intent_seq, {"items": self._intent_cache[-512:]})
```

The shared memory write limits to the last 512 entries. But the Python `_intent_cache` list grows unbounded — every intent ever published remains in memory forever. After 1M intents: ~1M dict objects, ~500 MB+ of Python memory.

Note: `InMemoryZincPlane.intent_region` has the same unbounded growth (already documented as F12).

**Severity: High**

### K9: `_backend_snapshot` timeout uses wall-clock `threading.Event.wait()` — NTP can truncate/extend

```python
def _backend_snapshot(self, *, timeout_ms=5000.0):
    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):  # wall clock!
        return self._last_snapshot  # stale data
```

`threading.Event.wait(timeout)` uses the system wall clock. If NTP adjusts the clock:
- **Forward** (e.g., +2 seconds): the timeout is truncated to ~3 seconds — spurious timeout, stale snapshot returned
- **Backward** (e.g., -2 seconds): the timeout extends to ~7 seconds — caller blocks longer than expected

The correct pattern is `time.monotonic()` with a deadline loop — which `InMemoryControlPlane.wait()` already uses correctly. The `_backend_snapshot` timeout is the single highest-impact site because it controls whether the venue adapter returns fresh or stale exchange state.

**Severity: High**

### K10: `RealZincControlPlane.wait()` uses wall clock — no monotonic guarantee

```python
# real_control_plane.py:126-130
def wait(self, timeout_ms=1000):
    try:
        return bool(self.region.wait(timeout_ms))  # wall clock
    except Exception:
        return False
```

The `SharedRegion.wait()` implementation (external) uses wall clock. Same NTP sensitivity as K9, though lower impact (controls shared memory synchronization, not exchange data freshness).

**Severity: Medium**

### K11: `exchange_ts` falls back to local `time.time()` when exchange timestamp `E` is missing

```python
# bingx_user_stream.py:278
ts = int(frame.get("E") or time.time() * 1000)  # local clock fallback
```

When the exchange's WebSocket frame lacks the `E` (event time) field, the code substitutes the local machine's wall clock. Two problems:
1. Local clock may differ from exchange clock by seconds or minutes (VM drift)
2. `time.time()` is wall-clock — subject to NTP backward jumps

Events that lack `E` will have timestamps from a different clock source than events that have `E`. This creates ordering paradoxes in any downstream consumer that sorts by timestamp.

**Severity: Medium**

### K12: No monotonic timestamp verification anywhere in the system

Zero code paths check whether timestamps progress forward:
- `process_intent()` — no comparison between intent timestamp and slot's `last_event_time`
- `on_venue_event()` — no check that event timestamp >= previous events
- `AccountProjectionV2._build()` — no monotonicity check on `ReconcileResult.ts`
- Rust kernel — `last_event_time = Some(event.timestamp)` stored but never validated

NTP backward jumps, clock skew, or VM migration all can produce decreasing timestamps. The system has no detection, no guard, no warning log.

**Severity: Medium**

### K13: `ControlPlane.wait()` and `notify()` have zero callers across all implementations — dead protocol surface

The `ControlPlane` protocol defines `wait(timeout_ms=1000)` and `notify()`. Both are implemented by `InMemoryControlPlane`, `ZincControlPlane`, and `RealZincControlPlane`. But **zero callers exist** in production code:
- `ExecutionKernel` never calls `self.control_plane.wait()` or `.notify()`
- `launcher.py` never calls them
- No test exercises them

Combined with the protocol methods having real implementations (with monotonic clock logic in `InMemoryControlPlane`), this is ~40 lines of dead-but-maintained code.

Similarly: all 7 `ZincPlane` wait/notify methods (`wait_on_intent`, `notify_intent`, `wait_on_state`, `notify_state`, `wait_on_control`, `notify_control`, `read_slots`) have zero callers — dead protocol surface.

**Severity: Informational**

### K14: `AccountProjection.to_account_event()` has zero callers

```python
# account.py:86
def to_account_event(self, metadata=None):
    ...
```

Defined, never called anywhere in production code or tests. Dead code.

**Severity: Informational**

### K15: `HazelcastProjector` entire class dead — zero callers

```python
# hazelcast_projection.py:18-48
class HazelcastProjector:
    def publish_slot(self, slot): ...
    def publish_event(self, event_type, payload): ...
```

Both methods have zero callers anywhere in the codebase. The class can never be constructed from any production code path. The actively-used projection class is `HazelcastRowWriter`.

**Severity: Informational**

### K16: `_order_to_payload()` dead code

```python
# rust_backend.py:220
def _order_to_payload(order):
    ...
```

Defined, never called. Serializing a `VenueOrder` to dict is done inline in `TradeSlot.to_dict()` (contracts.py:127-134), not via this function.

**Severity: Informational**

### K17: `MirroredControlPlane` entire class dead — never constructed

```python
# control.py:171-184
class MirroredControlPlane:
    def __init__(self, inner, mirror_sink=None): ...
```

`build_control_plane()` never returns a `MirroredControlPlane`. The class can only be constructed if someone explicitly instantiates it — no code path does. Similarly, `KernelJournal` protocol is never used as a type annotation outside `journal.py`.

**Severity: Informational**

### K18: 12 of 20 `TradeStage` variants never matched in Rust FSM logic

Defined in the Rust `string_enum!` but never matched in any `process_intent` or `on_venue_event` arm:
`DECISION_CREATED`, `INTENT_CREATED`, `ORDER_SENT`, `ORDER_ACKED`, `ORDER_REJECTED`, `POSITION_OPENED`, `EXIT_SENT`, `EXIT_ACKED`, `EXIT_REJECTED`, `POSITION_PARTIALLY_CLOSED`, `POSITION_CLOSED`, `TRADE_TERMINAL_WRITTEN`

Only 7 variants are used in FSM logic: `IDLE`, `ORDER_REQUESTED`, `ENTRY_WORKING`, `POSITION_OPEN`, `EXIT_REQUESTED`, `EXIT_WORKING`, `CLOSED`, `STALE_STATE_RECONCILING`. The other 12 are serialization-only — they exist in the enum but the kernel never transitions a slot to them.

**Severity: Low**

### K19: Unused imports in `projection.py` and `hazelcast_projection.py`

`projection.py` imports `AccountProjection`, `TradeStage`, `datetime`, `Iterable`, `List` — none used.
`hazelcast_projection.py` imports `KernelTransition`, `TradeSlot`, `KernelControlSnapshot`, `_transition_row` — none used.

These are carryovers from earlier code versions. They add no runtime cost (import is cached after first load) but indicate stale code structure.

**Severity: Informational**

### K20: `sys.path` mutation on import — importing the package appends Zinc path globally

Both `real_control_plane.py:13-15` and `real_zinc_plane.py:22-24` do:

```python
if _ZINC_ADAPTER_PATH.exists() and str(_ZINC_ADAPTER_PATH) not in sys.path:
    sys.path.append(str(_ZINC_ADAPTER_PATH))
```

This fires at module import time as a side effect of importing `__init__.py` (through the chain: `__init__` → `launcher` → `real_control_plane`/`real_zinc_plane`). It modifies the process-global `sys.path`, which persists for the entire process lifetime. If the Zinc adapter path shadows or conflicts with other modules, the consequences are global and hard to debug.

**Severity: Medium**

### K21: `load_dotenv()` runs at module import time — mutates `os.environ` as side effect

```python
# launcher.py:49-51 (at module level)
PROJECT_ROOT = Path(__file__).resolve().parents[3]
load_dotenv(PROJECT_ROOT / ".env")
```

This fires on `import launcher` (which happens via `__init__.py`). Mutates `os.environ` process-globally. Tests that need to set specific env vars must import `launcher` first to get `.env` loaded, then override — or the `.env` values win. Also: if `.env` doesn't exist, `load_dotenv()` silently does nothing, and the import dependency shifts — importing the package may or may not load `.env` depending on filesystem state.

**Severity: Medium**

### K22: `ControlPlane` protocol not in `__init__.py.__all__`

```python
# __init__.py (__all__)
"ControlPlane" not in __all__  # ← hidden from star imports
```

`from prod.clean_arch.dita_v2 import *` exports 44 names but does NOT include `ControlPlane` (the main interface type). Concrete implementations (`InMemoryControlPlane`, `RealZincControlPlane`, etc.) are all exported. The protocol class itself is hidden.

**Severity: Informational**

### K23: `KernelSlotView.__getattr__` makes a ctypes call per attribute access — no caching

```python
# rust_backend.py:422-426
def __getattr__(self, name):
    slot = self._snapshot()              # FFI round-trip every time
    if hasattr(slot, name):
        return getattr(slot, name)
    raise AttributeError(name)
```

Every attribute access on a `KernelSlotView` (e.g., `slot.size`, `slot.fsm_state`, `slot.trade_id`) does a full JSON round-trip to the Rust kernel. The `_snapshot()` method calls `self._kernel._get_slot(self._slot_id)` which calls `_get_rust().get_slot_json()` → Rust serializes slot to JSON → Python parses → creates new `TradeSlot` → attribute is read from the new object.

Accessing 5 fields on a `KernelSlotView` does 5 FFI round-trips. There is no caching of the deserialized `TradeSlot` between accesses.

**Severity: Medium**

---

## Pass 8 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| K1 | Zero stdout/stderr — system completely silent | All | **Critical** |
| K2 | No health check, metrics, or monitoring surface | All | **Critical** |
| K3 | Failed trades produce no notification — error in return value only | Bridge | **High** |
| K4 | Exception tracebacks not captured — all except:pass swallow silently | All | **High** |
| K5 | ~85+ Python objects per process_intent — 36 TradeSlot copies via FFI | Bridge | Medium |
| K6 | Circular ref cycle Kernel→StateView→SlotView→Kernel — delays __del__ | Bridge | **High** |
| K7 | MemoryKernelJournal silently drops transitions after 10K rows | Journal | **High** |
| K8 | RealZincPlane._intent_cache unbounded Python list growth | Zinc | **High** |
| K9 | _backend_snapshot timeout uses wall clock — NTP truncates/extends | Venue | **High** |
| K10 | RealZincControlPlane.wait() uses wall clock — no monotonic | Control | Medium |
| K11 | exchange_ts fallback to local time.time() when E missing | Stream | Medium |
| K12 | No monotonic timestamp verification anywhere | All | Medium |
| K13 | ControlPlane.wait()/notify() — zero callers across all impls | Control | Info |
| K14 | AccountProjection.to_account_event() — zero callers | Account | Info |
| K15 | HazelcastProjector entire class dead | Projection | Info |
| K16 | _order_to_payload() dead code | Bridge | Info |
| K17 | MirroredControlPlane entire class dead — never constructed | Control | Info |
| K18 | 12 of 20 TradeStage variants never matched in Rust FSM | Rust | Low |
| K19 | Unused imports in projection.py and hazelcast_projection.py | Projection | Info |
| K20 | sys.path mutation on import — global side effect | Config | Medium |
| K21 | load_dotenv() at module import time — mutates os.environ globally | Config | Medium |
| K22 | ControlPlane protocol not exported in __all__ | Config | Info |
| K23 | KernelSlotView.__getattr__ makes FFI call per attribute access | Bridge | Medium |

### Pass 8 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 2 (K1, K2) |
| **High** | 7 (K3, K4, K6, K7, K8, K9) |
| Medium | 7 (K5, K10, K11, K12, K20, K21, K23) |
| Low | 1 (K18) |
| Info | 6 (K13, K14, K15, K16, K17, K19, K22) |

### Combined Catalog (All 8 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| **Total** | | **199** | **13** | **55** | **55** | **49** | **27** |

---

## PASS 9 — CONTRACTS, EXCHANGE EVENTS, NETWORK, FFI, BACKUP DIFFS

### L1: `KernelOutcome(accepted=True, diagnostic_code=INVALID_INTENT)` is parseable — no invariant check

**File:** `rust_backend.py:388-402`

```python
def _outcome_from_payload(payload):
    return KernelOutcome(
        accepted=bool(payload.get("accepted", False)),
        diagnostic_code=KernelDiagnosticCode(str(payload.get("diagnostic_code", "OK"))),
    )
```

No validation that `accepted=True` implies `diagnostic_code=OK`, or that `accepted=False` implies a non-OK diagnostic code. If the Rust kernel ever returns contradictory values (e.g., `{"accepted": true, "diagnostic_code": "INVALID_INTENT"}`), Python silently accepts them. The default for both `KernelOutcome.diagnostic_code` and `_outcome_from_payload` fallback is `OK` — an `accepted=False` with no explicit `diagnostic_code` would silently show `OK`.

Similarly, `KernelTransition` has no FSM validation — any `(prev_state, next_state)` pair is accepted, even impossible transitions like `IDLE → POSITION_CLOSED`.

**Severity: Medium**

### L2: `VenueEvent.filled_size > VenueEvent.size` possible — `_fill_event_from_row` uses different source fields

**File:** `bingx_venue.py:530-531`

```python
size=abs(_row_float(row, "executedQty", "z", "lastFilledQty", default=0.0)),
filled_size=abs(_row_float(row, "lastFilledQty", "l", "z", default=0.0)),
```

`size` comes from `executedQty` (cumulative) while `filled_size` comes from `lastFilledQty` (incremental). If `lastFilledQty > executedQty` (exchange-side rounding, partial fill of a partially-cancelled order), `filled_size > size`. The Rust kernel's `apply_fill` uses `event.filled_size` for PnL and position adjustment — an oversized fill could over-count position reduction.

Also: `VenueOrder.filled_size > intended_size` possible via `_venue_order_from_row()` (line 157-163) when the exchange reports `executedQty > origQty`.

**Severity: Medium**

### L3: `VenueEvent.price=0` can reach the kernel from multiple paths

**File:** `bingx_venue.py:495` (via `_row_float` default 0.0), `mock_venue.py:180` (via `0.0` when `reference_price=0`), `rust_backend.py:411` (via outcome default 0.0)

The Rust kernel's `realized_pnl()` guards against `entry_price <= 0.0` and `exit_size <= 0.0`, but `exit_price=0` in a fill event produces `delta = (0 - entry) / entry = -1.0`. For LONG: PnL = -1.0 * notional → -100% of position. A zero-price fill event would register as a total loss.

The `mark_price()` function guards against `price <= 0`, so unrealized PnL is safe. But realized PnL from a zero-price fill is not guarded.

**Severity: High**

### L4: `BingxUserStream` — `available_margin` set to `cw` (cross wallet balance) instead of `crossWalletBalance - usedMargin`

**File:** `bingx_user_stream.py:336`

```python
available_margin=cw   # cw = cross wallet balance, NOT available margin
```

In BingX's `ACCOUNT_UPDATE` frame, `"cw"` is the cross wallet balance (total equity), not the available margin. Available margin = `crossWalletBalance - usedMargin`. The `ExchangeEvent.available_margin` field receives the wrong value. This flows into the dual-ledger accounting's `EBlock.available_margin` — if used for reconcile rules, the exchange-side `available_margin` is overstated.

**Severity: High**

### L5: `BingxUserStream` — `wallet_balance` silently defaults to 0 when `"wb"` is absent

**File:** `bingx_user_stream.py:334`

```python
wallet = _safe_float(usdt_bal.get("wb") or usdt_bal.get("walletBalance"))
```

If neither `"wb"` nor `"walletBalance"` exists in the USDT balance object (possible for some account types or frame formats), `_safe_float(None | None)` returns `0.0`. The exchange wallet balance is silently zeroed, making the E-side of the dual-ledger reconciliation see `wallet_balance=0` when the actual balance is positive. This always produces an ERROR reconcile status (R1: capital >> 0 vs wallet=0).

**Severity: High**

### L6: `BingxUserStream` — `_keepalive_loop` has no stop mechanism — runs forever on old listen key after rotation

**File:** `bingx_user_stream.py:394-405`

```python
async def _keepalive_loop(self, listen_key):
    while True:
        await asyncio.sleep(self._keepalive_secs)
        await self._http.signed_put_raw(...)
```

The keepalive loop is an `asyncio.Task` with no stop signal. When the 24h rotation creates a new listen key, the old keepalive task keeps sending PUT requests to the old (now-deleted) listen key indefinitely. BingX returns errors for keepalive on deleted keys — these errors are suppressed by `with suppress(Exception)` in the delete path but NOT in the keepalive path. The keepalive loop's errors are unhandled.

**Severity: Medium**

### L7: `BingxUserStream` — `event_id` from `frame.get("i")` can be integer 0 — `str(0)` is falsy on `or` chain, generates random UUID

**File:** `bingx_user_stream.py:283`

```python
event_id = str(frame.get("i") or frame.get("event_id") or uuid.uuid4().hex)
```

If `frame.get("i")` returns integer `0` (valid event ID in some BingX frames), `str(0)` gives `"0"` which is falsy on the `or` chain → falls through to `uuid.uuid4().hex`, losing the real event ID. Event dedup downstream sees a random UUID instead of the exchange's ID.

**Severity: Medium**

### L8: BingX test URLs hardcoded in test generators — wrong environment if system targets LIVE

**Files:** `gen_live_tests.py:70,77`, `gen2.py:135`

```python
"https://open-api-vst.bingx.com/openApi/swap/v2/user/positions"
"https://open-api-vst.bingx.com/openApi/swap/v2/quote/price"
```

Hardcoded `vst` (testnet) URLs. The production `launcher.py` path selects VST vs LIVE via `BingxEnvironment` and `DOLPHIN_BINGX_ENV`, but the test generators hardcode VST. If the system is configured for LIVE and these tests run, they hit the wrong exchange environment.

**Severity: Medium**

### L9: No proxy support — cannot be deployed behind corporate proxy

No code parses `HTTP_PROXY`, `HTTPS_PROXY`, `SOCKS_PROXY` or passes proxy configuration to `aiohttp.TCPConnector` or `ClientSession`. The `aiohttp.ClientSession` in `bingx_user_stream.py` is created without any proxy parameter. Deployment behind a corporate proxy or SOCKS proxy requires code changes.

**Severity: Low** (deployment constraint, not a correctness bug)

### L10: 5-minute DNS cache TTL in WebSocket adapter — stale IPs on infrastructure change

**File:** `bingx_user_stream.py:425`

```python
aiohttp.TCPConnector(limit=4, ttl_dns_cache=300)  # 300 seconds = 5 minutes
```

If BingX changes server IPs during an infrastructure migration or failover, the system continues using stale IPs for up to 5 minutes. The connector is recreated on each WS reconnect, so the cache resets — but a reconnection that uses the stale DNS from the just-discarded connector's cache... actually, `ttl_dns_cache=300` means aiohttp caches DNS results for 5 minutes. After a reconnect, the new connector starts with an empty cache. But if the system doesn't reconnect and just keeps the WS alive, DNS changes go undetected for 5 minutes.

**Severity: Low**

### L11: `getattr(intent, "limit_price", 0.0)` reads from dataclass field, not metadata dict — always 0.0

**File:** `bingx_venue.py:267`

```python
metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
```

`intent.limit_price` is a field on `KernelIntent` (default 0.0). The `or 0.0` is redundant — if it's somehow None, `float(None)` raises TypeError before `or` is evaluated. Actually, `getattr(intent, "limit_price", 0.0)` returns `0.0` (the default), then `0.0 or 0.0` → `0.0`, then `float(0.0)` → `0.0`. The result is always `0.0` regardless of what the policy layer set in metadata.

But wait — `limit_price` IS a real field on `KernelIntent` (contracts.py:257, added in this version). If the policy layer sets `intent.limit_price = 10.50`, then `getattr(intent, "limit_price", 0.0)` returns `10.50`, and `float(10.50)` → `10.50`. So this IS correct for the new code where `KernelIntent` has the field. But the `_legacy_intent` function (identical to H7) doesn't check `intent.metadata.get("limit_price")` — it reads the dataclass field. If any caller passes limit_price via metadata dict only, it's lost.

**Severity: Low**

### L12: Backup diff — Rust kernel added 428 lines including entire dual-ledger accounting, 14+ bug fixes

Comparing the backup `rust_kernel_src/lib.rs` (1614 lines) against current `_rust_kernel/src/lib.rs` (2042 lines) reveals:

**Bugs fixed between backup and current:**
- CANCEL now works on entry orders (backup only checked exit orders)
- Partial fills now accumulate (backup overwrote `filled_size`)
- Stale venue events on closed slots now rejected (TERMINAL_STATE guard, I13 fix)
- CANCEL_ACK properly resets entry orders to IDLE
- EXIT transition captures actual `prev_state` instead of hardcoded `POSITION_OPEN`
- `into_c_string` sanitizes NUL bytes instead of panicking (G2 fix)
- Null-string FFI returns diagnostic JSON instead of null pointer
- `invalid_intent_cstring()` helper returns structured diagnostics
- Reconcile validates slot invariants before applying

**New Rust features:**
- `AccountState` dual-ledger struct with K-value vs E-fact reconcile rules
- `on_account_event()` FFI for account-level events
- `set_seed_capital()` FFI
- `INVALID_INTENT` diagnostic code

**Critical finding: The backup still has the entry-fill size overwrite bug (I1), the backward EXIT prev_state bug (G3), and the CANCEL-only-exit-order bug (G10).** These were all fixed in the current code. The backup represents a pre-fix state that would double-settle PnL on partial fills.

**Severity: Informational**

### L13: `_build_full_runtime` in gen_live_tests.py is never called — dead code

**File:** `gen_live_tests.py:148-161`

```python
def _build_full_runtime(initial_capital):
    # Creates HazelcastDataFeed, DecisionEngine, IntentEngine, PinkDirectRuntime
    # ... but never called by any test function
```

This function wires the full production pipeline: `HazelcastDataFeed` + `PinkDirectRuntime` + `DecisionEngine` + `IntentEngine`. But every test function calls `_build_runtime_bundle()` instead, which returns a `_RuntimeShim` with zero fidelity (J16). The real `PinkDirectRuntime` — with `step()`, data feed, decision engine, intent engine — is never instantiated in any test.

Also: `hz_client=build_projection(...)` passes a `HazelcastProjection` (write-side wrapper) where a Hazelcast client object should go — type mismatch.

**Severity: High**

### L14: `BingxUserStream` — `listenKeyExpired` raises RuntimeError instead of clean return — triggers full reconnect

**File:** `bingx_user_stream.py:273`

```python
if frame.get("e") == "listenKeyExpired":
    raise RuntimeError("listenKeyExpired")
```

When the exchange sends `listenKeyExpired`, the code raises `RuntimeError` inside the `_consume()` async generator. This propagates to the outer `subscribe()` loop's `try/except`, which treats it as a connection failure — delays, creates a new listen key, reconnects. The proper behavior is to yield an `ExchangeEvent(kind=RECONNECTED)` and return cleanly, letting the caller handle the rotation without backoff delay.

**Severity: Medium**

### L15: `BingxUserStream` — `_delete_listen_key` suppresses all exceptions — leaked keys on auth failures

**File:** `bingx_user_stream.py:413-416`

```python
async def _delete_listen_key(self, listen_key):
    with suppress(Exception):
        await self._http.signed_delete_raw(...)
```

If the DELETE call fails (invalid signature, expired key, network error), the exception is swallowed. The old listen key remains active on BingX, wasting server resources. Over days of operation with unhandled auth failures, leaked listen keys accumulate server-side.

**Severity: Low**

### L16: Backup diff — `venue_order_id` propagation logic has ambiguous target selection

**File:** `_rust_kernel/src/lib.rs:1110-1125` (current code)

```rust
if !event.venue_order_id.is_empty() {
    let target = if slot.active_entry_order.is_some() {
        slot.active_entry_order.as_mut()
    } else {
        slot.active_exit_order.as_mut()
    };
```

If an entry order exists (even if fully filled and the slot is in `POSITION_OPEN`), ANY incoming event's `venue_order_id` propagates to the entry order — even if the event is for the exit order. The `active_entry_order` status might be `FILLED` but it's still `Some(...)`, so the exit event's ID goes to the wrong order.

**Severity: Medium**

---

## Pass 9 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| L1 | `KernelOutcome(accepted=True, diag=INVALID_INTENT)` parseable — no invariant check | Bridge | Medium |
| L2 | `VenueEvent.filled_size > size` possible via different source fields | Venue | Medium |
| L3 | `VenueEvent.price=0` reaches kernel — zero-price fill = 100% loss PnL | Venue | **High** |
| L4 | `available_margin` set to cross-wallet balance, not available margin | Stream | **High** |
| L5 | `wallet_balance` defaults to 0 when `"wb"` absent — E-side reconcile always ERROR | Stream | **High** |
| L6 | `_keepalive_loop` no stop mechanism — runs on old key after rotation | Stream | Medium |
| L7 | `event_id` integer 0 → `str(0)` falsy on `or` → random UUID generated | Stream | Medium |
| L8 | Hardcoded VST URLs in test generators — wrong env if LIVE configured | Test | Medium |
| L9 | No proxy support — can't deploy behind corporate proxy | Network | Low |
| L10 | 5-minute DNS cache TTL — stale IPs on infrastructure change | Network | Low |
| L11 | `limit_price` getattr reads dataclass field, not metadata dict | Venue | Low |
| L12 | Backup diff: 14+ critical bugs fixed, 428-line dual-ledger accounting added | Rust | Info |
| L13 | `_build_full_runtime` dead — real pipeline never tested | Test | **High** |
| L14 | `listenKeyExpired` raises RuntimeError instead of clean yield | Stream | Medium |
| L15 | `_delete_listen_key` suppresses all exceptions — leaked server keys | Stream | Low |
| L16 | `venue_order_id` target selection ambiguous when entry order exists | Rust | Medium |

### Pass 9 Severity

| Severity | Count |
|----------|-------|
| **High** | 4 (L3, L4, L5, L13) |
| Medium | 8 (L1, L2, L6, L7, L8, L14, L16) |
| Low | 4 (L9, L10, L11, L15) |
| Info | 1 (L12) |

### Combined Catalog (All 9 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| **Total** | | **215** | **13** | **59** | **63** | **53** | **27** |

---

## PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT

### M1: ENTER transition hardcodes `prev_state = IDLE` — every non-IDLE entry corrupts the audit trail

**File:** `_rust_kernel/src/lib.rs:1117`

```rust
let transition = self.transition(
    &slot,
    TradeStage::IDLE,           // HARDCODED — lies about actual prev_state
    slot.fsm_state.clone(),
    "ENTER_INTENT",
);
```

When a slot is entered from `CLOSED` (re-entry) or from any other state that passed the `is_free()` or same-trade bypass, the transition record claims `prev_state = IDLE`. This is **always wrong unless the slot was genuinely IDLE**. Every ENTER transition in the journal for a re-entered slot or a slot coming from CLOSED records an impossible transition (`CLOSED → ORDER_REQUESTED` recorded as `IDLE → ORDER_REQUESTED`).

This corrupts any downstream FSM analysis, journal audit, or trade-lifecycle reconstruction that relies on accurate `prev_state` values.

**Severity: Critical**

### M2: CANCEL intent creates no transition record — invisible in audit log

**File:** `_rust_kernel/src/lib.rs:1287-1305`

The CANCEL branch in `process_intent` returns a `KernelResult` with no call to `self.transition()`. Every other intent (ENTER, EXIT, MARK_PRICE, RECONCILE) records a transition. CANCEL operations — including accepted cancels — are **invisible in the transition audit log**.

Additionally, CANCEL returns `accepted = true` but never mutates the slot's `fsm_state`. The slot stays in whatever state it was in. The caller sees `accepted = true` with no visible effect.

**Severity: Critical**

### M3: `_mk_intent` test helper drops `order_type`/`limit_price` into `metadata` instead of setting proper fields

**File:** `test_flaws.py:43`

```python
def _mk_intent(action, trade_id="t1", size=0.001, price=100.0, slot_id=0, **kw):
    return KernelIntent(
        ...
        metadata=kw,   # order_type="LIMIT" goes into metadata dict, not the dataclass field!
    )
```

`KernelIntent` has dedicated fields `order_type: str = "MARKET"` and `limit_price: float = 0.0` (contracts.py:274-275), but `_mk_intent` passes `**kw` as `metadata=kw`. So `_mk_intent(order_type="LIMIT")` produces `intent.order_type == "MARKET"` (the default) while `intent.metadata["order_type"] == "LIMIT"`.

The Flaw 6 tests in `test_flaws.py` that verify `order_type`/`limit_price` preservation through `_legacy_intent` pass for the **wrong reason** — they check `legacy.metadata.get("order_type")` which finds the value in the passthrough metadata, not because `_legacy_intent` correctly reads `intent.order_type`. If the production code changes and the test helper isn't fixed, the tests silently become false positives.

**Severity: High**

### M4: `test_cancel_entry_with_partial_fill` never sends a CANCEL — misnamed vacuous test

**File:** `test_flaws.py:161-172`

```python
def test_cancel_entry_with_partial_fill(self):
    k = _fresh_kernel(scenario=MockVenueScenario(partial_fill_ratio=0.5))
    k.process_intent(_mk_intent(action=E.ENTER, trade_id="ce4", size=0.002))
    slot_after = k._get_slot(0)
    assert slot_after.size > 0, "Should have partial fill"
```

Named "Cancel entry with partial fill," belongs to `TestFlaw1EntryCancel` — but **no CANCEL intent is ever sent**. It only verifies that a partial fill occurred. The test is completely vacuous for its stated purpose.

The same pattern affects Flaw 9 tests — `test_cancel_uses_slot_asset_not_trade_id` and `test_mock_venue_cancel_event_has_asset` both have "cancel" in their names but never call any cancel function.

**Severity: High**

### M5: Flaw 7 tests (`test_entry_exit_different_ratios`, `test_per_action_type_ratios`) never send EXIT

**File:** `test_flaws.py` Flaw 7 test class

Both tests set `exit_partial_fill_ratio` on the mock venue scenario but only ever process an ENTER intent. The `exit_partial_fill_ratio` is configured but never exercised. The tests verify entry partial fill behavior only — they don't test what their titles and class name claim.

**Severity: Medium**

### M6: `test_dedup_window_accepts_many_events` uses wrong constant — actual=256, flaw claims 64-only 70 events sent

**File:** `test_flaws.py:536-555`

The Flaw 10 tests reference a 64-event dedup window, but the actual Rust constant is `MAX_SEEN_EVENT_IDS = 256` (lib.rs:8). The test sends 70 events and asserts `>= 70`. Since `70 < 256`, no eviction occurs. The test passes trivially regardless of whether the old-64-bound flaw exists. To meaningfully test eviction, >256 events would be needed.

Similarly, `test_dedup_eviction_does_not_accept_old_event` sends only 70 events then checks for dedup — with a 256-entry window, the first event is never evicted. The test verifies basic dedup (non-evicted), not eviction behavior.

**Severity: Medium**

### M7: `test_outcome_state_matches_actual_slot` is tautological — compares value with itself

**File:** `test_flaws.py:200-210`

```python
result = k.process_intent(_mk_intent(action=E.ENTER, trade_id="oc1"))
slot = k._get_slot(0)
assert result.state == slot.fsm_state,
```

`result.state` is set from `final_slot.fsm_state` (which comes from `self._get_slot(outcome.slot_id)` inside `process_intent`). The test then calls `k._get_slot(0)` again. Both read from the same Rust backend — they **must** be equal by construction. This test proves nothing; it's a tautology.

**Severity: Low**

### M8: ORDER_ACK silent fallthrough when no active order — accepts event with no effect

**File:** `_rust_kernel/src/lib.rs:1476-1498`

When `on_venue_event` receives an `ORDER_ACK` for a slot with neither `active_entry_order` nor `active_exit_order` (shouldn't happen normally, but possible after a reconcile or race), the match arm executes **no branch**. The state is unchanged, `diagnostic_code` stays `OK`, and `accepted = true`. The event is silently accepted with no effect — no diagnostic, no warning.

The same bug exists for `CANCEL_ACK` (line 1545): if no matching active order exists, the event is silently accepted with no state change and `OK` diagnostic.

**Severity: Medium**

### M9: ORDER_REJECT on POSITION_OPEN with stale entry order destroys the position

**File:** `_rust_kernel/src/lib.rs:1499-1530`

```rust
KernelEventKind::ORDER_REJECT => {
    if slot.active_entry_order.is_some() && slot.fsm_state != TradeStage::POSITION_OPEN {
        // clear entry, wipe trade data, set IDLE
    } else if slot.active_exit_order.is_some() {
        // clear exit order only, set POSITION_OPEN
    } else {
        // no match — reset to IDLE
    }
}
```

If a slot is in `POSITION_OPEN` (position active) but `active_entry_order` is still `Some` (stale — didn't get cleared on fill), the entry-reject guard `fsm_state != POSITION_OPEN` prevents the entry path. It falls to the exit check. If no exit order, the final `else` branch fires — resetting the slot to **IDLE** and destroying the open position and all trade data.

**Severity: Critical**

### M10: No aggregation of any metric — trade count, success/fail, latency all zero

**File:** entire codebase

The following metrics are completely impossible to obtain from the current system:

| Metric | Why unavailable |
|--------|----------------|
| Total trades processed | `trade_seq` declared on `AccountSnapshot` but never incremented anywhere |
| Succeeded vs failed trades | No aggregation of `KernelDiagnosticCode` outcomes |
| PnL per individual trade | `slot.realized_pnl` is overwritten on slot reuse — no per-trade persistence |
| Slippage (fill vs intended price) | Data exists transiently but no computed metric |
| API calls per minute | No call counters anywhere in the venue adapter |
| `process_intent` latency | Zero timing instrumentation — no `time.monotonic()` in kernel path |
| Process memory usage | No memory tracking of any kind |
| Deduplicated vs fresh event count | Dedup detection exists but is never counted |

The `AccountSnapshot.trade_seq` field (account.py:27) is declared as `trade_seq: int = 0` but **never assigned** — no code path ever sets it above 0. It's a dead field.

**Severity: High**

### M11: Flaw 6 tests pass via metadata passthrough, not via `_legacy_intent` field logic

**File:** `test_flaws.py` Flaw 6 tests

The two Flaw 6 tests verify that `_legacy_intent` preserves `order_type` and `limit_price`. They pass because `_mk_intent(order_type="LIMIT")` puts the value into `intent.metadata`, and `_legacy_intent` copies `intent.metadata` into `legacy.metadata` verbatim. The tests check `legacy.metadata.get("order_type")` which finds the value in the passthrough — **not** because `_legacy_intent` reads `intent.order_type` correctly.

`_legacy_intent` actually reads `getattr(intent, "order_type", "MARKET")` which returns `"MARKET"` (the default, since `_mk_intent` put it in metadata not the field), and sets `legacy.metadata["_order_type"] = "MARKET"`. The assertion passes via the wrong code path. If `_legacy_intent` stopped copying metadata entirely, the tests would still pass as long as `intent.metadata` is passed through.

**Severity: High**

### M12: No retry or fallback for ClickHouse INSERT failures

Evidence across all persistence paths: every `sink(table, row)` call in `pink_clickhouse.py` is unprotected. If ClickHouse is unreachable, slow, or returns an error, the exception propagates unhandled through `persist_step()` → `step()`. No retry, no backoff, no fallback, no queue, no error reporting to `anomaly_events`.

This means a transient ClickHouse outage (common in cloud deployments) crashes the entire policy cycle. The slot state in the Rust kernel may be lost as the exception unwinds.

**Severity: High**

### M13: `AccountSnapshot.trade_seq` declared but never incremented — dead field

**File:** `account.py:27`

```python
@dataclass
class AccountSnapshot:
    ...
    trade_seq: int = 0
```

This field is part of the `AccountSnapshot` dataclass. It's initialized to 0 and **never assigned or incremented** anywhere in the entire codebase. Every snapshot from `kernel.snapshot()["account"]` returns `trade_seq: 0`. Despite being a standard field in every persistence row, it's always `0` — making it impossible to order trades chronologically by sequence number from any persisted data.

**Severity: Medium**

### M14: `test_reentry_after_full_close_no_pnl_loss` uses absurdly loose 50% bound

**File:** `test_flaws.py:686-706`

```python
assert abs(cap_after_second - cap_before) < cap_before * 0.5
```

Allows a 50% capital deviation (12,500 USDT on 25,000). The actual PnL from the test's tiny trades (~0.02 USDT) is orders of magnitude smaller. A bug that silently leaked 10,000 USDT of PnL would pass this test. The bound provides no meaningful verification.

Also: the test never checks `diagnostic_code` for the warning it claims to test (already documented as I7 weakness).

**Severity: Low**

### M15: `test_reconcile_rejects_position_open_with_zero_size` passes even if reconcile silently ignores bad data

**File:** `test_flaws.py:568-585`

```python
result = k.reconcile_from_slots([bad_slot])
slot = k._get_slot(0)
assert slot.fsm_state != TradeStage.POSITION_OPEN or slot.size > 0
```

The assertion was true **before** calling reconcile (slot starts IDLE with size=0). The test never checks `result.accepted == False` or verifies the diagnostic code. If `reconcile_from_slots` silently ignores the bad slot and returns `accepted=True`, the test still passes — it only proves the slot wasn't in POSITION_OPEN _after_ reconcile, which was already true.

The same structural weakness exists in `test_reconcile_rejects_idle_with_nonzero_size`.

**Severity: Low**

### M16: No built-in metric for active slot count, event throughput, or memory usage

The following operational metrics cannot be obtained without writing custom code:

- **Active slot count**: `len([s for s in kernel.state.slots if not s.is_free()])` — requires Python access to the `ExecutionKernel` object. No `active_slot_count` property exists.
- **Total event count**: No counter. The journal tracks individual transitions but there's no `total_events_processed: int` anywhere.
- **Memory usage**: No `tracemalloc`, no `psutil`, no RSS polling. Nothing.
- **Runtime uptime**: No `start_time` or `uptime()` method anywhere.

**Severity: Medium**

### M17: M4 duplicate — test_cancel_uses_slot_asset_not_trade_id and test_mock_venue_cancel_event_has_asset never call cancel

**File:** `test_flaws.py` Flaw 9 class

Both tests verify that an entry order's metadata contains an `asset` key. They never call `scenario.cancel()` or `k.process_intent(action=CANCEL)`. Despite their names and class (`TestFlaw9CancelSymbolFallback`), they test **metadata preservation on entry**, not cancel behavior.

**Severity: High**

### M18: `_decision_to_kernel_intent` drops `order_type` and `limit_price` — LIMIT orders unreachable from the runtime

**File:** `pink_direct.py:79-115` (inferred from E2E trace)

The bridge function converts a `Decision` to a `KernelIntent`. It sets `timestamp`, `intent_id`, `trade_id`, `asset`, `side`, `action`, `reference_price`, `target_size`, `leverage`, `exit_leg_ratios`, `reason`, and `metadata`. It does **NOT** set `order_type` or `limit_price` — both default to `"MARKET"` and `0.0`.

Even if the `DecisionEngine` produced a LIMIT decision with a limit price, the runtime has no path to express it. The entire LIMIT-order pipeline is dead code from the runtime — LIMIT orders can only be set via direct `KernelIntent(...)` construction in tests, which is itself broken (M3).

**Severity: High**

---

## Pass 10 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** |
| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** |
| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** |
| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** |
| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium |
| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium |
| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low |
| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium |
| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position | Rust | **Critical** |
| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** |
| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** |
| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** |
| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium |
| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low |
| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low |
| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium |
| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** |
| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** |

### Pass 10 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 3 (M1, M2, M9) |
| **High** | 7 (M3, M4, M10, M11, M12, M17, M18) |
| Medium | 5 (M5, M6, M8, M13, M16) |
| Low | 3 (M7, M14, M15) |

### Combined Catalog (All 10 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| **Total** | | **233** | **16** | **66** | **68** | **56** | **27** |

---

## PASS 11 — ASYNC/SYNC SEAMS, LOCK ANALYSIS, THREADING

### N1: Rust kernel `with_handle_mut` has zero synchronization — `&mut KernelCore` from raw pointer, UB on concurrent FFI

**File:** `_rust_kernel/src/lib.rs:2042`

```rust
fn with_handle_mut<F, R>(handle: *mut KernelHandle, f: F) -> Result<R, String>
where
    F: FnOnce(&mut KernelCore) -> Result<R, String>,
{
    // Safety: single-threaded; caller holds exclusive access for the duration.
    let core = unsafe { &mut (*handle).core };   // raw ptr → &mut
```

The comment says "single-threaded" but provides **zero enforcement** — no `Mutex`, no `RwLock`, no atomic flag, no thread-local constraints, no `!Send`/`!Sync` marker on `KernelCore`. The `unsafe` block converts a raw pointer to a `&mut` reference, which under Rust's aliasing rules must be **exclusive** — two simultaneous `&mut` references to the same data is **undefined behavior** (data race, torn reads, LLVM miscompilation).

The `ctypes` FFI mechanism releases the GIL during the Rust call (`Py_BEGIN_ALLOW_THREADS`/`Py_END_ALLOW_THREADS`). Two Python threads can call any two `dita_kernel_*` functions simultaneously — one in `process_intent` (writing slot state), another in `snapshot_json` (reading). Both produce `&mut KernelCore`. This is a **compiler-level UB**, not just a logical race.

**Trigger scenario:** Thread A calls `process_intent()` (ENTRY fill → mutates slot). Thread B calls `on_venue_event()` (exit fill → mutates slot). The GIL is released during both Rust FFI calls. Both get `&mut KernelCore`. The Rust compiler can reorder, elide, or speculate any memory operation. Slot data becomes corrupted, PnL doubles, or the process segfaults.

**Severity: Critical** — undefined behavior, no enforcement, no mitigation.

### N2: `_run()` has two completely different code paths depending on event loop state — runtime branch, not design decision

**File:** `bingx_venue.py:225-238`

```python
def _run(self, result):
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)           # Path A: no loop → direct run
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()  # Path B: loop → pool + block
    return result
```

Path A (no event loop running): `asyncio.run(result)` — creates a new event loop, runs the coroutine, closes it. All on the same thread. Correct for sync contexts.

Path B (event loop running): `pool.submit(asyncio.run, result).result()` — submits to a 3-thread pool, each worker creates yet ANOTHER event loop via `asyncio.run()`, then blocks the calling thread with `.result()`.

The `asyncio.get_running_loop()` check is a **runtime probe** — the code doesn't know from its design whether it's in an async context. Same logical operation (run a coroutine), two completely different implementations. Path B is a documented anti-pattern (creating/destroying event loops per call), Path A is correct.

This is the root cause of the entire async/sync seam problem — the architecture never committed to being async or sync.

**Severity: Critical**

### N3: `_run()` Path B blocks the event loop thread for every venue HTTP operation

**File:** `bingx_venue.py:236`

```python
return pool.submit(asyncio.run, result).result()  # BLOCKS calling thread
```

When called from within a running event loop (all live tests, any async deployment), `.result()` **blocks the event loop thread** until the thread pool worker completes. During this block:
- No WS messages can be received from the `BingxUserStream`
- No keepalive tasks can run
- No timer-based events can fire
- The event loop is **stuck**

If the thread pool is exhausted (3 concurrent HTTP calls — e.g., `_backend_snapshot` from `submit()` which calls it twice plus `cancel()` which calls it three times), the 4th call blocks at `.result()` **indefinitely** — the work item is queued but no worker is free. This is a **stuck-process scenario** where the entire system freezes.

The event loop thread is blocked on `.result()`, which means it cannot process the WS events that might contain the fill for the order it just submitted. If the exchange fills instantly, the WS message arrives before `.result()` returns — the WS data sits in the kernel's TCP receive buffer, unprocessed, until `process_intent` completes and the event loop can schedule the WS reader again. This delay can cause stale fills, missed state transitions, or WS timeouts.

**Severity: Critical**

### N4: `asyncio.run()` called repeatedly inside thread pool — creates/destroys event loops per call, documented anti-pattern

**File:** `bingx_venue.py:236`

```python
return pool.submit(asyncio.run, result).result()
```

Each call to `asyncio.run()` creates a new `SelectorEventLoop`, runs it, then closes it. Doing this repeatedly for every HTTP call is a documented CPython anti-pattern:
- Each loop allocation costs memory (selector, callbacks, timeout queue)
- Each loop destruction leaves loop-internal objects for GC
- Over many calls (hundreds of trades), this creates GC pressure and memory fragmentation
- The `asyncio.run()` documentation explicitly says "don't call this repeatedly" — use a long-lived loop

Path A (no event loop) has the same issue — `asyncio.run()` is called per-`_run()` invocation.

The total cost: each `process_intent()` may call `_run()` 3-4 times (`_backend_snapshot` ×2 + `submit_intent` + optionally `cancel`). Each `_run()` creates/destroys an event loop. With 10 trades/min, that's 30-40 event loop creations/destructions per minute.

**Severity: Critical**

### N5: `_snapshot_ready` Event cascading re-fetch — N concurrent callers produce N overlapping HTTP calls

**File:** `bingx_venue.py:258-274`

```python
def _backend_snapshot(self, ...):
    if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
        return self._last_snapshot  # stale
    self._snapshot_ready.clear()
    try:
        snapshot = self._call_backend("refresh_state", ...)  # HTTP call
    except Exception:
        self._snapshot_ready.set()
        raise
    with self._snap_lock:
        self._last_snapshot = snapshot
    self._snapshot_ready.set()
```

When `_snapshot_ready.set()` fires at the end, ALL threads waiting on `.wait()` wake up. Each one proceeds to `clear()` and start a **new** HTTP call — even though a fresh snapshot was just written. With N concurrent callers to `_backend_snapshot`, this produces N overlapping `refresh_state` HTTP calls instead of N-1 callers reading the just-received result.

On BingX VST (rate limit ~10 req/s), 3 overlapping `refresh_state` calls (each doing 5 parallel sub-requests) burns 15 of the 10 req/s budget. The calls overlap and cascade, wasting rate-limit capacity with redundant work.

**Severity: High**

### N6: `BingxUserStream.close()` does not cancel pending tasks — keepalive/rotation tasks continue after close

**File:** `bingx_user_stream.py:160-169`

```python
async def close(self) -> None:
    self._closed.set()
    if self._session is not None and not self._session.closed:
        await self._session.close()
```

`close()` sets the `_closed` event and closes the aiohttp session. It does **not** cancel the `keepalive_task` or `rotation_task` created inside `subscribe()`. These tasks are only cancelled in the `finally` block of `subscribe()`. If `close()` is called while nobody is iterating the `subscribe()` generator (or if iteration is blocked in `_consume()`), those tasks **keep running** until:
- The event loop shuts down (automatic task cancellation)
- The subscribe generator is garbage collected
- An exception occurs in the WS reader

During this window, the keepalive loop continues sending PUT requests to the (now potentially deleted) listen key. The rotation task continues its 23h50m sleep. Both are zombie tasks with no cleanup path.

**Severity: Medium**

### N7: Live test architecture forces worst-case `_run()` behavior for every operation

**File:** `gen_live_tests.py`, `gen2.py`, `_gen_test.py` (all test generators)

The live tests use this pattern:

```python
def test_pink_ditav2_xxx(_live_client) -> None:
    ...
    result = asyncio.run(_run_scenario(bundle, _live_client, body_fn, name, ic))
```

Each test is a **synchronous** function that calls `asyncio.run()`. Inside the resulting event loop, every call to `k.process_intent()` triggers **Path B** of `_run()` — the pool-submit-`.result()` path. The test architecture forces the architecture's slowest, most thread-expensive code path for every single intent.

Every HTTP call: creates a new event loop on a pool thread → blocks the main event loop thread → blocks WS processing → wastes pool slots. Even for trivial mock-venue tests that don't need HTTP at all, the architecture still goes through the same `_run()` → pool → `.result()` path because the mock venue also returns awaitables.

**Severity: Medium**

### N8: `BingxUserStream subscribe()` creates new tasks on every reconnect — rapid reconnect causes task churn

**File:** `bingx_user_stream.py:100-120`

```python
async def subscribe(self):
    while not self._closed.is_set():
        ...
        keepalive_task = asyncio.create_task(self._keepalive_loop(listen_key))
        rotation_task = asyncio.create_task(self._rotation_sentinel())
        ...
        async for event in self._consume(listen_key, rotation_task):
            yield event
```

Each iteration of the reconnect loop creates new `keepalive_task` and `rotation_task`, then cancels the previous ones in the `finally` block. If the connection drops every few seconds (unstable WS), tasks are created and cancelled in rapid succession. Cancellation races with task creation — a task can be cancelled before its first `await`, which changes its state machine.

Also: no rate limiting on the reconnect loop beyond the `delay_ms` exponential backoff. If the WS repeatedly fails immediately after connection, the loop creates/destroys tasks in a tight cycle.

**Severity: Medium**

### N9: No `asyncio.all_tasks()` or task accounting anywhere — leaked tasks undetectable

No code in the entire workspace calls `asyncio.all_tasks()` or maintains a task registry. If a task is leaked (cancellation not propagated, generator not cleaned up), there is:
- No way to detect it programmatically
- No warning log
- No metrics
- No `__del__` fallback

Combined with N6 (tasks not cancelled on close) and N8 (task churn on reconnect), leaked tasks accumulate silently. Each leaked task holds references to its coroutine frame, which may hold references to `aiohttp.ClientSession`, websocket connections, and other resources.

**Severity: Low**

### N10: `_snap_lock` / `_snapshot_ready` pattern has no reader-side protection on `_last_snapshot`

**File:** `bingx_venue.py:258-274`

The `_snap_lock` protects `_last_snapshot` only during writes (line 269-271). The fallback path (timeout at line 260-262) also reads `_last_snapshot` under `_snap_lock`. But the `_call_backend` call at line 266 is **outside** the lock — the snapshot is fetched without holding `_snap_lock`, which is correct (don't hold a lock across HTTP). However, the time between releasing the lock and reacquiring it for the write (line 269) means another thread could also be writing `_last_snapshot` concurrently. The `_snap_lock` ensures only one write at a time, but the `_last_snapshot` can still be overwritten between threads — this is the intended behavior (last writer wins for staleness purposes, not a correctness bug).

**Severity: Informational**

---

## Pass 11 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| N1 | Rust kernel `with_handle_mut` zero synchronization — `&mut` from raw ptr, UB on concurrent FFI | Rust | **Critical** |
| N2 | `_run()` has two completely different code paths — runtime branch, not design decision | Venue | **Critical** |
| N3 | `_run()` path B blocks event loop thread for every venue HTTP operation | Venue | **Critical** |
| N4 | `asyncio.run()` called repeatedly — creates/destroys event loops per call, documented anti-pattern | Venue | **Critical** |
| N5 | `_snapshot_ready` cascading re-fetch — N callers produce N overlapping HTTP calls | Venue | **High** |
| N6 | `BingxUserStream.close()` doesn't cancel pending tasks — zombie keepalive/rotation after close | Stream | Medium |
| N7 | Live test architecture forces worst-case `_run()` path for every operation | Test | Medium |
| N8 | `subscribe()` reconnect creates new tasks per iteration — rapid reconnect causes task churn | Stream | Medium |
| N9 | No `asyncio.all_tasks()` or task accounting — leaked tasks undetectable | All | Low |
| N10 | `_snap_lock`/`_snapshot_ready` no reader-side protection (informational) | Venue | Info |

### Pass 11 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 4 (N1, N2, N3, N4) |
| **High** | 1 (N5) |
| Medium | 3 (N6, N7, N8) |
| Low | 1 (N9) |
| Info | 1 (N10) |

### Combined Catalog (All 11 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| **Total** | | **243** | **20** | **67** | **70** | **58** | **28** |

---

## PASS 12 — SYNC/ASYNC WIDER SCOPE (launcher, generators, streams, FFI, tests)

### O1: `_maybe_close()` calls `asyncio.run()` without checking for a running event loop — close/disconnect silently skipped

**File:** `launcher.py:270-274`

```python
def _maybe_close(obj):
    ...
    if inspect.isawaitable(result):
        try:
            asyncio.run(result)
        except RuntimeError:
            pass  # SILENT — coroutine never executed
```

When `_maybe_close()` is called from any context that already has a running event loop (which includes all async tests, any `async def main()` orchestrator, or any code path that imports and runs `DITAv2LauncherBundle` inside an async context), `asyncio.run(result)` raises `RuntimeError: asyncio.run() cannot be called from a running event loop`. The `except RuntimeError: pass` swallows it — the close/disconnect method **never executes**.

Affected resources when called from async context:
- `RealZincPlane.close()` — never called → 3 shared memory regions leaked
- `RealZincControlPlane.close()` — never called → 1 shared memory region leaked
- `BingxVenueAdapter` has neither `close()` nor `disconnect()` — N/A
- `InMemoryZincPlane` has no close — N/A

The `DITAv2LauncherBundle.close()` method calls `_maybe_close(self.venue)`, `_maybe_close(self.zinc_plane)`, `_maybe_close(self.control_plane)` — if any of these have async close/disconnect methods, they're all silently skipped when called from async context.

This means: in any async deployment (which is the only deployment pattern — tests, and presumably production via `asyncio.run()` at top level), **shared memory regions are never explicitly closed**. They rely on process exit cleanup.

**Severity: High**

### O2: `async def connect()` shims in all test generators call sync `venue.connect()` without `await` — misleading pattern

**Files:** `gen_live_tests.py:143-146`, `gen2.py:332-333`, `_gen_test.py:70` (via Shim/Shim pattern)

```python
# All three test harnesses have this pattern:
async def connect(self, initial_capital=0):
    self.kernel.venue.connect()          # sync method, no await
```

`BingxVenueAdapter.connect()` (bingx_venue.py:301) is a **sync** `def` that returns `bool`. It internally calls `self._run(result())` which under a running event loop submits to the thread pool and blocks with `.result()`. The `async def connect()` wrapper is misleading — it's `async` but immediately calls a sync method that will **block the event loop** for the HTTP round-trip duration.

The caller's perspective: `await runtime.connect()` should yield the event loop. Instead, it blocks until the BingX HTTP call inside `connect()` completes (via `_run()`'s thread pool path).

**Severity: Medium**

### O3: `gen_live_tests.py:171` — `_contract_rows(client)` NOT awaited in `async def _pick_live_symbol` — silent failure

**File:** `gen_live_tests.py:171**

```python
async def _pick_live_symbol(client):
    rows = _contract_rows(client)  # MISSING await! _contract_rows is async def
    ...
    pos_rows = [r for r in rows if ...]
```

`_contract_rows` is `async def` (line 69). Without `await`, `rows` is a **coroutine object**, not the actual data. The subsequent iteration `for r in rows` would iterate over a coroutine object — in Python 3.12+, coroutines raise `TypeError: 'coroutine' object is not iterable` when iterated.

This function is called from `_run_scenario` (line 260) and `_run_pink_live_roundtrip` (line 297). If either path reaches `_pick_live_symbol`, it crashes with `TypeError`. This bug may not have manifested in practice if the code paths that call `_pick_live_symbol` are rarely exercised or if the test generator's output file hasn't been regenerated recently.

**Severity: High**

### O4: `test_exchange_event_seam_parity.py` uses deprecated `asyncio.get_event_loop().run_until_complete()`

**File:** `test_exchange_event_seam_parity.py:243,264**

```python
snap = asyncio.get_event_loop().run_until_complete(mock.account_snapshot())  # line 243
asyncio.get_event_loop().run_until_complete(asyncio.wait_for(_collect(), timeout=2.0))  # line 264
```

`asyncio.get_event_loop()` is **deprecated** in Python 3.12+ (raises `DeprecationWarning`). If no running event loop exists at call time, it creates a new loop and sets it as the current event loop — which can cause subtle issues when multiple event loops are active. The modern pattern is `asyncio.run()`.

These are the only two places in the workspace that use the deprecated `get_event_loop().run_until_complete()` pattern.

**Severity: Medium**

### O5: `_run()` thread pool has no timeout on `.result()` — if backend hangs, calling thread hangs forever

**File:** `bingx_venue.py:236**

```python
return pool.submit(asyncio.run, result).result()  # NO timeout
```

`concurrent.futures.Future.result()` has an optional `timeout` parameter. None is set here. If the thread pool worker hangs (e.g., the `asyncio.run()` call in the worker gets stuck on a never-responding HTTP request, a deadlocked coroutine, or an infinite loop), the calling thread blocks **forever** on `.result()`.

If the calling thread is the event loop thread (Path B), the entire event loop is frozen indefinitely. No WS messages, no keepalive tasks, no timer events. The system is completely dead.

The `_backend_snapshot()` method has a 5-second timeout for its `threading.Event.wait()`, but the actual `_call_backend("refresh_state", ...)` that runs inside the thread pool has no timeout. The HTTP client (`BingxHttpClient`) may have its own default timeout (typically 30-60 seconds for `aiohttp`), but there's no fallback if it hangs beyond that.

**Severity: High**

### O6: MockVenueAdapter never exercises the thread-pool bridge — all CI tests use mock venue, bridge untested

**Files:** `mock_venue.py` vs `bingx_venue.py`

`MockVenueAdapter.submit()` is pure sync — it does `return self._events_from_submit(...)` with no awaitables, no thread pools. `BingxVenueAdapter.submit()` is a sync-bridge that goes through `_run()` → `pool.submit(asyncio.run, ...).result()`.

All 35+ tests in `test_flaws.py` use `MockVenueAdapter`. All generated live tests use `BingxVenueAdapter` but are rarely executed (require live exchange credentials and API key env vars). The thread-pool bridge — including:
- Thread creation and lifecycle
- `asyncio.run()` inside pool workers
- Event loop per HTTP call
- Thread pool exhaustion handling
- Exception propagation through `.result()`

— is **never exercised in CI**. If the bridge has a bug (e.g., the `asyncio.run()` inside the pool worker corrupts shared state, or thread-safety issues in `aiohttp`), it surfaces only in production.

**Severity: Medium**

### O7: `BingxUserStream._keepalive_loop` and `_rotation_sentinel` are fire-and-forget tasks — unhandled exceptions silently lost

**File:** `bingx_user_stream.py:105-112**

```python
keepalive_task = asyncio.create_task(self._keepalive_loop(listen_key), name="lk_keepalive")
rotation_task = asyncio.create_task(self._rotation_sentinel(), name="lk_rotation")
```

Both are created with `create_task()` and tracked for later cancellation, but **not supervised during normal operation**. If `_keepalive_loop` raises an exception that's not caught by its internal `try/except` (e.g., a `asyncio.CancelledError` variant, or a `RuntimeError` from the HTTP layer), the exception is stored in the `Task` object. If `.result()` or `.exception()` is never called on that `Task`, the exception is logged by the asyncio event loop as `"Task exception was never retrieved"` — a warning message, but no structured error handling.

`_rotation_sentinel` has no exception handling in its body — it just does `await asyncio.sleep(secs)` and returns. It can't raise an exception unless the event loop is shut down during its sleep (in which case `CancelledError` is raised, which is properly handled in the `finally` block).

**Severity: Low**

### O8: `KernelSlotView.__getattr__` makes a ctypes call per attribute — each read triggers Rust FFI and is not cached

**File:** `rust_backend.py:422-426**

```python
def __getattr__(self, name: str) -> Any:
    slot = self._snapshot()   # FFI call → Rust serialize → JSON parse → TradeSlot
    if hasattr(slot, name):
        return getattr(slot, name)
    raise AttributeError(name)
```

Every attribute access on a `KernelSlotView` — including `slot.size`, `slot.fsm_state`, `slot.trade_id`, `slot.active_entry_order`, etc. — does a full JSON round-trip to the Rust kernel:
1. Python calls `_get_rust().get_slot_json(self._backend, slot_id)`
2. ctypes calls Rust `dita_kernel_get_slot_json`
3. Rust serializes the entire `TradeSlot` to a JSON string
4. ctypes returns the C string pointer
5. Python calls `_take_string(raw)` → `text.decode("utf-8")`
6. Python calls `json.loads(text)` → dict
7. `_slot_from_payload(dict)` → new `TradeSlot` dataclass
8. `getattr(slot, name)` → read the one field from the new object

Accessing 5 fields on a `KernelSlotView` (e.g., `slot.size`, `slot.fsm_state`, `slot.entry_price`, `slot.active_entry_order`, `slot.trade_id`) does 5 FFI round-trips. The deserialized `TradeSlot` is created and immediately discarded for each access.

The `_snapshot()` method (line 435) calls `self._kernel._get_slot(self._slot_id)` which does the full FFI round-trip. There is no caching of the deserialized `TradeSlot` between successive accesses. This is an N+1 performance issue — accessing N fields costs N FFI calls instead of 1.

**Severity: Medium**

### O9: `DITAv2LauncherBundle` has no `__del__` — bundle that's garbage collected leaks its entire resource tree

**File:** `launcher.py:64-95**

```python
@dataclass
class DITAv2LauncherBundle:
    kernel: ExecutionKernel
    control_plane: ControlPlane
    projection: HazelcastProjection
    zinc_plane: ZincPlane
    venue: VenueAdapter

    def close(self) -> None:
        _maybe_close(self.venue)
        _maybe_close(self.zinc_plane)
        _maybe_close(self.control_plane)
```

No `__del__` method. If a bundle is garbage collected without an explicit `close()` call:
- The Rust kernel's `KernelHandle` is freed by `ExecutionKernel.__del__` (if GC runs)
- If `RealZincPlane` was in use, its `close()` is never called → 3 shared memory regions leaked
- If `RealZincControlPlane` was in use, its `close()` is never called → 1 shared memory region leaked
- The projection (Hazelcast) client connection is never closed
- The venue adapter's thread pool executor is never shut down

If the bundle is created and dropped in a loop (e.g., per-test setup/teardown), shared memory regions accumulate until the system runs out of `/dev/shm/` space.

**Severity: Medium**

### O10: ExecutionKernel has no `close()` — `__del__` is the only cleanup path for the Rust handle

**File:** `rust_backend.py:519-525**

```python
def __del__(self) -> None:
    backend = getattr(self, "_backend", None)
    if backend is not None:
        try:
            _get_rust().destroy(backend)
        except Exception:
            pass
```

No `close()` method exists on `ExecutionKernel`. The `DITAv2LauncherBundle.close()` doesn't touch the kernel (it calls `_maybe_close` on venue, zinc_plane, and control_plane only). The Rust `_backend` handle is only freed when `__del__` runs during garbage collection.

If the kernel is part of a reference cycle (K3/K6 — `Kernel → KernelStateView → KernelSlotView → Kernel`), `__del__` may be delayed indefinitely until the cycle GC runs. During that delay, the Rust `KernelHandle` is alive but unreachable — its memory is leaked until GC.

**Severity: Medium**

### O11: `KernelSlotView.__setattr__` triggers 5 side effects including durable writes — undocumented

**File:** `rust_backend.py:428-453**

```python
def __setattr__(self, name: str, value: Any) -> None:
    ...
    slot = self._snapshot()
    setattr(slot, name, value)
    self._kernel._set_slot(slot)  # triggers: Rust FFI write + state refresh
                                 #           + account.observe_slots
                                 #           + projection.write_slot
                                 #           + zinc_plane.write_slot
```

Setting any attribute on a `KernelSlotView` — even something trivial like `slot.some_metadata_field = "test"` — triggers 5 side effects: Rust FFI write to the kernel, `KernelStateView.refresh()`, `account.observe_slots()`, `projection.write_slot()`, and `zinc_plane.write_slot()`. The method name `__setattr__` gives no indication that setting a field triggers durable writes across multiple persistence layers.

There is no read-only view that prevents accidental mutation. Any code that holds a `KernelSlotView` reference and assigns a field bypasses all FSM guards and directly mutates the Rust kernel state.

**Severity: Medium**

---

## Pass 12 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| O1 | `_maybe_close()` asyncio.run without loop guard — close/disconnect silently skipped from async context | Launcher | **High** |
| O2 | `async def connect()` shims call sync `venue.connect()` without await — blocking pattern | Test | Medium |
| O3 | `_contract_rows(client)` NOT awaited in `_pick_live_symbol` — silent coroutine iteration crash | Test | **High** |
| O4 | `test_exchange_event_seam_parity.py` uses deprecated `get_event_loop().run_until_complete()` | Test | Medium |
| O5 | `_run()` thread pool `.result()` has no timeout — backend hang freezes process indefinitely | Venue | **High** |
| O6 | MockVenueAdapter never exercises thread-pool bridge — bridge untested in CI | Venue | Medium |
| O7 | `_keepalive_loop`/`_rotation_sentinel` fire-and-forget tasks — exceptions silently lost | Stream | Low |
| O8 | `KernelSlotView.__getattr__` makes N FFI calls for N attribute accesses — no caching | Bridge | Medium |
| O9 | `DITAv2LauncherBundle` no `__del__` — GC'd bundle leaks entire resource tree | Launcher | Medium |
| O10 | `ExecutionKernel` no `close()` — Rust handle only freed by unpredictable `__del__` | Bridge | Medium |
| O11 | `KernelSlotView.__setattr__` triggers 5 persistence side effects — read-only view missing | Bridge | Medium |

### Pass 12 Severity

| Severity | Count |
|----------|-------|
| **High** | 3 (O1, O3, O5) |
| Medium | 7 (O2, O4, O6, O8, O9, O10, O11) |
| Low | 1 (O7) |

### Combined Catalog (All 12 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| **Total** | | **254** | **20** | **70** | **73** | **60** | **28** |

---

## PASS 13 — FFI BOUNDARY SAFETY, DANGLING POINTERS, COVERAGE GAPS

### P1: `dita_kernel_destroy` double-free UB — Python does not null `handle.value` after destroy

**File:** `rust_backend.py:145-148`, `_rust_kernel/src/lib.rs:2081-2088`

```python
# Python destroy():
def destroy(self, handle):
    if handle and handle.value:
        self.lib.dita_kernel_destroy(handle)  # handle.value NOT nulled
```

```rust
// Rust dita_kernel_destroy:
pub extern "C" fn dita_kernel_destroy(handle: *mut KernelHandle) {
    if !handle.is_null() {
        unsafe { drop(Box::from_raw(handle)); }
    }
}
```

If `destroy()` is called twice on the same handle:
1. First call: `Box::from_raw(handle)` frees the memory. Python's `handle.value` still points to the old (now dangling) memory.
2. Second call: `handle and handle.value` is `True` (dangling but non-null). Passes to Rust.
3. Rust: `!handle.is_null()` is `True`. `Box::from_raw(handle)` on a dangling pointer is **undefined behavior** — heap corruption, use-after-free, or silent data corruption.

**Trigger scenarios:**
- `ExecutionKernel.__del__` calls `destroy()` if `_backend` is non-null. If user code also calls `destroy()` explicitly (no such code today, but the method is public), double-free.
- If `__del__` runs during interpreter shutdown after the `_RustKernelLib` CDLL object is partially finalized, the `self.lib` attribute might be `None` → `TypeError` rather than double-free, but if the CDLL is still alive, double-free.
- Test code that creates/destroys kernels in a loop (fresh_kernel pattern) could trigger double-destroy if GC runs finalization twice on the same handle (possible with reference cycles and PEP 442).

**Fix:** Python `destroy()` should set `handle.value = None` after calling, or use a `_destroyed` flag.

**Severity: Critical** — undefined behavior on any double-destroy path.

### P2: `CStr::from_ptr(payload)` without null guard in multiple FFI exports

**File:** `_rust_kernel/src/lib.rs` — `dita_kernel_set_exchange_config_json`, `dita_kernel_calibrate_fee_json`, `dita_kernel_on_account_event_json`

```rust
pub extern "C" fn dita_kernel_set_exchange_config_json(handle: *mut KernelHandle, payload: *const c_char) -> i32 {
    let payload = unsafe { CStr::from_ptr(payload) };  // NO NULL CHECK — UB if payload is null
    let payload_str = payload.to_str().map_err(|_| -1)?;
    ...
}
```

Three FFI functions call `CStr::from_ptr(payload)` directly on the raw `*const c_char` parameter **without** checking for null first. If a null pointer is passed (from Python ctypes passing `None`, or from a bug in a future caller), this reads from memory address 0 — **segfault or undefined behavior**.

The existing helper `cstr_to_string()` (line 1500) correctly checks for null:
```rust
fn cstr_to_string(ptr: *const c_char) -> Result<String, String> {
    if ptr.is_null() { return Err("NULL_POINTER".to_string()); }
    unsafe { CStr::from_ptr(ptr) }
        .to_str().map(|s| s.to_string()).map_err(|e| e.to_string())
}
```

But these three FFI functions bypass it. Only `dita_kernel_set_exchange_config_json` is called from Python; `calibrate_fee` and `on_account_event` are newer functions.

**Fix:** Use `cstr_to_string()` or add an explicit `if payload.is_null() { return -1; }` guard.

**Severity: High** — null pointer dereference on any call with a null payload.

### P3: `_check_open_orders` calls `asyncio.run()` from within async `_verify()` — RuntimeError in live test execution

**File:** `_gen_test.py:104`, `_build_pink_extended.py:75-78`

```python
def _check_open_orders(c, vs):
    r = __import__('asyncio').run(c._request_json("GET", "/openApi/swap/v2/trade/openOrders", ...))
```

This is a **sync** `def` that calls `asyncio.run()`. It is called from `_verify()` which is `async def` (inside the generated test file). When `_verify` runs inside `asyncio.run(_run(...))`, there is a **running event loop**. `_check_open_orders` calls `asyncio.run(...)` which detects the running loop and raises `RuntimeError: asyncio.run() cannot be called from a running event loop`.

The same pattern exists in `_build_pink_extended.py:75-78` in the patched version of `_check_open_orders`.

**Fix:** Make `_check_open_orders` `async def` and use `await` instead of `asyncio.run()`.

**Severity: High** — any live test that calls `_verify` (which all live tests do via `_run`) will crash.

### P4: `into_c_string` replaces NUL bytes with `"\\u0000"` — produces invalid JSON

**File:** `_rust_kernel/src/lib.rs:2006-2013`

```rust
fn into_c_string(value: &str) -> *mut c_char {
    match CString::new(value) {
        Ok(cs) => cs.into_raw(),
        Err(_) => {
            let sanitized = value.replace('\0', "\\u0000");  // literal backslash-u-0-0-0-0
            CString::new(sanitized).unwrap_or_else(|_| CString::new("").unwrap()).into_raw()
        }
    }
}
```

When a string contains an interior NUL byte (`\0`), `into_c_string` replaces it with the 8-character ASCII string `"\\u0000"`. If this string is a JSON payload — which it always is for `process_intent_json`, `on_venue_event_json`, etc. — the sanitized string is **not valid JSON**. Python's `json.loads()` in `_take_string` receives invalid JSON and raises `json.JSONDecodeError`.

This is a data integrity issue: a NUL byte in an intent field (which shouldn't happen in normal use but could come from a malformed exchange response) causes the entire intent to fail with a `JSONDecodeError` rather than a clean `INVALID_INTENT` diagnostic.

Note: The NUL-byte panic (G2) was fixed by adding this sanitizer, but the sanitizer produces invalid JSON, trading a crash for a different failure mode.

**Fix:** Strip NUL bytes entirely (`.replace('\0', "")`) before JSON construction, or reject the intent with `invalid_intent_cstring` if NUL bytes are detected.

**Severity: Medium**

### P5: `reconcile_slots_json` returns null on serialize failure — inconsistent with intent/venue error paths

**File:** `_rust_kernel/src/lib.rs:2258`

```rust
// reconcile_slots_json unwrap_or:
.with_handle_mut(handle, |core| ...)
.unwrap_or(ptr::null_mut())  // returns null — NO diagnostic
```

When `reconcile_slots_json` encounters a parse or serialize failure, it returns `ptr::null_mut()`. Python's `_take_string` raises `RuntimeError("Rust kernel returned null string")` — an unhandled exception.

Compare with `process_intent_json` and `on_venue_event_json` which use:
```rust
.map_err(|e| invalid_intent_cstring("INVALID_INTENT_PARSE", &e))
.unwrap_or_else(|ptr| ptr)  // returns structured diagnostic JSON
```

The reconcile and snapshot paths return bare null — no diagnostic, no structured error, no way for the Python side to distinguish "parse error" from "serialize error" from "null handle."

The same issue affects `dita_kernel_snapshot_json` (line 2269).

**Severity: Medium**

### P6: `_get_rust()` TOCTOU race on first call — concurrent threads both see `_RUST is None`

**File:** `rust_backend.py:271-275`

```python
def _get_rust():
    global _RUST
    if _RUST is None:
        _RUST = _RustKernelLib()  # two threads can both enter here
    return _RUST
```

Two threads calling `_get_rust()` simultaneously on first access both see `_RUST is None`. Both enter the `if` block. Both call `_RustKernelLib()` which:
1. Calls `_ensure_library()` which runs `subprocess.run(["cargo", "build", "--release", ...], check=True)` — **two concurrent cargo builds** can corrupt the build directory.
2. Calls `ctypes.CDLL(path)` — loads the shared library twice. The second CDLL object is assigned to `_RUST` (overwriting the first), which is then GC'd, but the Rust library's global state may have been initialized twice.

**Fix:** Use a module-level lock.

**Severity: High**

### P7: `KernelHandle` has no `!Send`/`!Sync` — but ctypes FFI bypasses all Rust ownership rules

**File:** `_rust_kernel/src/lib.rs`

`KernelHandle` and `KernelCore` have no explicit `unsafe impl Send` or `unsafe impl Sync`. The Rust compiler would auto-derive `Send`/`Sync` based on their fields — but because they contain `HashMap<String, Value>` (serde_json::Value is not Sync), they should NOT be auto-Send/Sync. However, the compiler's auto-derivation may include them in the `Send`/`Sync` set based on field composition.

The real issue: even if Rust correctly determined `KernelHandle` is `!Send` and `!Sync`, the `*mut KernelHandle` pointer passed across FFI has no type-system enforcement. Python's ctypes calls `dita_kernel_process_intent_json(handle, ...)` which immediately converts the raw pointer to `&mut KernelCore` via `unsafe { &mut (*handle).core }`. The Rust compiler cannot enforce ownership rules across the FFI boundary.

This means: **the Rust kernel's thread-safety design relies entirely on the Python side never calling FFI from multiple threads simultaneously.** There is no mechanism in either language to enforce this.

**Severity: Informational** — documenting the existing design constraint (already covered in N1, but worth noting the Send/Sync aspect).

### P8: `dita_kernel_destroy` not called from bundle close — no explicit Rust handle cleanup path

**Files:** `launcher.py:83-95`, `rust_backend.py`

`DITAv2LauncherBundle.close()` calls:
```python
_maybe_close(self.venue)
_maybe_close(self.zinc_plane)
_maybe_close(self.control_plane)
```

It does **not** call anything on `self.kernel`. `ExecutionKernel` has no `close()` method (O10). The Rust `_backend` handle is only freed when `__del__` runs during garbage collection.

The bundle holds `self.kernel` as a strong reference. As long as the bundle is alive, the kernel is alive. When the bundle is GC'd (or goes out of scope), the kernel's refcount may drop to zero, triggering `__del__`. But if the kernel has a reference cycle (K6: `Kernel → StateView → SlotView → Kernel`), `__del__` is delayed until the GC cycle.

**Fix:** Add `ExecutionKernel.close()` and call it from `DITAv2LauncherBundle.close()`.

**Severity: Medium**

### P9: `ExecutionKernel.__del__` accesses module-level `_RUST` — NameError during shutdown

**File:** `rust_backend.py:519-525**

```python
def __del__(self):
    backend = getattr(self, "_backend", None)
    if backend is not None:
        try:
            _get_rust().destroy(backend)   # accesses module-level _RUST
        except Exception:
            pass
```

During Python interpreter shutdown, the interpreter clears module globals **before** calling `__del__` on remaining objects. If `ExecutionKernel` survives to shutdown, `_get_rust()` accesses the module-level `_RUST` variable which may have already been set to `None` (module globals cleared). This raises `TypeError: 'NoneType' object is not callable` (when `_RUST is None` and `_get_rust()` tries to call `_RustKernelLib()`), or `NameError` if the variable itself has been deleted from the module namespace.

The `except Exception: pass` catches this, but the Rust handle is never destroyed — it leaks.

**Severity: Low** — caught by except, only at shutdown, Rust kernel handle is lost but process is exiting.

### P10: `_check_open_orders` in `_gen_test.py` has redundant `asyncio.run` already covered in P3 — different location, same pattern

Already covered in P3. Same root cause in `_build_pink_extended.py:75-78`. No additional finding.

---

## Pass 13 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| P1 | `dita_kernel_destroy` double-free UB — Python doesn't null handle.value | Bridge | **Critical** |
| P2 | `CStr::from_ptr(payload)` without null guard in 3 FFI exports | Rust | **High** |
| P3 | `_check_open_orders` calls `asyncio.run()` from async `_verify` — RuntimeError | Test | **High** |
| P4 | `into_c_string` NUL sanitizer produces invalid JSON — json.loads fails | Rust | Medium |
| P5 | `reconcile_slots_json`/`snapshot_json` return null on failure — no diagnostic | Rust | Medium |
| P6 | `_get_rust()` TOCTOU race — concurrent cargo build corruption | Bridge | **High** |
| P7 | `KernelHandle` no Send/Sync — FFI bypasses Rust ownership rules | Rust | Info |
| P8 | No explicit Rust handle destroy path from bundle.close() | Launcher | Medium |
| P9 | `__del__` accesses module `_RUST` during shutdown — NameError leak | Bridge | Low |

### Pass 13 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 1 (P1) |
| **High** | 3 (P2, P3, P6) |
| Medium | 3 (P4, P5, P8) |
| Low | 1 (P9) |
| Info | 1 (P7) |

### Combined Catalog (All 13 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| **Total** | | **263** | **21** | **73** | **76** | **64** | **29** |

---

## PASS 14 — SERDE EDGE CASES, BACKUP DIFFS, MARKET DATA/TIMESTAMPS

### Q1: `datetime.fromisoformat()` in Python < 3.11 cannot parse Rust `Z`-suffix timestamps

**Files:** `rust_backend.py:215,241,260`, `real_zinc_plane.py:95,122`

```python
entry_time=datetime.fromisoformat(payload["entry_time"]) if payload.get("entry_time") else None
```

When Rust's `chrono::DateTime<Utc>` serializes a timestamp via serde, it produces the RFC 3339 format with `Z` suffix: `"2026-05-31T12:00:00Z"`. Python's `datetime.fromisoformat()` — until Python 3.11 — **cannot parse the `Z` suffix**. It was accepted in 3.11 (PEP 678). On Python 3.10 and earlier, `fromisoformat("2026-05-31T12:00:00Z")` raises `ValueError`.

This affects every `_slot_from_payload()` and `_event_to_payload()` round-trip. The Rust kernel's `entry_time`, `last_event_time`, and all `VenueEvent/KernelTransition` timestamps are serialized with `Z`. If the Python runtime is < 3.11, every deserialization of any timestamp from the Rust kernel will crash with `ValueError`.

The system uses Python 3.10 features (`str | None`, `from __future__ import annotations`) — so it's targeting 3.10+ in practice. If the environment is exactly 3.10, every FFI call that returns a timestamp will fail.

**Severity: High**

### Q2: No `#[serde(deny_unknown_fields)]` on any Rust struct — misspelled field names cause silent no-op

**File:** `_rust_kernel/src/lib.rs` — all structs

None of the Rust kernel's serializable structs (`KernelIntent`, `VenueEvent`, `TradeSlot`, `KernelOutcome`, `AccountState`) use `#[serde(deny_unknown_fields)]`. When the Python side sends JSON with a misspelled field (e.g., `"slotid"` instead of `"slot_id"`, `"tradeid"` instead of `"trade_id"`), serde silently ignores it. The struct is deserialized with the default value or `Option::None` for that field.

For required fields without `#[serde(default)]`, a missing field causes an error (serde returns `Err`). But for optional/defaulted fields, a typo produces a **silent no-op** — the field value from the Python side is silently dropped, and the default value is used instead. No error, no warning.

**Trigger example:** If Python sends `"entry_price"` (correct) but the Rust expects `"entry_price"` — fine. But if someone adds a new field to `KernelIntent` in Python and the Rust struct doesn't have it yet, the field is dropped entirely. The round-trip silently loses data.

**Severity: Medium**

### Q3: `indexmap` dependency added with `features = ["serde"]` — new transitive dependency chain

**File:** `_rust_kernel/Cargo.toml` (current vs backup)

```toml
indexmap = { version = "2", features = ["serde"] }
```

This is a new dependency in the current code (not in backup). `indexmap` is an `IndexMap<K, V, S>` backed by `Vec<(K, V)>` for ordered insertion + hashed lookup. It adds ~10+ transitive crates (hashbrown, equivalent, etc.) to the build. Used for `AccountState::seen_account_event_ids: IndexSet<String>` — account-level dedup that supports ordered iteration and LRU eviction at 1024 entries.

Not a bug, but a significant increase in the dependency graph. The `IndexSet` with `serde` feature enables `seen_account_event_ids` to be serialized in `KernelFullSnapshot` for crash recovery. The LRU eviction at 1024 entries means account-level event dedup survives across save/restore cycles.

**Severity: Informational**

### Q4: Backup vs current — `on_venue_event` TERMINAL_STATE guard and `venue_order_id` propagation are the largest functional changes

**Files:** `_rust_kernel/src/lib.rs` (current), `_backup_20260530/rust_kernel_src/lib.rs` (backup)

Comparing the two Rust kernels, the current version adds:

1. **TERMINAL_STATE guard** (~28 lines): Prevents stale venue events from reactivating closed slots. Backup had no guard — a `FULL_FILL` arriving on a `CLOSED` slot would re-open the position.

2. **Venue order ID propagation** (~20 lines): Before entering the FSM match block, the current kernel enriches the working order with `venue_order_id` and `venue_client_id` from incoming events. Essential for LIMIT order cancel tracking. Backup had no such enrichment.

3. **CANCEL_ACK entry-order handling** (~20 lines): Backup only handled exit-order cancellation. Current correctly resets entry-order state to `IDLE` on `CANCEL_ACK`, clearing `trade_id`, `asset`, `side`, `size`, and PnL.

4. **`apply_fill` incremental accumulation** (~15 lines changed): Backup overwrote `filled_size` on every fill. Current accumulates `prev_filled + fill_size`. This is a **critical fix** — without it, multiple partial fills would report only the latest fill size.

5. **`with_handle_mut` `catch_unwind` guard** (~15 lines): Backup had no panic protection at the FFI boundary. Current wraps every FFI entry in `catch_unwind`. If Rust panics, the guard catches it and returns an error result instead of unwinding across the FFI boundary (which is UB).

6. **`process_intent` CAPITAL_FROZEN exit-early guard**: Added before the main FSM logic — if capital is frozen, all intents return `CAPITAL_FROZEN` diagnostic.

**These are bug fixes on top of the backup version — the backup represents a pre-fix state with ~6 serious bugs that have since been corrected.**

**Severity: Informational**

### Q5: `MarketSnapshot.timestamp` type is inconsistent — `time.time()` float vs `datetime` in the same file

**File:** `gen_live_tests.py:82` vs `gen_live_tests.py:169`

```python
# _build_live_snapshot (line 82):
MarketSnapshot(timestamp=time.time(), ...)    # float

# _snap helper (line 169):
MarketSnapshot(timestamp=datetime.now(timezone.utc), ...)  # datetime
```

Both construct `MarketSnapshot` in the same file. One for the `_build_live_snapshot` path (used in `_run_pink_live_roundtrip`), one for the `_snap` helper path (used in `_run_pink_live_recovery` and `_run_scenario`). Any code that reads `snap.timestamp` must handle both `float` and `datetime` — or crashes with `AttributeError` trying to call `.isoformat()` on a float.

This is a type mismatch in the same test infrastructure. Depending on which test path executes, the operator sees different timestamp types and may not notice the inconsistency.

**Severity: High**

### Q6: `datetime.fromisoformat()` cannot parse Rust `Z`-suffix timestamps on Python 3.10 — same root cause as Q1, applies to all serialized Rust timestamps

Same analysis as Q1. This is pervasive — every `VenueEvent.timestamp`, `KernelTransition.timestamp`, `TradeSlot.entry_time`, and `TradeSlot.last_event_time` deserialized from the Rust kernel will crash on Python < 3.11. The fix is either to upgrade to Python 3.11+, or to add a `str.replace("Z", "+00:00")` before calling `fromisoformat()`.

All 5 call sites in `rust_backend.py` and `real_zinc_plane.py` are affected.

**Severity: High**

### Q7: No upper-bound price validation — `reference_price = 1e300` passes all guards

**File:** `rust_backend.py:390`, `_rust_kernel/src/lib.rs` `mark_price`

The Python-side `_first_invalid_intent_field()` checks `math.isfinite(value)` for `reference_price`. A value of `1e300` passes (it's finite). The Rust-side `mark_price()` checks `!price.is_finite() || price <= 0.0` — `1e300` passes.

When this extreme price is used in `realized_pnl()`:
```rust
let notional = exit_size * slot.entry_price * slot.leverage.max(1.0);
delta * notional
```

With `entry_price = 1e300` and a modest `exit_size = 0.001`, `notional = 1e297` — which is within f64 range (max ~1.8e308). But `delta = (exit - 1e300) / 1e300 ≈ -1.0` (for exit=0), and PnL = `-1.0 * 1e297 = -1e297` — a completely nonsensical loss number that corrupts the account.

No upper bound exists on any price field in the system. There's no configurable `MAX_PRICE` or per-market sanity check.

**Severity: Medium**

### Q8: `_first_invalid_intent_field()` does not reject `reference_price <= 0` or `target_size == 0`

**File:** `rust_backend.py:395-410`

```python
scalar_checks = (
    ("target_size", float(intent.target_size if intent.target_size is not None else 0.0)),
    ("reference_price", float(intent.reference_price if intent.reference_price is not None else 0.0)),
    ("leverage", float(intent.leverage if intent.leverage is not None else 0.0)),
    ("limit_price", float(getattr(intent, "limit_price", 0.0) or 0.0)),
)
for name, value in scalar_checks:
    if not math.isfinite(value):
        return (name, value)
# Then only checks target_size < 0:
size = float(intent.target_size if intent.target_size is not None else 0.0)
if size < 0.0:
    return ("target_size", size)
```

The guard catches NaN/Inf and negative `target_size`. It does NOT catch:
- `reference_price = 0` (valid zero-price? No — price should never be zero)
- `reference_price < 0` (negative price — should never happen)
- `target_size = 0` (zero-quantity order — waste of a `process_intent` call)
- `leverage = 0` (Rust silently falls back to 1.0)

A `reference_price = 0` passes through, and the Rust kernel's `mark_price` silently skips it (returns early on `price <= 0.0`). The intent is processed as if no price was provided.

**Severity: Low**

### Q9: Rust `Utc::now()` and Python `datetime.now(timezone.utc)` timestamps can diverge within the same process

**File:** `_rust_kernel/src/lib.rs` (transition timestamp), Python `bingx_venue.py` (event timestamp)

When the kernel processes an `on_venue_event`:
- The event carries `timestamp = Python's datetime.now(timezone.utc)` 
- The kernel's `transition()` method uses `event.timestamp` if present, else falls back to `Utc::now()` (Rust side)

If the Python and Rust sides query their respective clocks at nearly the same time, the timestamps should match within microseconds. But if the system has clock skew between different clock sources (Python's `datetime.now()` uses `gettimeofday()` or similar, Rust's `Utc::now()` uses `chrono` which calls the same system clock — they should agree), but there's an architectural asymmetry: **some transitions get Python-sourced timestamps and others get Rust-sourced timestamps**.

A specific case: the TERMINAL_STATE guard (Q4 item 1) records a transition using `event.timestamp` (Python source) in the transition. But the CANCEL branch in `process_intent` (which creates no transition — flaw M2) as a counter-example. When transitions DO exist, they mix clock sources.

**Severity: Low**

### Q10: `threading.Event.wait(timeout)` uses platform-dependent clock — CLOCK_REALTIME on some platforms, affected by NTP jumps

**File:** `bingx_venue.py:259`

```python
if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
    return self._last_snapshot  # stale data
```

`threading.Event.wait(timeout)` is implemented differently across platforms and Python versions. On some platforms (notably older glibc), `pthread_cond_timedwait` uses `CLOCK_REALTIME` (wall clock). If an NTP correction jumps the wall clock forward by even 1 second during the wait, the timeout expires 1 second early. If it jumps backward, the wait extends by 1 second.

The `_backend_snapshot` method is the single most important timeout in the system — it controls whether the venue adapter returns fresh or stale exchange state. A premature timeout (NTP forward jump) causes a stale snapshot to be used for order submission, potentially causing wrong position sizing or duplicate orders.

**Fix:** Use `time.monotonic()` with a deadline loop around `Event.wait()` — exactly what `InMemoryControlPlane.wait()` already does correctly (control.py:131-138).

**Severity: Medium**

### Q11: Backup `_on_venue_event` had no `STALE_STATE_RECONCILING` guard — current added it

**File:** `_backup_20260530/rust_kernel_src/lib.rs` vs current `_rust_kernel/src/lib.rs`

The backup `on_venue_event` only had a `STALE_STATE_RECONCILING` check on `process_intent` (reconcile branch). The current version also checks it in `on_venue_event` — when the slot is in `STALE_STATE_RECONCILING`, only `RECONCILE` events are accepted. All other event kinds return `STALE_STATE_RECONCILE` diagnostic.

This is a safety improvement — prevents stray fills/acks from modifying a slot that's being reconciled.

**Severity: Informational**

### Q12: 5 of 5 timestamp deserialization sites use `datetime.fromisoformat()` — all fail on Python < 3.11 with Rust `Z` suffix

Covered in Q1 and Q6. Listing all sites: `rust_backend.py:215,241,260` and `real_zinc_plane.py:95,122`. Same root cause.

**Severity: High**

---

## Pass 14 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| Q1 | `fromisoformat()` can't parse Rust `Z` suffix on Python < 3.11 — crashes every timestamp deserialization | Bridge | **High** |
| Q2 | No `#[serde(deny_unknown_fields)]` — misspelled fields silently default | Rust | Medium |
| Q3 | `indexmap` new dependency in current code (informational) | Rust | Info |
| Q4 | Backup diff: 6 critical bug fixes between backup and current (informational) | Rust | Info |
| Q5 | `MarketSnapshot.timestamp` type inconsistent — float vs datetime in same file | Data Feed | **High** |
| Q6 | `fromisoformat()` Z-suffix fail on all 5 timestamp deserialization sites | Bridge | **High** |
| Q7 | No upper-bound price validation — 1e300 passes all guards | Bridge | Medium |
| Q8 | `_first_invalid_intent_field` does not reject zero/negative price or zero size | Bridge | Low |
| Q9 | Rust/Python clock sources diverge — transition timestamps mixed source | Rust | Low |
| Q10 | `threading.Event.wait()` uses platform-dependent clock — NTP jump risk | Venue | Medium |
| Q11 | Backup had no `STALE_STATE_RECONCILING` guard in `on_venue_event` (info) | Rust | Info |
| Q12 | All 5 `fromisoformat()` sites fail on Python < 3.11 (duplicate of Q1) | Bridge | **High** |

### Pass 14 Severity

| Severity | Count |
|----------|-------|
| **High** | 4 (Q1, Q5, Q6, Q12) |
| Medium | 3 (Q2, Q7, Q10) |
| Low | 2 (Q8, Q9) |
| Info | 3 (Q3, Q4, Q11) |

### Combined Catalog (All 14 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| **Total** | | **275** | **21** | **77** | **79** | **64** | **34** |

---

## PASS 15 — RESOURCE LEAKS, TRUST BOUNDARIES, SECURITY

### R1: `BingxVenueAdapter` `ThreadPoolExecutor` never shut down — 3 threads leak for process lifetime

**File:** `bingx_venue.py:194-208`

```python
class BingxVenueAdapter(VenueAdapter):
    _EXECUTOR: concurrent.futures.ThreadPoolExecutor | None = None
    _EXECUTOR_LOCK: threading.Lock = threading.Lock()

    @classmethod
    def _get_executor(cls) -> concurrent.futures.ThreadPoolExecutor:
        if cls._EXECUTOR is None:
            with cls._EXECUTOR_LOCK:
                if cls._EXECUTOR is None:
                    cls._EXECUTOR = concurrent.futures.ThreadPoolExecutor(
                        max_workers=3, thread_name_prefix="bingx_adapter",
                    )
        return cls._EXECUTOR
```

The `ThreadPoolExecutor` is a class-level singleton with **no shutdown path**. No `close()` method, no `atexit` handler, no classmethod for cleanup. The 3 worker threads persist for the entire process lifetime.

`ThreadPoolExecutor.__del__` in CPython calls `shutdown(wait=False)`, but this only runs when the executor object is GC'd. Since `_EXECUTOR` is a class variable, it's only GC'd when the class is (at interpreter shutdown). The CPython source for `ThreadPoolExecutor.__del__` calls `shutdown(wait=False)` which interrupts idle threads but doesn't wait for them. During shutdown, this races with module cleanup — threads accessing module globals see `None`.

**Trigger:** Every call to `BingxVenueAdapter.submit()` (line 371), `cancel()` (line 435), or `snapshot()` (line 509) submits to this executor. After 24+ hours of trading, 3 unnamed worker threads have consumed thread-local resources (stack ~8MB each = 24MB minimum) with no clean reclamation path.

**Severity: High**

### R2: `BingxVenueAdapter` has no `close()` method — backend `BingxDirectExecutionAdapter` HTTP client unreleasable

**File:** `bingx_venue.py` (no close method), `venue.py` (protocol), `launcher.py:262-264`

The `VenueAdapter` protocol (`venue.py`) does not define `close()` or `disconnect()`. `BingxVenueAdapter` holds a `_backend: BingxDirectExecutionAdapter` which in turn holds an HTTP client session. The launcher's `_maybe_close()` tries `.close()` then `.disconnect()` but gets `AttributeError` (caught by `except Exception: pass`).

This means:
- The `BingxDirectExecutionAdapter`'s `aiohttp.ClientSession` is never closed
- The underlying `TCPConnector` connection pool remains open
- Any HTTP keep-alive connections to BingX remain open until OS timeout
- No clean teardown path exists at any level of the venue stack

The `BingxUserStream` (WebSocket handler) has a `close()` method and proper task cancellation. The venue adapter (the synchronous REST path) has none. Asymmetric design.

**Severity: High**

### R3: `real_zinc_plane._intent_cache` grows unboundedly — memory proportional to total lifetime intents

**File:** `real_zinc_plane.py:157,202`

```python
# Line 157 (__init__):
self._intent_cache: List[Dict[str, Any]] = []

# Line 201-203 (publish_intent):
self._intent_cache.append(row)
self._write_region(self.intent_region, self._intent_seq, {"items": self._intent_cache[-512:]})
```

Every call to `publish_intent()` appends one dict to `_intent_cache`. Only the last 512 items are written to shared memory, but **the cache list itself is never trimmed**. Over a 24-hour session at 1 intent/second, this grows to 86,400 dicts — approximately 50-100 MB of memory for the cache alone (each dict contains timestamp, intent_id, asset, side, size, price, etc.).

The `-512` slice on write is a dead giveaway that the developer knew only the last 512 items were relevant — but forgot to trim the source list. The fix is `self._intent_cache = self._intent_cache[-512:]` after the append.

**Compare with:** `account.py`'s `seen_account_event_ids` (Rust side, capped at 1024 via `IndexSet` LRU), and `journal.py`'s `MemoryKernelJournal` (capped at 10,000). Every other cache in the system has a bound; this one doesn't.

**Severity: High**

### R4: `RealZincPlane`/`RealZincControlPlane` partial-construction leak — `SharedRegion` never cleaned up on init failure

**File:** `real_zinc_plane.py:161-176`, `real_control_plane.py:72-83`

```python
# real_zinc_plane.py __init__:
self.intent_region = SharedRegion.create(f"{prefix}_intent", 65536)
self.state_region = SharedRegion.create(f"{prefix}_state", 65536)    # if this fails...
self.control_region = SharedRegion.create(f"{prefix}_control", 4096)  # ...or this
```

If `SharedRegion.create()` succeeds for `intent_region` but fails for `state_region` (e.g., out of shared memory, permission denied, name collision), the constructor raises. The already-created `intent_region` has **no cleanup path** — `close()` is never called because the caller never gets a valid object reference. The shared memory segment leaks until the OS cleans it on process exit (or reboot on some systems).

Same pattern in `RealZincControlPlane.__init__` with 2 regions.

**Severity: Medium**

### R5: `BingxUserStream` `ClientSession` has no `__del__` fallback — connection pool leak if `close()` not called

**File:** `bingx_user_stream.py:229-230,433-436`

```python
async def close(self) -> None:
    self._closed.set()
    if self._session is not None and not self._session.closed:
        await self._session.close()
```

The `aiohttp.ClientSession` (created in `_get_session()` with `TCPConnector(limit=4)`) is only closed when `close()` is explicitly called. There is no `__del__`, `__aenter__`, or `__aexit__`. If a caller abandons the `BingxUserStream` object without calling `close()` — or if `close()` is never called because an exception occurs before the call — the TCP connection pool (4 connections to BingX) leaks.

During the reconnect loop (`subscribe()`), if `self._session.closed` is detected between retries, a new session and connector are created — the old connector's connections are released by `ClientSession.close()` which is called in the retry path. So the reconnect path itself is clean. But the top-level cleanup depends entirely on external discipline.

**Severity: Medium**

### R6: `test_alpha_blue_untouched_g7.py` — two `open()` calls without context manager, file descriptors leak

**File:** `test_alpha_blue_untouched_g7.py:31,63`

```python
src = open("/mnt/dolphinng5_predict/prod/clean_arch/dita_v2/gen2.py").read()  # line 31
src = open(full).read()                                                        # line 63
```

Both open a file, chain `.read()` to load contents, but **never close the file handle**. The file descriptor is leaked until garbage collection. In a test suite with thousands of tests, this can exhaust the ulimit (default 1024 on Linux, lower on macOS).

**Severity: Low** (test code, non-production)

### R7: All exchange REST/WS data parsed without schema validation — exchange controls all field values

**Files:** `bingx_venue.py:60-74,80-88,96-121,151-186`, `bingx_user_stream.py:267-379`

The system has a single trust boundary for exchange data: all BingX REST API responses and WebSocket frames are parsed without schema validation. Key entry points:

- **`bingx_user_stream.py:267`**: `json.loads(text)` on raw WebSocket frame — any valid JSON structure is accepted. No schema validation before field access with `.get()`.
- **`bingx_venue.py:60-74`**: `_row_text()` extracts string values from exchange response dicts with no sanitization beyond `.strip()`.
- **`bingx_venue.py:80-88`**: `_row_float()` — catches `ValueError` on float parse but does not filter NaN/Inf (these pass `float()` fine).
- **`bingx_venue.py:297-301`**: `_rate_limit_retry_after_ms()` parses exchange error message with `re.search(r"unblocked after (\d+)", msg)` — exchange controls the error message content.
- **`bingx_venue.py:338-340`**: `cancel()` exception handler includes `str(exc)` directly in the response event dict.

An exchange sending crafted responses could inject:
- Arbitrary strings into `reason`/`msg` fields (propagated to journal/ClickHouse)
- Non-numeric values that fail `float()` only on consumption
- Enormous lists in snapshot responses (OOM risk — no size limit on `snapshot.open_orders` iteration)
- NaN/Inf in price/size fields (pass through `float()` — Rust kernel `is_finite()` check on kernel side catches some but not all)

**Severity: Critical** — exchange controls all inbound data with no schema validation, and data flows to ClickHouse journaling and the Rust kernel memory.

### R8: Shared memory JSON deserialization without integrity check

**Files:** `real_zinc_plane.py:127-128`, `real_control_plane.py:60-61`

```python
# real_zinc_plane.py
def _decode_packet(self, payload: bytes) -> dict:
    return json.loads(payload)

# real_control_plane.py
payload = region.read()
if payload:
    data = json.loads(payload)
```

Both the Zinc plane and control plane deserialize JSON from shared memory **without any integrity check** (no HMAC, no checksum, no signature). Any process with access to the shared memory segment (`/dev/shm` on Linux, world-readable by default) can:

1. Inject arbitrary `KernelIntent` objects — the control plane reads intents from `intent_region` and dispatches them to `process_intent()`. An attacker could submit fake intents with malicious parameters.
2. Inject fake events into the event stream via the control plane region.
3. Corrupt slot state via the state region.

The shared memory segments are named by the `DITA_V2_PREFIX` env var (default `dita_v2`). On a shared system, any process running as the same user can read/write these segments.

**Severity: High**

### R9: `restore_state()` deserializes arbitrary JSON into full kernel state — no provenance tracking

**File:** `rust_backend.py:293-296` (Python), `_rust_kernel/src/lib.rs:934-968` (Rust)

```python
# Python:
def restore_state(self, json_str: str) -> bool:
    result = self.lib.dita_kernel_restore_state_json(self._backend, json_str.encode("utf-8"))
    return result == 0

# Rust:
pub extern "C" fn dita_kernel_restore_state_json(handle: *mut KernelHandle, payload: *const c_char) -> i32 {
    let payload = ...; // parse to KernelFullSnapshot
    let core = unsafe { &mut (*handle).core };
    core.restore_full_snapshot(&payload)  // overwrites ALL kernel state
}
```

`dita_kernel_restore_state_json` overwrites the entire kernel state — all slots, account balances, fee configuration, seen_event_ids, and capital_frozen flag — from a single JSON string. The method is public on `ExecutionKernel.restore_state()` with **no authentication, no authorization check, and no call stack validation**.

The JSON string can come from:
- The `DITAv2LauncherBundle` (restart path)
- A file read from disk
- An attacker who gains access to the Python runtime (e.g., via shared memory injection, R8)

Once restored, the kernel accepts the state as truth. There is no `restore_state` counter or version chain to prevent replay of old snapshots.

**Severity: Critical**

### R10: `DOLPHIN_BINGX_ENV` + `DOLPHIN_BINGX_ALLOW_MAINNET` — mainnet switch via env var

**File:** `launcher.py:189-190`

```python
DOLPHIN_BINGX_ENV = os.environ.get("DOLPHIN_BINGX_ENV", "VST")  # VST = testnet
DOLPHIN_BINGX_ALLOW_MAINNET = os.environ.get("DOLPHIN_BINGX_ALLOW_MAINNET", "false").lower() in ("true", "1", "yes")
```

Setting `DOLPHIN_BINGX_ENV=LIVE` + `DOLPHIN_BINGX_ALLOW_MAINNET=true` switches from testnet to production mainnet BingX. The `DOLPHIN_BINGX_ALLOW_MAINNET` check exists specifically as a safety gate, but both are attacker-controlled env vars with the same provenance as all other env config.

An attacker with access to set env vars (container breakout, CI/CD injection, shared hosting) could:
1. Redirect all trades to mainnet
2. Use real capital instead of testnet funds
3. Cost real money on every trade

**Severity: High**

### R11: `.env` file loaded from project root — secrets exposure risk

**File:** `launcher.py:23,51`

```python
from dotenv import load_dotenv
...
load_dotenv(PROJECT_ROOT / ".env")
```

The `.env` file is loaded from `PROJECT_ROOT`, which is `Path(__file__).resolve().parents[3]` — three directories up from the launcher file. On a shared development machine or CI runner, this file is:

1. World-readable if not explicitly chmod'd (default umask creates files 644)
2. Accessible to any process running as the same user
3. Often committed to version control accidentally (no `.gitignore` guarantee)
4. Visible in Docker layer history if included in the build context

The `.env` file contains `BINGX_API_KEY` and `BINGX_SECRET_KEY` — the exchange credentials. On a shared system, every user with read access can extract these keys.

**Severity: High**

### R12: Unvalidated `int()` on env vars — `DOLPHIN_BINGX_RECV_WINDOW_MS` could accept extreme values

**File:** `launcher.py:191-193`

```python
recv_window_ms = int(os.environ.get("DOLPHIN_BINGX_RECV_WINDOW_MS", "5000"))
default_leverage = int(os.environ.get("DOLPHIN_BINGX_DEFAULT_LEVERAGE", "1"))
exchange_leverage_cap = int(os.environ.get("DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP", "3"))
```

Three env vars are directly passed to `int()` with only a string default — **no bounds checking**. An attacker setting `DOLPHIN_BINGX_RECV_WINDOW_MS=2147483647` could set the exchange recv window to ~24 days, allowing replay attacks on signed requests. An attacker setting `DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP=1000` could allow 1000x leverage on the exchange.

**Severity: Medium**

### R13: `BingxUserStream` `listenKey` from exchange response used in WebSocket URL — MITM injection surface

**File:** `bingx_user_stream.py:230,398`

```python
# Line 230:
url = f"{self._ws_url}?listenKey={listen_key}"

# Line 398:
listen_key = resp.get("listenKey", "")  # from exchange POST /openApi/user/auth/userDataStream
```

The `listenKey` comes from the BingX REST API response (`POST /openApi/user/auth/userDataStream`). It is used directly in the WebSocket connection URL **with no encoding or validation**. The listenKey is a short opaque string (looks like a UUID), but:

- If an attacker can MITM the REST response (DNS spoofing, proxy, etc.), they control the listenKey value
- A malicious listenKey with URL metacharacters (`&`, `=`, `#`) could inject query parameters into the WebSocket URL
- The `listenKey` is BingX's session authentication mechanism — once an attacker controls it, they can hijack the user data stream

The fix is `urllib.parse.urlencode({"listenKey": listen_key})` but the current code uses an f-string.

**Severity: High**

### R14: `mock_venue._exchange_event_queue` unbounded growth — event enqueue rate can exceed consumption rate

**File:** `mock_venue.py:220,230`

```python
# _queue_exchange_event:
self._exchange_event_queue.append(event)

# subscribe (generator):
if self._exchange_event_queue:
    yield self._exchange_event_queue.pop(0)
```

The mock venue's event queue is consumed one event at a time via a generator in `subscribe()`. If `queue_exchange_event()` is called faster than the consumer calls `next()` on the generator (which happens on every `_on_exchange_event()` callback), the list grows unboundedly. In test scenarios with rapid fire events, this can exhaust memory.

Not a production risk (mock is test-only), but the unbounded growth pattern is worth noting.

**Severity: Low** (test code)

---

## Pass 15 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| R1 | `ThreadPoolExecutor` never shut down — 3 threads leak | Venue | **High** |
| R2 | `BingxVenueAdapter` no `close()` — backend HTTP client unreleasable | Venue | **High** |
| R3 | `real_zinc_plane._intent_cache` grows unboundedly | Plane | **High** |
| R4 | `RealZincPlane`/`ControlPlane` partial-construction SharedRegion leak | Plane | Medium |
| R5 | `BingxUserStream.ClientSession` no `__del__` — connection pool leak | Venue | Medium |
| R6 | `test_alpha_blue_untouched_g7.py` open() without context manager | Test | Low |
| R7 | All exchange REST/WS data parsed without schema validation | Venue | **Critical** |
| R8 | Shared memory JSON deserialization without integrity check | Plane | **High** |
| R9 | `restore_state()` deserializes arbitrary JSON — full kernel takeover | Bridge | **Critical** |
| R10 | `DOLPHIN_BINGX_ENV` + `ALLOW_MAINNET` mainnet switch via env | Config | **High** |
| R11 | `.env` file loaded from project root — secrets exposure | Config | **High** |
| R12 | Unvalidated `int()` on env vars — recv_window, leverage extremes | Config | Medium |
| R13 | `listenKey` from exchange in WS URL f-string — MITM injection | Venue | **High** |
| R14 | `mock_venue._exchange_event_queue` unbounded growth | Test | Low |

### Pass 15 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 2 (R7, R9) |
| **High** | 6 (R1, R2, R3, R8, R10, R11, R13) |
| Medium | 3 (R4, R5, R12) |
| Low | 2 (R6, R14) |

### Combined Catalog (All 15 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| **Total** | | **289** | **23** | **83** | **82** | **64** | **37** |

---

## PASS 16 — ERROR HANDLING, ARITHMETIC STABILITY, TEST INFRASTRUCTURE

### S1: `realized_pnl()` and `mark_price()` NaN bypasses `<= 0.0` guard — NaN PnL corrupts `k_realized_pnl`

**Files:** `_rust_kernel/src/lib.rs:1121-1129` (realized_pnl), `_rust_kernel/src/lib.rs:400-408` (mark_price), `contracts.py:193-200` (Python mark_price)

IEEE 754 semantics: all comparisons with `NaN` return `false`. The guard `if slot.entry_price <= 0.0 || exit_size <= 0.0` is **bypassed by NaN** — a NaN `entry_price` or `exit_size` passes through silently.

**Rust `realized_pnl()` (line 1121-1129):**
```rust
if slot.entry_price <= 0.0 || exit_size <= 0.0 { return 0.0; }  // NaN passes
let mut delta = (exit_price - slot.entry_price) / slot.entry_price;  // NaN / NaN = NaN
delta * notional  // NaN → corrupts k_realized_pnl via +=
```

**Rust `mark_price()` (line 400-408):**
```rust
if self.entry_price <= 0.0 || self.size <= 0.0 { return; }  // NaN passes
self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage;  // stores NaN
```

**Python `mark_price()` (contracts.py:193-200):** Same pattern — `if self.entry_price <= 0 or self.size <= 0` passes NaN, produces NaN PnL.

Once NaN enters `k_realized_pnl` or `unrealized_pnl`, every subsequent arithmetic operation propagates NaN: `k_capital`, `available_capital`, margin checks, reconcile deltas. The kernel enters a dead state where all financial computations produce NaN.

**Trigger paths for NaN entry_price:**
1. `set_slot_json()` bypasses process_intent — can set arbitrary slot fields
2. An `INVALID_ORDER_PARSE` event that produces `entry_price` = NaN in exchange data
3. A divide-by-zero in a prior computation (extremely unlikely but theoretically possible)

**Fix:** Replace `<= 0.0` with `!is_finite() || <= 0.0` in all three sites.

**Severity: Critical**

### S2: MockVenue `_exchange_event_queue` property has check-then-act race — silently drops events

**File:** `mock_venue.py:228-232`

```python
@property
def _exchange_event_queue(self) -> list:
    if not hasattr(self, "_exeq"):
        object.__setattr__(self, "_exeq", [])
    return self._exeq
```

`hasattr` + `object.__setattr__` is a classic TOCTOU race. If `queue_exchange_event()` (called from sync test code) and `subscribe()` (async generator started on event loop thread) interleave:
1. Thread A calls `hasattr` → `False`
2. Thread B calls `hasattr` → `False`
3. Thread A calls `object.__setattr__` → creates `_exeq = []`
4. Thread B calls `object.__setattr__` → **overwrites** `_exeq = []`, losing Thread A's window
5. Thread A appends event to its `_exeq` reference
6. Thread B returns its `_exeq` reference — Thread A's append is invisible

The same list is then subject to `list.append()` vs `list.pop(0)` without synchronization — `pop(0)` on empty list raises `IndexError`, crashed events.

**Fix:** Use `threading.Lock` around queue access, or use `collections.deque` with `self._exeq` initialized in `__post_init__`.

**Severity: Critical**

### S3: No FSM-specific test files — `test_kernel_fsm.py` and `test_kernel_fsm_recovery.py` do not exist

**Files:** missing — no `test_kernel_fsm.py` or `test_kernel_fsm_recovery.py` anywhere in workspace

The kernel's FSM is the core of the system, with states `IDLE → ORDER_REQUESTED → ORDER_SENT → ENTRY_WORKING → POSITION_OPEN → EXIT_REQUESTED → EXIT_WORKING → CLOSED` and additional states `STALE_STATE_RECONCILING`, `INTERVENTION_REQUIRED`, `TRADE_TERMINAL_WRITTEN`.

**Missing transition coverage:**
| Transition | Status |
|-----------|--------|
| `IDLE → ENTRY_WORKING` via ENTER intent | ✅ tested incidentally via test_flaws |
| `ENTRY_WORKING → POSITION_OPEN` via fill | ✅ tested incidentally |
| `POSITION_OPEN → EXIT_REQUESTED` via EXIT | ✅ partial |
| `POSITION_OPEN → IDLE` via exit cancel | ⚠️ single test only |
| `POSITION_OPEN → POSITION_OPEN` via partial exit | ❌ NOT tested |
| `STALE_STATE_RECONCILING → *` | ❌ NOT tested |
| `INTERVENTION_REQUIRED → *` | ❌ NOT tested |
| `TRADE_TERMINAL_WRITTEN → *` | ❌ NOT tested |
| All error transitions (RATE_LIMITED, INVALID_INTENT) | ❌ NOT tested |
| FSM timeout transitions | ❌ NOT tested (no timer exists) |
| Concurrent intent processing (two EXIT intents same slot) | ❌ NOT tested |

The only FSM testing is incidental through `test_flaws.py` — which tests specific flaw behaviors, not FSM correctness.

**Severity: Critical**

### S4: Generated tests use `await asyncio.sleep(0.8)` assuming fast mock venue — flaky false positives on slow CI

**Files:** `_gen_test.py` (all generated bodies), `gen2.py` (all generated bodies), `gen_live_tests.py` (all generated bodies)

Every generated test body follows this pattern:
```python
r = _si(k, E.ENTER, tid, sym, "LONG", p, 0.001); await asyncio.sleep(0.8)
r = _si(k, E.EXIT, tid, sym, "FLAT", p, 0.001); await asyncio.sleep(0.8)
r = _si(k, E.CANCEL, tid, sym, "FLAT", p, 0.001); await asyncio.sleep(0.8)
```

The `0.8` second sleep assumes the mock venue fills, cancels, and processes in <0.8s. On a loaded CI system (with virtualization, resource contention), the mock venue may take longer. The test then:
- Operates `EXIT` on a slot still in `ENTRY_WORKING` — intent is rejected with `SLOT_BUSY`
- The test checks `r.accepted` (or generated assertions) — gets `False`
- **The test fails, but not because the system is buggy** — it fails because the sleep was too short

This is a **timing-dependent false negative** pattern. The mock venue processes synchronously on `subscribe()`, which is called from `_on_exchange_event_callback` which is triggered on `intent.apply()`. In tests, the `DITAv2LauncherBundle._run()` calls `intent.apply()` which calls `process_intent` which calls `_on_exchange_event_callback` — but if the venue hasn't yet yielded the fill event from `subscribe()`, the slot isn't updated.

The fix is to await an event condition (e.g., `slot.fsm_state == POSITION_OPEN`) instead of using sleep.

**Severity: Critical**

### S5: `bingx_venue._rate_limit_retry_after_ms()` returns 0 on any parse failure — instant retry with no backoff

**File:** `bingx_venue.py:169-184`

```python
@staticmethod
def _rate_limit_retry_after_ms(msg: str) -> int:
    try:
        # Checks multiple response fields for retry-after hint
        ...
        m = re.search(r"unblocked after (\d+)", msg)  # regex on exchange error message
        if m: return max(0, int(float(m.group(1))))  # integer parse
        return 0  # no retry-after found → default to 0
    except Exception:  # catches ANY failure in the try block
        return 0  # returns 0 = INSTANT RETRY
```

If the regex fails, `int()` fails, or any other exception occurs, the function returns `0` — meaning "retry immediately." This defeats the purpose of rate-limit detection. Every parse failure produces a retry storm rather than a safe default (e.g., `5000` ms).

**Specific failure paths:**
- Exchange returns a new/bilingual rate-limit message format → regex misses → returns 0
- `int(float(raw_retry))` on a non-numeric string → `ValueError` → caught → returns 0
- `float()` on a value with locale-specific decimal (e.g., European `,`) → `ValueError` → returns 0

**Fix:** Default to a safe backoff (e.g., `5000` ms) in the `except` block. Log the parse failure for debugging.

**Severity: High**

### S6: Venue adapter detects rate limits but enforces zero backoff — retry storm reaches exchange

**File:** `bingx_venue.py:384-386,471`

When `submit()` or `cancel()` receives a rate-limited response, the adapter:
1. Extracts `retry_after_ms` from the response ✅
2. Tags the event with `RATE_LIMITED` status and `retry_after_ms` ✅
3. Returns the event to the kernel, which marks it `retryable:true` ✅
4. **Does NOT enforce the backoff delay** ❌ — the caller must decide when to retry

If the caller (the algo or scheduler) ignores `retry_after_ms` and resubmits immediately, the adapter does not block or queue the request. The rate-limited request reaches the exchange again, potentially getting another 429, which wastes bandwidth and exchange quota.

No circuit breaker, no request queue, no automatic backoff at the venue adapter level. The adapter is purely passive — it reports rate limits but does not enforce them.

**Fix:** Add a `_last_rate_limit_time` and `_last_rate_limit_delay` on the adapter. If a request arrives before `last_rate_limit_time + retry_after_ms`, queue it or return `RATE_LIMITED` immediately without calling the exchange.

**Severity: High**

### S7: `capital_epsilon = 1e-4` (0.0001 USDT) too tight for f64 precision — false WARN classifications

**File:** `account.py:224`

```python
capital_epsilon: float = 1e-4  # 0.0001 USDT — extremely tight
```

At 25k USDT capital, f64 has ~15-16 significant digits in base-10. The unit in the last place (ULP) at 25k is ~3.6e-12, so the absolute rounding error is ~8.8e-8 USDT — well below 1e-4. But accrual operations (100+ PnL additions) accumulate ~sqrt(N) × ULP ≈ 1e-6 USDT — still below 1e-4.

The problem: the R1 and R2 reconcile deltas compare `k.capital` vs `e.wallet_balance` which come from **different computation paths** (kernel fold vs exchange aggregation). With different rounding behaviors, the delta can exceed 1e-4 even on perfectly correct state. The `abs(k.capital - e.wallet_balance) < 1e-4` test produces `WARN` on the third or fourth fill at typical sizes.

At $1M capital, ULP is ~1.2e-10, absolute rounding ~1.2e-6 USDT — about 10x below 1e-4. Tight but not triggering. But aggregated across 100 fills, the accumulated rounding from different computation paths can exceed 1e-4.

**Fix:** Increase to at least `1e-3` (0.001 USDT) or make it configurable per-asset.

**Severity: High**

### S8: Generated tests use module-level `asyncio.run()` — leaks pending tasks on Python 3.12+

**Files:** `test_flaws.py` (all test functions), `test_exchange_event_seam_parity.py` (all test functions), all generated test files

Each test function calls `asyncio.run()` to execute async kernel operations within a sync test:
```python
def test_something(self):
    asyncio.run(self._run_test())  # creates event loop, runs, closes
```

Python 3.10+ issues a deprecation warning for repeated `asyncio.run()` calls if the previous loop had pending tasks. In Python 3.12+, this raises `RuntimeError: asyncio.run() cannot be called from a running event loop` if any tasks survive from the previous invocation.

All test functions call `asyncio.run()` directly. If a test creates a task that doesn't complete (e.g., a timeout that's not properly awaited), that task survives the loop close and the next `asyncio.run()` call crashes.

**Fix:** Use `pytest-asyncio` with `@pytest.mark.asyncio` and `async def test_method`, or add `try/finally` with task cancellation.

**Severity: High**

### S9: `_build_pink_extended.py` and `_build_pink_bodies.py` use `str.replace()` patching — silently does nothing on format change

**Files:** `_build_pink_extended.py` (all), `_build_pink_bodies.py` (all)

Both scripts modify `test_pink_bingx_dita_live_e2e.py` in-place using `str.replace()` and `str.find()` index math:
```python
content = content.replace(old_imports, new_imports)
content = content.replace(old_build, new_build)
idx = content.find(old_body)
content = content[:idx] + new_body + content[idx+len(old_body):]
```

If the generated file's whitespace or ordering changes (e.g., Python version updates, import sorting), `str.replace()` silently does nothing — the old string is not found, so no replacement occurs. The file is written back **unchanged**. Since this is a build-time preprocess step, there's no test that validates the patched output.

The index-based insertion is even more fragile — if the generated file's structure shifts by even 1 character (blank line added, comment changed), the index math inserts new code at the wrong position, producing syntactically broken import/assert blocks.

**Fix:** Parse the generated file as AST and insert/modify nodes, or use a template engine with well-defined insertion points.

**Severity: High**

### S10: `bingx_user_stream._consume()` has no per-message timeout — silent WS hang blocks forever

**File:** `bingx_user_stream.py:251-270`

```python
async def _consume(self, ws: aiohttp.ClientWebSocketResponse) -> AsyncIterator[dict]:
    async for msg in ws:  # no timeout on individual message read
        if msg.type == aiohttp.WSMsgType.TEXT:
            yield json.loads(msg.data)
        elif msg.type == aiohttp.WSMsgType.CLOSED:
            break
```

The `async for msg in ws:` loop blocks until the next message arrives. If the WebSocket connection silently drops (no CLOSE frame, no TCP RST), the loop blocks until the TCP keepalive timeout — which can be **2 hours** on some Linux configurations.

No application-level heartbeat, no ping/pong timer, no `asyncio.wait_for()` wrapper. The BingX listenKey keepalive (every 30 min) is HTTP-based, not WS-based, so it doesn't detect a WS-level silence.

**Fix:** Wrap the `async for` with `asyncio.wait_for(..., timeout=60)` or implement WS ping/pong.

**Severity: High**

### S11: `bingx_venue._run()` blocks `ThreadPoolExecutor` thread with no timeout — backend HTTP hang freezes the adapter

**File:** `bingx_venue.py:202-209`

```python
def _run(self, result: Awaitable) -> Any:
    loop = ...
    if loop is None:  # no running loop
        return asyncio.run(result)  # blocks until HTTP completes or TCP timeout
    else:
        pool = self._get_executor()
        fut = pool.submit(asyncio.run, result)
        return fut.result()  # BLOCKS FOREVER — no timeout argument
```

`fut.result()` with no timeout argument blocks until the future completes. If the HTTP call hangs (BingX server never responds, TCP half-open), the `ThreadPoolExecutor` thread blocks indefinitely. Since the pool has only 3 threads, 3 hung HTTP calls consume all worker threads, and all subsequent adapter operations (submit, cancel, snapshot) **hang forever** because no threads are available.

This is a partial-DoS on the adapter. If the BingX API becomes unresponsive, the adapter locks up completely.

**Fix:** Use `fut.result(timeout=30)` and handle `TimeoutError` with a fallback event.

**Severity: High**

### S12: `bingx_venue._rate_limit_retry_after_ms()` regex depends on exchange error message format — non-portable to other exchanges

**File:** `bingx_venue.py:176`

```python
m = re.search(r"unblocked after (\d+)", msg)
```

The regex looks for the English phrase `"unblocked after <number>"` in the exchange error message. If BingX changes their message format, localizes messages (Chinese exchange — could return Chinese text), or updates the wording, the regex silently returns 0 (caught by `except Exception`).

Additionally, the phrase `"unblocked"` is specific to BingX's rate-limit error wording. If the adapter is later extended to support other exchanges, this regex needs to be parameterized.

**Fix:** Prefer the numeric retry-after header field (response header `Retry-After`) rather than parsing the error message body.

**Severity: Medium**

### S13: `bingx_venue._row_float()` silently skips malformed rows — missing fields produce silent continue

**File:** `bingx_venue.py:51-56`

```python
@staticmethod
def _row_float(row: dict, keys: tuple[str, ...]) -> float:
    for k in keys:
        v = row.get(k)
        if v is not None and v != 0.0:  # also filters out 0.0 values!
            try:
                return float(v)
            except Exception:
                continue  # silently skip; next key tried
    return 0.0
```

Two issues:
1. `v != 0.0` filters out legitimate zero values. If an exchange response has `"origQty": "0"` for a cancelled order, `_row_float` skips it and tries the next key — which may be a different field with a non-matching value.
2. `except Exception: continue` silently skips `ValueError`, `TypeError`, and any other parsing error. No log, no diagnostic. A corrupted exchange response produces `0.0` with no trace.

**Severity: Medium**

### S14: `bingx_user_stream` reconnection backoff lacks jitter — thundering herd when multiple clients reconnect simultaneously

**File:** `bingx_user_stream.py:133-138`

```python
delay_ms = min(self._reconnect_delay_ms * 2, self._reconnect_max_ms)
```

Pure exponential backoff with **no jitter**. If multiple `BingxUserStream` instances (for different symbols or accounts) disconnect simultaneously (e.g., BingX WS maintenance), their reconnection attempts synchronize. Each retries at exactly the same intervals, creating a thundering herd against the BingX WebSocket endpoint.

**Fix:** Add random jitter: `delay_ms = min(base * 2, max_ms) * (0.5 + random.random())`.

**Severity: Medium**

### S15: `_venue_event_status_from_row()` falls back to ACKED for unrecognized statuses — masks new rejection types

**File:** `bingx_venue.py:85-101`

```python
@staticmethod
def _venue_event_status_from_row(row: dict) -> VenueEventStatus:
    status = (row.get("status") or "").strip().upper()
    if status == "NEW": return VenueEventStatus.ACKED
    elif status == "CANCELED": return VenueEventStatus.CANCELED
    elif status == "FILLED": return VenueEventStatus.FILLED
    elif status == "PARTIALLY_FILLED": return VenueEventStatus.PARTIALLY_FILLED
    elif status == "REJECTED": return VenueEventStatus.REJECTED
    elif status in ("EXPIRED", "EXPIRED") : return VenueEventStatus.EXPIRED
    else:
        return VenueEventStatus.ACKED  # fallback — unknown → ACKED (dangerous!)
```

If BingX introduces a new status (e.g., `"DEACTIVATED"`, `"PENDING_CANCEL"`, `"SUSPENDED"`), it maps to `ACKED` — which the kernel interprets as "order acknowledged by exchange and working." This could cause the kernel to believe an order is active when it's actually suspended, leading to:
- No cancel sent (kernel thinks order is working and waiting for fill)
- Premature exit intent submission (order not actually active)
- Incorrect slot FSM state

The fallback should be `REJECTED` (conservative — assume the worst) or should log a warning and escalate.

**Fix:** Change fallback to `VenueEventStatus.REJECTED` or log an error for unknown statuses.

**Severity: Medium**

### S16: `gen2.py` generates `except: pass` in test code — swallows KeyboardInterrupt and SystemExit

**File:** `gen2.py:335` (embedded in generated test template)

```python
try:
    bundle.close()
except:
    pass  # bare except — catches KeyboardInterrupt, SystemExit
```

The generated test files contain bare `except: pass` blocks in cleanup code. This catches `KeyboardInterrupt` and `SystemExit`, preventing the user from stopping a running test suite with Ctrl+C. The process must be killed with SIGKILL.

Same pattern in `_build_pink_extended.py` templates and other generated test builders.

**Fix:** Use `except Exception:` in generated code.

**Severity: Medium**

---

## Pass 16 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| S1 | `realized_pnl()`/`mark_price()` NaN bypasses `<=0.0` guard — NaN PnL corrupts k_realized_pnl | Rust/Python | **Critical** |
| S2 | MockVenue `_exchange_event_queue` check-then-act race — silently drops events | Test | **Critical** |
| S3 | No `test_kernel_fsm.py` or `test_kernel_fsm_recovery.py` exists | Test | **Critical** |
| S4 | Generated tests use `await asyncio.sleep(0.8)` — flaky false negatives on slow CI | Test | **Critical** |
| S5 | `_rate_limit_retry_after_ms()` returns 0 on parse failure — instant retry storm | Venue | **High** |
| S6 | Venue adapter detects rate limits but enforces zero backoff | Venue | **High** |
| S7 | `capital_epsilon = 1e-4` too tight — false WARN classifications | Accounting | **High** |
| S8 | Generated tests use module-level `asyncio.run()` — leaks tasks on Python 3.12+ | Test | **High** |
| S9 | `str.replace()` patching silently does nothing on format change | Build | **High** |
| S10 | `_consume()` no per-message WS timeout — silent hang blocks forever | Venue | **High** |
| S11 | `_run()` blocks pool thread with no timeout — 3 hung calls lock adapter | Venue | **High** |
| S12 | Rate-limit regex depends on exchange message format — non-portable | Venue | Medium |
| S13 | `_row_float()` silently skips malformed rows, filters zero values | Venue | Medium |
| S14 | Reconnection backoff lacks jitter — thundering herd risk | Venue | Medium |
| S15 | `_venue_event_status_from_row()` falls back to ACKED — masks new rejections | Venue | Medium |
| S16 | `except: pass` in generated test code — swallows KeyboardInterrupt | Test | Medium |

### Pass 16 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 4 (S1, S2, S3, S4) |
| **High** | 7 (S5, S6, S7, S8, S9, S10, S11) |
| Medium | 5 (S12, S13, S14, S15, S16) |

### Combined Catalog (All 16 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| **Total** | | **305** | **27** | **90** | **87** | **64** | **37** |

---

## PASS 17 — UNSAFE REVIEW, DEAD CODE/BACKUP DEBRIS, BUILD/PLANE PROTOCOLS

### T1: `catch_unwind` + `AssertUnwindSafe` on `&mut KernelCore` — partially mutated heap state persists after caught panic, no rollback

**File:** `_rust_kernel/src/lib.rs:2057-2071`

```rust
fn with_handle_mut<F, R>(handle: *mut KernelHandle, f: F) -> Result<R, String>
where F: FnOnce(&mut KernelCore) -> Result<R, String>,
{
    let core = unsafe { &mut (*handle).core };
    match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| f(core))) {
        Ok(result) => result,
        Err(panic_payload) => {
            let msg = ...;
            eprintln!("[KERNEL PANIC caught at FFI boundary] {msg}");
            Err(msg)  // Partially mutated KernelCore still live in heap Box<KernelHandle>
        }
    }
}
```

`catch_unwind` prevents Rust panics from unwinding across the FFI boundary (which would be UB). But the `KernelCore` behind the raw pointer is **mutated in-place** on the heap. When a panic occurs mid-mutation:

1. `f(core)` calls some kernel function like `process_intent()` or `apply_fill()`
2. The function panics partway through — e.g., `k_realized_pnl` was incremented but `event_seq` was not bumped; `slots[i]` was replaced but `rebuild_indexes()` was not called
3. `catch_unwind` catches the panic, returns `Err(msg)` to the Python caller
4. The `KernelCore` on the heap retains the **partially applied state**
5. The next FFI call operates on this corrupted state — `k_capital = seed + realized_pnl - fees_paid` is computed with mismatched values
6. The code comment acknowledges this: "the slot/account mutation that panicked may be partially applied"
7. The mitigations (reconcile WARN/ERROR → capital frozen) only work if the corruption is detectable — if the panic corrupts `slot.seen_event_ids` such that dedup fails, duplicate fills can process

`AssertUnwindSafe` on `&mut KernelCore` is sound for memory safety (after panic, the reference is still valid, just the data is inconsistent — no use-after-free, no double-free). But it is **logically unsound** — data invariants are violated, and the recovery path relies on a downstream reconcile to detect the issue, which may not catch all corruption patterns.

**Trigger paths:** Any panic inside `process_intent()`, `on_venue_event()`, `reconcile_slots()`, `apply_fill()`, or `save_full_snapshot()` while mutating `KernelCore`. A panic in `HashMap::insert()` (extremely rare, only on OOM) would leave the HashMap in an undefined state.

**Severity: High**

### T2: Empty backup directory `_backup_20260530_105512/` and stale `tea_debug.log` (0 bytes)

**Files:** `_backup_20260530_105512/` (empty directory), `tea_debug.log` (0 bytes)

`_backup_20260530_105512/` is a **completely empty directory** — zero files. Its sibling `_backup_20260530/` contains 22 source files and a `rust_kernel_src/` subdirectory. The `_105512` variant was created during an earlier backup attempt but never populated.

`tea_debug.log` is a **0-byte empty file** in the workspace root. No code writes to it. It's a stale artifact — likely a log file that was opened but never written to, or a debugging aid that was never used.

Both should be deleted to avoid confusion.

**Severity: Low**

### T3: `HazelcastRowWriter.__call__` uses bare `json.dumps(row, default=str)` — Enums and datetimes serialize as Python `str()` representations

**File:** `hazelcast_projection.py:60-63`

```python
def __call__(self, name: str, row: dict[str, Any]) -> None:
    if name.endswith("trade_events"):
        self.client.get_topic(name).publish(
            json.dumps(row, ensure_ascii=False, sort_keys=True, default=str)
        )
```

The `default=str` fallback serializes `Enum` values as `"TradeSide.SHORT"` (Python's `repr()` format) instead of `"SHORT"` (the `.value`). Datetimes become `"2026-01-01 00:00:00"` (Python `str()` format) instead of `"2026-01-01T00:00:00+00:00"` (ISO 8601). Downstream Hazelcast consumers expecting standard formats get unexpected strings.

Compare with `HazelcastProjector.publish_event()` (line 38) which correctly uses `json_safe(payload)` before `json.dumps()`:
```python
self.writer(self.trade_events_topic, json_safe(row))  # uses json_safe() first
```

The inconsistency: `HazelcastProjector` correctly serializes via `json_safe()`, but `HazelcastRowWriter.__call__` (used directly elsewhere) does not. Any code path that calls `HazelcastRowWriter` directly — rather than through `HazelcastProjector` — produces malformed output.

**Severity: High**

### T4: `real_zinc_plane._slot_from_payload()` uses `payload["entry_time"]` direct key access — crashes with `KeyError` if key missing

**File:** `real_zinc_plane.py:116,133`

```python
entry_time=datetime.fromisoformat(payload["entry_time"]) if payload.get("entry_time") else None,
# ... yet at line 133:
last_event_time=datetime.fromisoformat(payload["last_event_time"])  # NO .get() guard!
```

Line 116 uses `payload.get("entry_time")` — correct. Line 133 uses `payload["last_event_time"]` — **missing `.get()`**, crashes with `KeyError` if the key is absent.

Compare with `rust_backend.py:396-402` (the equivalent function):
```python
entry_time=datetime.fromisoformat(payload["entry_time"]) if payload.get("entry_time") else None,
last_event_time=datetime.fromisoformat(payload["last_event_time"]) if payload.get("last_event_time") else None,
```

Both fields use `.get()` in `rust_backend.py`. The `real_zinc_plane.py` version has a **copy-paste error** where the guard on `last_event_time` was omitted. If any slot is deserialized via the shared memory path (RealZincPlane) and lacks a `last_event_time` (e.g., a fresh slot that hasn't received a venue event yet), this crashes.

**Severity: High**

### T5: `_build_pink_bodies.py` uses `str.index("]")` to find SCENARIOS list close bracket — corrupts list if any entry contains `]`

**File:** `_build_pink_bodies.py:214`

```python
close_bracket = with_bodies.index("]", scenarios_open)
final = with_bodies[:close_bracket] + "\n" + param_block + "\n" + with_bodies[close_bracket:]
```

`str.index("]")` finds the **first** `]` character after `scenarios_open`. If any SCENARIOS entry contains a `]` inside a string literal (e.g., a diagnostic code like `INVALID_INTENT_PARSE`, a format string, or a nested data structure), the split lands inside the entry — truncating it and injecting the new `param_block` mid-entry.

The resulting file is syntactically incorrect only if the truncation produces unparseable code. If it happens to produce valid (but semantically wrong) code, the build succeeds with silently corrupted test data.

**Fix:** Use `ast` module to parse the list, or count bracket depth.

**Severity: High**

### T6: `VenueAdapter` protocol missing `connect()`/`disconnect()` — `AttributeError` at runtime

**File:** `venue.py` (protocol), `_build_pink_extended.py:31-32` (caller)

```python
# _build_pink_extended.py — Shim class:
async def connect(self, initial_capital=0):
    self.kernel.venue.connect()  # assumes VenueAdapter has connect()

async def disconnect(self):
    try:
        self.kernel.venue.disconnect()  # assumes VenueAdapter has disconnect()
    except:
        pass
```

`VenueAdapter` (defined in `venue.py` as a `Protocol`) defines `submit()`, `cancel()`, `snapshot()`, `subscribe()`, `open_positions()`, and `reconcile()` — but **not** `connect()` or `disconnect()`.

`MockVenueAdapter` has both methods (mock_venue.py:160-166). `BingxVenueAdapter` does **not** have them — calling `connect()` on a `BingxVenueAdapter` raises `AttributeError`.

The `Shim` class in `_build_pink_extended.py` is used for live-test infrastructure. If a live test runs with a venue that lacks `connect()`/`disconnect()`, the error is swallowed by the bare `except: pass` in `disconnect()`, but `connect()` propagates uncaught.

**Fix:** Add `connect()`/`disconnect()` to the `VenueAdapter` protocol, or add them as no-ops on `BingxVenueAdapter`.

**Severity: High**

### T7: `real_control_plane.py` and `real_zinc_plane.py` shared memory writes are non-atomic — reader sees partial state

**Files:** `real_control_plane.py:110-114`, `real_zinc_plane.py:252-253`

```python
# real_control_plane.py _write_region:
view[:len(packet)] = packet                      # writes new packet
if len(view) > len(packet):
    view[len(packet):] = b"\x00" * (len(view) - len(packet))  # zeroes tail

# real_zinc_plane.py _write_region:
view[:] = b"\x00" * len(view)                    # full zero (visible-zero window)
view[:len(packet)] = packet                       # writes packet
```

Both implementations write the shared memory buffer in **multiple non-atomic operations**. A reader process that reads between these operations sees:
- **`real_control_plane.py`**: The new header with stale tail from a previous larger packet → `_decode_packet()` may return stale data or parse failure
- **`real_zinc_plane.py`**: All zeros → `_decode_packet()` returns `{}` (empty dict) or parse failure

The visible-zero window in `real_zinc_plane.py` is particularly dangerous — if a reader reads the zeroed buffer, all slot states appear empty, which could trigger a spurious reconcile or incorrect position tracking.

**Fix:** Either:
1. Write the packet atomically (if the shared memory size supports it — write new data in a single slice assignment)
2. Use a sequence number in the header that the reader validates (sequence odd while writing, even when complete)
3. Use an explicit "writing" flag byte set before and cleared after the write

**Severity: High**

### T8: `real_zinc_plane._slot_from_payload()` reconstructs `internal_trade_id` from slot's `trade_id` instead of order's own — data loss on round-trip

**File:** `real_zinc_plane.py:92,106`

```python
active_entry_order = VenueOrder(
    internal_trade_id=str(payload.get("trade_id", "")),  # uses SLOT's trade_id
    ...
)
```

`TradeSlot.to_dict()` serializes the order's own `internal_trade_id` inside the `"active_entry_order"` sub-dict. But `_slot_from_payload()` ignores the per-order value and uses the slot-level `trade_id` instead.

If a slot has multiple orders (e.g., an entry order with `trade_id="abc"` and an exit order with `trade_id="def"`), the slot-level `trade_id` is the **current trade's ID** — which may match one of the orders. But after a CANCEL_ACK that clears the entry order, the slot `trade_id` may be empty or changed. The reconstructed order always gets the slot's `trade_id`, losing the distinction between entry-order and exit-order trade IDs.

This only affects the shared-memory round-trip (RealZincPlane). The FFI path (`rust_backend.py`) correctly uses the order's serialized `internal_trade_id`.

**Severity: Medium**

### T9: `_slot_from_payload()` duplicated verbatim between `real_zinc_plane.py` and `rust_backend.py` — double maintenance burden

**Files:** `real_zinc_plane.py:83-138`, `rust_backend.py:379-402`

The slot deserialization function `_slot_from_payload()` (or equivalent inline code) exists in **two separate files** with nearly identical logic. The `real_zinc_plane.py` version is a 55-line function; the `rust_backend.py` version is inline in `_slot_from_payload()`.

Both deserialize `TradeSlot` from the same `to_dict()` output format. Any schema change (field added, removed, renamed, or type-changed) must be updated in both places. T4 (missing `.get()` on `last_event_time`) and T8 (`internal_trade_id` from wrong source) are direct consequences of this duplication — the bug exists in one copy but not the other.

**Fix:** Extract shared `_slot_from_payload()` into `contracts.py` (or `utils.py`).

**Severity: Medium**

### T10: `_build_pink_extended.py` string index math finds first `finally:` — could match nested `try/finally` inside function body

**File:** `_build_pink_extended.py:117-119`

```python
idx = content.index(old_run_pat)
run_end = content.index("    finally:", idx)  # finds FIRST "finally:" — could be nested!
run_end = content.index("\n\n", run_end) + 2  # boundary detection for function end
```

The search for `"    finally:"` finds the **first** occurrence after `idx`. If the `_run()` function body (or any function it calls, like `_si()` or `_verify()`) contains a nested `try/finally` block — or if the function contains the word "finally:" in a string or comment — the index points to the wrong location. The `"\n\n"` search then terminates inside the function body, producing a truncated replacement that generates syntactically broken output.

The generated `test_pink_bingx_dita_live_e2e.py` is patched with index math that has no validation. A malformed patch silently produces a non-functional test file (syntax error only caught at test import time).

**Fix:** Parse the function boundaries using `ast` module or use a well-defined sentinel comment (e.g., `# END _run`) as anchor points.

**Severity: Medium**

### T11: No workspace-root `.gitignore` — `__pycache__`, backup dirs, context files, build artifacts untracked

**File:** (missing — should be `dita_v2/.gitignore`)

The only `.gitignore` in the workspace is inside `_rust_kernel/` (covers `/target`). There is **no `.gitignore` at the workspace root** (`dita_v2/`). This means:

- `__pycache__/` directories (29 `.pyc` files present) are tracked or untracked depending on global git config
- `_backup_20260530/` and `_backup_20260530_105512/` are visible to git (the 22 source files in the backup are tracked? may or may not be indexed)
- `_backup_20260530_105512/` (empty dir) is visible
- `Codex_CONTEXT_RESTORE__*.txt` context files are visible
- `tea_debug.log` is visible
- Any `.pyc` files that end up in the index cause merge conflicts

The git `status` shows `?? 2004` untracked files — many of these would be excluded by a proper `.gitignore`.

**Severity: Low**

### T12: `projection.py` lazy import failure silently swallowed — caller gets `writer=None` with no diagnostic

**File:** `projection.py:75-77`

```python
try:
    from .hazelcast_projection import HazelcastRowWriter
    writer = HazelcastRowWriter(client)
except Exception:  # catches import errors, constructor errors, everything
    writer = None
```

If the `hazelcast_projection` module has a syntax error, `HazelcastRowWriter` doesn't exist, or the constructor raises, the exception is silently swallowed. The caller gets a `HazelcastProjection` with `writer=None`. The `write_transition()` and `write_control()` methods check `if not self.writer:` and silently return — so all Hazelcast writes are silently dropped with no log, no error, no diagnostic.

The `"Hazelcast unavailable — fallback active"` log message is only printed for the **first** import attempt. If the module is later fixed (e.g., a missing dependency is installed), the stale `writer=None` persists because the import is not retried.

**Severity: Medium**

### T13: `Codex_CONTEXT_RESTORE__*.txt` and other AI context files in workspace root — debris

**Files:** `Codex_CONTEXT_RESTORE__2026-06-02-130508-*.txt`, other `.md` analysis documents

The workspace root contains AI-assistant context restore files and 6+ Markdown flaw analysis documents (`PINK_DITAv2_E2E_TRACE_ANALYSIS.md`, `PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md`, `PINK_DITAv2_THREADING_ATOMICITY.md`, etc.). These are analysis artifacts, not source code.

While the flaw documents are intentional project records, the `Codex_CONTEXT_RESTORE__*.txt` files are ephemeral AI context dumps that should not be in version control. They contain session state information that is meaningless outside the AI session.

**Severity: Low**

### T14: `_backup_20260530/` contains 22 live source files — risk of stale import confusion

**File:** `_backup_20260530/` (22 Python files including `rust_backend.py`, `launcher.py`, `bingx_venue.py`, etc.)

The backup directory contains full copies of all Python source files from May 30. If a developer runs `import` from within the `dita_v2` directory, the backup directory's `__init__.py` makes it a valid Python package. An accidental `from _backup_20260530 import rust_backend` would load the **old** code instead of the current implementation — silently, with no warning.

The backup `rust_backend.py` lacks the Rust FFI integration, has no `_first_invalid_intent_field()`, and uses the old `ExecutionKernel` class. Accidentally importing from the backup would produce hard-to-diagnose errors (missing methods, wrong behavior).

**Fix:** Rename backup directories to non-Python-package names (e.g., `backup_20260530` without the leading underscore), or add `__init__.py` that raises `ImportError` with a clear message.

**Severity: Medium**

---

## Pass 17 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| T1 | `catch_unwind` + `AssertUnwindSafe` — partially mutated state persists, no rollback | Rust | **High** |
| T2 | Empty backup dir `_backup_20260530_105512/` and stale `tea_debug.log` | Repo | Low |
| T3 | `HazelcastRowWriter` uses bare `json.dumps(row, default=str)` — Enums/datetimes wrong format | Bridge | **High** |
| T4 | `real_zinc_plane._slot_from_payload()` direct key access `payload["last_event_time"]` — `KeyError` crash | Plane | **High** |
| T5 | `_build_pink_bodies.py` `str.index("]")` finds first `]` — corrupts SCENARIOS list | Build | **High** |
| T6 | `VenueAdapter` protocol missing `connect()`/`disconnect()` — `AttributeError` at runtime | Venue | **High** |
| T7 | Shared memory writes non-atomic — visible-zero window, stale tail exposes partial state | Plane | **High** |
| T8 | `_slot_from_payload()` reconstructs `internal_trade_id` from slot's trade_id — order-level data loss | Plane | Medium |
| T9 | `_slot_from_payload()` duplicated in two files — double maintenance burden, schema drift risk | Plane | Medium |
| T10 | `_build_pink_extended.py` `str.index("finally:")` finds first match — nested try/finally mismatch | Build | Medium |
| T11 | No workspace-root `.gitignore` — `__pycache__`, backup dirs, debris untracked | Repo | Low |
| T12 | `projection.py` lazy import failure silently swallowed — `writer=None` drops all Hazelcast writes | Bridge | Medium |
| T13 | `Codex_CONTEXT_RESTORE__*.txt` AI context files in workspace root — debris | Repo | Low |
| T14 | `_backup_20260530/` is a valid Python package — accidental old-code import risk | Repo | Medium |

### Pass 17 Severity

| Severity | Count |
|----------|-------|
| **High** | 5 (T1, T3, T4, T5, T6, T7) |
| Medium | 5 (T8, T9, T10, T12, T14) |
| Low | 4 (T2, T11, T13) |

### Combined Catalog (All 17 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| **Total** | | **319** | **27** | **95** | **92** | **64** | **37** |

---

## PASS 18 — RUST TEST GAPS, ACCOUNTING RECONCILIATION BUGS, FFI TYPE MISMATCHES

### U1: Rust `KernelIntent` has no `order_type` or `limit_price` fields — Python sends them, serde silently drops them

**Files:** `rust_backend.py:375-377` (Python sends), `_rust_kernel/src/lib.rs:439-456` (Rust receives)

Python's `_intent_to_payload()` serializes two fields that the Rust `KernelIntent` struct does not have:

```python
# Python sends (rust_backend.py:375-377):
"order_type": getattr(intent, "order_type", "MARKET"),
"limit_price": float(getattr(intent, "limit_price", 0.0) or 0.0),
```

```rust
// Rust receives (lib.rs:439-456) — no order_type or limit_price fields:
struct KernelIntent {
    timestamp: DateTime<Utc>,
    intent_id: String,
    trade_id: String,
    slot_id: i64,
    asset: String,
    side: TradeSide,
    action: KernelCommandType,
    reference_price: f64,
    target_size: f64,
    leverage: f64,
    exit_leg_ratios: Vec<f64>,
    reason: String,
    metadata: Map<String, Value>,
    stage: TradeStage,
    // order_type — MISSING
    // limit_price — MISSING
}
```

Serde's default behavior ignores unknown fields during deserialization. Both `order_type` and `limit_price` are transmitted across the FFI boundary and **silently thrown away**. Any downstream logic in Rust that depends on these fields is dead code. The Python `KernelIntent` dataclass declares them with defaults (`"MARKET"`, `0.0`), and they're used in Python-side `_first_invalid_intent_field()` (which checks `limit_price` for NaN), but the Rust kernel never sees them.

**Impact:** If the Rust kernel were to use `order_type` to distinguish MARKET from LIMIT orders (which would be needed for realistic exchange interaction), the field exists in Python but is never delivered. This was clearly designed to be added to Rust but the addition was never completed.

**Severity: High**

### U2: Rust `VenueEventStatus` deserializer expects `"CANCEL_REJECTED"` — Python sends `"CANCELED_REJECTED"`, deserialization fails

**Files:** `_rust_kernel/src/lib.rs:269-278` (Rust deserializer), Python `contracts.py` `VenueEventStatus` enum

The Rust `VenueEventStatus` custom deserializer has a typo in one of its string literals:

```rust
// Rust deserializer (lib.rs:269-278):
"CANCEL_REJECTED" => Ok(VenueEventStatus::CANCELED_REJECTED),
//     ^ no D — TYPO
```

The string literal `"CANCEL_REJECTED"` is missing the D after L. The enum variant is correctly named `CANCELED_REJECTED` (with a D), and Python's `VenueEventStatus.CANCELED_REJECTED.value` produces `"CANCELED_REJECTED"` (with a D).

When Python sends a `VenueEvent` with `status=CANCELED_REJECTED`, the JSON contains `"status": "CANCELED_REJECTED"`. Rust's deserializer tries to match `"CANCELED_REJECTED"` against the string `"CANCEL_REJECTED"` — which fails. Serde returns an error: `"invalid VenueEventStatus: CANCELED_REJECTED"`. The `on_venue_event()` returns an error diagnostic.

**Impact:** Any venue event with status `CANCELED_REJECTED` fails deserialization on the Rust side. The event is rejected with `INVALID_EVENT_PARSE` instead of being processed normally. This means CANCEL_REJECT events (which are important FSM signals — they tell the kernel that a cancel was rejected by the exchange) are silently discarded rather than being used to transition the FSM.

**Note on usage:** The mock venue does not produce `CANCELED_REJECTED` events in the current test suite. The bingx live venue adapter (`bingx_venue.py`) maps exchange cancel responses but may use a different status string. This bug is dormant until a live exchange returns a cancel-rejected status — at which point the venue event is silently dropped with an error diagnostic.

**Severity: High**

### U3: R2 reconciliation compares cumulative `k.realized_pnl` against single-last-fill `e.last_fill_realized_pnl` — broken after 2+ fills

**File:** `account.py:459-473`

```python
# K side (line 296):
self._k_realized += safe_float(realized_pnl, 0.0)  # ACCUMULATES all fills

# E side (line 299):
self._e_last_fill_realized_pnl = e_fill_realized_pnl  # ONLY the LAST fill

# R2 comparison (line 460):
delta_r2 = abs(k.realized_pnl - e.last_fill_realized_pnl)
```

After the first fill:
- `k.realized_pnl = fill_1_pnl` (e.g., 10.0)
- `e.last_fill_realized_pnl = fill_1_pnl` (10.0)
- `delta_r2 = 0` ✅ OK

After the second fill:
- `k.realized_pnl = fill_1_pnl + fill_2_pnl` (e.g., 10.0 + 15.0 = 25.0)
- `e.last_fill_realized_pnl = fill_2_pnl` (15.0 — only the last fill)
- `delta_r2 = |25.0 - 15.0| = 10.0`
- With `realized_rounding = 0.05`, `10.0 > 0.05` → **ERROR**

After every fill beyond the first, R2 fires ERROR because the K accumulator includes all fills but the E value is reset to only the most recent fill. This is a fundamental design flaw — K and E track realized PnL differently (K accumulates, E replaces), and the comparison is apples-to-oranges.

**Fix:** Either `e.last_fill_realized_pnl` must be accumulated (add each new fill to the previous total) on the Python side, or R2 should compare only the per-fill delta (which would require storing per-fill values on both sides).

**Severity: Critical**

### U4: R4 reconciliation compares `k.open_notional` against `e.used_margin` — fundamentally different quantities

**File:** `account.py:488-498`

```python
# K side (line 411):
open_notional += abs(slot.size) * mark  # Σ |qty| · mark_price

# E side (line 370):
self._e_used_margin = _safe(...)  # exchange-reported used margin

# R4 comparison (line 490):
delta_notional = abs(k.open_notional - e.used_margin)
```

**K open_notional** = sum of absolute position notional values: `|size| × mark_price`. This is a gross market exposure measure.

**E used_margin** = exchange-reported margin used: `Σ notional / leverage`, possibly adjusted for cross-margin, risk weighting, position tiers, and maintenance margin requirements. This is a margin requirement measure.

For a 100 USDT position at 10x leverage:
- K open_notional = 100 × mark = **100 USDT**
- E used_margin = 100 / 10 = **10 USDT** (approximately, ignoring cross-margin effects)
- delta = |100 - 10| = **90 USDT**
- With `capital_epsilon = 1e-4` → `90 > 1e-4` → instantly exceeds even the WARN band

R4 produces ERROR on every position with any leverage > 1x. The only time R4 would pass is with 1x leverage on a single position (where notional = used_margin approximately). This makes R4 **completely broken as designed**.

**Fix:** R4 should compare like-with-like: either compare K open_notional against E open_notional (if available from exchange), or compare K used_margin against E used_margin. The current comparison of notional vs margin is meaningless.

**Severity: Critical**

### U5: R3 skipped when `len(e.positions) == 0` — K has open positions but E reports none, silent false negative

**File:** `account.py:478`

```python
if len(e.positions) > 0:  # guard — only run R3 when E reports positions
    if k.open_positions != len(e_pos_map):
        return "ERROR", ...
```

If the exchange reports zero positions (corresponds to exchange-side position clear, connection loss, or initial state) while K has open positions (slots in `POSITION_OPEN`), R3 is **entirely skipped**. The position count mismatch is not detected.

This is asymmetric: if K=0 and E=1 (E has a position K doesn't know about), R3 fires because `len(e.positions)=1 > 0` and `k.open_positions=0 != 1`. But if K=1 and E=0, the guard prevents detection. The position that exists only in K's view is invisible to reconciliation.

**Scenario:** The exchange liquidates a position (or it expires) and sends no explicit event. K still thinks the position is open. The reconciler is called with E reporting 0 positions. R3 is skipped. R1 capital divergence may catch it eventually (if the trade had PnL), but if PnL is zero, R1 delta is also zero → **no detection at all**.

**Severity: High**

### U6: `on_venue_event` and `apply_fill` have no NaN/Inf guards on incoming venue event fields — NaN price/size propagate unchecked

**File:** `_rust_kernel/src/lib.rs` — `on_venue_event()`, `apply_fill()`

The `apply_fill()` function uses `event.price`, `event.size`, `event.filled_size` directly in arithmetic without finiteness checks:

```rust
// apply_fill (approximately):
slot.entry_price = event.price;           // NaN → stored directly
let realized = realized_pnl(&slot, event.price, fill_size);  // NaN → NaN PnL
slot.realized_pnl += realized;             // NaN accumulates
slot.size = (slot.size - fill_size).max(0.0);  // NaN - x = NaN, .max(0.0) = 0.0 (safe for size)
```

The `realized_pnl()` function (line 1121) guards `entry_price <= 0.0` but **NaN passes through** (IEEE 754: `NaN <= 0.0` is `false`). The `mark_price()` function (line 395) guards `price.is_finite()` on input but `entry_price` can be set to NaN by `apply_fill` before `mark_price` is called.

**Trigger path:** If a venue event arrives with `price = NaN` (corrupted exchange data, deserialization of malformed JSON, or a bug in the venue adapter), `apply_fill` stores NaN into `entry_price`. Every subsequent `realized_pnl()` and `mark_price()` call produces NaN. Once NaN enters `slot.realized_pnl`, it propagates to `k_realized_pnl` → `k_capital` → all financial computations.

The Rust-side `apply_fill_settled()` (line 761) has `if realized_pnl.is_finite()` — but this only guards the `account` level, not the slot level. The slot-level `realized_pnl` has already been corrupted.

**Severity: Critical**

### U7: Rust kernel has zero tests for ORDER_REJECT, PARTIAL_FILL, TERMINAL_STATE guard, STALE_STATE_RECONCILING guard, RATE_LIMITED, or MARK_PRICE transitions

**File:** `_rust_kernel/src/lib.rs` — `mod tests` (32 tests total)

The 32 Rust unit tests cover ExchangeFeeConfig (7), AccountState fill_settled (5), predicted→settled (2), JSON dispatch (3), dedup (3), snapshot save/restore (4), capital_frozen (4), and one FSM transition test (`enter_then_ack_fill`).

**FSM transitions with ZERO Rust test coverage:**

| Transition | Rust test | Python FFI test | Overall |
|-----------|-----------|-----------------|---------|
| ORDER_REJECT → IDLE | ❌ | ❌ | **Uncovered** |
| PARTIAL_FILL → ENTRY_WORKING / EXIT_WORKING | ❌ | ❌ (no explicit FSM state check) | **Uncovered** |
| FULL_FILL → POSITION_OPEN / CLOSED | ❌ | ✅ (test_flaws.py) | Python only |
| CANCEL_ACK → IDLE (entry) | ❌ | ✅ (TestFlaw2) | Python only |
| CANCEL_ACK → POSITION_OPEN (exit) | ❌ | ✅ (TestFlaw2) | Python only |
| CANCEL_REJECT → POSITION_OPEN | ❌ | ✅ (TestI15) | Python only |
| TERMINAL_STATE guard | ❌ | ❌ | **Uncovered** |
| STALE_STATE_RECONCILING guard | ❌ | ❌ | **Uncovered** |
| RATE_LIMITED response | ❌ | ❌ | **Uncovered** |
| MARK_PRICE (unrealized_pnl update) | ❌ | ❌ | **Uncovered** |
| SLOT_BUSY rejection | ❌ | ✅ (incidental) | Weak |
| ENTER with capital frozen | ❌ | ✅ (TestCapitalFrozen) | Python only |
| EXIT on IDLE slot | ❌ | ✅ (incidental) | Weak |
| Multi-leg exit FSM | ❌ | ✅ (TestFlaw4) | Python only |

The most critical gap: **ORDER_REJECT** — if the exchange rejects an entry order, the kernel should transition back to IDLE (or emit a diagnostic). This path exists in the Rust code (line ~1525: `KernelEventKind::ORDER_REJECT` match arm) but has **zero test coverage** in Rust or Python.

**Severity: High**

### U8: `safe_float()` in `utils.py` returns NaN/Inf instead of default — contradictory behavior with `_safe()` in `account.py`

**File:** `utils.py:13-19`, `account.py:229-233`

```python
# utils.py safe_float():
def safe_float(value, default=0.0):
    try:
        out = float(value)
    except Exception:
        return default
    if not math.isfinite(out):
        return out       # BUG: returns NaN/Inf unchanged!
    return out

# account.py _safe():
def _safe(v, default=0.0):
    try:
        f = float(v)
        return f if math.isfinite(f) else default  # CORRECT: returns default
    except:
        return default
```

`safe_float()` returns the non-finite value (NaN/Inf) when encountered, while `_safe()` correctly returns the default. Both functions serve the same purpose (safe float conversion) but have **opposite behavior** for non-finite inputs.

`safe_float()` is used in `AccountProjectionV1.observe_slots()` (line 56) which feeds data into the legacy reconciliation path. If an exchange returns NaN for a price or size, `safe_float()` propagates NaN into the accounting state rather than defaulting to 0.0.

**Fix:** `safe_float()` should return `default` when the value is non-finite, matching `_safe()`'s behavior.

**Severity: Medium**

### U9: `_scan_slots()` uses `slot.metadata.get("leverage")` not `slot.leverage` — wrong leverage source for used_margin computation

**File:** `account.py:414-416`

```python
metadata = slot.metadata or {}
lev = max(1.0, _safe(metadata.get("leverage", slot.leverage if hasattr(slot, "leverage") else 1.0), 1.0))
```

The used_margin computation reads leverage from `slot.metadata.get("leverage")` rather than `slot.leverage`. The slot's own `.leverage` field is only used as a fallback if metadata doesn't have the key.

The Rust kernel sets `slot.leverage` from the intent's leverage field during `process_intent()` (line ~1258). It does NOT write leverage into `slot.metadata`. The metadata is populated from `intent.metadata` which is a generic dict that may or may not contain a "leverage" key.

**Result:** Unless the calling algo explicitly places leverage in `intent.metadata["leverage"]`, `_scan_slots` uses the default `1.0` for `lev`, regardless of what `slot.leverage` actually is. The used_margin computation is wrong for any slot with leverage ≠ 1x.

**This affects R5 (used_margin comparison).** If the slot has 10x leverage but `_scan_slots` computes with 1x, `k.used_margin` is 10x larger than it should be, producing a false R5 ERROR.

**Severity: Medium**

### U10: `AccountState` serializes `k_fees_paid` but Rust manually injects JSON key `"k_net_fees"` — two keys for same value

**File:** `_rust_kernel/src/lib.rs:1144-1148` (Rust), `rust_backend.py:907` (Python)

Rust's `on_account_event()` manually injects an additional JSON key into the account event result:
```rust
obj.insert("k_net_fees".to_string(), json!(self.account.k_fees_paid));
```

The serde-serialized `AccountState` already contains the field `k_fees_paid` (from `#[derive(Serialize)]`). This creates two JSON keys (`"k_net_fees"` and `"k_fees_paid"`) holding the same value.

Python's `ExecutionKernel.snapshot()` reads `"k_fees_paid"` from the deserialized account data — it never reads `"k_net_fees"`. The injected key is dead data on the wire. This is not a functional bug but represents confusion about which key names are canonical.

**Severity: Low**

### U11: 10+ `AccountState` fields transmitted across FFI but never read by Python — wasted bandwidth, confusion risk

**File:** `_rust_kernel/src/lib.rs` (Rust serializes), `rust_backend.py` (Python reads)

Fields serialized by Rust's `AccountState` serde and transmitted across FFI but **never read** by Python:

- `seed_capital` — initial capital, available from config
- `k_taker_fees` — individual taker fee bucket
- `k_maker_fees` — individual maker fee bucket
- `k_maker_rebates` — individual rebate bucket
- `fee_config` — entire `ExchangeFeeConfig` struct (5 subfields: calibration_ratio, taker_rate, maker_rate, etc.)
- `last_predicted_fee` — most recent predicted fee value
- `last_calibration_ratio` — most recent calibration ratio
- `seen_account_event_ids` — entire `IndexSet<String>` (1024 entries at capacity)

These fields are serialized to JSON, sent across the FFI boundary via CString, and **silently discarded** by Python. The most wasteful is `seen_account_event_ids` — a 1024-element set of strings that is transmitted on every snapshot read but never used on the Python side.

**Severity: Low**

### U12: `_order_from_payload()` overwrites `internal_trade_id` with enclosing slot's `trade_id` — loses order-level distinction

**File:** `rust_backend.py:302-310,334-335`

```python
def _order_from_payload(payload: dict, trade_id: str) -> VenueOrder:
    return VenueOrder(
        internal_trade_id=trade_id,  # OVERWRITES with slot's trade_id
        ...
    )

# Called as (line 334):
active_exit_order=_order_from_payload(order_dict, trade_id=str(payload.get("trade_id", ""))),
```

The `_order_from_payload()` function takes `trade_id` from the caller and uses it as the order's `internal_trade_id`. The JSON payload's own `internal_trade_id` field is **ignored**. If Rust's `TradeSlot.to_dict()` serializes the order with its own `internal_trade_id` (which may differ from the slot's `trade_id` — e.g., a slot that was re-entered after a cancel), the per-order ID is silently replaced with the slot-level ID.

This affects the Python-side `VenueOrder.internal_trade_id` field — it will always match the slot's `trade_id` rather than the order's original ID. Any Python code that relies on `internal_trade_id` to distinguish between entry and exit orders, or to track orders across cancel/re-enter cycles, gets wrong data.

**Note:** T8 covers the same bug in `real_zinc_plane.py`. This is the same bug in `rust_backend.py` — the FFI path. The bug exists in both code paths.

**Severity: Medium**

### U13: Reconciliation has no independent third reference — any divergence affecting both K and E equally is invisible

**File:** `account.py:442-510`

Every R-rule compares K (kernel-computed state) against E (exchange-reported state). If both K and E share a common error source — stale mark price, wrong position count, outdated wallet balance — the delta is small and reconciliation reports OK.

**Specific blind spots:**
1. **R1**: If both `k.capital` and `e.wallet_balance` are wrong by the same amount in the same direction (e.g., both show 9,800 but true capital is 10,000), `abs(9800 - 9800) = 0` → OK
2. **R4**: If both K open_notional and E used_margin are inflated by the same stale mark price, the delta is small → OK (but read U4 — these are fundamentally different quantities, so this blind spot is theoretical)
3. **R3**: If both K and E report 2 positions but the positions have wrong sizes or entry prices, the count matches → OK (no per-position quality comparison)

No cross-check against a third data source (independent market data feed, broker API, or blockchain) exists.

**Severity: Medium**

### U14: `lot_step` declared in `ReconcileConfig` but never used anywhere — dead config field

**File:** `account.py:220`

```python
@dataclass
class ReconcileConfig:
    ...
    lot_step: float = 0.001   # DEAD — never referenced in _classify or _scan_slots
```

The `lot_step` field is declared with a default of `0.001` but is never read by any code path. It was intended for per-position quantity comparison (R6 mentioned in a comment) but that rule was never implemented. A developer configuring reconciliation might set `lot_step` expecting it to affect behavior, but it has no effect.

**Severity: Low**

---

## Pass 18 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| U1 | `order_type`/`limit_price` sent to Rust, no serde fields — silently dropped | FFI | **High** |
| U2 | Rust `VenueEventStatus` expects `"CANCEL_REJECTED"` (typo) — `"CANCELED_REJECTED"` fails | Rust | **High** |
| U3 | R2 compares cumulative K realized vs single-last-fill E realized — broken after 2nd fill | Accounting | **Critical** |
| U4 | R4 compares K open_notional vs E used_margin — fundamentally different quantities | Accounting | **Critical** |
| U5 | R3 skipped when `len(e.positions)==0` — K has positions but E reports none, silent | Accounting | **High** |
| U6 | `on_venue_event`/`apply_fill` no NaN guards on venue event price/size — NaN propagates | Rust | **Critical** |
| U7 | Zero Rust tests for ORDER_REJECT, PARTIAL_FILL, TERMINAL_STATE, etc. | Test | **High** |
| U8 | `safe_float()` returns NaN/Inf instead of default — contradicts `_safe()` | Bridge | Medium |
| U9 | `_scan_slots` uses `metadata.get("leverage")` not `slot.leverage` — wrong leverage source | Accounting | Medium |
| U10 | Rust injects `"k_net_fees"` key alongside serde's `k_fees_paid` — duplicate key | Bridge | Low |
| U11 | 10+ AccountState fields transmitted across FFI but never read by Python | FFI | Low |
| U12 | `_order_from_payload()` overwrites `internal_trade_id` with slot's `trade_id` | Bridge | Medium |
| U13 | No independent third reference — symmetrical K=E errors invisible | Accounting | Medium |
| U14 | `lot_step` declared in ReconcileConfig but never used — dead field | Accounting | Low |

### Pass 18 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 3 (U3, U4, U6) |
| **High** | 4 (U1, U2, U5, U7) |
| Medium | 4 (U8, U9, U12, U13) |
| Low | 3 (U10, U11, U14) |

### Combined Catalog (All 18 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| **Total** | | **333** | **30** | **99** | **96** | **64** | **37** |

---

## PASS 19 — STARTUP/SHUTDOWN LIFECYCLE, RUST KERNEL SUBTLETIES, GENERATED TEST INFRA

### V1: `DITAv2LauncherBundle.close()` never calls `kernel.close()` — Rust kernel handle always leaks via `__del__`

**File:** `launcher.py:40-42`

```python
def close(self) -> None:
    _maybe_close(self.venue)
    _maybe_close(self.zinc_plane)
    _maybe_close(self.control_plane)
    # NOTE: self.kernel is never closed!
```

The bundle's `close()` method closes the venue, zinc plane, and control plane — but **not the kernel**. The `ExecutionKernel` Rust handle (`self._backend`) is only freed when Python's garbage collector calls `__del__`, which is non-deterministic.

In a `with` block or explicit `bundle.close()` pattern, the Rust handle survives until the `ExecutionKernel` object's refcount drops to zero (usually when the bundle goes out of scope). If the caller holds a reference to the kernel (e.g., `k = bundle.kernel` for inspection), the handle lives indefinitely.

Compare with `_build_rb()` in test code (gen2.py:326-338) which creates a `Shim` that has a `close` method calling `bundle.close()` and `bundle.kernel.close()`. The production `DITAv2LauncherBundle` has no equivalent.

**Severity: Critical**

### V2: `BingxVenueAdapter` has no `close()` or `disconnect()` — ThreadPoolExecutor, HTTP client sessions, connections never released

**File:** `bingx_venue.py` (entire file)

`BingxVenueAdapter` is the synchronous REST adapter to BingX. It has no `close()`, `disconnect()`, or any cleanup method. The `_maybe_close()` in the launcher tries `.close()` → `AttributeError` (caught), then `.disconnect()` → `AttributeError` (caught). **Nothing happens.**

Leaked resources:
- **ThreadPoolExecutor** (class-level `_EXECUTOR`, 3 threads) — never shut down
- **BingxDirectExecutionAdapter backend** — holds an `aiohttp.ClientSession` with TCP connection pool (`limit=4`) — never closed
- Any in-flight HTTP connections — abandoned at process exit

This is a compounding of R1 and R2 from Pass 15. It's the single largest resource leak in the system.

**Severity: Critical**

### V3: `process_intent` ENTER path does NOT clear `seen_event_ids` — old trade's dedup set pollutes new trade

**File:** `_rust_kernel/src/lib.rs:1260-1290` (ENTER path)

```rust
// ENTER path — slot reuse:
slot.trade_id = intent.trade_id.clone();
slot.asset = intent.asset.clone();
slot.entry_price = 0.0;
slot.size = 0.0;
slot.initial_size = 0.0;
slot.unrealized_pnl = 0.0;
slot.realized_pnl = 0.0;
slot.active_leg_index = 0;
slot.active_entry_order = None;
slot.active_exit_order = None;
slot.close_reason.clear();
slot.closed = false;
slot.last_event_time = None;
// 🔴 seen_event_ids is NOT cleared — survives from old trade
slot.fsm_state = TradeStage::ORDER_REQUESTED;
```

When a slot is reused by a new ENTER intent, the `seen_event_ids` vector from the **previous trade** survives. If any event_id from the new trade collides with one observed by the old trade, the event is rejected as `DUPLICATE_EVENT` and silently dropped.

**Example:**
1. Trade A in slot 0 receives events `evt-001`, `evt-002`, `evt-003`. All stored in `seen_event_ids`.
2. Trade A completes. Slot 0 returns to `IDLE`.
3. Trade B enters on slot 0. `seen_event_ids` still contains `evt-001`, `evt-002`, `evt-003`.
4. Exchange sends `evt-001` for Trade B (a legitimate new event with a reused event_id — possible on exchanges that recycle IDs daily).
5. Rust kernel sees `evt-001` in `seen_event_ids` → rejects as `DUPLICATE_EVENT`. The fill/ack is lost.

The probability of collision depends on the exchange's event ID scheme. If event IDs are UUIDs, the risk is negligible. If event IDs are daily counters (e.g., `"20260601-001"` → `"20260602-001"`), the collision risk is **guaranteed** on every day boundary.

**Fix:** Add `slot.seen_event_ids.clear();` in the ENTER path.

**Severity: High**

### V4: Three generators (`gen2.py`, `gen_live_tests.py`, `_gen_test.py`) all write to same output file — last writer wins, incompatible prologues

**Files:** `gen2.py:431-432`, `gen_live_tests.py:680-681`, `_gen_test.py:1233-1234`

All three generator scripts write to the same file:
```
/mnt/dolphinng5_predict/prod/tests/test_pink_bingx_dita_live_e2e.py
```

Each produces **different prologues** with incompatible import styles, helper function signatures, and test runner implementations:
- `gen2.py`: Uses `RB`/`Shim` tuple, `_si()` helper, `_flatten()` helper
- `gen_live_tests.py`: Uses `_RuntimeBundle`/`_RuntimeShim` dataclass, `_run_scenario()` runner, `_flatten_via_kernel_intent()` helper
- `_gen_test.py`: Uses scenario-style `B()` body definitions, single parametrized `test_pink_ditav2` function

The 68 named body functions are **identical** across gen2.py and gen_live_tests.py but their signatures differ (`(k, symbol, p)` vs `(bundle, client, symbol, snap)`). Running gen2.py then gen_live_tests.py produces a file where the bodies don't match the runner — the file compiles (Python sees valid functions) but silently tests nothing meaningful.

`_build_pink_extended.py` and `_build_pink_bodies.py` then mutate the same file in-place with `str.replace()` — which silently does nothing if the expected format doesn't match.

**Severity: Critical**

### V5: Generated tests are triple env-gated — never run in CI, making them dead code

**File:** `gen2.py` (generated output), `gen_live_tests.py` (generated output), `_gen_test.py` (generated output)

Every generated test file has three `pytest.skip()` guards at the top:
```python
if not os.environ.get("BINGX_SMOKE_LIVE"): pytest.skip("...")
if not os.environ.get("BINGX_SMOKE_ALLOW_TRADE"): pytest.skip("...")
if not os.environ.get("PINK_DITA_E2E"): pytest.skip("...")
```

These three env vars are **never set in CI**. The CI pipeline has no step that sets them. As a result:
- 68 test functions from gen2.py → always skipped
- 70 test functions from gen_live_tests.py → always skipped
- 157 body scenarios from _gen_test.py → always skipped
- ~295 combined test scenarios → **zero executed in CI**

Only the mock-venue tests (`test_flaws.py`, `test_account_core_v2.py`, `test_account_reconcile_faults.py`, `test_exchange_event_seam_parity.py`, `test_kernel_reliability.py`, `test_kernel_fee_friction.py`, `test_pink_clickhouse_phase4.py`, `test_alpha_blue_untouched_g7.py`) actually run in CI.

**Severity: Critical**

### V6: `ExecutionKernel.close()` destroys Rust handle immediately — no drain of in-flight intents, no wait for pending FFI calls, no state flush

**File:** `rust_backend.py:320-330`

```python
def close(self) -> None:
    backend = self._backend
    if backend is not None:
        self._backend = None
        try:
            _get_rust().destroy(backend)  # immediately frees Rust kernel memory
        except Exception:
            pass
```

`destroy(backend)` calls `dita_kernel_destroy(handle)` which calls `drop(Box::from_raw(handle))`, freeing the entire `KernelHandle` heap allocation including `KernelCore`, all slots, all account state.

If `process_intent()` or `on_venue_event()` has been called from another thread and is mid-execution in Rust when `close()` fires:
1. Rust's state machine is destroyed mid-transition — dangling `self` reference
2. Any HTTP calls (venue.submit()) already in-flight complete, but their results are never processed
3. The Rust `Box::from_raw` calls `drop()` while another thread may be holding a reference to `core` — **use-after-free UB**
4. `_last_settled_pnl` dict is orphaned
5. The zinc plane, projection, and account state are all inconsistent with the destroyed kernel

There is no cancel/token mechanism, no pending-call queue drain, and no state flush before destroy.

**Severity: Critical**

### V7: `_last_settled_pnl` dict accessed from both `process_intent` and `on_venue_event` without locks — thread-unsafe

**File:** `rust_backend.py:440,475`

```python
# process_intent (line 440):
self._last_settled_pnl[intent.slot_id] = 0.0

# on_venue_event (line 475):
incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
```

`process_intent()` (called from the async event loop or scheduler) and `on_venue_event()` (called from the venue event stream callback) both read and write `_last_settled_pnl` without any synchronization primitive. If these run on different threads, the dict can experience:
- **Lost update**: Two concurrent writes to the same `slot_id` — one overwrites the other
- **Dict corruption**: Python dict isn't thread-safe for concurrent writes — can produce `KeyError` on iteration or silently drop entries
- **Incorrect PnL settlement**: The incremental PnL calculation uses stale values

In the current single-threaded async architecture, this isn't triggered. But the lack of protection means any future multi-threaded usage introduces a data race.

**Severity: Medium**

### V8: `#[serde(default)]` on `leverage: f64` defaults to 0.0 — `mark_price()` uses leverage directly without `.max(1.0)`, silent accounting error

**File:** `_rust_kernel/src/lib.rs:400-408`

```rust
fn mark_price(&mut self, price: f64) {
    // ...
    if self.entry_price <= 0.0 || self.size <= 0.0 { return; }
    let mut delta = (price - self.entry_price) / self.entry_price;
    if self.side == TradeSide::SHORT { delta = -delta; }
    self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage;  // uses leverage directly
}
```

The `TradeSlot` struct has `#[serde(default)] leverage: f64` which defaults to `0.0` when deserialized from JSON without a `leverage` field.

`mark_price()` uses `self.leverage` directly in `unrealized_pnl` computation — **no `.max(1.0)` guard**. If leverage is `0.0` (from missing JSON field, `set_slot_json`, or snapshot restore without leverage), `unrealized_pnl` is always `0.0` regardless of price movement.

Compare with `realized_pnl()` which correctly guards:
```rust
let notional = exit_size * slot.entry_price * slot.leverage.max(1.0);  // correct
```

The `process_intent` ENTER path handles this (sets leverage to 1.0 if ≤ 0), but `set_slot_json` and `restore_full_snapshot` can bypass this and store leverage=0.0 directly into the slot.

**Impact:** Any slot restored from a snapshot that predates the `leverage` field (or saved by a version without it) gets `leverage=0.0` silently. `unrealized_pnl` is always 0.0 — the operator sees no mark-to-market PnL even though the position has moved. This PnL is only realized on close (via `realized_pnl` which correctly uses `.max(1.0)`), so the total PnL is correct but the intra-period mark appears flat — a silent accounting error.

**Severity: Medium**

### V9: No `conftest.py`, no `pytest.ini`, no `asyncio_mode` configuration — test discovery relies on default pytest behavior

**File:** (missing — no pytest configuration in workspace)

The workspace has **zero pytest configuration files**. No `conftest.py`, `pytest.ini`, `setup.cfg`, `pyproject.toml` with pytest settings. This means:

- **`asyncio_mode`** defaults to `"strict"` (pytest 8.x+), but all test files use `asyncio.run()` inline rather than `@pytest.mark.asyncio`. In strict mode, `async def` test functions are **not discovered** unless explicitly marked. Since all current tests are `sync def` wrapping `asyncio.run()`, this works — but the discovery relies on this specific pattern holding.
- No **timeout** configuration — a hanging test blocks the entire suite until killed
- No **test filtering** markers (no `slow`, `live`, `offline` markers)
- No **shared fixtures** — each test file repeats the `_build_rb()` setup pattern
- No **cache** or **rerun** configuration — flaky tests fail the CI suite

If a developer adds a new test using `async def` + `await` (the natural pattern for async code) without `@pytest.mark.asyncio`, the test silently doesn't run — a false negative.

**Severity: High**

### V10: `kernel.close()` has `except Exception: pass` — silently swallows all destroy errors

**File:** `rust_backend.py:325`

```python
try:
    _get_rust().destroy(backend)
except Exception:
    pass  # silently swallows RuntimeError, OSError, ctypes errors
```

If `dita_kernel_destroy` fails (e.g., segfault caught by `catch_unwind` returning an error, or the shared library was unloaded), the exception is silently consumed. The caller believes the kernel was closed successfully — but the handle may still be allocated. This is a leak path that's invisible to monitoring.

**Severity: Low**

### V11: `build_launcher_bundle()` has no cleanup for partially-created components — kernel OOM orphans venue, zinc, control plane

**File:** `launcher.py:163-223`

The build order is:
1. Control plane ✅ built
2. Projection ✅ built
3. Zinc plane ✅ built
4. Venue ✅ built
5. **ExecutionKernel** — if this fails (OOM, cargo build fail, CDLL load), components 1-4 are all orphaned

There's no `try/finally` around the sequence. If `ExecutionKernel.__init__()` raises, the five already-built components (control plane, projection, zinc plane, venue, and any partially-initialized kernel state) leak. No cleanup code exists.

**Fix:** Use a context manager or `try/finally` to close created components if a later one fails.

**Severity: Medium**

### V12: `KernelResult` clones entire kernel state (all slots + indexes + AccountState) on every FFI call — performance issue

**File:** `_rust_kernel/src/lib.rs:1030-1050` (snapshot method)

Every FFI call to `process_intent()` or `on_venue_event()` returns a `KernelResult` containing a full `snapshot()`:

```rust
fn snapshot(&self) -> KernelSnapshot {
    KernelSnapshot {
        slots: self.slots.clone(),                           // O(n) clone ALL TradeSlots
        active_trade_index: self.active_trade_index.clone(), // clone entire HashMap
        venue_order_index: self.venue_order_index.clone(),
        client_order_index: self.client_order_index.clone(),
        account: self.account.clone(),                       // clone entire AccountState
    }
}
```

For a kernel with 10 slots, each with metadata, seen_event_ids (1024 IDs), nested VenueOrders, and a full AccountState with fee_config, this produces **thousands of heap allocations per FFI call**. At 10 intents/second, this is tens of thousands of allocations/second.

The Python side only reads `KernelResult.outcome` and `KernelResult.slot` (the single affected slot) from each response — **the full snapshot is never used**. The `snapshot` field is transmitted and decoded by Python (`json.loads` parses all of it) but the caller only accesses `outcome` and `slot`. The snapshot data is generated, serialized, transmitted, deserialized, and **silently discarded** on every call.

**Severity: Medium** (performance, not correctness)

### V13: `_build_rb()` leaks bundle on any post-creation failure — Shim construction failure orphans kernel and venue

**File:** `gen2.py:326-338`, `_gen_test.py:69-80`

```python
def _build_rb(ic=25000.0, max_slots=1):
    cfg = _build_config(ic)
    b = build_launcher_bundle(...)  # bundle created with active kernel, venue, etc.
    k = b.kernel
    k.account.snapshot.capital = ic  # <-- if this raises (e.g., AttributeError)
    ...  # Shim construction, etc.
    return RB(runtime=Shim(k), config=cfg)
```

If any line after `build_launcher_bundle()` raises, the bundle `b` (with live kernel, venue connections, zinc plane) is leaked. No `try/finally` to call `b.close()`. In test code this is acceptable (process exits soon), but if test suites grow large, accumulated leaked bundles exhaust kernel slots or file descriptors.

**Severity: Low**

### V14: `_maybe_close` uses `break` after first method match — only tries `close` OR `disconnect`, never both

**File:** `launcher.py:67`

```python
for method_name in ("close", "disconnect"):
    method = getattr(obj, method_name, None)
    if callable(method):
        ...
        break  # <-- breaks after first successful match — never tries the second method
```

If an object has both `close()` and `disconnect()`, only `close()` is called. The `disconnect()` fallback is never reached if `close()` exists. This is correct for objects like `RealZincPlane` (which has `close()` but no `disconnect()`), but for objects with both methods (possible future adapters), the second method is silently skipped.

**Severity: Low**

---

## Pass 19 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| V1 | `DITAv2LauncherBundle.close()` never calls `kernel.close()` — Rust handle leaks via `__del__` | Launcher | **Critical** |
| V2 | `BingxVenueAdapter` no `close()`/`disconnect()` — ThreadPoolExecutor/HTTP never release | Venue | **Critical** |
| V3 | `process_intent` ENTER doesn't clear `seen_event_ids` — old dedup pollutes new trade | Rust | **High** |
| V4 | 3 generators write same output file — last writer wins, incompatible prologues | Test | **Critical** |
| V5 | Generated tests triple env-gated — never run in CI, dead code | Test | **Critical** |
| V6 | `kernel.close()` destroys Rust handle immediately — no drain, no flush, use-after-free risk | Bridge | **Critical** |
| V7 | `_last_settled_pnl` dict accessed from process_intent and on_venue_event without locks | Bridge | Medium |
| V8 | `#[serde(default)] leverage: f64` default 0.0 — mark_price uses directly no `.max(1.0)` | Rust | Medium |
| V9 | No `conftest.py`, no `pytest.ini`, no `asyncio_mode` — test discovery fragile | Test | **High** |
| V10 | `kernel.close()` `except Exception: pass` — silently swallows destroy errors | Bridge | Low |
| V11 | `build_launcher_bundle()` no cleanup on partial failure — OOM orphans 4 components | Launcher | Medium |
| V12 | `KernelResult` clones entire kernel state every FFI call — wasted allocations | Rust | Medium |
| V13 | `_build_rb()` leaks bundle on post-creation failure | Test | Low |
| V14 | `_maybe_close` breaks after first method — never tries both close and disconnect | Launcher | Low |

### Pass 19 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 5 (V1, V2, V4, V5, V6) |
| **High** | 2 (V3, V9) |
| Medium | 4 (V7, V8, V11, V12) |
| Low | 3 (V10, V13, V14) |

### Combined Catalog (All 19 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
| **Total** | | **347** | **35** | **101** | **100** | **64** | **37** |

---

## PASS 20 — CONFIGURATION MANAGEMENT, MATH SIGN CONVENTIONS, BINGX PROTOCOL

### W1: `int()` on three env vars (`RECV_WINDOW_MS`, `DEFAULT_LEVERAGE`, `EXCHANGE_LEVERAGE_CAP`) — `ValueError` uncaught, immediate crash on non-numeric input

**File:** `launcher.py:189-191`

```python
recv_window_ms = int(os.environ.get("DOLPHIN_BINGX_RECV_WINDOW_MS", "5000"))
default_leverage = int(os.environ.get("DOLPHIN_BINGX_DEFAULT_LEVERAGE", "1"))
exchange_leverage_cap = int(os.environ.get("DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP", "3"))
```

Three consecutive `int()` calls on raw env var strings with **no try/except**. If any of these env vars is set to a non-numeric value (e.g., `DOLPHIN_BINGX_RECV_WINDOW_MS=abc` from a typo in Docker config), `int("abc")` raises `ValueError` which propagates uncaught through `build_bingx_exec_client_config()` → `build_launcher_bundle()` → crashes the process.

Compare with `DITA_V2_ACTIVE_SLOT_LIMIT` (launcher.py:140-144) which correctly wraps `int()` in `try/except Exception: pass`. The slot limit parsing is safe; these three are not.

**Severity: Critical**

### W2: `DITA_V2_PREFIX` default `"dita_v2"` — multi-process with `ZINC=REAL` causes silent shared-memory data corruption

**File:** `launcher.py:311`

```python
resolved_prefix = (prefix or os.environ.get("DITA_V2_PREFIX", "dita_v2")).strip()
```

The prefix defaults to `"dita_v2"` if not set. When two processes on the same machine use `DITA_V2_ZINC=REAL` (or one process restarts without cleaning shared memory), both try to `SharedRegion.create("dita_v2_intent", ...)`, `"dita_v2_state"`, `"dita_v2_control"`.

On Linux, shared memory segments (`/dev/shm/`) persist until explicitly unlinked or the system reboots. A second process:
- Gets `EEXIST` from `SharedRegion.create()` — which `real_zinc_plane.py`'s `__init__` does NOT handle (no `try/except`)
- Or if the region already exists with different size, get a mismatch error
- Or if the region is simply opened (not created), both processes read/write the same memory — **simultaneous writes corrupt the state**

The `_write_region` functions are non-atomic (T7), so two processes writing concurrently see partial updates.

**Severity: Critical**

### W3: Funding sign convention opposite between Python V2 `apply_funding()` and Rust `apply_funding_fee()` — same raw exchange value produces opposite capital effect

**Files:** `_rust_kernel/src/lib.rs:839-841` (Rust), `account.py:299` (Python V2)

```rust
// Rust: amount > 0 = received (capital ↑)
self.k_funding_net -= amount;
// If amount = -3.75 (paid out): k_funding_net -= (-3.75) = k_funding_net + 3.75
// k_capital = seed + realized - fees - k_funding_net = seed + realized - fees - (+3.75) = capital ↓
```

```python
# Python V2: amount > 0 = paid out (capital ↓)
self._k_funding += amount
self._k_capital = self._seed + self._k_realized - self._k_fees - self._k_funding
# If amount = -3.75 (paid out): _k_funding += (-3.75) = -3.75
# k_capital = seed + realized - fees - (-3.75) = seed + realized - fees + 3.75 = capital ↑ WRONG
```

Both use `k_capital = seed + realized - fees - funding`. But:
- **Rust**: A funding **payment** (capital decreases) is represented as a **negative** `amount`. `k_funding_net -= (-3.75) = +3.75`, then `capital - 3.75` = capital decreases. Correct.
- **Python V2**: A funding **payment** (capital decreases) is represented as a **positive** `amount`. `_k_funding += 7.25`, then `capital - 7.25` = capital decreases. Also correct for its own convention — but opposite sign convention.

**The same raw exchange value** (e.g., `funding_amount = -3.75` from BingX WS showing a funding cost):

| System | Input | `k_funding` | k_capital effect | Correct? |
|--------|-------|-------------|-----------------|----------|
| Rust | `-3.75` | `funding_net = 0 - (-3.75) = +3.75` | `capital - 3.75` (decrease) | ✅ |
| Python V2 | `-3.75` | `_k_funding = 0 + (-3.75) = -3.75` | `capital - (-3.75) = capital + 3.75` (increase) | ❌ WRONG |

The parity test (`test_exchange_event_seam_parity.py:426`) compares WS path vs Poll path — both use Python V2 `apply_funding()` with the same convention, so they match each other but **both are wrong in absolute value**. The Rust kernel produces the correct absolute value.

**Severity: Critical**

### W4: `BingxUserStream` `listenKeyExpired` frames silently swallowed — `continue` at line 272 skips the expiry check at line 275, dead code

**File:** `bingx_user_stream.py:272-276`

```python
# Line 272-276 — the main WS message dispatch:
kind = frame.get("e", "")
if kind in self._NORMALISE_MAP:
    yield self._NORMALISE_MAP[kind](frame)
else:
    continue  # <-- UNKNOWN event type → continue

# Line 275: THIS LINE IS NEVER REACHED for listenKeyExpired
if kind == "listenKeyExpired":
    raise RuntimeError("listenKeyExpired")
```

The `else: continue` at line 273 skips the `listenKeyExpired` check at line 275. When BingX sends `{"e": "listenKeyExpired"}`, the dispatch:
1. Check `kind in self._NORMALISE_MAP` — `"listenKeyExpired"` is NOT in the map
2. Falls to `else: continue` — skips the rest of the loop body
3. Line 275 is **dead code** — never reaches the `raise RuntimeError`

The stream stays connected with a dead listenKey. The keepalive loop (which runs independently) keeps sending PUT keepalive requests to the dead key. The 24-hour rotation timer eventually fires, but in the meantime (potentially hours), all WS events are silently lost.

The `raise RuntimeError("listenKeyExpired")` at line 276 was clearly intended to trigger a reconnect, but the `continue` before it makes it unreachable.

**Severity: Critical**

### W5: `int()` on `DOLPHIN_BINGX_RECV_WINDOW_MS` with no bounds check — extreme values can enable replay attacks

**File:** `launcher.py:189`

```python
recv_window_ms = int(os.environ.get("DOLPHIN_BINGX_RECV_WINDOW_MS", "5000"))
```

The recv window is used in HMAC-signed BingX requests as the `recvWindow` parameter. It defines the timestamp tolerance for signed requests — how far off the request timestamp can be from the server's clock.

A value like `86400000` (24 hours) means any signed request is valid for 24 hours from its timestamp. An attacker who intercepts a signed request can replay it for an entire day.

There's no upper bound. The code only clamps `max(1, recv_window_ms)` — so the minimum is 1ms but the maximum is unbounded.

**Severity: High**

### W6: `DITA_V2_ACTIVE_SLOT_LIMIT` stored in control snapshot but never enforced by Rust kernel — dead config value

**File:** `launcher.py:140-144` (read and stored), `_rust_kernel/src/lib.rs` (never checked)

```python
# launcher.py: stored in control snapshot
raw = os.environ.get("DITA_V2_ACTIVE_SLOT_LIMIT")
if raw is not None:
    fields["active_slot_limit"] = max(1, int(str(raw).strip()))
```

The env var is read, parsed, clamped, and stored into the control plane's `ControlSnapshot`. But the Rust kernel allocates `max_slots` slots at construction (from `ExecutionKernel.__init__`'s `max_slots` parameter) and **never reads** the `active_slot_limit` from the control snapshot.

The `active_slot_limit` field is written to projections (`hazelcast_projection.py:41` writes `control.as_dict()` which includes `active_slot_limit`) and visible in the control plane state, but the Rust kernel never limits slot usage based on it. An algorithm could send ENTER intents to any slot up to `max_slots` regardless of the configured limit.

**Severity: High**

### W7: No fill/trade history fetched during WS reconnect gap-backfill — fills during disconnect window permanently lost

**File:** `bingx_user_stream.py:117-121`

```python
try:
    snap = await self.account_snapshot()
    yield snap
except Exception as exc:
    log.warning("bingx_user_stream: gap-backfill REST failed: %s", exc)
```

`account_snapshot()` (lines 153-219) fetches:
- `GET /openApi/swap/v3/user/balance` — wallet balance, available margin
- `GET /openApi/swap/v2/user/positions` — open positions with entry price, qty

It does **NOT** fetch:
- `GET /openApi/swap/v2/trade/fill/history` — fill history during the gap

If a LIMIT order filled during the reconnect window (e.g., a resting limit order that was placed before disconnect and filled while the WS was down), the fill event is permanently lost. The balance snapshot shows the *result* (changed wallet balance), but no fill event with `price`, `qty`, `fee`, `realized_pnl` is emitted. The kernel processes only an `ACCOUNT_UPDATE`, missing the individual fill details.

Additionally, funding fee events accrued during the disconnect are invisible. The balance reflects them, but no `FUNDING_FEE` event arrives. The kernel's `k_funding_net` drifts until the next explicit funding event.

**Severity: High**

### W8: `BingxVenueAdapter` rate limit detection fails on HTTP 429 without matching message body — `_rate_limit_retry_after_ms` returns 0, instant retry

**File:** `bingx_venue.py:169-183`

The rate limit detection has three paths:
1. **Response header** (`retryAfter`, `retry_after_ms`, `retryAfterMs`) — extracted from response dict
2. **Error message regex** — `re.search(r"unblocked after (\d+)", msg)` on the exchange error text
3. **Return 0** — everything else

If BingX returns HTTP 429 with a message body that doesn't contain the phrase `"unblocked after"` (e.g., a generic `"too many requests"` or a localized message), the regex misses and returns 0. The caller then retries **immediately**, burning more rate limit quota.

The `BingxHttpError` catch at line 316-317 catches all HTTP errors (including 429) and converts them to `{"status": "REJECTED", ...}` — the `REJECTED` tag prevents the kernel from recognizing it as `RATE_LIMITED`. The rate-limit detection is entirely dependent on the error message body format, not the HTTP status code.

**Fix:** Check for HTTP 429 status code first, then fall back to message parsing.

**Severity: High**

### W9: `DITA_V2_CONTROL_PLANE=REAL_ZINC` silently falls back to in-memory on any exception — operator thinks they have persistence but don't

**File:** `control.py:205-212`

```python
if env_choice in {"REAL", "REAL_ZINC", "SHARED", "SHARED_MEM"}:
    try:
        from .real_control_plane import RealZincControlPlane
        plane = RealZincControlPlane(prefix=prefix, create=True)
    except Exception:
        pass  # <-- silent fallback, no log
return ZincControlPlane(snapshot=snapshot)  # in-memory fallback
```

If `RealZincControlPlane()` raises (Zinc library not installed, shared memory creation fails, permission denied), the exception is silently swallowed and the function returns an `InMemoryControlPlane`. No log, no warning, no diagnostic. The operator configured persistent shared-memory control plane but gets ephemeral in-memory.

Compare with `build_launcher_bundle()`'s `_build_zinc_plane()` (launcher.py:122-125) which also silently falls back — same pattern.

**Severity: High**

### W10: `BingxVenueAdapter` HTTP error handling maps ALL `BingxHttpError` to `"REJECTED"` — cannot distinguish "order not found" from "exchange is down"

**File:** `bingx_venue.py:316-317`

```python
except BingxHttpError as exc:
    response = {"status": "REJECTED", "msg": str(exc), ...}
```

Every `BingxHttpError` is mapped to `status="REJECTED"`. A 500 Internal Server Error, a 403 forbidden, a 404 order-not-found — all become "REJECTED". The Rust kernel treats "REJECTED" as a specific FSM signal. It cannot distinguish "this cancel was rejected because the order doesn't exist" (harmless, order already cancelled) from "the exchange is returning 500 errors" (system-wide failure, should halt trading).

**Impact:** If BingX has a transient 500 error, every submit/cancel in that window returns "REJECTED". The kernel may interpret this as genuine order rejections and transition the FSM to CLOSED or trigger cancels, even though the orders may have actually gone through.

**Severity: High**

### W11: `DOLPHIN_BINGX_API_KEY` accessed via bracket `os.environ["BINGX_API_KEY"]` in generated tests — `KeyError` crash if unset; inconsistent with launcher's `.get()` which silently passes `None`

**Files:** `gen2.py:320`, `gen_live_tests.py:116-117`, `_gen_test.py:60`

```python
# Generated test code (all three generators):
BINGX_API_KEY = os.environ["BINGX_API_KEY"]     # bracket access — KeyError if unset
BINGX_SECRET_KEY = os.environ["BINGX_SECRET_KEY]"
```

```python
# launcher.py:195-196 — different access pattern:
api_key=os.environ.get("BINGX_API_KEY"),          # .get() — returns None if unset
secret_key=os.environ.get("BINGX_SECRET_KEY"),
```

The test generators use **bracket access** (`os.environ["KEY"]`) which raises `KeyError` instantly if the env var is missing. The launcher uses **`.get()` access** (`os.environ.get("KEY")`) which silently returns `None`.

This means:
- Running generated tests without env vars → `KeyError` crash at module import time
- Running the launcher without env vars → `None` silently passed to `BingxExecClientConfig` → delay failure until first HTTP call (confusing 401)

Two different failure modes for the same missing configuration. The launcher should validate `None` immediately.

**Severity: High**

### W12: `MockVenueScenario` has no `rate_limit` flag — entire RATE_LIMITED code path untested in CI

**File:** `mock_venue.py:27-35`

```python
@dataclass
class MockVenueScenario:
    reject_entries: bool = False
    reject_exits: bool = False
    reject_cancels: bool = False
    all_fills_partial: bool = False
    # NOTE: no rate_limit field
```

The `MockVenueScenario` dataclass has flags for rejection and partial fill simulation but **no `rate_limit` flag**. The entire `RATE_LIMITED` code path — in the Python adapter (`bingx_venue.py:384-396` detects retry-after, tags event as RATE_LIMITED) and in the Rust kernel (the FSM match arm for `KernelEventKind::RATE_LIMITED`) — has zero simulation in mock venue tests.

Only a live BingX connection can trigger the rate-limit path, and the live tests are triple env-gated (V5) and never run in CI.

**Severity: Medium**

### W13: `_rate_limit_retry_after_ms` regex uses English phrase `"unblocked after"` — non-portable, fails on localized exchange messages

**File:** `bingx_venue.py:177`

```python
m = re.search(r"unblocked after (\d+)", msg)
```

The regex relies on the English phrase `"unblocked after"` in the exchange's error message. BingX is a Chinese exchange. If the error response is localized to Chinese (e.g., `"解封后(\d+)毫秒"`), or if BingX changes their English wording, the regex silently misses and returns 0 (instant retry).

**Fix:** Prioritize the `Retry-After` HTTP response header or JSON field `retryAfter`/`retry_after_ms` over parsing the error message body.

**Severity: Medium**

### W14: `DITA_V2_ACTIVE_SLOT_LIMIT` invalid values silently discarded with no logging — operator sets `"abc"`, gets default with no warning

**File:** `launcher.py:140-144`

```python
raw = os.environ.get("DITA_V2_ACTIVE_SLOT_LIMIT")
if raw is not None:
    try:
        fields["active_slot_limit"] = max(1, int(str(raw).strip()))
    except Exception:
        pass  # no log, no warning
```

If the operator sets `DITA_V2_ACTIVE_SLOT_LIMIT=abc`, the `int()` raises `ValueError`, the `except` swallows it, and the field is never written to `fields`. The slot limit silently uses the control plane default (10). No log, no warning, no error — the operator thinks they configured a limit but the config was silently ignored.

**Severity: Medium**

---

## Pass 20 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| W1 | `int()` on 3 env vars uncaught `ValueError` — non-numeric input crashes process | Config | **Critical** |
| W2 | `DITA_V2_PREFIX` default `"dita_v2"` — multi-process shared memory corruption | Config | **Critical** |
| W3 | Funding sign opposite Python V2 vs Rust — same raw value opposite capital effect | Accounting | **Critical** |
| W4 | `listenKeyExpired` frames silently swallowed — `continue` skips expiry check, dead code | Venue | **Critical** |
| W5 | `RECV_WINDOW_MS` no upper bound — extreme values enable replay attacks | Config | **High** |
| W6 | `ACTIVE_SLOT_LIMIT` stored but never enforced by Rust kernel — dead config | Config | **High** |
| W7 | No fill history fetched during WS reconnect gap-backfill — fills permanently lost | Venue | **High** |
| W8 | Rate limit detection fails on HTTP 429 without matching message — returns 0 instant retry | Venue | **High** |
| W9 | `CONTROL_PLANE=REAL_ZINC` silently falls back to in-memory — no persistence | Config | **High** |
| W10 | All `BingxHttpError` mapped to "REJECTED" — can't distinguish errors from real rejections | Venue | **High** |
| W11 | `os.environ["KEY"]` bracket access in tests vs `.get()` in launcher — inconsistent | Test | **High** |
| W12 | `MockVenueScenario` no `rate_limit` flag — RATE_LIMITED path untested in CI | Test | Medium |
| W13 | Rate-limit regex uses English phrase `"unblocked after"` — non-portable | Venue | Medium |
| W14 | Invalid `ACTIVE_SLOT_LIMIT` values silently discarded — no log, no warning | Config | Medium |

### Pass 20 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 4 (W1, W2, W3, W4) |
| **High** | 7 (W5, W6, W7, W8, W9, W10, W11) |
| Medium | 3 (W12, W13, W14) |

### Combined Catalog (All 20 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
| **Total** | | **361** | **39** | **108** | **103** | **64** | **37** |

---

## PASS 21 — RUST BUILD/DEPS, PYTHON PACKAGING, SHARED MEMORY PROTOCOL

### X1: Critical — No ABI compatibility check on Rust `.so` load — stale/wrong binary can crash or silently corrupt state

**File:** `rust_backend.py:86-92`

```python
path = _ensure_library()
self.lib = ctypes.CDLL(str(path))
```

The Python code loads whatever `.so`/`.dylib` exists at the computed path with **zero verification**. Problems:

1. **No Rust version check**: If the `.so` was built with a different Rust compiler version that changed struct layout, data is silently corrupted.
2. **No recompile-on-version-mismatch**: If `Cargo.lock` is updated, the old `.so` is used until manually deleted.
3. **No hash/checksum**: No mechanism to detect stale binary, wrong branch, or tampering.
4. **No `#[repr(C)]` on internal types**: Only `KernelHandle` has `#[repr(C)]`. Serde JSON is the FFI wire format, which is type-safe, but the `Box::from_raw(handle)` in `dita_kernel_destroy` assumes exact same memory layout.

**Severity: Critical**

### X2: Critical — `real_zinc_plane._write_region()` zeroes entire buffer before writing — visible all-zero window, inconsistent with real_control_plane

**File:** `real_zinc_plane.py:258-260`

```python
# real_zinc_plane.py — zero THEN write:
view[:] = b"\x00" * len(view)          # Zero entire 1MB buffer
view[:len(packet)] = packet            # Then write packet
```

```python
# real_control_plane.py — write THEN zero tail:
view[:len(packet)] = packet            # Write packet first
view[len(packet):] = b"\x00" * (len(view) - len(packet))  # Then zero tail
```

Two different implementations for the same operation. The zinc plane zeros the full buffer (1MB allocation and memcpy) **before** writing the packet. During the window between zero and write, a concurrent reader sees all zeros → `_decode_packet` returns `{}` (empty dict). Reader gets stale/wrong state.

The control plane correctly writes the packet first then zeros the tail — minimizing the visible window and avoiding the wasteful full-buffer zero.

Additionally, the full-buffer zero `b"\x00" * 1MB` allocates and copies 1MB for every write, even though the packet is typically <1KB. Performance issue.

**Severity: Critical**

### X3: Critical — No `requirements.txt`, `setup.py`, or `pyproject.toml` — zero Python dependency declarations

**File:** (missing — workspace root)

The workspace has **no Python dependency management files at all**. No `requirements.txt`, `setup.py`, `setup.cfg`, `pyproject.toml`, `Pipfile`, or `poetry.lock`.

**Undocumented external dependencies:**
- `aiohttp` — used by `bingx_user_stream.py`
- `requests` — used by `gen_live_tests.py`
- `python-dotenv` — used by `launcher.py`
- `pytest` — used by all test files and generators
- `zinc` (SharedRegion C extension) — used by `real_zinc_plane.py`, `real_control_plane.py`
- `prod.bingx.*` — 3+ modules outside workspace
- `prod.clean_arch.*` — 5+ modules outside workspace

Without a requirements file:
- No pinned versions → build non-reproducible
- `pip install` on a fresh machine installs only what happens to be present
- Version conflicts between environments cause silent behavior changes
- CI cannot install dependencies deterministically

**Severity: Critical**

### X4: High — `RealZincControlPlane.update()` has no thread lock — concurrent calls corrupt sequence number and shared memory

**File:** `real_control_plane.py:98-99`

```python
# No lock on RealZincControlPlane (unlike RealZincPlane which has self._lock)
def update(self) -> None:
    self._seq += 1                       # race: two threads read 5, both write 6
    self._write_region(self._seq, self._snapshot.as_dict())  # race: both write seq=6
```

`RealZincPlane` (real_zinc_plane.py:154) has a `threading.Lock` and uses `with self._lock:` around all write operations. `RealZincControlPlane` has **no lock**. If two threads call `update()` simultaneously:

1. Both read `self._seq = 5`, both increment to `6`, both write with `seq=6` → one write is lost
2. Both call `_write_region` simultaneously → concurrent writes to shared memory → data corruption
3. Sequence number jumps: two calls, sequence goes `5→6` with only one write visible

**Severity: High**

### X5: High — `libc` declared in `Cargo.toml` but never used — dead dependency

**File:** `_rust_kernel/Cargo.toml:8`

```toml
[dependencies]
libc = "0.2"
```

The `libc` crate is declared as a dependency but `grep 'libc' src/lib.rs` returns **zero matches**. The code uses `std::ffi::{c_char, CStr, CString}` from the standard library (stable since Rust 1.64), not `libc::c_char`.

Not harmful at runtime (compiler optimizes it out), but:
- Dead dependency to maintain (version bumps, audit)
- Adds to supply chain attack surface
- Indicates refactoring residue from an earlier version that used `libc` types directly

**Severity: High**

### X6: High — 5 test files use hardcoded `sys.path.insert(0, "/mnt/dolphinng5_predict")` — non-portable, environment-specific path

**Files:** `test_flaws.py:13`, `test_account_core_v2.py:16`, `test_account_reconcile_faults.py:15`, `test_alpha_blue_untouched_g7.py:13`, `test_exchange_event_seam_parity.py` (similar)

Every mock-venue test file prepends `/mnt/dolphinng5_predict` to `sys.path` using a **hardcoded absolute path**. This path is specific to the current deployment machine. On any other machine, these tests fail with `ModuleNotFoundError` for `prod.*` imports.

The `real_zinc_plane.py:13-14` also adds a Zinc adapter path using `Path(__file__).resolve().parents[3]` which is relative (better) but assumes a rigid directory structure.

**Severity: High**

### X7: High — Shared memory `_decode_packet()` has no try/except on `json.loads` — partial body read causes unhandled `JSONDecodeError`, crashes reader

**File:** `real_zinc_plane.py:120-130`, `real_control_plane.py:53-63`

```python
def _decode_packet(buf: memoryview) -> Dict[str, Any]:
    if len(buf) < 16: return {}
    seq, size = struct.unpack_from("!QQ", buf, 0)
    if size <= 0 or size > len(buf) - 16: return {}
    payload = bytes(buf[16 : 16 + size]).decode("utf-8")
    out = json.loads(payload)       # NO try/except — crash on partial body
    if isinstance(out, dict): out["_seq"] = seq
    return out
```

If a reader reads the shared memory at the exact moment when the 16-byte header is written but the JSON body is partially written (or not yet written), `json.loads()` receives truncated data and raises `json.JSONDecodeError`. This is **not caught** — the exception propagates up through all read paths:

- `RealZincPlane.read_slots()` → crash
- `RealZincPlane.read_intents()` → crash
- `RealZincPlane.read_control()` → crash
- `RealZincControlPlane.read()` → crash
- `RealZincPlane.__init__` open path → crash during init

The header size check (`size > len(buf) - 16`) prevents reading beyond buffer bounds, but it doesn't prevent reading incomplete body data. The writer writes header+body in a single memcpy, so on x86-64 this is unlikely — but on ARM or under heavy memory pressure, the writes can be observed in any order.

**Severity: High**

### X8: High — `ExchangeEvent` and `ExchangeEventKind` not exported from `__init__.py` — package API inconsistency

**File:** `__init__.py:44-88`

The `__init__.py` exports 45+ names from 12 sub-modules but does **not** export `ExchangeEvent`, `ExchangeEventKind`, or `ExchangePosition`. Consumers import them directly via the raw module path:
```python
from prod.clean_arch.dita_v2.exchange_event import ExchangeEvent
```

This is a package hygiene violation. `mypy --strict` flags this. IDE autocomplete fails for these types. If the module is restructured (e.g., `exchange_event.py` renamed to `seam.py`), all direct imports break silently.

**Severity: High**

### X9: Medium — No MSRV (`rust-version`) in `Cargo.toml`, no `rust-toolchain.toml` — builds differ per Rust version

**File:** `_rust_kernel/Cargo.toml`

```toml
[package]
edition = "2021"
# NO rust-version field
```

No `rust-toolchain.toml`, no CI config to pin a Rust version. `cargo build` uses whatever Rust version is on the builder. Cross-machine, cross-developer, cross-deployment builds can produce different binaries.

The code uses `std::ffi::c_char` (stabilized in Rust 1.64), so building with <1.64 fails. But any version >=1.64 could produce slightly different codegen — and more importantly, if the `.so` from one Rust version is loaded into a Python process that built it with a different Rust version, the ABI may differ.

**Severity: Medium**

### X10: Medium — RealZincPlane and RealZincControlPlane both use `{prefix}_control` region name — collision when both are REAL

**Files:** `real_zinc_plane.py:153`, `real_control_plane.py:72`

```python
# real_zinc_plane.py:
self.control_region = SharedRegion.create(f"{base}_control", 4096)

# real_control_plane.py (via region_name):
self.region = SharedRegion.create(f"{base}_control", ...)
```

When both `DITA_V2_ZINC=REAL` and `DITA_V2_CONTROL_PLANE=REAL` are set, the launcher creates both `RealZincPlane(prefix="dita_v2")` and `RealZincControlPlane(prefix="dita_v2")`. Both create/open a shared memory region named `"dita_v2_control"`. They write different payload structures to the same region — one overwrites the other's data.

**Severity: Medium**

### X11: Medium — Sequence number (`_seq`) is decoded and injected into output dict but never read by any consumer — transmitted waste

**Files:** `real_zinc_plane.py:128`, `real_control_plane.py:61`

```python
out["_seq"] = seq  # written to output dict
```

The sequence number is packed into the 16-byte header, transmitted, decoded, and injected into the output dict — but **no consumer ever reads `"_seq"`**:
- `RealZincPlane.read_slots()` reads `payload.get("slots", [])` — ignores `_seq`
- `RealZincPlane.read_intents()` reads `payload.get("items", [])` — ignores `_seq`
- `RealZincControlPlane.read()` reads `payload.get("control")` — ignores `_seq`

No gap detection, no staleness check, no ordering verification. The sequence number is dead data on the wire.

**Severity: Medium**

### X12: Medium — `_maybe_close()` uses `ThreadPoolExecutor` + `result(timeout=10.0)` — `TimeoutError` unhandled, strand coroutine

**File:** `launcher.py:63-65`

```python
pool = concurrent.futures.ThreadPoolExecutor(max_workers=1)
fut = pool.submit(asyncio.run, result)
try:
    fut.result(timeout=10.0)       # TimeoutError if >10s
except Exception:
    pass                           # catches TimeoutError but coroutine still running
```

If the async `close()`/`disconnect()` coroutine takes longer than 10 seconds, `fut.result(timeout=10.0)` raises `TimeoutError`. The `except Exception: pass` catches it — but the coroutine is **still running** in the thread pool. When the coroutine eventually completes, it writes to a `self._closed` event or similar attribute on an object that the caller has already forgotten about.

On every `_maybe_close` call, a new `ThreadPoolExecutor(1)` is created. If multiple components are closed in sequence, multiple executors are created and never shut down (the `with` block is missing — each `_maybe_close` call creates an executor that's never `.shutdown()`).

**Severity: Medium**

### X13: Medium — `__init__.py` re-exports 45 names from 12 modules — flat namespace risks naming collisions

**File:** `__init__.py:44-88`

The `__init__.py` flattens all imports into a single namespace. Examples:
- `BingxVenueAdapter` (from `.bingx_venue`) and `MockVenueAdapter` (from `.mock_venue`) — no collision
- `RealZincPlane` (from `.real_zinc_plane`) and `RealZincUnavailable` (from both `.real_zinc_plane` and `.real_control_plane` via alias) — **the alias `RealZincControlUnavailable` avoids this but shows the risk**

If any two sub-modules export the same name, the second import silently overwrites the first. No warning is raised.

**Severity: Medium**

### X14: Medium— `real_control_plane.close()` is not idempotent, no `_closed` guard — double-close depends on C extension behavior

**File:** `real_control_plane.py:85-86`, `real_zinc_plane.py:187-190`

```python
# Both implementations:
def close(self) -> None:
    self.intent_region.close()    # or self.region.close()
    # no _closed flag, no guard
```

Neither `RealZincPlane.close()` nor `RealZincControlPlane.close()` has a `_closed` guard. Calling `close()` twice calls `SharedRegion.close()` twice on the same region. The Zinc library's C extension behavior on double-close is unknown — it could segfault (use-after-free pattern common with C extensions) or silently return successfully. No Python-side protection.

Additionally, `close()` does not clear Python-level caches (`_slot_cache`, `_intent_cache`, `_control_cache`). After closing, stale data is still accessible from the cache.

**Severity: Medium**

---

## Pass 21 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| X1 | No ABI compatibility check on Rust `.so` load — stale binary corrupts silently | Bridge | **Critical** |
| X2 | `real_zinc_plane._write_region()` zeroes entire buffer before write — visible all-zero window | Plane | **Critical** |
| X3 | No `requirements.txt`/`setup.py`/`pyproject.toml` — zero Python dependency declarations | Build | **Critical** |
| X4 | `RealZincControlPlane.update()` no thread lock — concurrent calls corrupt seq and shared memory | Plane | **High** |
| X5 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | **High** |
| X6 | 5 test files use hardcoded `sys.path.insert(0, "/mnt/dolphinng5_predict")` — non-portable | Test | **High** |
| X7 | `_decode_packet()` no try/except on `json.loads` — partial body read crashes reader | Plane | **High** |
| X8 | `ExchangeEvent`/`ExchangeEventKind` not exported from `__init__.py` — package API inconsistency | Bridge | **High** |
| X9 | No MSRV or `rust-toolchain.toml` — builds differ per Rust version | Rust | Medium |
| X10 | `RealZincPlane` and `RealZincControlPlane` collide on `{prefix}_control` region name | Plane | Medium |
| X11 | Sequence number decoded but never read by any consumer — dead data on wire | Plane | Medium |
| X12 | `_maybe_close()` `fut.result(timeout=10.0)` — `TimeoutError` leaves coroutine stranded, executor leaks | Launcher | Medium |
| X13 | `__init__.py` flat re-exports 45 names — naming collision risk | Bridge | Medium |
| X14 | `close()` not idempotent on RealZincPlane/RealZincControlPlane — double-close risk | Plane | Medium |

### Pass 21 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 3 (X1, X2, X3) |
| **High** | 5 (X4, X5, X6, X7, X8) |
| Medium | 6 (X9, X10, X11, X12, X13, X14) |

### Combined Catalog (All 21 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
| **Total** | | **375** | **42** | **113** | **109** | **64** | **37** |

---

## PASS 22 — SERDE FIELD-BY-FIELD ROUND-TRIP, MOCK VENUE FIDELITY GAPS, PROTOCOL CONSISTENCY

### Y1: `VenueEvent` in Rust kernel `exit_leg_ratios` field — serde default `[]` vs struct default `vec![1.0]` — mismatch produces wrong ratio on JSON-deserialized slots

**File:** `_rust_kernel/src/lib.rs:346-400` (TradeSlot)

```rust
// serde default — when field is missing from incoming JSON:
#[serde(default)] exit_leg_ratios: Vec<f64>
// → Default::default() produces [] (empty vec)

// struct Default impl:
impl Default for TradeSlot { ... exit_leg_ratios: vec![1.0] ... }
```

When a `TradeSlot` is deserialized from JSON without an `exit_leg_ratios` field:
- `serde(default)` gives `[]` (empty)
- The struct's own `Default` gives `vec![1.0]`

The `next_exit_ratio()` function uses `.unwrap_or(1.0)` so an empty vec returns `1.0` — same as `[1.0]` would at `active_leg_index=0`. But if `active_leg_index` were ever non-zero during deserialization (possible with manually constructed or restored slots), the empty vec would silently return `1.0` for every subsequent leg instead of the correct ratio.

**Trigger scenario:** A slot restored from a snapshot that was saved with `active_leg_index=2` and 3 leg ratios, but the JSON has no `exit_leg_ratios` field (e.g., from an older version). `active_leg_index=2`, `exit_leg_ratios=[]` → `exit_leg_ratios.get(2) = None` → `unwrap_or(1.0)` → wrong ratio for leg 3.

**Severity: Medium**

### Y2: `KernelIntent.slot_id` is `i64` while `TradeSlot.slot_id` and `KernelTransition.slot_id` are `usize` — signed/unsigned inconsistency within Rust

**File:** `_rust_kernel/src/lib.rs:419,346,470`

```rust
pub struct KernelIntent {
    pub slot_id: i64,       // signed — can be negative
    ...
}

pub struct TradeSlot {
    pub slot_id: usize,     // unsigned
    ...
}

pub struct KernelTransition {
    pub slot_id: usize,     // unsigned
    ...
}
```

`KernelIntent` (input from Python) uses `i64`, while `TradeSlot` and `KernelTransition` use `usize`. The kernel guards with `if slot_id < 0` before casting to `usize` in `resolve_slot()`, so a negative `slot_id` is correctly rejected. But this inconsistency means:
- A large `slot_id` from Python (> 2^63-1) can't be represented in `i64` → serde error or truncation
- A very large `slot_id` in a `TradeSlot` (> 2^63-1 on 64-bit) can't be represented in `KernelResult.outcome.slot_id` (which is `usize` → back to Python as int, fine) or `VenueEvent.slot_id` (`i64` — overflow risk)

Python `int` is arbitrary precision, so it can send any value. The `i64` vs `usize` inconsistency means the boundary between input and internal types has a silent truncation risk for unrealistic slot counts (>9 quintillion).

**Severity: Low**

### Y3: `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` — third-party JSON producers without `stage` field get wrong default

**File:** `_rust_kernel/src/lib.rs:419` (KernelIntent), `contracts.py:225` (Python KernelIntent)

```rust
// Rust serde default — when stage is absent from incoming JSON:
#[serde(default)] stage: TradeStage,
// → Default::default() → TradeStage::IDLE
```

```python
# Python dataclass default:
stage: TradeStage = TradeStage.INTENT_CREATED   # different!
```

The Python `_intent_to_payload()` always explicitly writes `"stage": intent.stage.value`, so this mismatch never triggers in normal operation. But if a third-party JSON producer (future REST API, another kernel, a test helper) sends a `KernelIntent` without a `stage` field, Rust interprets it as `IDLE` instead of `INTENT_CREATED`.

The kernel's FSM logic doesn't use `stage` for any decision — it's only recorded in transitions and the outcome. So functionally both `IDLE` and `INTENT_CREATED` produce identical behavior. But a monitoring dashboard that displays `stage=IDLE` on a brand-new intent would be confusing.

**Severity: Low**

### Y4: `AccountState` injects JSON key `"k_net_fees"` as duplicate of serde-serialized `k_fees_paid` — two names for same value

**File:** `_rust_kernel/src/lib.rs:1088-1092`

```rust
// Inside on_account_event — manually injects a duplicate key:
obj.insert("k_net_fees".to_string(), json!(self.account.k_fees_paid));
```

The `AccountState` struct already has `k_fees_paid: f64` which is serialized by `#[derive(Serialize)]`. Then the `on_account_event` handler manually injects `"k_net_fees"` with the same value from `self.account.k_fees_paid`.

Python reads `"k_fees_paid"` (rust_backend.py:907) — never reads `"k_net_fees"`. The injected key is dead data on the wire. If Python were ever changed to look for `"k_net_fees"`, it would find the same value — but the dual naming creates confusion about which key is canonical.

**Severity: Low**

### Y5: `await asyncio.sleep(0.8)` in every generated test body — timing-dependent false negatives on slow CI, and false positives when fills arrive late

**File:** (all generated test bodies across `gen2.py`, `gen_live_tests.py`, `_gen_test.py`)

Every generated test body follows the pattern:
```python
_si(k, E.ENTER, tid, sym, "LONG", p, 0.001); await asyncio.sleep(0.8)
```

The `0.8` second sleep assumes the mock venue (or live exchange) produces the fill event within 0.8 seconds. On a loaded CI system:
- Mock venue processes fills synchronously in the same event-loop iteration, so `0.8` is always enough
- But live exchange with real latency → the fill may arrive after `0.8s` → the EXIT intent hits `SLOT_BUSY` → test fails

This is flaw S4 but the actual number of affected tests: **all ~295 generated test scenarios** across all three generators use this pattern. Not a separate finding, just noting the scale.

**Severity: Critical** (already logged as S4)

### Y6: MockVenueAdapter has no `rate_limit` flag — RATE_LIMITED code path in both Python bridge and Rust kernel has zero test coverage

**File:** `mock_venue.py:27-35` (MockVenueScenario)

```python
@dataclass(frozen=True)
class MockVenueScenario:
    reject_entries: bool = False
    reject_exits: bool = False
    reject_cancels: bool = False
    all_fills_partial: bool = False
    # NOTE: no rate_limit field
```

The `MockVenueScenario` dataclass has flags for rejection and partial fill simulation but **no `rate_limit` flag**. The real adapter (`BingxVenueAdapter`) produces `RATE_LIMITED` venue events from three code paths:
1. `_events_from_submit()` — when receipt status is `"RATE_LIMITED"` or `"THROTTLED"`
2. `_events_from_cancel()` — same status check
3. `_http_error_status()` — maps HTTP 429, 5xx, transport errors to RATE_LIMITED

The Rust kernel has a full FSM path for `RATE_LIMITED`: `accepted=false, diagnostic_code=RATE_LIMITED`. This path has never fired in any test. Adding a `rate_limit: bool = False` flag to `MockVenueScenario` and a corresponding branch in `submit()`/`cancel()` would cost ~10 lines and enable testing the entire RATE_LIMITED pipeline.

**Severity: High** (already logged as W12, but worth noting the specific gap)

### Y7: MockVenueAdapter `reconcile()` returns `[]` always — cannot simulate late fills, stale orders, or exchange-state divergence

**File:** `mock_venue.py:150-155`

```python
def reconcile(self) -> List[VenueEvent]:
    return []  # always returns empty
```

The real adapter's `reconcile()` calls `_backend_snapshot()` which fetches the full exchange state (open orders, fills, balance) and compares against known state. This is how late fills are detected — an exchange response includes fills that occurred during a WS disconnect window.

The mock's `reconcile()` returns nothing. Any code path that depends on `reconcile()` to discover fills (flaw B1: "no fill history fetched during WS reconnect gap-backfill") is completely untested with the mock.

**Severity: High**

### Y8: MockVenueAdapter emits exactly one fill per `submit()` — cannot test multi-partial-fill accumulation for LIMIT orders

**File:** `mock_venue.py:87-120`

```python
def submit(self, intent: Intent) -> List[VenueEvent]:
    ...
    if scenario.emit_fill_on_submit or fill_ratio > 0:
        events.append(self._event_from_order(..., kind=FULL_FILL if ratio >= 1.0 else PARTIAL_FILL, ...))
    return events  # at most one fill event
```

The mock emits **exactly one** fill event per `submit()` call. Real exchange behavior (especially for LIMIT orders) involves:
1. ORDER_ACK — order accepted by exchange
2. PARTIAL_FILL — first 50% fills at limit price
3. PARTIAL_FILL — second 50% fills at a better price
4. FULL_FILL — remaining fills

The Rust kernel's `apply_fill` has a full incremental accumulation path (`prev_filled + fill_size`). This path is only tested with a single fill per submit. The multi-partial-fill lifecycle is completely untested.

**Severity: High**

### Y9: MockVenueAdapter has no `connect()` method — if `VenueAdapter` protocol gains this requirement (flaw T6), mock fails at runtime

**File:** `mock_venue.py` (entire file — no `connect()` method)

```python
class MockVenueAdapter:
    # No connect() method defined anywhere
```

Flaw T6 identifies that `VenueAdapter` protocol is missing `connect()`/`disconnect()`. The test infrastructure (`_build_pink_extended.py`'s `Shim` class) calls `self.kernel.venue.connect()`. If `BingxVenueAdapter` gets a `connect()` method (as it should, per flaw V2 fix), but `MockVenueAdapter` does not, any test using the mock venue will raise `AttributeError` at `connect()` time.

**Severity: High**

### Y10: Unused `libc` crate declared in `Cargo.toml` — dead dependency with zero code references

**File:** `_rust_kernel/Cargo.toml:8`, `_rust_kernel/src/lib.rs`

```toml
[dependencies]
libc = "0.2"
```

`grep 'libc' src/lib.rs` returns **zero matches**. The code uses `std::ffi::{c_char, CStr, CString}` from the Rust standard library (stable since Rust 1.64). The `libc` crate is not imported, not used, and serves no purpose.

This is dead weight:
- Adds to compile time and dependency graph
- Version bumps need maintenance
- Theoretical supply-chain risk (crate could be compromised)
- Indicates refactoring residue from an earlier version that used `libc::c_char` directly

**Severity: Medium**

### Y11: MockVenueAdapter has no `close()` method — bundle lifecycle cleanup can't properly release mock resources

**File:** `mock_venue.py` (entire file — no `close()` method)

Even though `MockVenueAdapter` has no actual resources to release (no thread pool, no HTTP connections, no shared memory), the `DITAv2LauncherBundle.close()` method calls `_maybe_close(self.venue)` which tries `obj.close()` → `AttributeError` (caught), then `obj.disconnect()` → `AttributeError` (caught).

The error is silently swallowed (flaw V2 fix), but the missing method means:
- If a future version adds resources to `MockVenueAdapter` (e.g., a mock thread pool for testing thread safety), the leak won't be detected
- The mock can't be used to test the bundle lifecycle cleanup path
- `_maybe_close` exception handling is exercised but silently

**Severity: Medium**

### Y12: MockVenueAdapter fills are synchronous — `process_intent()` returns with fill event already in `emitted_events` — masks async timing bugs

**File:** `mock_venue.py:87-120`

The mock's `submit()` returns a list of `VenueEvent`s that includes the fill event in the **same call**. The Python bridge appends these to the `KernelResult.outcome.emitted_events` list. The kernel returns from `process_intent()` with the fill already applied to the slot.

With a real exchange (or even a realistic mock), fills arrive via a separate `on_venue_event()` call — potentially hundreds of milliseconds after `process_intent()` returns. This means:
- The kernel makes FSM decisions in `on_venue_event()` that the mock never exercises (e.g., the TERMINAL_STATE guard, SLOT_BUSY check for concurrent intents)
- Tests that check `emitted_events` count from `process_intent()` will see different numbers with mock vs real
- Timing-dependent race conditions between `process_intent()` and `on_venue_event()` (like the `_last_settled_pnl` dict access) are never exercised

**Severity: Medium**

### Y13: `IndexSet<String>` in `AccountState` serializes as JSON array — LRU eviction order preserved through round-trip but fragile

**File:** `_rust_kernel/src/lib.rs:830-870` (AccountState)

The `seen_account_event_ids` field in `AccountState` is `IndexSet<String>`. It serializes as a JSON array `["evt-001", "evt-002", ...]` with insertion order preserved (oldest first). The LRU eviction does `shift_remove_index(0)` to evict the oldest entry.

The round-trip (Rust → JSON → file → JSON → Rust) preserves insertion order because serde's `IndexSet` support uses iteration order for serialization and insertion order for deserialization. But:
- If a human or script edits the JSON and reorders the entries, the LRU ordering is corrupted
- If Python modifies the array before sending it back (which it doesn't currently), the ordering breaks
- The JSON array format doesn't encode "this is an LRU ordered set" semantics — it looks like an ordinary list

**Severity: Low**

### Y14: `_slot_from_payload()` in `rust_backend.py` and `real_zinc_plane.py` implement parallel but slightly different deserialization — schema drift risk

**Files:** `rust_backend.py:379-402`, `real_zinc_plane.py:83-138`

Both files implement `TradeSlot` deserialization from the same JSON format produced by `TradeSlot.to_dict()`. The `rust_backend.py` version is inline in `_slot_from_payload`; the `real_zinc_plane.py` version is a standalone function.

**Known differences:**
| Aspect | rust_backend.py | real_zinc_plane.py |
|--------|-----------------|---------------------|
| `entry_time` | `.get("entry_time")` with fallback ✅ | `.get("entry_time")` with fallback ✅ |
| `last_event_time` | `.get("last_event_time")` with fallback ✅ | **`payload["last_event_time"]` — direct key access, crashes on missing** ❌ (T4) |
| `internal_trade_id` | Overwritten with slot's `trade_id` ❌ (U12) | Overwritten with slot's `trade_id` ❌ (T8) |
| Error handling | Returns `None` slot on parse error | Returns default slot on missing key |

These differences mean the same `TradeSlot` JSON can produce different Python objects depending on which deserialization path is used — the FFI path or the shared memory path.

**Severity: Medium**

---

## Pass 22 Summary

| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| Y1 | `exit_leg_ratios` serde default `[]` vs struct default `vec![1.0]` — wrong ratio on restore | Rust | Medium |
| Y2 | `KernelIntent.slot_id` is `i64` — inconsistent with `TradeSlot`/`KernelTransition` `usize` | Rust | Low |
| Y3 | `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` | Rust | Low |
| Y4 | `AccountState` injects `"k_net_fees"` as duplicate of serde's `k_fees_paid` | Rust | Low |
| Y5 | `asyncio.sleep(0.8)` in ~295 generated test bodies — timing-dependent false results | Test | **Critical** |
| Y6 | MockVenueAdapter no `rate_limit` flag — RATE_LIMITED path untested | Test | **High** |
| Y7 | MockVenueAdapter `reconcile()` returns `[]` always — late fills untestable | Test | **High** |
| Y8 | MockVenueAdapter emits one fill per submit — multi-partial-fill untestable | Test | **High** |
| Y9 | MockVenueAdapter no `connect()` — runtime error if protocol gains it | Test | **High** |
| Y10 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | Medium |
| Y11 | MockVenueAdapter no `close()` — lifecycle cleanup untestable | Test | Medium |
| Y12 | MockVenueAdapter fills synchronous — masks async timing bugs | Test | Medium |
| Y13 | `IndexSet` JSON array doesn't encode LRU semantics — fragile on manual edit | Rust | Low |
| Y14 | `_slot_from_payload()` duplicated in two files with different behavior | Bridge | Medium |

### Pass 22 Severity

| Severity | Count |
|----------|-------|
| **Critical** | 1 (Y5 — confirming S4 scale) |
| **High** | 4 (Y6, Y7, Y8, Y9) |
| Medium | 5 (Y1, Y10, Y11, Y12, Y14) |
| Low | 4 (Y2, Y3, Y4, Y13) |

### Combined Catalog (All 22 Passes)

| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
| **Total** | | **389** | **43** | **117** | **114** | **64** | **37** |