PINK: E2E trace analysis — Pass 4 domain scans (G1-G36)

Four systematic passes covering Rust kernel invariants (4 criticals — missing EXIT_RESIDUAL action, unwrap() panic on NUL, backward FSM transition, stale all_legs_done variable), config validation chain (zero validators on 127 fields), persistence schema drift (7 confirmed field-level mismatches), and lifecycle management (no signal handlers, no __del__, no exception safety in builder). Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 14:26:36 +02:00
parent d475e9246b
commit d9dd54c24e
1 changed files with 619 additions and 0 deletions
--- a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
+++ b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
@@ -1549,3 +1549,622 @@ undo, no validation that the slot reached a consistent state.
 This is the single highest-impact E2E flaw because it requires no concurrency,
 no race condition, no unusual market conditions — just a transient FFI error
 during normal operation.
+
+---
+
+## PASS 4 — SYSTEMATIC DOMAIN SCANS (Config, Rust, Persistence, Lifecycle)
+
+### Rust Kernel — Numeric & FSM Invariants
+
+#### G1: EXIT_RESIDUAL action is entirely missing from Rust KernelCommandType
+
+**File:** `_rust_kernel/src/lib.rs`
+
+```rust
+string_enum! {
+    enum KernelCommandType {
+        ENTER, EXIT, MARK_PRICE, RECONCILE, CONTROL, CANCEL,
+    }
+}
+```
+
+Six variants. **No `EXIT_RESIDUAL`.** If any caller submits an intent with `action = "EXIT_RESIDUAL"`, the string_enum deserializer fails — serde returns `INVALID_INTENT_PARSE`. Even if deserialization worked, there's no branch to handle residual-position cleanup. Any position with remaining size after partial exit legs has **no way to trigger a clean-up exit** via the intent system.
+
+The Python `KernelCommandType` enum (contracts.py) does have `EXIT_RESIDUAL`, translated to `"EXIT_RESIDUAL"` string by `_intent_to_payload`. This string hits Rust's string_enum → parse error → `INVALID_INTENT_PARSE`.
+
+**Fix:** Add `EXIT_RESIDUAL` variant to Rust enum + match arm that skips the `NO_OPEN_POSITION` guard for residual-sized positions.
+
+**Severity: Critical**
+
+#### G2: `into_c_string` uses `unwrap()` — panics on interior NUL byte
+
+**File:** `_rust_kernel/src/lib.rs:1477`
+
+```rust
+fn into_c_string(value: &str) -> *mut c_char {
+    CString::new(value).unwrap().into_raw()
+}
+```
+
+`CString::new()` returns `Err` if the string contains a NUL (`'\0'`) byte. `.unwrap()` panics at the C FFI boundary. If any `serde_json::to_string()` output (e.g., user-controlled string in `KernelIntent`, `VenueEvent`, or `TradeSlot`) contains a NUL byte, this **panics the entire process**.
+
+Triggered by every FFI call that returns a string:
+- `dita_kernel_process_intent_json`
+- `dita_kernel_on_venue_event_json`
+- `dita_kernel_reconcile_slots_json`
+- `dita_kernel_snapshot_json`
+- `dita_kernel_get_slot_json`
+
+**Fix:** Replace `.unwrap()` with `unwrap_or_else(|_| ptr::null_mut())` or feed through `invalid_intent_cstring`.
+
+**Severity: Critical**
+
+#### G3: `process_intent` EXIT hardcodes `prev_state = POSITION_OPEN` unconditionally
+
+**File:** `_rust_kernel/src/lib.rs:842-890`
+
+```rust
+slot.fsm_state = TradeStage::EXIT_REQUESTED;        // unconditional override
+let transition = self.transition(
+    &slot,
+    TradeStage::POSITION_OPEN,                        // always POSITION_OPEN
+    slot.fsm_state.clone(),
+    "EXIT_INTENT",
+);
+```
+
+Three problems:
+
+(a) **Transition prev_state is a lie.** If the slot was in `EXIT_WORKING`, `EXIT_SENT`, `EXIT_REQUESTED`, or `POSITION_PARTIALLY_CLOSED`, the transition record says `POSITION_OPEN` — wrong.
+
+(b) **Backward transition.** If the slot is `EXIT_WORKING` and a new EXIT intent arrives, `fsm_state` is set to `EXIT_REQUESTED` — a backward transition from `EXIT_WORKING` → `EXIT_REQUESTED`. This corrupts the FSM.
+
+(c) **No state guard.** EXIT should only be allowed from `POSITION_OPEN`, `EXIT_WORKING` (for additional legs), or `POSITION_PARTIALLY_CLOSED`. Currently any state that passes `!is_free() && !closed && size > 0` can transition to `EXIT_REQUESTED`.
+
+**Fix:** Check actual FSM state before allowing EXIT, log actual prev_state, guard against backward transitions.
+
+**Severity: Critical**
+
+#### G4: `consume_exit_leg` advances beyond last valid index — stale `all_legs_done` variable
+
+**File:** `_rust_kernel/src/lib.rs:1420-1435`
+
+```rust
+let all_legs_done = slot.active_leg_index >= slot.exit_leg_ratios.len(); // (A)
+let should_close = (slot.size <= 1e-12 || (!partial && all_legs_done));  // (B)
+
+if !partial {
+    slot.consume_exit_leg();  // (C) — advances active_leg_index POST (A)
+}
+
+if should_close && slot.size <= 1e-12 {         // (D) — close
+} else if !partial && !all_legs_done {           // (E) — stale! uses (A) not post-advance index
+```
+
+On the last leg (`active_leg_index = len - 1`):
+- (A): `all_legs_done = false` (pre-advance)
+- (C): advances to `len` (exhausted)
+- (E): `!partial && !false` = true → enters `POSITION_OPEN` instead of examining `should_close` with post-advance index
+
+The `all_legs_done` variable is captured **before** `consume_exit_leg` advances the index. Branch (E) should use the post-advance index to correctly detect exhaustion.
+
+After exhaustion, `next_exit_ratio()` returns `1.0` (out-of-bounds `unwrap_or(1.0)`) — silently tries to exit remaining size as 100% instead of detecting completion.
+
+**Severity: Critical**
+
+#### G5: `realized_pnl` uses unbounded f64 — overflows to inf at extreme values
+
+**File:** `_rust_kernel/src/lib.rs:648-656`
+
+```rust
+let notional = exit_size * slot.entry_price * slot.leverage.max(1.0);
+delta * notional
+```
+
+No `is_finite()` check on intermediate products. At `exit_price=1e200`, `entry_price=1e-200`: `delta` = `(1e200 - 1e-200) / 1e-200` ≈ `1e400` → `inf`. The resulting `inf` is stored in `slot.realized_pnl`, corrupting all future PnL tracking.
+
+Subnormals: `entry_price=5e-324` (subnormal) causes division to produce `inf` for modest exit prices on some platforms.
+
+**Fix:** Add `is_finite()` guards on both prices and cap intermediate products.
+
+**Severity: High**
+
+#### G6: `mark_price` produces unbounded `unrealized_pnl`
+
+**File:** `_rust_kernel/src/lib.rs:384-399`
+
+```rust
+self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage;
+// No is_finite() check on result
+```
+
+If any of `delta`, `size`, `entry_price`, or `leverage` is extreme, the product overflows to `inf`. No result guard. `inf` stored in `unrealized_pnl` forever. Capped only by the `price <= 0.0` guard on input — no guard on the computation chain.
+
+Also: `self.entry_price = price` at line 388 overwrites entry_price on every mark_price call for a position with `entry_price <= 0.0`, even when the position has been open for a while. This means a stale-zero entry_price gets set to the current market price on first mark_price after open, which is correct — but if the slot is reused (re-entry without resetting entry_price), the old entry price from the prior trade bleeds into unrealized PnL.
+
+**Severity: High**
+
+#### G7: `process_intent` ENTER — no `is_finite()` guard on `target_size`
+
+**File:** `_rust_kernel/src/lib.rs:806-807`
+
+```rust
+intended_size: intent.target_size.max(0.0),
+```
+
+`f64::NAN.max(0.0)` returns `NAN`. `f64::INFINITY.max(0.0)` returns `inf`. Serde_json **does** accept `Infinity` and `NaN` by default — they're valid JSON tokens. If the Python-side `_first_invalid_intent_field` guard is bypassed (F3 — it allows these through), `NaN`/`inf` propagates into `intended_size` in `VenueOrder`, corrupting all fill calculations.
+
+Similarly, `reference_price` is never validated for finiteness before being stored in `VenueOrder.metadata`.
+
+**Severity: High**
+
+#### G8: `reconcile_slots_json` — no dedup or bounds validation
+
+**File:** `_rust_kernel/src/lib.rs:1668-1675`
+
+```rust
+for slot in slots {
+    if slot.slot_id < core.slots.len() {
+        core.slots[slot.slot_id] = slot.clone();
+    }
+}
+```
+
+Two slots with the same `slot_id`: the **second overwrites the first** silently. A slot with `slot_id >= core.slots.len()`: **silently dropped** — no error, no diagnostic. Caller sees `accepted=true` even if some/all slots were not applied.
+
+**Severity: High**
+
+#### G9: `exchange_order_id` propagation uses wrong order target
+
+**File:** `_rust_kernel/src/lib.rs:1110-1125`
+
+```rust
+let target = if slot.active_entry_order.is_some() {
+    slot.active_entry_order.as_mut()
+} else {
+    slot.active_exit_order.as_mut()
+};
+```
+
+If an **entry** order exists (even if fully filled) and an **exit** fill event arrives, the code updates the entry order's `venue_order_id` instead of the exit order's. The exit order's `venue_order_id` stays empty. Any subsequent `CANCEL` intent on the exit order fails because `active_exit_order.venue_order_id` is empty — the venue can't match the cancel.
+
+**Fix:** Disambiguate by matching `venue_client_id`, or clear `active_entry_order` when entry is complete.
+
+**Severity: High**
+
+#### G10: CANCEL diagnostic code says NO_ACTIVE_EXIT_ORDER for entry cancel too
+
+**File:** `_rust_kernel/src/lib.rs:966-1005`
+
+```rust
+if !has_cancellable_exit && !has_cancellable_entry {
+    return KernelResult {
+        diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER, // always says exit
+        details: json!({"reason": "NO_ACTIVE_EXIT_ORDER"}),
+    };
+}
+```
+
+When neither exit nor entry is cancellable, the diagnostic returns `NO_ACTIVE_EXIT_ORDER` regardless of which order was the target. If the user wanted to cancel an entry order that's not in a cancellable state, the diagnostic is misleading.
+
+**Fix:** Separate diagnostic codes: `NO_ACTIVE_EXIT_ORDER`, `NO_ACTIVE_ENTRY_ORDER`, `ENTRY_NOT_CANCELLABLE`.
+
+**Severity: High**
+
+#### G11: `apply_fill` entry-fill overwrites `active_entry_order.intended_size` with `slot.size`
+
+**File:** `_rust_kernel/src/lib.rs:1363-1377**
+
+On FULL_FILL entry, `slot.active_entry_order` is entirely replaced with a new `VenueOrder` where `intended_size = slot.size` (the fill amount) instead of the original intended size. The original intended size (which could be larger than fill size for partial fills) is lost.
+
+If a duplicate fill event arrives (dedup fails due to missing event_id), the second fill would use `slot.size` as the basis for further fills — wrong values.
+
+**Severity: Medium**
+
+#### G12: `leverage` unbounded after `is_finite()` — no maximum cap
+
+**File:** `_rust_kernel/src/lib.rs:778`
+
+```rust
+slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
+    intent.leverage  // 1e100 accepted here
+} else { 1.0 };
+```
+
+`leverage = 1e100` passes `is_finite()`. Feeds into `realized_pnl()` as `slot.leverage.max(1.0) = 1e100`, producing `notional = exit_size * entry_price * 1e100`. Makes `unrealized_pnl` arbitrarily large.
+
+No maximum leverage cap enforced anywhere — the exchange-level cap (`DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP`) exists in `BingxExecClientConfig` but is **never passed to the Rust kernel**.
+
+**Severity: Medium**
+
+#### G13: `resolve_slot` fallback returns `unwrap_or(0)` — can misroute events
+
+**File:** `_rust_kernel/src/lib.rs:623`
+
+```rust
+self.slots.first().map(|slot| slot.slot_id).unwrap_or(0)
+```
+
+When no slot matches the event (`slot_id` out of range or all slot filters fail), returns `slot_id` of the **first slot** (which may be 0 or any value). No diagnostic emitted — caller sees slot state change with no idea the event was misrouted.
+
+**Severity: Medium**
+
+#### G14: `commit_slot` silently ignores out-of-bounds slot_id
+
+**File:** `_rust_kernel/src/lib.rs:595-600**
+
+```rust
+fn commit_slot(&mut self, slot: TradeSlot) {
+    if slot.slot_id < self.slots.len() {
+        self.slots[slot_id] = slot;
+    }
+    // else: silently dropped — no error returned
+}
+```
+
+Mutations to out-of-bounds slot are silently discarded. Can happen if `slot.slot_id` is corrupted via `set_slot_from_json` causing index mismatch between `slot.slot_id` and the actual slot position.
+
+**Severity: Medium**
+
+---
+
+### Configuration & Validation Chain
+
+#### G15: Zero `__post_init__` validators on all config dataclasses
+
+Every config dataclass in the system has zero field-level validation:
+
+| Dataclass | Fields | Validators |
+|-----------|--------|------------|
+| `KernelControlSnapshot` | 16 | **0** |
+| `ControlUpdate` | 16 | **0** |
+| `KernelIntent` | 19 | **0** |
+| `TradeSlot` | 22 | **0** |
+| `VenueOrder` | 8 | **0** |
+| `VenueEvent` | 18 | **0** |
+| `KernelTransition` | 11 | **0** |
+| `KernelOutcome` | 8 | **0** |
+| `AccountSnapshot` | 9 | **0** |
+| **Total** | **127** | **0** |
+
+The only validation in the entire chain:
+- `_first_invalid_intent_field()` — finiteness guard at Python→Rust FFI boundary (not a dataclass validator)
+- Rust `leverage = if is_finite && > 0.0 { val } else { 1.0 }` — post-hoc clamp
+- Rust `KernelCore::new(max_slots.max(1))` — floor only, no ceiling
+- `launcher.py:143`: `max(1, int(...))` for `active_slot_limit` — floor only
+
+**No `__post_init__` exists anywhere. No bounds check on any field except the two floor-only guards.**
+
+**Severity: High**
+
+#### G16: `DITA_V2_DEBUG_CLICKHOUSE` defaults to `True` when env var is unset
+
+**File:** `launcher.py:133`
+
+```python
+debug = _env_bool("DITA_V2_DEBUG_CLICKHOUSE", True)
+```
+
+`_env_bool` (launcher.py:75) returns `default` when the env var is unset. So `debug = True` by default. Every runtime writes debug traces to ClickHouse by default. `DITA_V2_DEBUG_CLICKHOUSE=False` is required to disable it.
+
+This is not a bug per se, but it means debug ClickHouse writes are **on by default**, adding ~10 ClickHouse insertions per process_intent call (every transition + position state + trade event) that most production deployments may not want.
+
+**Severity: Informational**
+
+#### G17: String config fields have no charset/length validation — Zinc region injection risk
+
+**File:** `control.py:31-53`, `real_zinc_plane.py:30`
+
+`runtime_namespace`, `strategy_namespace`, `event_namespace`, `actor_name`, `exec_venue`, `data_venue`, `ledger_authority` are all free-form strings with no validation. They're used as:
+
+1. **Zinc shared memory region names**: `self.prefix + "." + namespace + "." + kind` — an attacker-controlled namespace could collide with other processes' Zinc regions
+2. **ClickHouse table names**: `DOLPHIN_BINGX_JOURNAL_STRATEGY` is used as a table suffix — SQL injection risk in ClickHouse journal
+3. **Hazelcast map names**: Same injection risk via `event_namespace`
+
+**Severity: Medium**
+
+#### G18: `exit_leg_ratios` no sum-to-1 validation
+
+`KernelIntent.exit_leg_ratios` and `TradeSlot.exit_leg_ratios` are tuple/list of floats. No validator ensures they sum to approximately 1.0. Ratios summing to 0.5 leave the position partially closed forever (residual can't be exited because `next_exit_ratio()` returns `1.0` after exhaustion, exiting 100% of remaining — which may exceed the intended residual).
+
+**Severity: Low**
+
+#### G19: `RealZincControlPlane.read()` has no sequence check — torn-read risk
+
+**File:** `real_control_plane.py:88-94**
+
+```python
+def read(self):
+    payload = _decode_packet(self.region.as_buffer())
+    control = payload.get("control")
+    if not isinstance(control, dict):
+        return self._snapshot
+    self._snapshot = KernelControlSnapshot(**control)
+    return self._snapshot
+```
+
+The binary packet has a 64-bit sequence number but `read()` **never checks it**. Between the zero-write and packet-write in `_write_region`, a reader sees an empty buffer → `_decode_packet` fails → falls back to `self._snapshot` (stale). Between the packet-write and `struct.pack` header (order depends on implementation), a reader sees a partial write with wrong size → `_decode_packet` fails.
+
+No checksum on the wire format: `struct.pack("!QQ", seq, len) + json_bytes`. A torn write produces garbage that `json.loads` may or may not parse successfully.
+
+**Severity: Low**
+
+#### G20: `DOLPHIN_BINGX_JOURNAL_STRATEGY`/`_DB` — ClickHouse SQL injection risk
+
+**File:** `launcher.py:202-203`
+
+```python
+"DOLPHIN_BINGX_JOURNAL_STRATEGY": os.environ.get("DOLPHIN_BINGX_JOURNAL_STRATEGY", ""),
+"DOLPHIN_BINGX_JOURNAL_DB": os.environ.get("DOLPHIN_BINGX_JOURNAL_DB", ""),
+```
+
+These are used as ClickHouse table and database name suffixes in `pink_clickhouse.py`. An attacker who can set env vars can inject SQL via semicolons or quotes in the table name. ClickHouse supports `INSERT INTO db.table FORMAT JSONEachRow` — a table name like `positions; DROP TABLE ...;` could be destructive.
+
+**Severity: Low** (requires env var control, which implies broader access)
+
+---
+
+### Persistence Schema Alignment
+
+#### G21: `entry_price` used as `exit_price` in `trade_events` — data loss
+
+**File:** `pink_clickhouse.py (outside workspace)`
+
+The `_write_trade_event` function maps `entry_price` from `slot.to_dict()` to both the `entry_price` and `exit_price` columns. The actual exit fill price (available on the `VenueEvent` object) is **never written** to the `exit_price` column.
+
+**Result:** Every `trade_events` row has `exit_price == entry_price`. The `exit_price` column is a dead column — always contains the entry price, never the actual fill.
+
+**Severity: High** — data loss to DB for the most important trade metric.
+
+#### G22: `active_leg_index` → `entry_bar` semantic mis-mapping
+
+**File:** `pink_clickhouse.py (outside workspace)`
+
+```python
+"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
+```
+
+`active_leg_index` tracks the exit-leg-ratios cursor (which leg of a multi-leg exit we're on), not a bar count. The value `0` at position open and `1` after the first exit leg — neither value represents bars held. **The `entry_bar` column stores the wrong concept.**
+
+**Severity: Medium** — column contains semantically meaningless data.
+
+#### G23: `capital_before` arithmetic reconstruction absorbs cross-slot PnL
+
+**File:** `pink_clickhouse.py (outside workspace)`
+
+```python
+capital_before = capital_after - pnl_leg
+```
+
+`capital_before` is reconstructed by subtracting the current leg's PnL from the current capital. In a multi-slot system, other slots' PnL changes between legs are absorbed into `capital_before`. The column is **always wrong** in multi-slot scenarios because `capital_after` reflects total PnL from all slots, not just the leg being recorded.
+
+**Severity: Medium** — wrong `capital_before` for multi-slot trading.
+
+#### G24: Recovery `trade_reconstruction` always has `trade_id=""`
+
+**File:** `pink_clickhouse.py (outside workspace)`
+
+The `persist_recovery_state` function passes `kernel.snapshot()["account"]` (an account dict with keys `capital, equity, realized_pnl, ...`) where a slot dict is expected. The `trade_id` key **does not exist** on the account dict. The `recovery_state` row always has `trade_id=""`.
+
+**Severity: Medium** — recovery data is not associable with any trade.
+
+#### G25: `seen_event_ids`, `exit_leg_ratios`, `VenueOrder`, `metadata` not in flat ClickHouse tables
+
+These fields are:
+- Present on the Python `TradeSlot` ✅
+- Transmitted through Zinc shared memory ✅
+- Stored in Hazelcast ✅
+- Stored in ClickHouse `dita_kernel_debug` (full JSON) ✅
+- **NOT extracted** into main ClickHouse flat tables `position_state`, `trade_events`, `trade_exit_legs` ❌
+
+Data exists at the source, travels through the pipeline, hits the debug journal — but is lost in the main analytical tables.
+
+**Severity: Low** (data exists in debug journal if needed for reconstruction)
+
+#### G26: `_safe_float` silently converts NaN/None/Inf to 0.0
+
+**File:** `utils.py:15`
+
+```python
+def _safe_float(v, default=0.0):
+    try:
+        f = float(v)
+        if not math.isfinite(f):
+            return default
+        return f
+    except (TypeError, ValueError, OverflowError):
+        return default
+```
+
+Used in multiple ClickHouse writers. Silently converts `NaN`/`Inf`/parsing errors to `0.0`. No diagnostic emitted when a non-finite value reaches the persistence layer — data silently zeroed.
+
+**Severity: Low** (safe default but silent corruption)
+
+---
+
+### Lifecycle & Resource Management
+
+#### G27: `build_launcher_bundle` has no exception safety — prior resources leak
+
+**File:** `launcher.py:264-300**
+
+```python
+def build_launcher_bundle(...):
+    control_plane = _build_control_plane(...)
+    projection = build_projection(...)
+    zinc_plane = _build_zinc_plane(...)
+    venue = _build_venue(...)
+    kernel = ExecutionKernel(...)  # ← if THIS fails, everything above leaks
+```
+
+If any step after the first raises, all previously built resources leak:
+- `RealZincPlane` created → `_build_venue()` fails → 3 shared memory regions orphaned
+- `RealZincControlPlane` created → `_build_zinc_plane()` fails → 1 shared memory region orphaned
+- `BingxVenueAdapter` created → `ExecutionKernel.__init__()` fails → HTTP connection leaked
+
+**No `try/finally` anywhere in the builder.** The init order is also optimized for forward construction, not backward cleanup.
+
+**Severity: High** — shared memory leak on any build failure.
+
+#### G28: `RealZincPlane` and `RealZincControlPlane` have no `__del__`
+
+When `close()` is not called (exception in builder, forgotten cleanup, GC during shutdown), the shared memory regions opened by `RealZincPlane` (3 regions) and `RealZincControlPlane` (1 region) are **orphaned on the OS**. They persist in `/dev/shm/` (or platform equivalent) until system reboot.
+
+Python's `__del__` is unreliable (not called on SIGKILL, not called if the object is part of a cycle without a GC run), but its absence means even normal garbage collection can't clean up.
+
+**Severity: High** — shared memory leaks.
+
+#### G29: Zero signal handlers — no cleanup on SIGTERM/SIGINT
+
+```bash
+$ grep -rn "signal\|SIGTERM\|SIGINT\|atexit" *.py  # ZERO matches
+```
+
+When SIGTERM or SIGINT arrives:
+1. Python's default handler terminates the process immediately
+2. No `DITAv2LauncherBundle.close()` is called
+3. No `ExecutionKernel.__del__` is called (CPython may run GC on normal exit but not reliably)
+4. All shared memory (RealZincPlane, RealZincControlPlane) is orphaned
+5. In-flight BingX HTTP calls are interrupted mid-stream
+6. Rust kernel handle is leaked
+
+**Severity: High**
+
+#### G30: `ExecutionKernel` has no `close()` — relies on `__del__` for Rust handle cleanup
+
+`ExecutionKernel` has `__del__` which calls `_get_rust().destroy(backend)`. No `close()` method. `DITAv2LauncherBundle.close()` never touches the kernel — the Rust handle is only freed by GC at unpredictable time.
+
+If any code holds a stale `_backend` pointer, the handle dangles when GC runs. If `__del__` is suppressed (e.g., during interpreter shutdown with cyclic references), the Rust handle leaks permanently.
+
+**Fix:** Add `close()` to `ExecutionKernel`, call it from `DITAv2LauncherBundle.close()`.
+
+**Severity: High**
+
+#### G31: `projection` (Hazelcast) never closed
+
+`build_projection()` returns a `HazelcastProjection` which holds a Hazelcast client connection. No `close()` or `disconnect()` method exists on the projection, projector, or row writer. `DITAv2LauncherBundle.close()` doesn't touch the projection. The Hazelcast client connection leaks on shutdown.
+
+**Severity: Medium**
+
+#### G32: `_maybe_close()` only calls the first method found — `break` skips the second
+
+**File:** `launcher.py:233-243**
+
+```python
+for method_name in ("close", "disconnect"):
+    method = getattr(obj, method_name, None)
+    if method is None:
+        continue
+    try:
+        result = method()
+    except TypeError:
+        continue
+    if inspect.isawaitable(result):
+        try:
+            asyncio.run(result)
+        except RuntimeError:
+            pass
+    break  # ← ONLY calls the FIRST found method, never both
+```
+
+If an object has both `close()` and `disconnect()`, only `close()` is called. `disconnect()` is silently skipped. Also: `asyncio.run(result)` silently swallows `RuntimeError` when a running event loop exists — the coroutine is **never executed**.
+
+Currently no object has both, but the pattern is fragile.
+
+**Severity: Low**
+
+#### G33: `close()` is not idempotent for RealZinc components
+
+`RealZincPlane.close()` and `RealZincControlPlane.close()` call their Zinc region's `close()` method. If called twice, the second call operates on an already-closed region — likely crashes from Hazelcast's shared memory code.
+
+No nulling of references after close: `DITAv2LauncherBundle.close()` sets `self.venue`, `self.zinc_plane`, `self.control_plane` to `None` — **wait, it doesn't. It calls `_maybe_close()` which doesn't null references.** Double `close()` is unsafe.
+
+**Severity: Low**
+
+#### G34: No context manager on `DITAv2LauncherBundle`
+
+`DITAv2LauncherBundle` has no `__enter__`/`__exit__`. Users must manually call `close()`. No `with` pattern exists anywhere in the source for lifecycle management. No `__del__` fallback on the bundle either.
+
+**Severity: Low** (ergonomic, not a leak source if caller follows the pattern)
+
+#### G35: `BingxVenueAdapter.connect()` exists but is never called by the launcher
+
+`BingxDirectExecutionAdapter` has a `connect()` method that initializes the lifetime HTTP client. `BingxVenueAdapter` has `connect()` that calls `_call_backend("connect")`. Neither is called in `build_launcher_bundle()` or `_build_venue()`. If the adapter's `submit_intent()` relies on a connected client, it initializes lazily — but the connect path is dead code that exists but is never invoked.
+
+**Severity: Informational**
+
+#### G36: Only one `try/finally` in the entire codebase
+
+The only `try/finally` is `_RustKernelLib._take_string()` (rust_backend.py:140-143) which frees the Rust C string. All other resource management uses `try/except` with no `finally`.
+
+No cleanup is guaranteed on exception:
+- `build_launcher_bundle()` — no cleanup on failure
+- `process_intent()` — no cleanup of partial slot state on venue event exception
+- `on_venue_event()` — no cleanup on FFI failure
+- `_set_slot()` — no cleanup on projection or Zinc write failure
+
+**Severity: High** (across all layers)
+
+---
+
+## Pass 4 Summary
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType | Rust | **Critical** |
+| G2 | `into_c_string` unwrap() panics on NUL byte | Rust | **Critical** |
+| G3 | EXIT hardcodes prev_state=POSITION_OPEN, allows backward FSM transition | Rust | **Critical** |
+| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
+| G5 | `realized_pnl` unbounded f64 overflow to inf | Rust | **High** |
+| G6 | `mark_price` unbounded unrealized_pnl — no result guard | Rust | **High** |
+| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
+| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
+| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
+| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
+| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
+| G12 | No max leverage cap enforced by kernel | Rust | Medium |
+| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
+| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
+| G15 | Zero `__post_init__` validators on all config dataclasses | Config | **High** |
+| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
+| G17 | String config fields — Zinc region injection risk | Config | Medium |
+| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
+| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
+| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
+| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
+| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
+| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
+| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
+| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
+| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
+| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
+| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
+| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
+| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
+| G31 | Hazelcast projection never closed | Lifecycle | Medium |
+| G32 | _maybe_close() break skips second method | Lifecycle | Low |
+| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
+| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
+| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
+| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
+
+### Pass 4 Severity Distribution
+
+| Severity | Count |
+|----------|-------|
+| **Critical** | 4 (G1, G2, G3, G4) |
+| **High** | 11 (G5-G10, G15, G21, G27, G28, G29, G30, G36) |
+| Medium | 11 (G11-G14, G17, G22, G23, G24, G31) |
+| Low | 8 (G16, G18, G19, G20, G25, G26, G32, G33, G34, G35) |
+| Info | 2 |
+
+### Combined Catalog (All 4 Passes)
+
+| Pass | Focus | Count | Critical | High | Medium | Low | Info |
+|------|-------|-------|----------|------|--------|-----|------|
+| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
+| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
+| E | E2E Trace | 26 | 0 | 4 | 10 | 11 | 1 |
+| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
+| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
+| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |