PINK: E2E trace analysis — Pass 19 lifecycle/Rust subtleties/test infra (V1-V14)

Nineteenth pass: DITAv2LauncherBundle.close() never calls kernel.close() Rust
handle leaks via __del__ (V1 Critical), BingxVenueAdapter no close/disconnect
ThreadPoolExecutor/HTTP never release (V2 Critical), 3 generators write same
output file last writer wins incompatible prologues (V4 Critical), generated
tests triple env-gated never run in CI dead code (V5 Critical), kernel.close()
destroys Rust handle immediately no drain no flush UAF risk (V6 Critical),
process_intent ENTER doesn't clear seen_event_ids old dedup pollutes new trade
(V3 High), no conftest/pytest.ini/asyncio_mode test discovery fragile (V9 High),
#[serde(default)] leverage:0.0 mark_price no .max(1.0) silent accounting error
(V8 Medium). 347 total flaws across 19 passes.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
Codex
2026-06-02 16:34:58 +02:00
parent 94078ee8fe
commit ded4b59891
3 changed files with 1502 additions and 10 deletions

View File

@@ -6776,3 +6776,362 @@ The `lot_step` field is declared with a default of `0.001` but is never read by
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| **Total** | | **333** | **30** | **99** | **96** | **64** | **37** |
---
## PASS 19 — STARTUP/SHUTDOWN LIFECYCLE, RUST KERNEL SUBTLETIES, GENERATED TEST INFRA
### V1: `DITAv2LauncherBundle.close()` never calls `kernel.close()` — Rust kernel handle always leaks via `__del__`
**File:** `launcher.py:40-42`
```python
def close(self) -> None:
_maybe_close(self.venue)
_maybe_close(self.zinc_plane)
_maybe_close(self.control_plane)
# NOTE: self.kernel is never closed!
```
The bundle's `close()` method closes the venue, zinc plane, and control plane — but **not the kernel**. The `ExecutionKernel` Rust handle (`self._backend`) is only freed when Python's garbage collector calls `__del__`, which is non-deterministic.
In a `with` block or explicit `bundle.close()` pattern, the Rust handle survives until the `ExecutionKernel` object's refcount drops to zero (usually when the bundle goes out of scope). If the caller holds a reference to the kernel (e.g., `k = bundle.kernel` for inspection), the handle lives indefinitely.
Compare with `_build_rb()` in test code (gen2.py:326-338) which creates a `Shim` that has a `close` method calling `bundle.close()` and `bundle.kernel.close()`. The production `DITAv2LauncherBundle` has no equivalent.
**Severity: Critical**
### V2: `BingxVenueAdapter` has no `close()` or `disconnect()` — ThreadPoolExecutor, HTTP client sessions, connections never released
**File:** `bingx_venue.py` (entire file)
`BingxVenueAdapter` is the synchronous REST adapter to BingX. It has no `close()`, `disconnect()`, or any cleanup method. The `_maybe_close()` in the launcher tries `.close()` → `AttributeError` (caught), then `.disconnect()` → `AttributeError` (caught). **Nothing happens.**
Leaked resources:
- **ThreadPoolExecutor** (class-level `_EXECUTOR`, 3 threads) — never shut down
- **BingxDirectExecutionAdapter backend** — holds an `aiohttp.ClientSession` with TCP connection pool (`limit=4`) — never closed
- Any in-flight HTTP connections — abandoned at process exit
This is a compounding of R1 and R2 from Pass 15. It's the single largest resource leak in the system.
**Severity: Critical**
### V3: `process_intent` ENTER path does NOT clear `seen_event_ids` — old trade's dedup set pollutes new trade
**File:** `_rust_kernel/src/lib.rs:1260-1290` (ENTER path)
```rust
// ENTER path — slot reuse:
slot.trade_id = intent.trade_id.clone();
slot.asset = intent.asset.clone();
slot.entry_price = 0.0;
slot.size = 0.0;
slot.initial_size = 0.0;
slot.unrealized_pnl = 0.0;
slot.realized_pnl = 0.0;
slot.active_leg_index = 0;
slot.active_entry_order = None;
slot.active_exit_order = None;
slot.close_reason.clear();
slot.closed = false;
slot.last_event_time = None;
// 🔴 seen_event_ids is NOT cleared — survives from old trade
slot.fsm_state = TradeStage::ORDER_REQUESTED;
```
When a slot is reused by a new ENTER intent, the `seen_event_ids` vector from the **previous trade** survives. If any event_id from the new trade collides with one observed by the old trade, the event is rejected as `DUPLICATE_EVENT` and silently dropped.
**Example:**
1. Trade A in slot 0 receives events `evt-001`, `evt-002`, `evt-003`. All stored in `seen_event_ids`.
2. Trade A completes. Slot 0 returns to `IDLE`.
3. Trade B enters on slot 0. `seen_event_ids` still contains `evt-001`, `evt-002`, `evt-003`.
4. Exchange sends `evt-001` for Trade B (a legitimate new event with a reused event_id — possible on exchanges that recycle IDs daily).
5. Rust kernel sees `evt-001` in `seen_event_ids` → rejects as `DUPLICATE_EVENT`. The fill/ack is lost.
The probability of collision depends on the exchange's event ID scheme. If event IDs are UUIDs, the risk is negligible. If event IDs are daily counters (e.g., `"20260601-001"` → `"20260602-001"`), the collision risk is **guaranteed** on every day boundary.
**Fix:** Add `slot.seen_event_ids.clear();` in the ENTER path.
**Severity: High**
### V4: Three generators (`gen2.py`, `gen_live_tests.py`, `_gen_test.py`) all write to same output file — last writer wins, incompatible prologues
**Files:** `gen2.py:431-432`, `gen_live_tests.py:680-681`, `_gen_test.py:1233-1234`
All three generator scripts write to the same file:
```
/mnt/dolphinng5_predict/prod/tests/test_pink_bingx_dita_live_e2e.py
```
Each produces **different prologues** with incompatible import styles, helper function signatures, and test runner implementations:
- `gen2.py`: Uses `RB`/`Shim` tuple, `_si()` helper, `_flatten()` helper
- `gen_live_tests.py`: Uses `_RuntimeBundle`/`_RuntimeShim` dataclass, `_run_scenario()` runner, `_flatten_via_kernel_intent()` helper
- `_gen_test.py`: Uses scenario-style `B()` body definitions, single parametrized `test_pink_ditav2` function
The 68 named body functions are **identical** across gen2.py and gen_live_tests.py but their signatures differ (`(k, symbol, p)` vs `(bundle, client, symbol, snap)`). Running gen2.py then gen_live_tests.py produces a file where the bodies don't match the runner — the file compiles (Python sees valid functions) but silently tests nothing meaningful.
`_build_pink_extended.py` and `_build_pink_bodies.py` then mutate the same file in-place with `str.replace()` — which silently does nothing if the expected format doesn't match.
**Severity: Critical**
### V5: Generated tests are triple env-gated — never run in CI, making them dead code
**File:** `gen2.py` (generated output), `gen_live_tests.py` (generated output), `_gen_test.py` (generated output)
Every generated test file has three `pytest.skip()` guards at the top:
```python
if not os.environ.get("BINGX_SMOKE_LIVE"): pytest.skip("...")
if not os.environ.get("BINGX_SMOKE_ALLOW_TRADE"): pytest.skip("...")
if not os.environ.get("PINK_DITA_E2E"): pytest.skip("...")
```
These three env vars are **never set in CI**. The CI pipeline has no step that sets them. As a result:
- 68 test functions from gen2.py → always skipped
- 70 test functions from gen_live_tests.py → always skipped
- 157 body scenarios from _gen_test.py → always skipped
- ~295 combined test scenarios → **zero executed in CI**
Only the mock-venue tests (`test_flaws.py`, `test_account_core_v2.py`, `test_account_reconcile_faults.py`, `test_exchange_event_seam_parity.py`, `test_kernel_reliability.py`, `test_kernel_fee_friction.py`, `test_pink_clickhouse_phase4.py`, `test_alpha_blue_untouched_g7.py`) actually run in CI.
**Severity: Critical**
### V6: `ExecutionKernel.close()` destroys Rust handle immediately — no drain of in-flight intents, no wait for pending FFI calls, no state flush
**File:** `rust_backend.py:320-330`
```python
def close(self) -> None:
backend = self._backend
if backend is not None:
self._backend = None
try:
_get_rust().destroy(backend) # immediately frees Rust kernel memory
except Exception:
pass
```
`destroy(backend)` calls `dita_kernel_destroy(handle)` which calls `drop(Box::from_raw(handle))`, freeing the entire `KernelHandle` heap allocation including `KernelCore`, all slots, all account state.
If `process_intent()` or `on_venue_event()` has been called from another thread and is mid-execution in Rust when `close()` fires:
1. Rust's state machine is destroyed mid-transition — dangling `self` reference
2. Any HTTP calls (venue.submit()) already in-flight complete, but their results are never processed
3. The Rust `Box::from_raw` calls `drop()` while another thread may be holding a reference to `core` — **use-after-free UB**
4. `_last_settled_pnl` dict is orphaned
5. The zinc plane, projection, and account state are all inconsistent with the destroyed kernel
There is no cancel/token mechanism, no pending-call queue drain, and no state flush before destroy.
**Severity: Critical**
### V7: `_last_settled_pnl` dict accessed from both `process_intent` and `on_venue_event` without locks — thread-unsafe
**File:** `rust_backend.py:440,475`
```python
# process_intent (line 440):
self._last_settled_pnl[intent.slot_id] = 0.0
# on_venue_event (line 475):
incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
```
`process_intent()` (called from the async event loop or scheduler) and `on_venue_event()` (called from the venue event stream callback) both read and write `_last_settled_pnl` without any synchronization primitive. If these run on different threads, the dict can experience:
- **Lost update**: Two concurrent writes to the same `slot_id` — one overwrites the other
- **Dict corruption**: Python dict isn't thread-safe for concurrent writes — can produce `KeyError` on iteration or silently drop entries
- **Incorrect PnL settlement**: The incremental PnL calculation uses stale values
In the current single-threaded async architecture, this isn't triggered. But the lack of protection means any future multi-threaded usage introduces a data race.
**Severity: Medium**
### V8: `#[serde(default)]` on `leverage: f64` defaults to 0.0 — `mark_price()` uses leverage directly without `.max(1.0)`, silent accounting error
**File:** `_rust_kernel/src/lib.rs:400-408`
```rust
fn mark_price(&mut self, price: f64) {
// ...
if self.entry_price <= 0.0 || self.size <= 0.0 { return; }
let mut delta = (price - self.entry_price) / self.entry_price;
if self.side == TradeSide::SHORT { delta = -delta; }
self.unrealized_pnl = delta * self.size * self.entry_price * self.leverage; // uses leverage directly
}
```
The `TradeSlot` struct has `#[serde(default)] leverage: f64` which defaults to `0.0` when deserialized from JSON without a `leverage` field.
`mark_price()` uses `self.leverage` directly in `unrealized_pnl` computation — **no `.max(1.0)` guard**. If leverage is `0.0` (from missing JSON field, `set_slot_json`, or snapshot restore without leverage), `unrealized_pnl` is always `0.0` regardless of price movement.
Compare with `realized_pnl()` which correctly guards:
```rust
let notional = exit_size * slot.entry_price * slot.leverage.max(1.0); // correct
```
The `process_intent` ENTER path handles this (sets leverage to 1.0 if ≤ 0), but `set_slot_json` and `restore_full_snapshot` can bypass this and store leverage=0.0 directly into the slot.
**Impact:** Any slot restored from a snapshot that predates the `leverage` field (or saved by a version without it) gets `leverage=0.0` silently. `unrealized_pnl` is always 0.0 — the operator sees no mark-to-market PnL even though the position has moved. This PnL is only realized on close (via `realized_pnl` which correctly uses `.max(1.0)`), so the total PnL is correct but the intra-period mark appears flat — a silent accounting error.
**Severity: Medium**
### V9: No `conftest.py`, no `pytest.ini`, no `asyncio_mode` configuration — test discovery relies on default pytest behavior
**File:** (missing — no pytest configuration in workspace)
The workspace has **zero pytest configuration files**. No `conftest.py`, `pytest.ini`, `setup.cfg`, `pyproject.toml` with pytest settings. This means:
- **`asyncio_mode`** defaults to `"strict"` (pytest 8.x+), but all test files use `asyncio.run()` inline rather than `@pytest.mark.asyncio`. In strict mode, `async def` test functions are **not discovered** unless explicitly marked. Since all current tests are `sync def` wrapping `asyncio.run()`, this works — but the discovery relies on this specific pattern holding.
- No **timeout** configuration — a hanging test blocks the entire suite until killed
- No **test filtering** markers (no `slow`, `live`, `offline` markers)
- No **shared fixtures** — each test file repeats the `_build_rb()` setup pattern
- No **cache** or **rerun** configuration — flaky tests fail the CI suite
If a developer adds a new test using `async def` + `await` (the natural pattern for async code) without `@pytest.mark.asyncio`, the test silently doesn't run — a false negative.
**Severity: High**
### V10: `kernel.close()` has `except Exception: pass` — silently swallows all destroy errors
**File:** `rust_backend.py:325`
```python
try:
_get_rust().destroy(backend)
except Exception:
pass # silently swallows RuntimeError, OSError, ctypes errors
```
If `dita_kernel_destroy` fails (e.g., segfault caught by `catch_unwind` returning an error, or the shared library was unloaded), the exception is silently consumed. The caller believes the kernel was closed successfully — but the handle may still be allocated. This is a leak path that's invisible to monitoring.
**Severity: Low**
### V11: `build_launcher_bundle()` has no cleanup for partially-created components — kernel OOM orphans venue, zinc, control plane
**File:** `launcher.py:163-223`
The build order is:
1. Control plane ✅ built
2. Projection ✅ built
3. Zinc plane ✅ built
4. Venue ✅ built
5. **ExecutionKernel** — if this fails (OOM, cargo build fail, CDLL load), components 1-4 are all orphaned
There's no `try/finally` around the sequence. If `ExecutionKernel.__init__()` raises, the five already-built components (control plane, projection, zinc plane, venue, and any partially-initialized kernel state) leak. No cleanup code exists.
**Fix:** Use a context manager or `try/finally` to close created components if a later one fails.
**Severity: Medium**
### V12: `KernelResult` clones entire kernel state (all slots + indexes + AccountState) on every FFI call — performance issue
**File:** `_rust_kernel/src/lib.rs:1030-1050` (snapshot method)
Every FFI call to `process_intent()` or `on_venue_event()` returns a `KernelResult` containing a full `snapshot()`:
```rust
fn snapshot(&self) -> KernelSnapshot {
KernelSnapshot {
slots: self.slots.clone(), // O(n) clone ALL TradeSlots
active_trade_index: self.active_trade_index.clone(), // clone entire HashMap
venue_order_index: self.venue_order_index.clone(),
client_order_index: self.client_order_index.clone(),
account: self.account.clone(), // clone entire AccountState
}
}
```
For a kernel with 10 slots, each with metadata, seen_event_ids (1024 IDs), nested VenueOrders, and a full AccountState with fee_config, this produces **thousands of heap allocations per FFI call**. At 10 intents/second, this is tens of thousands of allocations/second.
The Python side only reads `KernelResult.outcome` and `KernelResult.slot` (the single affected slot) from each response — **the full snapshot is never used**. The `snapshot` field is transmitted and decoded by Python (`json.loads` parses all of it) but the caller only accesses `outcome` and `slot`. The snapshot data is generated, serialized, transmitted, deserialized, and **silently discarded** on every call.
**Severity: Medium** (performance, not correctness)
### V13: `_build_rb()` leaks bundle on any post-creation failure — Shim construction failure orphans kernel and venue
**File:** `gen2.py:326-338`, `_gen_test.py:69-80`
```python
def _build_rb(ic=25000.0, max_slots=1):
cfg = _build_config(ic)
b = build_launcher_bundle(...) # bundle created with active kernel, venue, etc.
k = b.kernel
k.account.snapshot.capital = ic # <-- if this raises (e.g., AttributeError)
... # Shim construction, etc.
return RB(runtime=Shim(k), config=cfg)
```
If any line after `build_launcher_bundle()` raises, the bundle `b` (with live kernel, venue connections, zinc plane) is leaked. No `try/finally` to call `b.close()`. In test code this is acceptable (process exits soon), but if test suites grow large, accumulated leaked bundles exhaust kernel slots or file descriptors.
**Severity: Low**
### V14: `_maybe_close` uses `break` after first method match — only tries `close` OR `disconnect`, never both
**File:** `launcher.py:67`
```python
for method_name in ("close", "disconnect"):
method = getattr(obj, method_name, None)
if callable(method):
...
break # <-- breaks after first successful match — never tries the second method
```
If an object has both `close()` and `disconnect()`, only `close()` is called. The `disconnect()` fallback is never reached if `close()` exists. This is correct for objects like `RealZincPlane` (which has `close()` but no `disconnect()`), but for objects with both methods (possible future adapters), the second method is silently skipped.
**Severity: Low**
---
## Pass 19 Summary
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| V1 | `DITAv2LauncherBundle.close()` never calls `kernel.close()` — Rust handle leaks via `__del__` | Launcher | **Critical** |
| V2 | `BingxVenueAdapter` no `close()`/`disconnect()` — ThreadPoolExecutor/HTTP never release | Venue | **Critical** |
| V3 | `process_intent` ENTER doesn't clear `seen_event_ids` — old dedup pollutes new trade | Rust | **High** |
| V4 | 3 generators write same output file — last writer wins, incompatible prologues | Test | **Critical** |
| V5 | Generated tests triple env-gated — never run in CI, dead code | Test | **Critical** |
| V6 | `kernel.close()` destroys Rust handle immediately — no drain, no flush, use-after-free risk | Bridge | **Critical** |
| V7 | `_last_settled_pnl` dict accessed from process_intent and on_venue_event without locks | Bridge | Medium |
| V8 | `#[serde(default)] leverage: f64` default 0.0 — mark_price uses directly no `.max(1.0)` | Rust | Medium |
| V9 | No `conftest.py`, no `pytest.ini`, no `asyncio_mode` — test discovery fragile | Test | **High** |
| V10 | `kernel.close()` `except Exception: pass` — silently swallows destroy errors | Bridge | Low |
| V11 | `build_launcher_bundle()` no cleanup on partial failure — OOM orphans 4 components | Launcher | Medium |
| V12 | `KernelResult` clones entire kernel state every FFI call — wasted allocations | Rust | Medium |
| V13 | `_build_rb()` leaks bundle on post-creation failure | Test | Low |
| V14 | `_maybe_close` breaks after first method — never tries both close and disconnect | Launcher | Low |
### Pass 19 Severity
| Severity | Count |
|----------|-------|
| **Critical** | 5 (V1, V2, V4, V5, V6) |
| **High** | 2 (V3, V9) |
| Medium | 4 (V7, V8, V11, V12) |
| Low | 3 (V10, V13, V14) |
### Combined Catalog (All 19 Passes)
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
| **Total** | | **347** | **35** | **101** | **100** | **64** | **37** |