PINK: E2E trace analysis — Pass 22 serde round-trip/mock fidelity/protocol (Y1-Y14)

Twenty-second pass: asyncio.sleep(0.8) in ~295 generated test bodies flaky (Y5 Critical), MockVenueAdapter no rate_limit flag RATE_LIMITED path untested (Y6 High), reconcile() returns [] always late fills untestable (Y7 High), emits one fill per submit multi-partial-fill untestable (Y8 High), no connect() runtime error if protocol gains it (Y9 High), exit_leg_ratios serde default [] vs struct default vec[1.0] wrong ratio on restore (Y1 Medium), libc dead dep (Y10 Medium), no close() (Y11 Medium), synchronous fills masks timing bugs (Y12 Medium), _slot_from_payload duplicated two files different behavior (Y14 Medium). 389 total flaws across 22 passes. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 18:39:49 +02:00
parent 09db2e694b
commit 13822d5bfa
2 changed files with 349 additions and 2 deletions
--- a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
+++ b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
@@ -7789,3 +7789,320 @@ Additionally, `close()` does not clear Python-level caches (`_slot_cache`, `_int
 | W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
 | X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
 | **Total** | | **375** | **42** | **113** | **109** | **64** | **37** |
+
+---
+
+## PASS 22 — SERDE FIELD-BY-FIELD ROUND-TRIP, MOCK VENUE FIDELITY GAPS, PROTOCOL CONSISTENCY
+
+### Y1: `VenueEvent` in Rust kernel `exit_leg_ratios` field — serde default `[]` vs struct default `vec![1.0]` — mismatch produces wrong ratio on JSON-deserialized slots
+
+**File:** `_rust_kernel/src/lib.rs:346-400` (TradeSlot)
+
+```rust
+// serde default — when field is missing from incoming JSON:
+#[serde(default)] exit_leg_ratios: Vec<f64>
+// → Default::default() produces [] (empty vec)
+
+// struct Default impl:
+impl Default for TradeSlot { ... exit_leg_ratios: vec![1.0] ... }
+```
+
+When a `TradeSlot` is deserialized from JSON without an `exit_leg_ratios` field:
+- `serde(default)` gives `[]` (empty)
+- The struct's own `Default` gives `vec![1.0]`
+
+The `next_exit_ratio()` function uses `.unwrap_or(1.0)` so an empty vec returns `1.0` — same as `[1.0]` would at `active_leg_index=0`. But if `active_leg_index` were ever non-zero during deserialization (possible with manually constructed or restored slots), the empty vec would silently return `1.0` for every subsequent leg instead of the correct ratio.
+
+**Trigger scenario:** A slot restored from a snapshot that was saved with `active_leg_index=2` and 3 leg ratios, but the JSON has no `exit_leg_ratios` field (e.g., from an older version). `active_leg_index=2`, `exit_leg_ratios=[]` → `exit_leg_ratios.get(2) = None` → `unwrap_or(1.0)` → wrong ratio for leg 3.
+
+**Severity: Medium**
+
+### Y2: `KernelIntent.slot_id` is `i64` while `TradeSlot.slot_id` and `KernelTransition.slot_id` are `usize` — signed/unsigned inconsistency within Rust
+
+**File:** `_rust_kernel/src/lib.rs:419,346,470`
+
+```rust
+pub struct KernelIntent {
+    pub slot_id: i64,       // signed — can be negative
+    ...
+}
+
+pub struct TradeSlot {
+    pub slot_id: usize,     // unsigned
+    ...
+}
+
+pub struct KernelTransition {
+    pub slot_id: usize,     // unsigned
+    ...
+}
+```
+
+`KernelIntent` (input from Python) uses `i64`, while `TradeSlot` and `KernelTransition` use `usize`. The kernel guards with `if slot_id < 0` before casting to `usize` in `resolve_slot()`, so a negative `slot_id` is correctly rejected. But this inconsistency means:
+- A large `slot_id` from Python (> 2^63-1) can't be represented in `i64` → serde error or truncation
+- A very large `slot_id` in a `TradeSlot` (> 2^63-1 on 64-bit) can't be represented in `KernelResult.outcome.slot_id` (which is `usize` → back to Python as int, fine) or `VenueEvent.slot_id` (`i64` — overflow risk)
+
+Python `int` is arbitrary precision, so it can send any value. The `i64` vs `usize` inconsistency means the boundary between input and internal types has a silent truncation risk for unrealistic slot counts (>9 quintillion).
+
+**Severity: Low**
+
+### Y3: `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` — third-party JSON producers without `stage` field get wrong default
+
+**File:** `_rust_kernel/src/lib.rs:419` (KernelIntent), `contracts.py:225` (Python KernelIntent)
+
+```rust
+// Rust serde default — when stage is absent from incoming JSON:
+#[serde(default)] stage: TradeStage,
+// → Default::default() → TradeStage::IDLE
+```
+
+```python
+# Python dataclass default:
+stage: TradeStage = TradeStage.INTENT_CREATED   # different!
+```
+
+The Python `_intent_to_payload()` always explicitly writes `"stage": intent.stage.value`, so this mismatch never triggers in normal operation. But if a third-party JSON producer (future REST API, another kernel, a test helper) sends a `KernelIntent` without a `stage` field, Rust interprets it as `IDLE` instead of `INTENT_CREATED`.
+
+The kernel's FSM logic doesn't use `stage` for any decision — it's only recorded in transitions and the outcome. So functionally both `IDLE` and `INTENT_CREATED` produce identical behavior. But a monitoring dashboard that displays `stage=IDLE` on a brand-new intent would be confusing.
+
+**Severity: Low**
+
+### Y4: `AccountState` injects JSON key `"k_net_fees"` as duplicate of serde-serialized `k_fees_paid` — two names for same value
+
+**File:** `_rust_kernel/src/lib.rs:1088-1092`
+
+```rust
+// Inside on_account_event — manually injects a duplicate key:
+obj.insert("k_net_fees".to_string(), json!(self.account.k_fees_paid));
+```
+
+The `AccountState` struct already has `k_fees_paid: f64` which is serialized by `#[derive(Serialize)]`. Then the `on_account_event` handler manually injects `"k_net_fees"` with the same value from `self.account.k_fees_paid`.
+
+Python reads `"k_fees_paid"` (rust_backend.py:907) — never reads `"k_net_fees"`. The injected key is dead data on the wire. If Python were ever changed to look for `"k_net_fees"`, it would find the same value — but the dual naming creates confusion about which key is canonical.
+
+**Severity: Low**
+
+### Y5: `await asyncio.sleep(0.8)` in every generated test body — timing-dependent false negatives on slow CI, and false positives when fills arrive late
+
+**File:** (all generated test bodies across `gen2.py`, `gen_live_tests.py`, `_gen_test.py`)
+
+Every generated test body follows the pattern:
+```python
+_si(k, E.ENTER, tid, sym, "LONG", p, 0.001); await asyncio.sleep(0.8)
+```
+
+The `0.8` second sleep assumes the mock venue (or live exchange) produces the fill event within 0.8 seconds. On a loaded CI system:
+- Mock venue processes fills synchronously in the same event-loop iteration, so `0.8` is always enough
+- But live exchange with real latency → the fill may arrive after `0.8s` → the EXIT intent hits `SLOT_BUSY` → test fails
+
+This is flaw S4 but the actual number of affected tests: **all ~295 generated test scenarios** across all three generators use this pattern. Not a separate finding, just noting the scale.
+
+**Severity: Critical** (already logged as S4)
+
+### Y6: MockVenueAdapter has no `rate_limit` flag — RATE_LIMITED code path in both Python bridge and Rust kernel has zero test coverage
+
+**File:** `mock_venue.py:27-35` (MockVenueScenario)
+
+```python
+@dataclass(frozen=True)
+class MockVenueScenario:
+    reject_entries: bool = False
+    reject_exits: bool = False
+    reject_cancels: bool = False
+    all_fills_partial: bool = False
+    # NOTE: no rate_limit field
+```
+
+The `MockVenueScenario` dataclass has flags for rejection and partial fill simulation but **no `rate_limit` flag**. The real adapter (`BingxVenueAdapter`) produces `RATE_LIMITED` venue events from three code paths:
+1. `_events_from_submit()` — when receipt status is `"RATE_LIMITED"` or `"THROTTLED"`
+2. `_events_from_cancel()` — same status check
+3. `_http_error_status()` — maps HTTP 429, 5xx, transport errors to RATE_LIMITED
+
+The Rust kernel has a full FSM path for `RATE_LIMITED`: `accepted=false, diagnostic_code=RATE_LIMITED`. This path has never fired in any test. Adding a `rate_limit: bool = False` flag to `MockVenueScenario` and a corresponding branch in `submit()`/`cancel()` would cost ~10 lines and enable testing the entire RATE_LIMITED pipeline.
+
+**Severity: High** (already logged as W12, but worth noting the specific gap)
+
+### Y7: MockVenueAdapter `reconcile()` returns `[]` always — cannot simulate late fills, stale orders, or exchange-state divergence
+
+**File:** `mock_venue.py:150-155`
+
+```python
+def reconcile(self) -> List[VenueEvent]:
+    return []  # always returns empty
+```
+
+The real adapter's `reconcile()` calls `_backend_snapshot()` which fetches the full exchange state (open orders, fills, balance) and compares against known state. This is how late fills are detected — an exchange response includes fills that occurred during a WS disconnect window.
+
+The mock's `reconcile()` returns nothing. Any code path that depends on `reconcile()` to discover fills (flaw B1: "no fill history fetched during WS reconnect gap-backfill") is completely untested with the mock.
+
+**Severity: High**
+
+### Y8: MockVenueAdapter emits exactly one fill per `submit()` — cannot test multi-partial-fill accumulation for LIMIT orders
+
+**File:** `mock_venue.py:87-120`
+
+```python
+def submit(self, intent: Intent) -> List[VenueEvent]:
+    ...
+    if scenario.emit_fill_on_submit or fill_ratio > 0:
+        events.append(self._event_from_order(..., kind=FULL_FILL if ratio >= 1.0 else PARTIAL_FILL, ...))
+    return events  # at most one fill event
+```
+
+The mock emits **exactly one** fill event per `submit()` call. Real exchange behavior (especially for LIMIT orders) involves:
+1. ORDER_ACK — order accepted by exchange
+2. PARTIAL_FILL — first 50% fills at limit price
+3. PARTIAL_FILL — second 50% fills at a better price
+4. FULL_FILL — remaining fills
+
+The Rust kernel's `apply_fill` has a full incremental accumulation path (`prev_filled + fill_size`). This path is only tested with a single fill per submit. The multi-partial-fill lifecycle is completely untested.
+
+**Severity: High**
+
+### Y9: MockVenueAdapter has no `connect()` method — if `VenueAdapter` protocol gains this requirement (flaw T6), mock fails at runtime
+
+**File:** `mock_venue.py` (entire file — no `connect()` method)
+
+```python
+class MockVenueAdapter:
+    # No connect() method defined anywhere
+```
+
+Flaw T6 identifies that `VenueAdapter` protocol is missing `connect()`/`disconnect()`. The test infrastructure (`_build_pink_extended.py`'s `Shim` class) calls `self.kernel.venue.connect()`. If `BingxVenueAdapter` gets a `connect()` method (as it should, per flaw V2 fix), but `MockVenueAdapter` does not, any test using the mock venue will raise `AttributeError` at `connect()` time.
+
+**Severity: High**
+
+### Y10: Unused `libc` crate declared in `Cargo.toml` — dead dependency with zero code references
+
+**File:** `_rust_kernel/Cargo.toml:8`, `_rust_kernel/src/lib.rs`
+
+```toml
+[dependencies]
+libc = "0.2"
+```
+
+`grep 'libc' src/lib.rs` returns **zero matches**. The code uses `std::ffi::{c_char, CStr, CString}` from the Rust standard library (stable since Rust 1.64). The `libc` crate is not imported, not used, and serves no purpose.
+
+This is dead weight:
+- Adds to compile time and dependency graph
+- Version bumps need maintenance
+- Theoretical supply-chain risk (crate could be compromised)
+- Indicates refactoring residue from an earlier version that used `libc::c_char` directly
+
+**Severity: Medium**
+
+### Y11: MockVenueAdapter has no `close()` method — bundle lifecycle cleanup can't properly release mock resources
+
+**File:** `mock_venue.py` (entire file — no `close()` method)
+
+Even though `MockVenueAdapter` has no actual resources to release (no thread pool, no HTTP connections, no shared memory), the `DITAv2LauncherBundle.close()` method calls `_maybe_close(self.venue)` which tries `obj.close()` → `AttributeError` (caught), then `obj.disconnect()` → `AttributeError` (caught).
+
+The error is silently swallowed (flaw V2 fix), but the missing method means:
+- If a future version adds resources to `MockVenueAdapter` (e.g., a mock thread pool for testing thread safety), the leak won't be detected
+- The mock can't be used to test the bundle lifecycle cleanup path
+- `_maybe_close` exception handling is exercised but silently
+
+**Severity: Medium**
+
+### Y12: MockVenueAdapter fills are synchronous — `process_intent()` returns with fill event already in `emitted_events` — masks async timing bugs
+
+**File:** `mock_venue.py:87-120`
+
+The mock's `submit()` returns a list of `VenueEvent`s that includes the fill event in the **same call**. The Python bridge appends these to the `KernelResult.outcome.emitted_events` list. The kernel returns from `process_intent()` with the fill already applied to the slot.
+
+With a real exchange (or even a realistic mock), fills arrive via a separate `on_venue_event()` call — potentially hundreds of milliseconds after `process_intent()` returns. This means:
+- The kernel makes FSM decisions in `on_venue_event()` that the mock never exercises (e.g., the TERMINAL_STATE guard, SLOT_BUSY check for concurrent intents)
+- Tests that check `emitted_events` count from `process_intent()` will see different numbers with mock vs real
+- Timing-dependent race conditions between `process_intent()` and `on_venue_event()` (like the `_last_settled_pnl` dict access) are never exercised
+
+**Severity: Medium**
+
+### Y13: `IndexSet<String>` in `AccountState` serializes as JSON array — LRU eviction order preserved through round-trip but fragile
+
+**File:** `_rust_kernel/src/lib.rs:830-870` (AccountState)
+
+The `seen_account_event_ids` field in `AccountState` is `IndexSet<String>`. It serializes as a JSON array `["evt-001", "evt-002", ...]` with insertion order preserved (oldest first). The LRU eviction does `shift_remove_index(0)` to evict the oldest entry.
+
+The round-trip (Rust → JSON → file → JSON → Rust) preserves insertion order because serde's `IndexSet` support uses iteration order for serialization and insertion order for deserialization. But:
+- If a human or script edits the JSON and reorders the entries, the LRU ordering is corrupted
+- If Python modifies the array before sending it back (which it doesn't currently), the ordering breaks
+- The JSON array format doesn't encode "this is an LRU ordered set" semantics — it looks like an ordinary list
+
+**Severity: Low**
+
+### Y14: `_slot_from_payload()` in `rust_backend.py` and `real_zinc_plane.py` implement parallel but slightly different deserialization — schema drift risk
+
+**Files:** `rust_backend.py:379-402`, `real_zinc_plane.py:83-138`
+
+Both files implement `TradeSlot` deserialization from the same JSON format produced by `TradeSlot.to_dict()`. The `rust_backend.py` version is inline in `_slot_from_payload`; the `real_zinc_plane.py` version is a standalone function.
+
+**Known differences:**
+| Aspect | rust_backend.py | real_zinc_plane.py |
+|--------|-----------------|---------------------|
+| `entry_time` | `.get("entry_time")` with fallback ✅ | `.get("entry_time")` with fallback ✅ |
+| `last_event_time` | `.get("last_event_time")` with fallback ✅ | **`payload["last_event_time"]` — direct key access, crashes on missing** ❌ (T4) |
+| `internal_trade_id` | Overwritten with slot's `trade_id` ❌ (U12) | Overwritten with slot's `trade_id` ❌ (T8) |
+| Error handling | Returns `None` slot on parse error | Returns default slot on missing key |
+
+These differences mean the same `TradeSlot` JSON can produce different Python objects depending on which deserialization path is used — the FFI path or the shared memory path.
+
+**Severity: Medium**
+
+---
+
+## Pass 22 Summary
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| Y1 | `exit_leg_ratios` serde default `[]` vs struct default `vec![1.0]` — wrong ratio on restore | Rust | Medium |
+| Y2 | `KernelIntent.slot_id` is `i64` — inconsistent with `TradeSlot`/`KernelTransition` `usize` | Rust | Low |
+| Y3 | `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` | Rust | Low |
+| Y4 | `AccountState` injects `"k_net_fees"` as duplicate of serde's `k_fees_paid` | Rust | Low |
+| Y5 | `asyncio.sleep(0.8)` in ~295 generated test bodies — timing-dependent false results | Test | **Critical** |
+| Y6 | MockVenueAdapter no `rate_limit` flag — RATE_LIMITED path untested | Test | **High** |
+| Y7 | MockVenueAdapter `reconcile()` returns `[]` always — late fills untestable | Test | **High** |
+| Y8 | MockVenueAdapter emits one fill per submit — multi-partial-fill untestable | Test | **High** |
+| Y9 | MockVenueAdapter no `connect()` — runtime error if protocol gains it | Test | **High** |
+| Y10 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | Medium |
+| Y11 | MockVenueAdapter no `close()` — lifecycle cleanup untestable | Test | Medium |
+| Y12 | MockVenueAdapter fills synchronous — masks async timing bugs | Test | Medium |
+| Y13 | `IndexSet` JSON array doesn't encode LRU semantics — fragile on manual edit | Rust | Low |
+| Y14 | `_slot_from_payload()` duplicated in two files with different behavior | Bridge | Medium |
+
+### Pass 22 Severity
+
+| Severity | Count |
+|----------|-------|
+| **Critical** | 1 (Y5 — confirming S4 scale) |
+| **High** | 4 (Y6, Y7, Y8, Y9) |
+| Medium | 5 (Y1, Y10, Y11, Y12, Y14) |
+| Low | 4 (Y2, Y3, Y4, Y13) |
+
+### Combined Catalog (All 22 Passes)
+
+| Pass | Focus | Count | Critical | High | Medium | Low | Info |
+|------|-------|-------|----------|------|--------|-----|------|
+| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
+| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
+| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
+| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
+| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
+| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
+| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
+| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
+| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
+| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
+| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
+| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
+| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
+| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
+| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
+| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
+| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
+| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
+| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
+| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
+| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
+| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
+| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
+| **Total** | | **389** | **43** | **117** | **114** | **64** | **37** |