PINK: E2E trace analysis — Pass 22 serde round-trip/mock fidelity/protocol (Y1-Y14)

Twenty-second pass: asyncio.sleep(0.8) in ~295 generated test bodies flaky (Y5
Critical), MockVenueAdapter no rate_limit flag RATE_LIMITED path untested (Y6
High), reconcile() returns [] always late fills untestable (Y7 High), emits
one fill per submit multi-partial-fill untestable (Y8 High), no connect()
runtime error if protocol gains it (Y9 High), exit_leg_ratios serde default []
vs struct default vec[1.0] wrong ratio on restore (Y1 Medium), libc dead dep
(Y10 Medium), no close() (Y11 Medium), synchronous fills masks timing bugs
(Y12 Medium), _slot_from_payload duplicated two files different behavior (Y14
Medium). 389 total flaws across 22 passes.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
Codex
2026-06-02 18:39:49 +02:00
parent 09db2e694b
commit 13822d5bfa
2 changed files with 349 additions and 2 deletions

View File

@@ -7789,3 +7789,320 @@ Additionally, `close()` does not clear Python-level caches (`_slot_cache`, `_int
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
| **Total** | | **375** | **42** | **113** | **109** | **64** | **37** |
---
## PASS 22 — SERDE FIELD-BY-FIELD ROUND-TRIP, MOCK VENUE FIDELITY GAPS, PROTOCOL CONSISTENCY
### Y1: `VenueEvent` in Rust kernel `exit_leg_ratios` field — serde default `[]` vs struct default `vec![1.0]` — mismatch produces wrong ratio on JSON-deserialized slots
**File:** `_rust_kernel/src/lib.rs:346-400` (TradeSlot)
```rust
// serde default — when field is missing from incoming JSON:
#[serde(default)] exit_leg_ratios: Vec<f64>
// → Default::default() produces [] (empty vec)
// struct Default impl:
impl Default for TradeSlot { ... exit_leg_ratios: vec![1.0] ... }
```
When a `TradeSlot` is deserialized from JSON without an `exit_leg_ratios` field:
- `serde(default)` gives `[]` (empty)
- The struct's own `Default` gives `vec![1.0]`
The `next_exit_ratio()` function uses `.unwrap_or(1.0)` so an empty vec returns `1.0` — same as `[1.0]` would at `active_leg_index=0`. But if `active_leg_index` were ever non-zero during deserialization (possible with manually constructed or restored slots), the empty vec would silently return `1.0` for every subsequent leg instead of the correct ratio.
**Trigger scenario:** A slot restored from a snapshot that was saved with `active_leg_index=2` and 3 leg ratios, but the JSON has no `exit_leg_ratios` field (e.g., from an older version). `active_leg_index=2`, `exit_leg_ratios=[]` → `exit_leg_ratios.get(2) = None` → `unwrap_or(1.0)` → wrong ratio for leg 3.
**Severity: Medium**
### Y2: `KernelIntent.slot_id` is `i64` while `TradeSlot.slot_id` and `KernelTransition.slot_id` are `usize` — signed/unsigned inconsistency within Rust
**File:** `_rust_kernel/src/lib.rs:419,346,470`
```rust
pub struct KernelIntent {
pub slot_id: i64, // signed — can be negative
...
}
pub struct TradeSlot {
pub slot_id: usize, // unsigned
...
}
pub struct KernelTransition {
pub slot_id: usize, // unsigned
...
}
```
`KernelIntent` (input from Python) uses `i64`, while `TradeSlot` and `KernelTransition` use `usize`. The kernel guards with `if slot_id < 0` before casting to `usize` in `resolve_slot()`, so a negative `slot_id` is correctly rejected. But this inconsistency means:
- A large `slot_id` from Python (> 2^63-1) can't be represented in `i64` → serde error or truncation
- A very large `slot_id` in a `TradeSlot` (> 2^63-1 on 64-bit) can't be represented in `KernelResult.outcome.slot_id` (which is `usize` → back to Python as int, fine) or `VenueEvent.slot_id` (`i64` — overflow risk)
Python `int` is arbitrary precision, so it can send any value. The `i64` vs `usize` inconsistency means the boundary between input and internal types has a silent truncation risk for unrealistic slot counts (>9 quintillion).
**Severity: Low**
### Y3: `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` — third-party JSON producers without `stage` field get wrong default
**File:** `_rust_kernel/src/lib.rs:419` (KernelIntent), `contracts.py:225` (Python KernelIntent)
```rust
// Rust serde default — when stage is absent from incoming JSON:
#[serde(default)] stage: TradeStage,
// → Default::default() → TradeStage::IDLE
```
```python
# Python dataclass default:
stage: TradeStage = TradeStage.INTENT_CREATED # different!
```
The Python `_intent_to_payload()` always explicitly writes `"stage": intent.stage.value`, so this mismatch never triggers in normal operation. But if a third-party JSON producer (future REST API, another kernel, a test helper) sends a `KernelIntent` without a `stage` field, Rust interprets it as `IDLE` instead of `INTENT_CREATED`.
The kernel's FSM logic doesn't use `stage` for any decision — it's only recorded in transitions and the outcome. So functionally both `IDLE` and `INTENT_CREATED` produce identical behavior. But a monitoring dashboard that displays `stage=IDLE` on a brand-new intent would be confusing.
**Severity: Low**
### Y4: `AccountState` injects JSON key `"k_net_fees"` as duplicate of serde-serialized `k_fees_paid` — two names for same value
**File:** `_rust_kernel/src/lib.rs:1088-1092`
```rust
// Inside on_account_event — manually injects a duplicate key:
obj.insert("k_net_fees".to_string(), json!(self.account.k_fees_paid));
```
The `AccountState` struct already has `k_fees_paid: f64` which is serialized by `#[derive(Serialize)]`. Then the `on_account_event` handler manually injects `"k_net_fees"` with the same value from `self.account.k_fees_paid`.
Python reads `"k_fees_paid"` (rust_backend.py:907) — never reads `"k_net_fees"`. The injected key is dead data on the wire. If Python were ever changed to look for `"k_net_fees"`, it would find the same value — but the dual naming creates confusion about which key is canonical.
**Severity: Low**
### Y5: `await asyncio.sleep(0.8)` in every generated test body — timing-dependent false negatives on slow CI, and false positives when fills arrive late
**File:** (all generated test bodies across `gen2.py`, `gen_live_tests.py`, `_gen_test.py`)
Every generated test body follows the pattern:
```python
_si(k, E.ENTER, tid, sym, "LONG", p, 0.001); await asyncio.sleep(0.8)
```
The `0.8` second sleep assumes the mock venue (or live exchange) produces the fill event within 0.8 seconds. On a loaded CI system:
- Mock venue processes fills synchronously in the same event-loop iteration, so `0.8` is always enough
- But live exchange with real latency → the fill may arrive after `0.8s` → the EXIT intent hits `SLOT_BUSY` → test fails
This is flaw S4 but the actual number of affected tests: **all ~295 generated test scenarios** across all three generators use this pattern. Not a separate finding, just noting the scale.
**Severity: Critical** (already logged as S4)
### Y6: MockVenueAdapter has no `rate_limit` flag — RATE_LIMITED code path in both Python bridge and Rust kernel has zero test coverage
**File:** `mock_venue.py:27-35` (MockVenueScenario)
```python
@dataclass(frozen=True)
class MockVenueScenario:
reject_entries: bool = False
reject_exits: bool = False
reject_cancels: bool = False
all_fills_partial: bool = False
# NOTE: no rate_limit field
```
The `MockVenueScenario` dataclass has flags for rejection and partial fill simulation but **no `rate_limit` flag**. The real adapter (`BingxVenueAdapter`) produces `RATE_LIMITED` venue events from three code paths:
1. `_events_from_submit()` — when receipt status is `"RATE_LIMITED"` or `"THROTTLED"`
2. `_events_from_cancel()` — same status check
3. `_http_error_status()` — maps HTTP 429, 5xx, transport errors to RATE_LIMITED
The Rust kernel has a full FSM path for `RATE_LIMITED`: `accepted=false, diagnostic_code=RATE_LIMITED`. This path has never fired in any test. Adding a `rate_limit: bool = False` flag to `MockVenueScenario` and a corresponding branch in `submit()`/`cancel()` would cost ~10 lines and enable testing the entire RATE_LIMITED pipeline.
**Severity: High** (already logged as W12, but worth noting the specific gap)
### Y7: MockVenueAdapter `reconcile()` returns `[]` always — cannot simulate late fills, stale orders, or exchange-state divergence
**File:** `mock_venue.py:150-155`
```python
def reconcile(self) -> List[VenueEvent]:
return [] # always returns empty
```
The real adapter's `reconcile()` calls `_backend_snapshot()` which fetches the full exchange state (open orders, fills, balance) and compares against known state. This is how late fills are detected — an exchange response includes fills that occurred during a WS disconnect window.
The mock's `reconcile()` returns nothing. Any code path that depends on `reconcile()` to discover fills (flaw B1: "no fill history fetched during WS reconnect gap-backfill") is completely untested with the mock.
**Severity: High**
### Y8: MockVenueAdapter emits exactly one fill per `submit()` — cannot test multi-partial-fill accumulation for LIMIT orders
**File:** `mock_venue.py:87-120`
```python
def submit(self, intent: Intent) -> List[VenueEvent]:
...
if scenario.emit_fill_on_submit or fill_ratio > 0:
events.append(self._event_from_order(..., kind=FULL_FILL if ratio >= 1.0 else PARTIAL_FILL, ...))
return events # at most one fill event
```
The mock emits **exactly one** fill event per `submit()` call. Real exchange behavior (especially for LIMIT orders) involves:
1. ORDER_ACK — order accepted by exchange
2. PARTIAL_FILL — first 50% fills at limit price
3. PARTIAL_FILL — second 50% fills at a better price
4. FULL_FILL — remaining fills
The Rust kernel's `apply_fill` has a full incremental accumulation path (`prev_filled + fill_size`). This path is only tested with a single fill per submit. The multi-partial-fill lifecycle is completely untested.
**Severity: High**
### Y9: MockVenueAdapter has no `connect()` method — if `VenueAdapter` protocol gains this requirement (flaw T6), mock fails at runtime
**File:** `mock_venue.py` (entire file — no `connect()` method)
```python
class MockVenueAdapter:
# No connect() method defined anywhere
```
Flaw T6 identifies that `VenueAdapter` protocol is missing `connect()`/`disconnect()`. The test infrastructure (`_build_pink_extended.py`'s `Shim` class) calls `self.kernel.venue.connect()`. If `BingxVenueAdapter` gets a `connect()` method (as it should, per flaw V2 fix), but `MockVenueAdapter` does not, any test using the mock venue will raise `AttributeError` at `connect()` time.
**Severity: High**
### Y10: Unused `libc` crate declared in `Cargo.toml` — dead dependency with zero code references
**File:** `_rust_kernel/Cargo.toml:8`, `_rust_kernel/src/lib.rs`
```toml
[dependencies]
libc = "0.2"
```
`grep 'libc' src/lib.rs` returns **zero matches**. The code uses `std::ffi::{c_char, CStr, CString}` from the Rust standard library (stable since Rust 1.64). The `libc` crate is not imported, not used, and serves no purpose.
This is dead weight:
- Adds to compile time and dependency graph
- Version bumps need maintenance
- Theoretical supply-chain risk (crate could be compromised)
- Indicates refactoring residue from an earlier version that used `libc::c_char` directly
**Severity: Medium**
### Y11: MockVenueAdapter has no `close()` method — bundle lifecycle cleanup can't properly release mock resources
**File:** `mock_venue.py` (entire file — no `close()` method)
Even though `MockVenueAdapter` has no actual resources to release (no thread pool, no HTTP connections, no shared memory), the `DITAv2LauncherBundle.close()` method calls `_maybe_close(self.venue)` which tries `obj.close()` → `AttributeError` (caught), then `obj.disconnect()` → `AttributeError` (caught).
The error is silently swallowed (flaw V2 fix), but the missing method means:
- If a future version adds resources to `MockVenueAdapter` (e.g., a mock thread pool for testing thread safety), the leak won't be detected
- The mock can't be used to test the bundle lifecycle cleanup path
- `_maybe_close` exception handling is exercised but silently
**Severity: Medium**
### Y12: MockVenueAdapter fills are synchronous — `process_intent()` returns with fill event already in `emitted_events` — masks async timing bugs
**File:** `mock_venue.py:87-120`
The mock's `submit()` returns a list of `VenueEvent`s that includes the fill event in the **same call**. The Python bridge appends these to the `KernelResult.outcome.emitted_events` list. The kernel returns from `process_intent()` with the fill already applied to the slot.
With a real exchange (or even a realistic mock), fills arrive via a separate `on_venue_event()` call — potentially hundreds of milliseconds after `process_intent()` returns. This means:
- The kernel makes FSM decisions in `on_venue_event()` that the mock never exercises (e.g., the TERMINAL_STATE guard, SLOT_BUSY check for concurrent intents)
- Tests that check `emitted_events` count from `process_intent()` will see different numbers with mock vs real
- Timing-dependent race conditions between `process_intent()` and `on_venue_event()` (like the `_last_settled_pnl` dict access) are never exercised
**Severity: Medium**
### Y13: `IndexSet<String>` in `AccountState` serializes as JSON array — LRU eviction order preserved through round-trip but fragile
**File:** `_rust_kernel/src/lib.rs:830-870` (AccountState)
The `seen_account_event_ids` field in `AccountState` is `IndexSet<String>`. It serializes as a JSON array `["evt-001", "evt-002", ...]` with insertion order preserved (oldest first). The LRU eviction does `shift_remove_index(0)` to evict the oldest entry.
The round-trip (Rust → JSON → file → JSON → Rust) preserves insertion order because serde's `IndexSet` support uses iteration order for serialization and insertion order for deserialization. But:
- If a human or script edits the JSON and reorders the entries, the LRU ordering is corrupted
- If Python modifies the array before sending it back (which it doesn't currently), the ordering breaks
- The JSON array format doesn't encode "this is an LRU ordered set" semantics — it looks like an ordinary list
**Severity: Low**
### Y14: `_slot_from_payload()` in `rust_backend.py` and `real_zinc_plane.py` implement parallel but slightly different deserialization — schema drift risk
**Files:** `rust_backend.py:379-402`, `real_zinc_plane.py:83-138`
Both files implement `TradeSlot` deserialization from the same JSON format produced by `TradeSlot.to_dict()`. The `rust_backend.py` version is inline in `_slot_from_payload`; the `real_zinc_plane.py` version is a standalone function.
**Known differences:**
| Aspect | rust_backend.py | real_zinc_plane.py |
|--------|-----------------|---------------------|
| `entry_time` | `.get("entry_time")` with fallback ✅ | `.get("entry_time")` with fallback ✅ |
| `last_event_time` | `.get("last_event_time")` with fallback ✅ | **`payload["last_event_time"]` — direct key access, crashes on missing** ❌ (T4) |
| `internal_trade_id` | Overwritten with slot's `trade_id` ❌ (U12) | Overwritten with slot's `trade_id` ❌ (T8) |
| Error handling | Returns `None` slot on parse error | Returns default slot on missing key |
These differences mean the same `TradeSlot` JSON can produce different Python objects depending on which deserialization path is used — the FFI path or the shared memory path.
**Severity: Medium**
---
## Pass 22 Summary
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| Y1 | `exit_leg_ratios` serde default `[]` vs struct default `vec![1.0]` — wrong ratio on restore | Rust | Medium |
| Y2 | `KernelIntent.slot_id` is `i64` — inconsistent with `TradeSlot`/`KernelTransition` `usize` | Rust | Low |
| Y3 | `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` | Rust | Low |
| Y4 | `AccountState` injects `"k_net_fees"` as duplicate of serde's `k_fees_paid` | Rust | Low |
| Y5 | `asyncio.sleep(0.8)` in ~295 generated test bodies — timing-dependent false results | Test | **Critical** |
| Y6 | MockVenueAdapter no `rate_limit` flag — RATE_LIMITED path untested | Test | **High** |
| Y7 | MockVenueAdapter `reconcile()` returns `[]` always — late fills untestable | Test | **High** |
| Y8 | MockVenueAdapter emits one fill per submit — multi-partial-fill untestable | Test | **High** |
| Y9 | MockVenueAdapter no `connect()` — runtime error if protocol gains it | Test | **High** |
| Y10 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | Medium |
| Y11 | MockVenueAdapter no `close()` — lifecycle cleanup untestable | Test | Medium |
| Y12 | MockVenueAdapter fills synchronous — masks async timing bugs | Test | Medium |
| Y13 | `IndexSet` JSON array doesn't encode LRU semantics — fragile on manual edit | Rust | Low |
| Y14 | `_slot_from_payload()` duplicated in two files with different behavior | Bridge | Medium |
### Pass 22 Severity
| Severity | Count |
|----------|-------|
| **Critical** | 1 (Y5 — confirming S4 scale) |
| **High** | 4 (Y6, Y7, Y8, Y9) |
| Medium | 5 (Y1, Y10, Y11, Y12, Y14) |
| Low | 4 (Y2, Y3, Y4, Y13) |
### Combined Catalog (All 22 Passes)
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
| **Total** | | **389** | **43** | **117** | **114** | **64** | **37** |