PINK: E2E trace analysis — Pass 10 runtime/test bugs/FSM/persistence/metrics (M1-M18)

Tenth pass: ENTER transition always says prev_state=IDLE (M1 Critical), CANCEL creates no transition record (M2 Critical), ORDER_REJECT on POSITION_OPEN with stale entry order destroys position (M9 Critical), _mk_intent test helper drops order_type/limit_price into metadata not proper field (M3 High), four test/s that claim to test cancel but never cancel (M4, M17), no metric aggregation for trade count/latency/slippage (M10 High), no ClickHouse INSERT retry (M12 High), _decision_to_kernel_intent drops order_type/limit_price making LIMIT orders dead from the runtime (M18 High). 233 total flaws. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 00:03:41 +02:00
parent b3b28bb44a
commit 81fe1d6d25
2 changed files with 324 additions and 1 deletions
--- a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
+++ b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md
@@ -4012,3 +4012,298 @@ If an entry order exists (even if fully filled and the slot is in `POSITION_OPEN
 | K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
 | L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
 | **Total** | | **215** | **13** | **59** | **63** | **53** | **27** |
+
+---
+
+## PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT
+
+### M1: ENTER transition hardcodes `prev_state = IDLE` — every non-IDLE entry corrupts the audit trail
+
+**File:** `_rust_kernel/src/lib.rs:1117`
+
+```rust
+let transition = self.transition(
+    &slot,
+    TradeStage::IDLE,           // HARDCODED — lies about actual prev_state
+    slot.fsm_state.clone(),
+    "ENTER_INTENT",
+);
+```
+
+When a slot is entered from `CLOSED` (re-entry) or from any other state that passed the `is_free()` or same-trade bypass, the transition record claims `prev_state = IDLE`. This is **always wrong unless the slot was genuinely IDLE**. Every ENTER transition in the journal for a re-entered slot or a slot coming from CLOSED records an impossible transition (`CLOSED → ORDER_REQUESTED` recorded as `IDLE → ORDER_REQUESTED`).
+
+This corrupts any downstream FSM analysis, journal audit, or trade-lifecycle reconstruction that relies on accurate `prev_state` values.
+
+**Severity: Critical**
+
+### M2: CANCEL intent creates no transition record — invisible in audit log
+
+**File:** `_rust_kernel/src/lib.rs:1287-1305`
+
+The CANCEL branch in `process_intent` returns a `KernelResult` with no call to `self.transition()`. Every other intent (ENTER, EXIT, MARK_PRICE, RECONCILE) records a transition. CANCEL operations — including accepted cancels — are **invisible in the transition audit log**.
+
+Additionally, CANCEL returns `accepted = true` but never mutates the slot's `fsm_state`. The slot stays in whatever state it was in. The caller sees `accepted = true` with no visible effect.
+
+**Severity: Critical**
+
+### M3: `_mk_intent` test helper drops `order_type`/`limit_price` into `metadata` instead of setting proper fields
+
+**File:** `test_flaws.py:43`
+
+```python
+def _mk_intent(action, trade_id="t1", size=0.001, price=100.0, slot_id=0, **kw):
+    return KernelIntent(
+        ...
+        metadata=kw,   # order_type="LIMIT" goes into metadata dict, not the dataclass field!
+    )
+```
+
+`KernelIntent` has dedicated fields `order_type: str = "MARKET"` and `limit_price: float = 0.0` (contracts.py:274-275), but `_mk_intent` passes `**kw` as `metadata=kw`. So `_mk_intent(order_type="LIMIT")` produces `intent.order_type == "MARKET"` (the default) while `intent.metadata["order_type"] == "LIMIT"`.
+
+The Flaw 6 tests in `test_flaws.py` that verify `order_type`/`limit_price` preservation through `_legacy_intent` pass for the **wrong reason** — they check `legacy.metadata.get("order_type")` which finds the value in the passthrough metadata, not because `_legacy_intent` correctly reads `intent.order_type`. If the production code changes and the test helper isn't fixed, the tests silently become false positives.
+
+**Severity: High**
+
+### M4: `test_cancel_entry_with_partial_fill` never sends a CANCEL — misnamed vacuous test
+
+**File:** `test_flaws.py:161-172`
+
+```python
+def test_cancel_entry_with_partial_fill(self):
+    k = _fresh_kernel(scenario=MockVenueScenario(partial_fill_ratio=0.5))
+    k.process_intent(_mk_intent(action=E.ENTER, trade_id="ce4", size=0.002))
+    slot_after = k._get_slot(0)
+    assert slot_after.size > 0, "Should have partial fill"
+```
+
+Named "Cancel entry with partial fill," belongs to `TestFlaw1EntryCancel` — but **no CANCEL intent is ever sent**. It only verifies that a partial fill occurred. The test is completely vacuous for its stated purpose.
+
+The same pattern affects Flaw 9 tests — `test_cancel_uses_slot_asset_not_trade_id` and `test_mock_venue_cancel_event_has_asset` both have "cancel" in their names but never call any cancel function.
+
+**Severity: High**
+
+### M5: Flaw 7 tests (`test_entry_exit_different_ratios`, `test_per_action_type_ratios`) never send EXIT
+
+**File:** `test_flaws.py` Flaw 7 test class
+
+Both tests set `exit_partial_fill_ratio` on the mock venue scenario but only ever process an ENTER intent. The `exit_partial_fill_ratio` is configured but never exercised. The tests verify entry partial fill behavior only — they don't test what their titles and class name claim.
+
+**Severity: Medium**
+
+### M6: `test_dedup_window_accepts_many_events` uses wrong constant — actual=256, flaw claims 64-only 70 events sent
+
+**File:** `test_flaws.py:536-555`
+
+The Flaw 10 tests reference a 64-event dedup window, but the actual Rust constant is `MAX_SEEN_EVENT_IDS = 256` (lib.rs:8). The test sends 70 events and asserts `>= 70`. Since `70 < 256`, no eviction occurs. The test passes trivially regardless of whether the old-64-bound flaw exists. To meaningfully test eviction, >256 events would be needed.
+
+Similarly, `test_dedup_eviction_does_not_accept_old_event` sends only 70 events then checks for dedup — with a 256-entry window, the first event is never evicted. The test verifies basic dedup (non-evicted), not eviction behavior.
+
+**Severity: Medium**
+
+### M7: `test_outcome_state_matches_actual_slot` is tautological — compares value with itself
+
+**File:** `test_flaws.py:200-210`
+
+```python
+result = k.process_intent(_mk_intent(action=E.ENTER, trade_id="oc1"))
+slot = k._get_slot(0)
+assert result.state == slot.fsm_state,
+```
+
+`result.state` is set from `final_slot.fsm_state` (which comes from `self._get_slot(outcome.slot_id)` inside `process_intent`). The test then calls `k._get_slot(0)` again. Both read from the same Rust backend — they **must** be equal by construction. This test proves nothing; it's a tautology.
+
+**Severity: Low**
+
+### M8: ORDER_ACK silent fallthrough when no active order — accepts event with no effect
+
+**File:** `_rust_kernel/src/lib.rs:1476-1498`
+
+When `on_venue_event` receives an `ORDER_ACK` for a slot with neither `active_entry_order` nor `active_exit_order` (shouldn't happen normally, but possible after a reconcile or race), the match arm executes **no branch**. The state is unchanged, `diagnostic_code` stays `OK`, and `accepted = true`. The event is silently accepted with no effect — no diagnostic, no warning.
+
+The same bug exists for `CANCEL_ACK` (line 1545): if no matching active order exists, the event is silently accepted with no state change and `OK` diagnostic.
+
+**Severity: Medium**
+
+### M9: ORDER_REJECT on POSITION_OPEN with stale entry order destroys the position
+
+**File:** `_rust_kernel/src/lib.rs:1499-1530`
+
+```rust
+KernelEventKind::ORDER_REJECT => {
+    if slot.active_entry_order.is_some() && slot.fsm_state != TradeStage::POSITION_OPEN {
+        // clear entry, wipe trade data, set IDLE
+    } else if slot.active_exit_order.is_some() {
+        // clear exit order only, set POSITION_OPEN
+    } else {
+        // no match — reset to IDLE
+    }
+}
+```
+
+If a slot is in `POSITION_OPEN` (position active) but `active_entry_order` is still `Some` (stale — didn't get cleared on fill), the entry-reject guard `fsm_state != POSITION_OPEN` prevents the entry path. It falls to the exit check. If no exit order, the final `else` branch fires — resetting the slot to **IDLE** and destroying the open position and all trade data.
+
+**Severity: Critical**
+
+### M10: No aggregation of any metric — trade count, success/fail, latency all zero
+
+**File:** entire codebase
+
+The following metrics are completely impossible to obtain from the current system:
+
+| Metric | Why unavailable |
+|--------|----------------|
+| Total trades processed | `trade_seq` declared on `AccountSnapshot` but never incremented anywhere |
+| Succeeded vs failed trades | No aggregation of `KernelDiagnosticCode` outcomes |
+| PnL per individual trade | `slot.realized_pnl` is overwritten on slot reuse — no per-trade persistence |
+| Slippage (fill vs intended price) | Data exists transiently but no computed metric |
+| API calls per minute | No call counters anywhere in the venue adapter |
+| `process_intent` latency | Zero timing instrumentation — no `time.monotonic()` in kernel path |
+| Process memory usage | No memory tracking of any kind |
+| Deduplicated vs fresh event count | Dedup detection exists but is never counted |
+
+The `AccountSnapshot.trade_seq` field (account.py:27) is declared as `trade_seq: int = 0` but **never assigned** — no code path ever sets it above 0. It's a dead field.
+
+**Severity: High**
+
+### M11: Flaw 6 tests pass via metadata passthrough, not via `_legacy_intent` field logic
+
+**File:** `test_flaws.py` Flaw 6 tests
+
+The two Flaw 6 tests verify that `_legacy_intent` preserves `order_type` and `limit_price`. They pass because `_mk_intent(order_type="LIMIT")` puts the value into `intent.metadata`, and `_legacy_intent` copies `intent.metadata` into `legacy.metadata` verbatim. The tests check `legacy.metadata.get("order_type")` which finds the value in the passthrough — **not** because `_legacy_intent` reads `intent.order_type` correctly.
+
+`_legacy_intent` actually reads `getattr(intent, "order_type", "MARKET")` which returns `"MARKET"` (the default, since `_mk_intent` put it in metadata not the field), and sets `legacy.metadata["_order_type"] = "MARKET"`. The assertion passes via the wrong code path. If `_legacy_intent` stopped copying metadata entirely, the tests would still pass as long as `intent.metadata` is passed through.
+
+**Severity: High**
+
+### M12: No retry or fallback for ClickHouse INSERT failures
+
+Evidence across all persistence paths: every `sink(table, row)` call in `pink_clickhouse.py` is unprotected. If ClickHouse is unreachable, slow, or returns an error, the exception propagates unhandled through `persist_step()` → `step()`. No retry, no backoff, no fallback, no queue, no error reporting to `anomaly_events`.
+
+This means a transient ClickHouse outage (common in cloud deployments) crashes the entire policy cycle. The slot state in the Rust kernel may be lost as the exception unwinds.
+
+**Severity: High**
+
+### M13: `AccountSnapshot.trade_seq` declared but never incremented — dead field
+
+**File:** `account.py:27`
+
+```python
+@dataclass
+class AccountSnapshot:
+    ...
+    trade_seq: int = 0
+```
+
+This field is part of the `AccountSnapshot` dataclass. It's initialized to 0 and **never assigned or incremented** anywhere in the entire codebase. Every snapshot from `kernel.snapshot()["account"]` returns `trade_seq: 0`. Despite being a standard field in every persistence row, it's always `0` — making it impossible to order trades chronologically by sequence number from any persisted data.
+
+**Severity: Medium**
+
+### M14: `test_reentry_after_full_close_no_pnl_loss` uses absurdly loose 50% bound
+
+**File:** `test_flaws.py:686-706`
+
+```python
+assert abs(cap_after_second - cap_before) < cap_before * 0.5
+```
+
+Allows a 50% capital deviation (12,500 USDT on 25,000). The actual PnL from the test's tiny trades (~0.02 USDT) is orders of magnitude smaller. A bug that silently leaked 10,000 USDT of PnL would pass this test. The bound provides no meaningful verification.
+
+Also: the test never checks `diagnostic_code` for the warning it claims to test (already documented as I7 weakness).
+
+**Severity: Low**
+
+### M15: `test_reconcile_rejects_position_open_with_zero_size` passes even if reconcile silently ignores bad data
+
+**File:** `test_flaws.py:568-585`
+
+```python
+result = k.reconcile_from_slots([bad_slot])
+slot = k._get_slot(0)
+assert slot.fsm_state != TradeStage.POSITION_OPEN or slot.size > 0
+```
+
+The assertion was true **before** calling reconcile (slot starts IDLE with size=0). The test never checks `result.accepted == False` or verifies the diagnostic code. If `reconcile_from_slots` silently ignores the bad slot and returns `accepted=True`, the test still passes — it only proves the slot wasn't in POSITION_OPEN _after_ reconcile, which was already true.
+
+The same structural weakness exists in `test_reconcile_rejects_idle_with_nonzero_size`.
+
+**Severity: Low**
+
+### M16: No built-in metric for active slot count, event throughput, or memory usage
+
+The following operational metrics cannot be obtained without writing custom code:
+
+- **Active slot count**: `len([s for s in kernel.state.slots if not s.is_free()])` — requires Python access to the `ExecutionKernel` object. No `active_slot_count` property exists.
+- **Total event count**: No counter. The journal tracks individual transitions but there's no `total_events_processed: int` anywhere.
+- **Memory usage**: No `tracemalloc`, no `psutil`, no RSS polling. Nothing.
+- **Runtime uptime**: No `start_time` or `uptime()` method anywhere.
+
+**Severity: Medium**
+
+### M17: M4 duplicate — test_cancel_uses_slot_asset_not_trade_id and test_mock_venue_cancel_event_has_asset never call cancel
+
+**File:** `test_flaws.py` Flaw 9 class
+
+Both tests verify that an entry order's metadata contains an `asset` key. They never call `scenario.cancel()` or `k.process_intent(action=CANCEL)`. Despite their names and class (`TestFlaw9CancelSymbolFallback`), they test **metadata preservation on entry**, not cancel behavior.
+
+**Severity: High**
+
+### M18: `_decision_to_kernel_intent` drops `order_type` and `limit_price` — LIMIT orders unreachable from the runtime
+
+**File:** `pink_direct.py:79-115` (inferred from E2E trace)
+
+The bridge function converts a `Decision` to a `KernelIntent`. It sets `timestamp`, `intent_id`, `trade_id`, `asset`, `side`, `action`, `reference_price`, `target_size`, `leverage`, `exit_leg_ratios`, `reason`, and `metadata`. It does **NOT** set `order_type` or `limit_price` — both default to `"MARKET"` and `0.0`.
+
+Even if the `DecisionEngine` produced a LIMIT decision with a limit price, the runtime has no path to express it. The entire LIMIT-order pipeline is dead code from the runtime — LIMIT orders can only be set via direct `KernelIntent(...)` construction in tests, which is itself broken (M3).
+
+**Severity: High**
+
+---
+
+## Pass 10 Summary
+
+| # | Flaw | Layer | Severity |
+|---|------|-------|----------|
+| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** |
+| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** |
+| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** |
+| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** |
+| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium |
+| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium |
+| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low |
+| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium |
+| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position | Rust | **Critical** |
+| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** |
+| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** |
+| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** |
+| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium |
+| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low |
+| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low |
+| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium |
+| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** |
+| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** |
+
+### Pass 10 Severity
+
+| Severity | Count |
+|----------|-------|
+| **Critical** | 3 (M1, M2, M9) |
+| **High** | 7 (M3, M4, M10, M11, M12, M17, M18) |
+| Medium | 5 (M5, M6, M8, M13, M16) |
+| Low | 3 (M7, M14, M15) |
+
+### Combined Catalog (All 10 Passes)
+
+| Pass | Focus | Count | Critical | High | Medium | Low | Info |
+|------|-------|-------|----------|------|--------|-----|------|
+| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
+| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
+| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
+| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
+| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
+| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
+| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
+| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
+| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
+| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
+| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
+| **Total** | | **233** | **16** | **66** | **68** | **56** | **27** |