PINK: E2E trace analysis — Pass 10 runtime/test bugs/FSM/persistence/metrics (M1-M18)
Tenth pass: ENTER transition always says prev_state=IDLE (M1 Critical), CANCEL creates no transition record (M2 Critical), ORDER_REJECT on POSITION_OPEN with stale entry order destroys position (M9 Critical), _mk_intent test helper drops order_type/limit_price into metadata not proper field (M3 High), four test/s that claim to test cancel but never cancel (M4, M17), no metric aggregation for trade count/latency/slippage (M10 High), no ClickHouse INSERT retry (M12 High), _decision_to_kernel_intent drops order_type/limit_price making LIMIT orders dead from the runtime (M18 High). 233 total flaws. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
@@ -4012,3 +4012,298 @@ If an entry order exists (even if fully filled and the slot is in `POSITION_OPEN
|
||||
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
|
||||
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
|
||||
| **Total** | | **215** | **13** | **59** | **63** | **53** | **27** |
|
||||
|
||||
---
|
||||
|
||||
## PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT
|
||||
|
||||
### M1: ENTER transition hardcodes `prev_state = IDLE` — every non-IDLE entry corrupts the audit trail
|
||||
|
||||
**File:** `_rust_kernel/src/lib.rs:1117`
|
||||
|
||||
```rust
|
||||
let transition = self.transition(
|
||||
&slot,
|
||||
TradeStage::IDLE, // HARDCODED — lies about actual prev_state
|
||||
slot.fsm_state.clone(),
|
||||
"ENTER_INTENT",
|
||||
);
|
||||
```
|
||||
|
||||
When a slot is entered from `CLOSED` (re-entry) or from any other state that passed the `is_free()` or same-trade bypass, the transition record claims `prev_state = IDLE`. This is **always wrong unless the slot was genuinely IDLE**. Every ENTER transition in the journal for a re-entered slot or a slot coming from CLOSED records an impossible transition (`CLOSED → ORDER_REQUESTED` recorded as `IDLE → ORDER_REQUESTED`).
|
||||
|
||||
This corrupts any downstream FSM analysis, journal audit, or trade-lifecycle reconstruction that relies on accurate `prev_state` values.
|
||||
|
||||
**Severity: Critical**
|
||||
|
||||
### M2: CANCEL intent creates no transition record — invisible in audit log
|
||||
|
||||
**File:** `_rust_kernel/src/lib.rs:1287-1305`
|
||||
|
||||
The CANCEL branch in `process_intent` returns a `KernelResult` with no call to `self.transition()`. Every other intent (ENTER, EXIT, MARK_PRICE, RECONCILE) records a transition. CANCEL operations — including accepted cancels — are **invisible in the transition audit log**.
|
||||
|
||||
Additionally, CANCEL returns `accepted = true` but never mutates the slot's `fsm_state`. The slot stays in whatever state it was in. The caller sees `accepted = true` with no visible effect.
|
||||
|
||||
**Severity: Critical**
|
||||
|
||||
### M3: `_mk_intent` test helper drops `order_type`/`limit_price` into `metadata` instead of setting proper fields
|
||||
|
||||
**File:** `test_flaws.py:43`
|
||||
|
||||
```python
|
||||
def _mk_intent(action, trade_id="t1", size=0.001, price=100.0, slot_id=0, **kw):
|
||||
return KernelIntent(
|
||||
...
|
||||
metadata=kw, # order_type="LIMIT" goes into metadata dict, not the dataclass field!
|
||||
)
|
||||
```
|
||||
|
||||
`KernelIntent` has dedicated fields `order_type: str = "MARKET"` and `limit_price: float = 0.0` (contracts.py:274-275), but `_mk_intent` passes `**kw` as `metadata=kw`. So `_mk_intent(order_type="LIMIT")` produces `intent.order_type == "MARKET"` (the default) while `intent.metadata["order_type"] == "LIMIT"`.
|
||||
|
||||
The Flaw 6 tests in `test_flaws.py` that verify `order_type`/`limit_price` preservation through `_legacy_intent` pass for the **wrong reason** — they check `legacy.metadata.get("order_type")` which finds the value in the passthrough metadata, not because `_legacy_intent` correctly reads `intent.order_type`. If the production code changes and the test helper isn't fixed, the tests silently become false positives.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### M4: `test_cancel_entry_with_partial_fill` never sends a CANCEL — misnamed vacuous test
|
||||
|
||||
**File:** `test_flaws.py:161-172`
|
||||
|
||||
```python
|
||||
def test_cancel_entry_with_partial_fill(self):
|
||||
k = _fresh_kernel(scenario=MockVenueScenario(partial_fill_ratio=0.5))
|
||||
k.process_intent(_mk_intent(action=E.ENTER, trade_id="ce4", size=0.002))
|
||||
slot_after = k._get_slot(0)
|
||||
assert slot_after.size > 0, "Should have partial fill"
|
||||
```
|
||||
|
||||
Named "Cancel entry with partial fill," belongs to `TestFlaw1EntryCancel` — but **no CANCEL intent is ever sent**. It only verifies that a partial fill occurred. The test is completely vacuous for its stated purpose.
|
||||
|
||||
The same pattern affects Flaw 9 tests — `test_cancel_uses_slot_asset_not_trade_id` and `test_mock_venue_cancel_event_has_asset` both have "cancel" in their names but never call any cancel function.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### M5: Flaw 7 tests (`test_entry_exit_different_ratios`, `test_per_action_type_ratios`) never send EXIT
|
||||
|
||||
**File:** `test_flaws.py` Flaw 7 test class
|
||||
|
||||
Both tests set `exit_partial_fill_ratio` on the mock venue scenario but only ever process an ENTER intent. The `exit_partial_fill_ratio` is configured but never exercised. The tests verify entry partial fill behavior only — they don't test what their titles and class name claim.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### M6: `test_dedup_window_accepts_many_events` uses wrong constant — actual=256, flaw claims 64-only 70 events sent
|
||||
|
||||
**File:** `test_flaws.py:536-555`
|
||||
|
||||
The Flaw 10 tests reference a 64-event dedup window, but the actual Rust constant is `MAX_SEEN_EVENT_IDS = 256` (lib.rs:8). The test sends 70 events and asserts `>= 70`. Since `70 < 256`, no eviction occurs. The test passes trivially regardless of whether the old-64-bound flaw exists. To meaningfully test eviction, >256 events would be needed.
|
||||
|
||||
Similarly, `test_dedup_eviction_does_not_accept_old_event` sends only 70 events then checks for dedup — with a 256-entry window, the first event is never evicted. The test verifies basic dedup (non-evicted), not eviction behavior.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### M7: `test_outcome_state_matches_actual_slot` is tautological — compares value with itself
|
||||
|
||||
**File:** `test_flaws.py:200-210`
|
||||
|
||||
```python
|
||||
result = k.process_intent(_mk_intent(action=E.ENTER, trade_id="oc1"))
|
||||
slot = k._get_slot(0)
|
||||
assert result.state == slot.fsm_state,
|
||||
```
|
||||
|
||||
`result.state` is set from `final_slot.fsm_state` (which comes from `self._get_slot(outcome.slot_id)` inside `process_intent`). The test then calls `k._get_slot(0)` again. Both read from the same Rust backend — they **must** be equal by construction. This test proves nothing; it's a tautology.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### M8: ORDER_ACK silent fallthrough when no active order — accepts event with no effect
|
||||
|
||||
**File:** `_rust_kernel/src/lib.rs:1476-1498`
|
||||
|
||||
When `on_venue_event` receives an `ORDER_ACK` for a slot with neither `active_entry_order` nor `active_exit_order` (shouldn't happen normally, but possible after a reconcile or race), the match arm executes **no branch**. The state is unchanged, `diagnostic_code` stays `OK`, and `accepted = true`. The event is silently accepted with no effect — no diagnostic, no warning.
|
||||
|
||||
The same bug exists for `CANCEL_ACK` (line 1545): if no matching active order exists, the event is silently accepted with no state change and `OK` diagnostic.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### M9: ORDER_REJECT on POSITION_OPEN with stale entry order destroys the position
|
||||
|
||||
**File:** `_rust_kernel/src/lib.rs:1499-1530`
|
||||
|
||||
```rust
|
||||
KernelEventKind::ORDER_REJECT => {
|
||||
if slot.active_entry_order.is_some() && slot.fsm_state != TradeStage::POSITION_OPEN {
|
||||
// clear entry, wipe trade data, set IDLE
|
||||
} else if slot.active_exit_order.is_some() {
|
||||
// clear exit order only, set POSITION_OPEN
|
||||
} else {
|
||||
// no match — reset to IDLE
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If a slot is in `POSITION_OPEN` (position active) but `active_entry_order` is still `Some` (stale — didn't get cleared on fill), the entry-reject guard `fsm_state != POSITION_OPEN` prevents the entry path. It falls to the exit check. If no exit order, the final `else` branch fires — resetting the slot to **IDLE** and destroying the open position and all trade data.
|
||||
|
||||
**Severity: Critical**
|
||||
|
||||
### M10: No aggregation of any metric — trade count, success/fail, latency all zero
|
||||
|
||||
**File:** entire codebase
|
||||
|
||||
The following metrics are completely impossible to obtain from the current system:
|
||||
|
||||
| Metric | Why unavailable |
|
||||
|--------|----------------|
|
||||
| Total trades processed | `trade_seq` declared on `AccountSnapshot` but never incremented anywhere |
|
||||
| Succeeded vs failed trades | No aggregation of `KernelDiagnosticCode` outcomes |
|
||||
| PnL per individual trade | `slot.realized_pnl` is overwritten on slot reuse — no per-trade persistence |
|
||||
| Slippage (fill vs intended price) | Data exists transiently but no computed metric |
|
||||
| API calls per minute | No call counters anywhere in the venue adapter |
|
||||
| `process_intent` latency | Zero timing instrumentation — no `time.monotonic()` in kernel path |
|
||||
| Process memory usage | No memory tracking of any kind |
|
||||
| Deduplicated vs fresh event count | Dedup detection exists but is never counted |
|
||||
|
||||
The `AccountSnapshot.trade_seq` field (account.py:27) is declared as `trade_seq: int = 0` but **never assigned** — no code path ever sets it above 0. It's a dead field.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### M11: Flaw 6 tests pass via metadata passthrough, not via `_legacy_intent` field logic
|
||||
|
||||
**File:** `test_flaws.py` Flaw 6 tests
|
||||
|
||||
The two Flaw 6 tests verify that `_legacy_intent` preserves `order_type` and `limit_price`. They pass because `_mk_intent(order_type="LIMIT")` puts the value into `intent.metadata`, and `_legacy_intent` copies `intent.metadata` into `legacy.metadata` verbatim. The tests check `legacy.metadata.get("order_type")` which finds the value in the passthrough — **not** because `_legacy_intent` reads `intent.order_type` correctly.
|
||||
|
||||
`_legacy_intent` actually reads `getattr(intent, "order_type", "MARKET")` which returns `"MARKET"` (the default, since `_mk_intent` put it in metadata not the field), and sets `legacy.metadata["_order_type"] = "MARKET"`. The assertion passes via the wrong code path. If `_legacy_intent` stopped copying metadata entirely, the tests would still pass as long as `intent.metadata` is passed through.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### M12: No retry or fallback for ClickHouse INSERT failures
|
||||
|
||||
Evidence across all persistence paths: every `sink(table, row)` call in `pink_clickhouse.py` is unprotected. If ClickHouse is unreachable, slow, or returns an error, the exception propagates unhandled through `persist_step()` → `step()`. No retry, no backoff, no fallback, no queue, no error reporting to `anomaly_events`.
|
||||
|
||||
This means a transient ClickHouse outage (common in cloud deployments) crashes the entire policy cycle. The slot state in the Rust kernel may be lost as the exception unwinds.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### M13: `AccountSnapshot.trade_seq` declared but never incremented — dead field
|
||||
|
||||
**File:** `account.py:27`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class AccountSnapshot:
|
||||
...
|
||||
trade_seq: int = 0
|
||||
```
|
||||
|
||||
This field is part of the `AccountSnapshot` dataclass. It's initialized to 0 and **never assigned or incremented** anywhere in the entire codebase. Every snapshot from `kernel.snapshot()["account"]` returns `trade_seq: 0`. Despite being a standard field in every persistence row, it's always `0` — making it impossible to order trades chronologically by sequence number from any persisted data.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### M14: `test_reentry_after_full_close_no_pnl_loss` uses absurdly loose 50% bound
|
||||
|
||||
**File:** `test_flaws.py:686-706`
|
||||
|
||||
```python
|
||||
assert abs(cap_after_second - cap_before) < cap_before * 0.5
|
||||
```
|
||||
|
||||
Allows a 50% capital deviation (12,500 USDT on 25,000). The actual PnL from the test's tiny trades (~0.02 USDT) is orders of magnitude smaller. A bug that silently leaked 10,000 USDT of PnL would pass this test. The bound provides no meaningful verification.
|
||||
|
||||
Also: the test never checks `diagnostic_code` for the warning it claims to test (already documented as I7 weakness).
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### M15: `test_reconcile_rejects_position_open_with_zero_size` passes even if reconcile silently ignores bad data
|
||||
|
||||
**File:** `test_flaws.py:568-585`
|
||||
|
||||
```python
|
||||
result = k.reconcile_from_slots([bad_slot])
|
||||
slot = k._get_slot(0)
|
||||
assert slot.fsm_state != TradeStage.POSITION_OPEN or slot.size > 0
|
||||
```
|
||||
|
||||
The assertion was true **before** calling reconcile (slot starts IDLE with size=0). The test never checks `result.accepted == False` or verifies the diagnostic code. If `reconcile_from_slots` silently ignores the bad slot and returns `accepted=True`, the test still passes — it only proves the slot wasn't in POSITION_OPEN _after_ reconcile, which was already true.
|
||||
|
||||
The same structural weakness exists in `test_reconcile_rejects_idle_with_nonzero_size`.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### M16: No built-in metric for active slot count, event throughput, or memory usage
|
||||
|
||||
The following operational metrics cannot be obtained without writing custom code:
|
||||
|
||||
- **Active slot count**: `len([s for s in kernel.state.slots if not s.is_free()])` — requires Python access to the `ExecutionKernel` object. No `active_slot_count` property exists.
|
||||
- **Total event count**: No counter. The journal tracks individual transitions but there's no `total_events_processed: int` anywhere.
|
||||
- **Memory usage**: No `tracemalloc`, no `psutil`, no RSS polling. Nothing.
|
||||
- **Runtime uptime**: No `start_time` or `uptime()` method anywhere.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### M17: M4 duplicate — test_cancel_uses_slot_asset_not_trade_id and test_mock_venue_cancel_event_has_asset never call cancel
|
||||
|
||||
**File:** `test_flaws.py` Flaw 9 class
|
||||
|
||||
Both tests verify that an entry order's metadata contains an `asset` key. They never call `scenario.cancel()` or `k.process_intent(action=CANCEL)`. Despite their names and class (`TestFlaw9CancelSymbolFallback`), they test **metadata preservation on entry**, not cancel behavior.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### M18: `_decision_to_kernel_intent` drops `order_type` and `limit_price` — LIMIT orders unreachable from the runtime
|
||||
|
||||
**File:** `pink_direct.py:79-115` (inferred from E2E trace)
|
||||
|
||||
The bridge function converts a `Decision` to a `KernelIntent`. It sets `timestamp`, `intent_id`, `trade_id`, `asset`, `side`, `action`, `reference_price`, `target_size`, `leverage`, `exit_leg_ratios`, `reason`, and `metadata`. It does **NOT** set `order_type` or `limit_price` — both default to `"MARKET"` and `0.0`.
|
||||
|
||||
Even if the `DecisionEngine` produced a LIMIT decision with a limit price, the runtime has no path to express it. The entire LIMIT-order pipeline is dead code from the runtime — LIMIT orders can only be set via direct `KernelIntent(...)` construction in tests, which is itself broken (M3).
|
||||
|
||||
**Severity: High**
|
||||
|
||||
---
|
||||
|
||||
## Pass 10 Summary
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** |
|
||||
| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** |
|
||||
| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** |
|
||||
| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** |
|
||||
| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium |
|
||||
| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium |
|
||||
| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low |
|
||||
| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium |
|
||||
| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position | Rust | **Critical** |
|
||||
| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** |
|
||||
| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** |
|
||||
| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** |
|
||||
| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium |
|
||||
| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low |
|
||||
| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low |
|
||||
| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium |
|
||||
| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** |
|
||||
| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** |
|
||||
|
||||
### Pass 10 Severity
|
||||
|
||||
| Severity | Count |
|
||||
|----------|-------|
|
||||
| **Critical** | 3 (M1, M2, M9) |
|
||||
| **High** | 7 (M3, M4, M10, M11, M12, M17, M18) |
|
||||
| Medium | 5 (M5, M6, M8, M13, M16) |
|
||||
| Low | 3 (M7, M14, M15) |
|
||||
|
||||
### Combined Catalog (All 10 Passes)
|
||||
|
||||
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|
||||
|------|-------|-------|----------|------|--------|-----|------|
|
||||
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
|
||||
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
|
||||
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
|
||||
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
||||
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
|
||||
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
|
||||
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
|
||||
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
|
||||
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
|
||||
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
|
||||
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
|
||||
| **Total** | | **233** | **16** | **66** | **68** | **56** | **27** |
|
||||
|
||||
Reference in New Issue
Block a user