diff --git a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md index 3fa2c5f..9eb8487 100644 --- a/PINK_DITAv2_E2E_TRACE_ANALYSIS.md +++ b/PINK_DITAv2_E2E_TRACE_ANALYSIS.md @@ -4012,3 +4012,298 @@ If an entry order exists (even if fully filled and the slot is in `POSITION_OPEN | K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 | | L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 | | **Total** | | **215** | **13** | **59** | **63** | **53** | **27** | + +--- + +## PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT + +### M1: ENTER transition hardcodes `prev_state = IDLE` — every non-IDLE entry corrupts the audit trail + +**File:** `_rust_kernel/src/lib.rs:1117` + +```rust +let transition = self.transition( + &slot, + TradeStage::IDLE, // HARDCODED — lies about actual prev_state + slot.fsm_state.clone(), + "ENTER_INTENT", +); +``` + +When a slot is entered from `CLOSED` (re-entry) or from any other state that passed the `is_free()` or same-trade bypass, the transition record claims `prev_state = IDLE`. This is **always wrong unless the slot was genuinely IDLE**. Every ENTER transition in the journal for a re-entered slot or a slot coming from CLOSED records an impossible transition (`CLOSED → ORDER_REQUESTED` recorded as `IDLE → ORDER_REQUESTED`). + +This corrupts any downstream FSM analysis, journal audit, or trade-lifecycle reconstruction that relies on accurate `prev_state` values. + +**Severity: Critical** + +### M2: CANCEL intent creates no transition record — invisible in audit log + +**File:** `_rust_kernel/src/lib.rs:1287-1305` + +The CANCEL branch in `process_intent` returns a `KernelResult` with no call to `self.transition()`. Every other intent (ENTER, EXIT, MARK_PRICE, RECONCILE) records a transition. CANCEL operations — including accepted cancels — are **invisible in the transition audit log**. + +Additionally, CANCEL returns `accepted = true` but never mutates the slot's `fsm_state`. The slot stays in whatever state it was in. The caller sees `accepted = true` with no visible effect. + +**Severity: Critical** + +### M3: `_mk_intent` test helper drops `order_type`/`limit_price` into `metadata` instead of setting proper fields + +**File:** `test_flaws.py:43` + +```python +def _mk_intent(action, trade_id="t1", size=0.001, price=100.0, slot_id=0, **kw): + return KernelIntent( + ... + metadata=kw, # order_type="LIMIT" goes into metadata dict, not the dataclass field! + ) +``` + +`KernelIntent` has dedicated fields `order_type: str = "MARKET"` and `limit_price: float = 0.0` (contracts.py:274-275), but `_mk_intent` passes `**kw` as `metadata=kw`. So `_mk_intent(order_type="LIMIT")` produces `intent.order_type == "MARKET"` (the default) while `intent.metadata["order_type"] == "LIMIT"`. + +The Flaw 6 tests in `test_flaws.py` that verify `order_type`/`limit_price` preservation through `_legacy_intent` pass for the **wrong reason** — they check `legacy.metadata.get("order_type")` which finds the value in the passthrough metadata, not because `_legacy_intent` correctly reads `intent.order_type`. If the production code changes and the test helper isn't fixed, the tests silently become false positives. + +**Severity: High** + +### M4: `test_cancel_entry_with_partial_fill` never sends a CANCEL — misnamed vacuous test + +**File:** `test_flaws.py:161-172` + +```python +def test_cancel_entry_with_partial_fill(self): + k = _fresh_kernel(scenario=MockVenueScenario(partial_fill_ratio=0.5)) + k.process_intent(_mk_intent(action=E.ENTER, trade_id="ce4", size=0.002)) + slot_after = k._get_slot(0) + assert slot_after.size > 0, "Should have partial fill" +``` + +Named "Cancel entry with partial fill," belongs to `TestFlaw1EntryCancel` — but **no CANCEL intent is ever sent**. It only verifies that a partial fill occurred. The test is completely vacuous for its stated purpose. + +The same pattern affects Flaw 9 tests — `test_cancel_uses_slot_asset_not_trade_id` and `test_mock_venue_cancel_event_has_asset` both have "cancel" in their names but never call any cancel function. + +**Severity: High** + +### M5: Flaw 7 tests (`test_entry_exit_different_ratios`, `test_per_action_type_ratios`) never send EXIT + +**File:** `test_flaws.py` Flaw 7 test class + +Both tests set `exit_partial_fill_ratio` on the mock venue scenario but only ever process an ENTER intent. The `exit_partial_fill_ratio` is configured but never exercised. The tests verify entry partial fill behavior only — they don't test what their titles and class name claim. + +**Severity: Medium** + +### M6: `test_dedup_window_accepts_many_events` uses wrong constant — actual=256, flaw claims 64-only 70 events sent + +**File:** `test_flaws.py:536-555` + +The Flaw 10 tests reference a 64-event dedup window, but the actual Rust constant is `MAX_SEEN_EVENT_IDS = 256` (lib.rs:8). The test sends 70 events and asserts `>= 70`. Since `70 < 256`, no eviction occurs. The test passes trivially regardless of whether the old-64-bound flaw exists. To meaningfully test eviction, >256 events would be needed. + +Similarly, `test_dedup_eviction_does_not_accept_old_event` sends only 70 events then checks for dedup — with a 256-entry window, the first event is never evicted. The test verifies basic dedup (non-evicted), not eviction behavior. + +**Severity: Medium** + +### M7: `test_outcome_state_matches_actual_slot` is tautological — compares value with itself + +**File:** `test_flaws.py:200-210` + +```python +result = k.process_intent(_mk_intent(action=E.ENTER, trade_id="oc1")) +slot = k._get_slot(0) +assert result.state == slot.fsm_state, +``` + +`result.state` is set from `final_slot.fsm_state` (which comes from `self._get_slot(outcome.slot_id)` inside `process_intent`). The test then calls `k._get_slot(0)` again. Both read from the same Rust backend — they **must** be equal by construction. This test proves nothing; it's a tautology. + +**Severity: Low** + +### M8: ORDER_ACK silent fallthrough when no active order — accepts event with no effect + +**File:** `_rust_kernel/src/lib.rs:1476-1498` + +When `on_venue_event` receives an `ORDER_ACK` for a slot with neither `active_entry_order` nor `active_exit_order` (shouldn't happen normally, but possible after a reconcile or race), the match arm executes **no branch**. The state is unchanged, `diagnostic_code` stays `OK`, and `accepted = true`. The event is silently accepted with no effect — no diagnostic, no warning. + +The same bug exists for `CANCEL_ACK` (line 1545): if no matching active order exists, the event is silently accepted with no state change and `OK` diagnostic. + +**Severity: Medium** + +### M9: ORDER_REJECT on POSITION_OPEN with stale entry order destroys the position + +**File:** `_rust_kernel/src/lib.rs:1499-1530` + +```rust +KernelEventKind::ORDER_REJECT => { + if slot.active_entry_order.is_some() && slot.fsm_state != TradeStage::POSITION_OPEN { + // clear entry, wipe trade data, set IDLE + } else if slot.active_exit_order.is_some() { + // clear exit order only, set POSITION_OPEN + } else { + // no match — reset to IDLE + } +} +``` + +If a slot is in `POSITION_OPEN` (position active) but `active_entry_order` is still `Some` (stale — didn't get cleared on fill), the entry-reject guard `fsm_state != POSITION_OPEN` prevents the entry path. It falls to the exit check. If no exit order, the final `else` branch fires — resetting the slot to **IDLE** and destroying the open position and all trade data. + +**Severity: Critical** + +### M10: No aggregation of any metric — trade count, success/fail, latency all zero + +**File:** entire codebase + +The following metrics are completely impossible to obtain from the current system: + +| Metric | Why unavailable | +|--------|----------------| +| Total trades processed | `trade_seq` declared on `AccountSnapshot` but never incremented anywhere | +| Succeeded vs failed trades | No aggregation of `KernelDiagnosticCode` outcomes | +| PnL per individual trade | `slot.realized_pnl` is overwritten on slot reuse — no per-trade persistence | +| Slippage (fill vs intended price) | Data exists transiently but no computed metric | +| API calls per minute | No call counters anywhere in the venue adapter | +| `process_intent` latency | Zero timing instrumentation — no `time.monotonic()` in kernel path | +| Process memory usage | No memory tracking of any kind | +| Deduplicated vs fresh event count | Dedup detection exists but is never counted | + +The `AccountSnapshot.trade_seq` field (account.py:27) is declared as `trade_seq: int = 0` but **never assigned** — no code path ever sets it above 0. It's a dead field. + +**Severity: High** + +### M11: Flaw 6 tests pass via metadata passthrough, not via `_legacy_intent` field logic + +**File:** `test_flaws.py` Flaw 6 tests + +The two Flaw 6 tests verify that `_legacy_intent` preserves `order_type` and `limit_price`. They pass because `_mk_intent(order_type="LIMIT")` puts the value into `intent.metadata`, and `_legacy_intent` copies `intent.metadata` into `legacy.metadata` verbatim. The tests check `legacy.metadata.get("order_type")` which finds the value in the passthrough — **not** because `_legacy_intent` reads `intent.order_type` correctly. + +`_legacy_intent` actually reads `getattr(intent, "order_type", "MARKET")` which returns `"MARKET"` (the default, since `_mk_intent` put it in metadata not the field), and sets `legacy.metadata["_order_type"] = "MARKET"`. The assertion passes via the wrong code path. If `_legacy_intent` stopped copying metadata entirely, the tests would still pass as long as `intent.metadata` is passed through. + +**Severity: High** + +### M12: No retry or fallback for ClickHouse INSERT failures + +Evidence across all persistence paths: every `sink(table, row)` call in `pink_clickhouse.py` is unprotected. If ClickHouse is unreachable, slow, or returns an error, the exception propagates unhandled through `persist_step()` → `step()`. No retry, no backoff, no fallback, no queue, no error reporting to `anomaly_events`. + +This means a transient ClickHouse outage (common in cloud deployments) crashes the entire policy cycle. The slot state in the Rust kernel may be lost as the exception unwinds. + +**Severity: High** + +### M13: `AccountSnapshot.trade_seq` declared but never incremented — dead field + +**File:** `account.py:27` + +```python +@dataclass +class AccountSnapshot: + ... + trade_seq: int = 0 +``` + +This field is part of the `AccountSnapshot` dataclass. It's initialized to 0 and **never assigned or incremented** anywhere in the entire codebase. Every snapshot from `kernel.snapshot()["account"]` returns `trade_seq: 0`. Despite being a standard field in every persistence row, it's always `0` — making it impossible to order trades chronologically by sequence number from any persisted data. + +**Severity: Medium** + +### M14: `test_reentry_after_full_close_no_pnl_loss` uses absurdly loose 50% bound + +**File:** `test_flaws.py:686-706` + +```python +assert abs(cap_after_second - cap_before) < cap_before * 0.5 +``` + +Allows a 50% capital deviation (12,500 USDT on 25,000). The actual PnL from the test's tiny trades (~0.02 USDT) is orders of magnitude smaller. A bug that silently leaked 10,000 USDT of PnL would pass this test. The bound provides no meaningful verification. + +Also: the test never checks `diagnostic_code` for the warning it claims to test (already documented as I7 weakness). + +**Severity: Low** + +### M15: `test_reconcile_rejects_position_open_with_zero_size` passes even if reconcile silently ignores bad data + +**File:** `test_flaws.py:568-585` + +```python +result = k.reconcile_from_slots([bad_slot]) +slot = k._get_slot(0) +assert slot.fsm_state != TradeStage.POSITION_OPEN or slot.size > 0 +``` + +The assertion was true **before** calling reconcile (slot starts IDLE with size=0). The test never checks `result.accepted == False` or verifies the diagnostic code. If `reconcile_from_slots` silently ignores the bad slot and returns `accepted=True`, the test still passes — it only proves the slot wasn't in POSITION_OPEN _after_ reconcile, which was already true. + +The same structural weakness exists in `test_reconcile_rejects_idle_with_nonzero_size`. + +**Severity: Low** + +### M16: No built-in metric for active slot count, event throughput, or memory usage + +The following operational metrics cannot be obtained without writing custom code: + +- **Active slot count**: `len([s for s in kernel.state.slots if not s.is_free()])` — requires Python access to the `ExecutionKernel` object. No `active_slot_count` property exists. +- **Total event count**: No counter. The journal tracks individual transitions but there's no `total_events_processed: int` anywhere. +- **Memory usage**: No `tracemalloc`, no `psutil`, no RSS polling. Nothing. +- **Runtime uptime**: No `start_time` or `uptime()` method anywhere. + +**Severity: Medium** + +### M17: M4 duplicate — test_cancel_uses_slot_asset_not_trade_id and test_mock_venue_cancel_event_has_asset never call cancel + +**File:** `test_flaws.py` Flaw 9 class + +Both tests verify that an entry order's metadata contains an `asset` key. They never call `scenario.cancel()` or `k.process_intent(action=CANCEL)`. Despite their names and class (`TestFlaw9CancelSymbolFallback`), they test **metadata preservation on entry**, not cancel behavior. + +**Severity: High** + +### M18: `_decision_to_kernel_intent` drops `order_type` and `limit_price` — LIMIT orders unreachable from the runtime + +**File:** `pink_direct.py:79-115` (inferred from E2E trace) + +The bridge function converts a `Decision` to a `KernelIntent`. It sets `timestamp`, `intent_id`, `trade_id`, `asset`, `side`, `action`, `reference_price`, `target_size`, `leverage`, `exit_leg_ratios`, `reason`, and `metadata`. It does **NOT** set `order_type` or `limit_price` — both default to `"MARKET"` and `0.0`. + +Even if the `DecisionEngine` produced a LIMIT decision with a limit price, the runtime has no path to express it. The entire LIMIT-order pipeline is dead code from the runtime — LIMIT orders can only be set via direct `KernelIntent(...)` construction in tests, which is itself broken (M3). + +**Severity: High** + +--- + +## Pass 10 Summary + +| # | Flaw | Layer | Severity | +|---|------|-------|----------| +| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** | +| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** | +| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** | +| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** | +| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium | +| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium | +| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low | +| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium | +| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position | Rust | **Critical** | +| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** | +| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** | +| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** | +| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium | +| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low | +| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low | +| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium | +| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** | +| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** | + +### Pass 10 Severity + +| Severity | Count | +|----------|-------| +| **Critical** | 3 (M1, M2, M9) | +| **High** | 7 (M3, M4, M10, M11, M12, M17, M18) | +| Medium | 5 (M5, M6, M8, M13, M16) | +| Low | 3 (M7, M14, M15) | + +### Combined Catalog (All 10 Passes) + +| Pass | Focus | Count | Critical | High | Medium | Low | Info | +|------|-------|-------|----------|------|--------|-----|------| +| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 | +| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 | +| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 | +| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 | +| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 | +| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 | +| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 | +| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 | +| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 | +| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 | +| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 | +| **Total** | | **233** | **16** | **66** | **68** | **56** | **27** | diff --git a/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md b/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md index e9866a6..2fb5c66 100644 --- a/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md +++ b/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md @@ -24,7 +24,8 @@ | J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 | | K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 | | L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 | -| **Total** | | **215** | **13** | **59** | **63** | **53** | **27** | +| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 | +| **Total** | | **233** | **16** | **66** | **68** | **56** | **27** | --- @@ -278,6 +279,33 @@ --- +## M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10) + +*Full detail in TRACE doc under "PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT."* + +| # | Flaw | Layer | Severity | +|---|------|-------|----------| +| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** | +| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** | +| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** | +| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** | +| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium | +| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium | +| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low | +| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium | +| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position | Rust | **Critical** | +| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** | +| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** | +| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** | +| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium | +| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low | +| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low | +| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium | +| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** | +| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** | + +--- + ## H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5) *Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."*