PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
# PINK DITAv2 — Structural Flaw Analysis (CENTRAL)
**Analysis date:** 2026-05-31
2026-06-02 18:04:33 +02:00
**Last updated:** 2026-06-02 (flaw fix pass 4 — W10 closed; 17 total fixed)
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
**Scope:** Full PINK pipeline — all flaws across all modules.
2026-06-02 16:34:58 +02:00
> **Fix notation:** Rows marked **✅ FIXED `<sha>`** are verified-fixed with a test commit on branch `exp/pink-ditav2-sprint0-20260530`.
> Unfixed flaws remain as originally described.
### Fixes applied (2026-06-02)
| Flaw | Commit | What changed |
|------|--------|--------------|
| I1 — apply_fill partial fill accumulation | `c87ca78` | `slot.size += fill_size` (was `=` ); all partial fills accumulate |
| I10 — seen_event_ids lost on restart | `c87ca78` + `3ca154e` | IndexSet dedup in AccountState; KernelFullSnapshot persists per-slot seen_event_ids; startup reconcile wired (I14) |
| I13 — stray event reactivates CLOSED slot | `c87ca78` | `slot.closed` guard returns TERMINAL_STATE before FSM branch |
| I14 — no Zinc restore on startup | `3ca154e` | `__init__` now calls `reconcile_from_slots(zinc_live)` for any non-idle Zinc slots |
| I15 — CANCEL_REJECT leaves slot in EXIT_WORKING | `9a8d1b9` | Clears `active_exit_order` , transitions to `POSITION_OPEN` |
| O1 — `_maybe_close()` silent skip from async context | `338811e` | Routes to thread-pool executor when a running loop is detected |
| O5 — `_run()` no timeout → process hang | `338811e` | `Future.result(timeout=_BACKEND_TIMEOUT_S)` (default 30 s); raises `TimeoutError` |
| O10 — no `close()` on ExecutionKernel | `3ca154e` | `close()` nulls `_backend` to prevent double-free; `__enter__` /`__exit__` added |
| N1 — `with_handle_mut` zero sync (partial) | `c87ca78` | `catch_unwind` at FFI boundary; concurrent-call UB mitigated by Python GIL |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
**Sources:**
- This file (A-series): Detailed writeups for architectural flaws.
- [PINK_DITAv2_E2E_TRACE_ANALYSIS.md ](./PINK_DITAv2_E2E_TRACE_ANALYSIS.md ) (E, F, G-series):
Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans.
Every E, F, G entry below is a summary only — full detail is in the TRACE doc.
---
## Combined Catalog (All Flaws, All Passes)
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
2026-06-01 23:11:15 +02:00
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
2026-06-02 00:03:41 +02:00
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
2026-06-02 08:00:50 +02:00
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
2026-06-02 09:27:25 +02:00
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
2026-06-02 11:06:18 +02:00
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
2026-06-02 12:00:22 +02:00
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
2026-06-02 12:54:02 +02:00
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
PINK: E2E trace analysis — Pass 16 error handling/arithmetic/test infra (S1-S16)
Sixteenth pass: realized_pnl/mark_price NaN bypasses <=0 guard (S1 Critical),
MockVenue _exchange_event_queue check-then-act race drops events (S2 Critical),
no test_kernel_fsm.py exists (S3 Critical), generated tests use asyncio.sleep(0.8)
flaky on slow CI (S4 Critical), _rate_limit_retry_after_ms returns 0 on parse
failure instant retry storm (S5 High), venue adapter detects rate limits but
enforces zero backoff (S6 High), capital_epsilon=1e-4 too tight false WARN (S7
High), tests use asyncio.run() leaks tasks on 3.12+ (S8 High), str.replace()
patching silently does nothing (S9 High), WS _consume no per-message timeout (S10
High), _run blocks pool thread with no timeout lock adapter (S11 High).
305 total flaws across 16 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 13:32:53 +02:00
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
2026-06-02 14:10:49 +02:00
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
2026-06-02 14:47:36 +02:00
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
2026-06-02 16:34:58 +02:00
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
2026-06-02 18:04:33 +02:00
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
2026-06-02 18:39:49 +02:00
| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
| **Total** | | **389** | **43** | **117** | **114** | **64** | **37** |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
---
## T-Series: Threading & Atomicity Flaws
*Full detail in TRACE doc under "Threading & Atomicity" section.*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| T1 | `InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy | Zinc | **Critical** |
| T2 | Thread-unsafe kernel snapshot capture for account | Bridge | **High** |
| T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | **High** |
| T4 | Consequence: `on_venue_event` PnL settle races | Bridge | **High** |
| T5 | Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls | Zinc | Medium |
| T6 | `_write_region` buffer zero + notify race with concurrent reader | Zinc | Medium |
| T7 | Publication of events in `process_intent` loop not synchronized with persist | Bridge | Medium |
| T8 | `asyncio.run` executor skip in `_run` leads to event-loop stall | Venue | Low |
| T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low |
---
## E-Series: E2E Data-Flow Flaws (Pass 1)
*Full detail in TRACE doc under "Layer 1" through "Layer 9."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| E1 | `step()` calls `pump_venue_events()` every cycle unconditionally | Runtime | **High** |
| E2 | `kernel.snapshot()["account"]` returns a fresh dict, not a live view | Bridge | Low |
| E3 | `_decision_to_kernel_intent` drops `order_type` and `limit_price` | Runtime | **High** |
| E4 | `_exit_intent_from_slot` trusts slot.size but slot may be stale | Runtime | **High** |
| E5 | JSON serialization round-trip loses numeric precision | Bridge | Low |
| E6 | `_RustKernelLib` is a global singleton — shared across all kernels | Bridge | Low |
| E7 | ENTER handler silently allows re-entry with same trade_id | Rust | **High** |
| E8 | EXIT handler uses `initial_size` not current size | Rust | **High** |
| E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low |
| E10 | `apply_fill` entry branch double-sets `active_entry_order` | Rust | Low |
| E11 | `_legacy_intent()` is a lossy conversion | Venue | Low |
| E12 | `_events_from_submit()` price fallback chain can lose venue price | Venue | Low |
| E13 | `_backend_snapshot()` timeout returns stale data | Venue | Medium |
| E14 | `_events_from_cancel` uses stale `slot_id` from order metadata | Venue | Low |
| E15 | Submit sets leverage via separate HTTP call | Adapter | Medium |
| E16 | `_format_quantity` /`_format_price` may use zero tick/step | Adapter | Medium |
| E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium |
| E18 | `on_venue_event` settles PnL incrementally — fees never included | Bridge | Medium |
| E19 | `observe_slots` called with ALL slots, not just changed ones | Bridge | Low |
| E20 | `_capital()` reads live from `AccountProjection` — stale row risk | Persistence | Low |
| E21 | `persist_fill_events()` synthesizes fake Decision/Intent | Persistence | Medium |
| E22 | `_write_trade_exit_leg` capital_before uses arithmetic reconstruction | Persistence | Medium |
| E23 | `_write_trade_event` uses entry_price as exit_price | Persistence | Medium |
| E24 | Mock venue always emits fill on `partial_fill_ratio > 0` | Test | Low |
| E25 | Test scenarios use MARKET-only `_si()` helper — no LIMIT tests | Test | Low |
| E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low |
---
## F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)
*Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium |
| F2 | `_last_settled_pnl` reset before `venue.submit()` — transient window | Bridge | Medium |
| F3 | `_first_invalid_intent_field` allows `leverage=0` and `target_size=0` | Bridge | Low |
| F4 | `outcome.emitted_events` only from venue — Rust kernel events dropped | Bridge | Low |
| F5 | `on_venue_event` redundant FFI read of slot already returned by Rust | Bridge | Low |
| F6 | `process_intent` records pre-venue transitions with `event=None` | Bridge | Info |
| F7 | `reconcile_from_slots` writes ALL slots to projection/zinc | Bridge | Low |
| F8 | `HazelcastRowWriter.put()` synchronous, no error handling — crashes intent | Projection | **Medium** |
| F9 | `RealZincPlane.write_slot()` serializes ALL slots, not just changed one | Zinc | Low |
| F10 | `RealZincPlane` zeros buffer before write — concurrent read sees empty | Zinc | Low |
| F11 | `RealZincPlane._write_region` no partial-write recovery | Zinc | Low |
| F12 | `InMemoryZincPlane` intent_region grows without bound | Zinc | Low |
| F13 | `InMemoryZincPlane` uses non-re-entrant `threading.Condition` | Zinc | Low |
| F14 | `KernelSlotView.__setattr__` round-trips unknown fields — silently dropped | Bridge | Low |
| F15 | `on_venue_event` loop stops on first exception — slot left in partial state | Bridge | **High** |
| F16 | `venue.submit()` returning empty events leaves slot in ORDER_REQUESTED | Bridge | Medium |
| F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium |
| F18 | Leverage-set and order-submit failures share error handler | Adapter | Low |
| F19 | `_events_from_submit` stale snapshot fallback → wrong fill detection | Venue | Medium |
| F20 | `__del__` frees Rust handle at unpredictable GC time — no explicit close() | Bridge | **Medium** |
| F21 | `DITAv2LauncherBundle.close()` closes venue before kernel is done | Launcher | Low |
| F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | **Medium** |
| F23 | `VenueEvent.size` = `intent.target_size` not actual fill | Venue | Info |
| F24 | `asyncio.run()` inside async function in test generator | Test | Low |
| F25 | `_build_fresh_kernel_from_slot` leaks old kernel objects per call | Test | Low |
| F26 | `seen_event_ids` not cleared on re-entry — accumulates across trades | Rust | Low |
| F27 | `RealZincControlPlane.read()` parses Zinc region every call — no caching | Control | Low |
| F28 | `_legacy_intent` hardcodes confidence=1.0, bars_held=0 | Venue | Info |
| F29 | `_slot_to_payload` in real_zinc_plane.py is dead code | Zinc | Info |
| F30 | Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py | Zinc | Low |
---
## G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)
*Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | **Critical** |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
| G2 | `into_c_string` unwrap() panics on NUL byte in FFI string — ** ✅ FIXED `c87ca78` ** | Rust | **Critical** |
| G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition — ** ✅ FIXED `c87ca78` ** | Rust | **Critical** |
| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg — ** ✅ FIXED `fb03300` ** | Rust | **Critical** |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
| G5 | `realized_pnl` unbounded f64 — overflows to inf at extreme values | Rust | **High** |
| G6 | `mark_price` produces unbounded unrealized_pnl — no result guard | Rust | **High** |
| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
| G9 | `exchange_order_id` update targets wrong order — exit cancel broken — ** ✅ FIXED `fb03300` ** | Rust | **High** |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
| G12 | No max leverage cap enforced by kernel | Rust | Medium |
| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
| G15 | Zero `__post_init__` validators on all 16 config dataclasses (127 fields) | Config | **High** |
| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
| G17 | String config fields — Zinc region injection risk | Config | Medium |
| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
| G31 | Hazelcast projection never closed | Lifecycle | Medium |
| G32 | _maybe_close() break skips second method | Lifecycle | Low |
| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
---
## I-Series: Math, Tests, Concurrency, Recovery, Security (Pass 6)
*Full detail in TRACE doc under "PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
2026-06-02 16:34:58 +02:00
| I1 | Entry `apply_fill` multiple partial fills overwrite size instead of accumulating — ** ✅ FIXED `c87ca78` ** | Rust | **Critical** |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
| I2 | Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED | Rust | Medium |
| I3 | entry_price inconsistency — Python falsy vs Rust `<= 0.0` gate | Bridge | Info |
| I4 | Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer | Rust | **High** |
| I5 | MockVenueScenario rejection flags exist but zero tests use them | Test | **High** |
| I6 | No LIMIT order test through full kernel path | Test | **High** |
| I7 | Three weak/vacuous assertions in test_flaws.py | Test | Low |
| I8 | Entry overfill no guard | Rust | Low |
| I9 | No crash durability — slot state pure in-memory until step 7 of process_intent | Bridge | **Critical** |
2026-06-02 16:34:58 +02:00
| I10 | seen_event_ids lost on restart — events double-processed — ** ✅ FIXED `c87ca78` + `3ca154e` ** (IndexSet dedup + snapshot; startup restore wired) | Rust | **Critical** |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
| I11 | No idempotency key sent to BingX — lost response creates duplicate orders | Venue | **High** |
| I12 | No graceful degradation for ANY subsystem | All | **High** |
2026-06-02 16:34:58 +02:00
| I13 | Stray venue event can reactivate CLOSED slot — no guard — ** ✅ FIXED `c87ca78` ** | Rust | **High** |
| I14 | No reconcile_from_slots call on startup — Zinc state never loaded into kernel — ** ✅ FIXED `3ca154e` ** | Restart | **High** |
| I15 | CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING — ** ✅ FIXED `9a8d1b9` ** | Rust | Medium |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
| I16 | Zinc shared memory world-readable/writable by same-machine processes | Zinc | **High** |
| I17 | KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards | Bridge | **High** |
| I18 | sys.path.insert(0) at import time in 3 production files — malicious module loading | Build | **High** |
| I19 | pump_venue_events stale snapshot diff produces phantom position events | Venue | **High** |
| I20 | exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) | Contracts | Info |
| I21 | RATE_LIMITED code path in both Python and Rust is completely untested | All | Medium |
| I22 | Thread pool max_workers=3 shared across all adapter instances — never shut down | Venue | Medium |
---
## J-Series: Test Infra, Data Feed, Rust Deeper, Env Parsing, Connections (Pass 7)
*Full detail in TRACE doc under "PASS 7 — TEST INFRA, DATA FEED, RUST DEEPER, ENV PARSING, CONNECTIONS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| J1 | `_flatten` submits wrong direction for LONG positions | Test | Medium |
| J2 | `_check_slot_accounting` double-counts unrealized PnL | Test | Medium |
| J3 | `_build_live_snapshot` timestamp is float vs datetime — type crash risk | Data Feed | **High** |
| J4 | `ExecutionKernel.mark_price()` never called — no mark-to-market | Bridge | **High** |
| J5 | All VenueEvent timestamps use local clock, not exchange timestamp | Venue | Medium |
| J6 | No monotonic timestamp verification anywhere | All | Low |
| J7 | `rebuild_indexes()` overwrites duplicate trade_id — last wins | Rust | **High** |
| J8 | `resolve_slot()` falls back to slot 0 — stray event corrupts slot 0 | Rust | **High** |
| J9 | `get_slot_json` /`snapshot_json` return null with no diagnostic | Rust | Medium |
| J10 | Two processes with same DITA_V2_PREFIX corrupt shared Zinc memory | Zinc | **High** |
| J11 | `load_dotenv()` only runs on launcher.py import — ordering dependency | Config | Medium |
| J12 | BINGX_API_KEY passed None with no validation — fails at HTTP time | Config | Medium |
| J13 | API credentials never masked in error messages or tracebacks | Config | **High** |
| J14 | `_env_bool` inconsistent: empty string = False vs unset = default | Config | Low |
| J15 | gen2.py and _gen_test.py write to same output — last writer wins | Test | Medium |
| J16 | Shim test bridge lacks step(), decision_engine — zero runtime fidelity | Test | **High** |
---
## K-Series: Observability, Memory, Time, Dead Code, Module Init (Pass 8)
*Full detail in TRACE doc under "PASS 8 — OBSERVABILITY, MEMORY, TIME, DEAD CODE, MODULE INIT."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| K1 | Zero stdout/stderr — system completely silent | All | **Critical** |
| K2 | No health check, metrics, or monitoring surface | All | **Critical** |
| K3 | Failed trades produce no notification — error in return value only | Bridge | **High** |
| K4 | Exception tracebacks not captured — all except:pass swallow silently | All | **High** |
| K5 | ~85+ Python objects per process_intent — 36 TradeSlot copies via FFI | Bridge | Medium |
| K6 | Circular ref cycle Kernel→StateView→SlotView→Kernel — delays __del__ | Bridge | **High** |
| K7 | MemoryKernelJournal silently drops transitions after 10K rows | Journal | **High** |
| K8 | RealZincPlane._intent_cache unbounded Python list growth | Zinc | **High** |
| K9 | _backend_snapshot timeout uses wall clock — NTP truncates/extends | Venue | **High** |
| K10 | RealZincControlPlane.wait() uses wall clock — no monotonic | Control | Medium |
| K11 | exchange_ts fallback to local time.time() when E missing | Stream | Medium |
| K12 | No monotonic timestamp verification anywhere | All | Medium |
| K13 | ControlPlane.wait()/notify() — zero callers across all impls | Control | Info |
| K14 | AccountProjection.to_account_event() — zero callers | Account | Info |
| K15 | HazelcastProjector entire class dead | Projection | Info |
| K16 | _order_to_payload() dead code | Bridge | Info |
| K17 | MirroredControlPlane entire class dead — never constructed | Control | Info |
| K18 | 12 of 20 TradeStage variants never matched in Rust FSM | Rust | Low |
| K19 | Unused imports in projection.py and hazelcast_projection.py | Projection | Info |
| K20 | sys.path mutation on import — global side effect | Config | Medium |
| K21 | load_dotenv() at module import time — mutates os.environ globally | Config | Medium |
| K22 | ControlPlane protocol not exported in __all__ | Config | Info |
| K23 | KernelSlotView.__getattr__ makes FFI call per attribute access | Bridge | Medium |
---
2026-06-01 23:11:15 +02:00
## L-Series: Contracts, Exchange Events, Network, FFI, Backup Diffs (Pass 9)
*Full detail in TRACE doc under "PASS 9 — CONTRACTS, EXCHANGE EVENTS, NETWORK, FFI, BACKUP DIFFS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| L1 | `KernelOutcome(accepted=True, diag=INVALID_INTENT)` parseable — no invariant check | Bridge | Medium |
| L2 | `VenueEvent.filled_size > size` possible via different source fields | Venue | Medium |
| L3 | `VenueEvent.price=0` reaches kernel — zero-price fill = 100% loss PnL | Venue | **High** |
| L4 | `available_margin` set to cross-wallet balance, not available margin | Stream | **High** |
| L5 | `wallet_balance` defaults to 0 when `"wb"` absent — E-side reconcile always ERROR | Stream | **High** |
| L6 | `_keepalive_loop` no stop mechanism — runs on old key after rotation | Stream | Medium |
| L7 | `event_id` integer 0 → `str(0)` falsy on `or` → random UUID generated | Stream | Medium |
| L8 | Hardcoded VST URLs in test generators — wrong env if LIVE configured | Test | Medium |
| L9 | No proxy support — can't deploy behind corporate proxy | Network | Low |
| L10 | 5-minute DNS cache TTL — stale IPs on infrastructure change | Network | Low |
| L11 | `limit_price` getattr reads dataclass field, not metadata dict | Venue | Low |
| L12 | Backup diff: 14+ critical bugs fixed, 428-line dual-ledger accounting added | Rust | Info |
| L13 | `_build_full_runtime` dead — real pipeline never tested | Test | **High** |
| L14 | `listenKeyExpired` raises RuntimeError instead of clean yield | Stream | Medium |
| L15 | `_delete_listen_key` suppresses all exceptions — leaked server keys | Stream | Low |
| L16 | `venue_order_id` target selection ambiguous when entry order exists | Rust | Medium |
---
2026-06-02 00:03:41 +02:00
## M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10)
*Full detail in TRACE doc under "PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** |
| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** |
| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** |
| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** |
| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium |
| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium |
| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low |
| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position — ** ✅ FIXED `fb03300` ** | Rust | **Critical** |
2026-06-02 00:03:41 +02:00
| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** |
| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** |
| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** |
| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium |
| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low |
| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low |
| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium |
| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** |
| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** |
---
2026-06-02 08:00:50 +02:00
## N-Series: Async/Sync Seams, Lock Analysis, Threading (Pass 11)
*Full detail in TRACE doc under "PASS 11 — ASYNC/SYNC SEAMS, LOCK ANALYSIS, THREADING."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
2026-06-02 16:34:58 +02:00
| N1 | Rust kernel `with_handle_mut` zero sync — `&mut` from raw ptr, UB on concurrent FFI — *mitigated by Python GIL (single-threaded caller); catch_unwind added `c87ca78`* | Rust | **Critical** |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
### Fixes applied (2026-06-02 pass 2)
| Flaw | Commit | What changed |
|------|--------|--------------|
| G2 — `into_c_string` NUL panic | `c87ca78` | NUL bytes stripped/escaped; `CString::new` error path handled |
| G3 — EXIT backward FSM transition | `c87ca78` | EXIT no longer hardcodes `prev_state=POSITION_OPEN` ; uses real slot state |
| G4 — stale `all_legs_done` in exit leg | `fb03300` | Renamed pre-consume var; else-if guard reads fresh `active_leg_index` after consume |
| M9 — ORDER_REJECT nukes POSITION_OPEN | `fb03300` | Spurious reject (no matching order) no longer resets fsm_state; only entry-phase rejects → IDLE |
| G9 — venue_order_id targets wrong order | `fb03300` | Routes by FSM state: exit-phase events update exit order, not stale entry order |
| H6 — unknown enum variant crashes bridge | `fb03300` | `_safe_enum()` helper returns configurable default on unknown variants |
2026-06-02 08:00:50 +02:00
| N2 | `_run()` has two completely different code paths — runtime branch, not design | Venue | **Critical** |
| N3 | `_run()` path B blocks event loop thread for every venue HTTP operation | Venue | **Critical** |
| N4 | `asyncio.run()` called repeatedly — creates/destroys event loops per call | Venue | **Critical** |
| N5 | `_snapshot_ready` cascading re-fetch — N callers produce N overlapping HTTP | Venue | **High** |
| N6 | `BingxUserStream.close()` doesn't cancel pending tasks | Stream | Medium |
| N7 | Live test architecture forces worst-case `_run()` path for every operation | Test | Medium |
| N8 | `subscribe()` reconnect creates new tasks per iteration | Stream | Medium |
| N9 | No `asyncio.all_tasks()` or task accounting — leaked tasks undetectable | All | Low |
| N10 | `_snap_lock` no reader-side protection (informational) | Venue | Info |
2026-06-02 18:04:33 +02:00
### Fixes applied (2026-06-02 pass 3)
| Flaw | Commit | What changed |
|------|--------|--------------|
| V1 — `LauncherBundle.close()` missing `kernel.close()` | `8d9762c` | `self.kernel.close()` wired into bundle teardown; Rust handle deterministically destroyed |
| V2 — `BingxVenueAdapter` no `close()` | `8d9762c` | `close()` added; shuts down class-level `ThreadPoolExecutor` + delegates to `backend.close()` |
| V3 — `seen_event_ids` not cleared on slot reuse | `8d9762c` | `slot.seen_event_ids.clear()` added to ENTER handler in Rust kernel; fill dedup no longer pollutes across trades |
2026-06-02 18:39:49 +02:00
### Fixes applied (2026-06-02 pass 4)
| Flaw | Commit | What changed |
|------|--------|--------------|
| W10 — `BingxHttpError` blindly mapped to "REJECTED" | `e90d542` | `_http_error_status()` helper: 429/5xx/transport → RATE_LIMITED; 4xx → REJECTED |
2026-06-02 08:00:50 +02:00
---
2026-06-02 09:27:25 +02:00
## O-Series: Sync/Async Wider Scope (Launcher, Generators, Streams, FFI, Tests) (Pass 12)
*Full detail in TRACE doc under "PASS 12 — SYNC/ASYNC WIDER SCOPE."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
2026-06-02 16:34:58 +02:00
| O1 | `_maybe_close()` asyncio.run without loop guard — close skipped from async context — ** ✅ FIXED `338811e` ** | Launcher | **High** |
2026-06-02 09:27:25 +02:00
| O2 | `async def connect()` shims call sync venue.connect() without await — blocking | Test | Medium |
| O3 | `_contract_rows(client)` NOT awaited — `_pick_live_symbol` iterates coroutine = crash | Test | **High** |
| O4 | Deprecated `get_event_loop().run_until_complete()` in test file | Test | Medium |
2026-06-02 16:34:58 +02:00
| O5 | `_run()` thread pool .result() no timeout — backend hang freezes process — ** ✅ FIXED `338811e` ** | Venue | **High** |
2026-06-02 09:27:25 +02:00
| O6 | MockVenueAdapter never exercises thread-pool bridge — untested in CI | Venue | Medium |
| O7 | `_keepalive_loop` /`_rotation_sentinel` fire-and-forget — exceptions silently lost | Stream | Low |
| O8 | `KernelSlotView.__getattr__` N FFI calls for N fields — no caching | Bridge | Medium |
| O9 | `DITAv2LauncherBundle` no `__del__` — GC'd bundle leaks resource tree | Launcher | Medium |
2026-06-02 16:34:58 +02:00
| O10 | `ExecutionKernel` no `close()` — Rust handle only freed by unpredictable __del__ — ** ✅ FIXED `3ca154e` ** | Bridge | Medium |
2026-06-02 09:27:25 +02:00
| O11 | `KernelSlotView.__setattr__` triggers 5 persistence side effects — no read-only view | Bridge | Medium |
---
2026-06-02 11:06:18 +02:00
## P-Series: FFI Safety, Dangling Pointers, Coverage Gaps (Pass 13)
*Full detail in TRACE doc under "PASS 13 — FFI BOUNDARY SAFETY, DANGLING POINTERS, COVERAGE GAPS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| P1 | `dita_kernel_destroy` double-free UB — Python doesn't null handle.value | Bridge | **Critical** |
| P2 | `CStr::from_ptr(payload)` without null guard in 3 FFI exports | Rust | **High** |
| P3 | `_check_open_orders` calls `asyncio.run()` from async `_verify` — RuntimeError | Test | **High** |
| P4 | `into_c_string` NUL sanitizer produces invalid JSON — json.loads fails | Rust | Medium |
| P5 | `reconcile_slots_json` /`snapshot_json` return null on failure — no diagnostic | Rust | Medium |
| P6 | `_get_rust()` TOCTOU race — concurrent cargo build corruption | Bridge | **High** |
| P7 | `KernelHandle` no Send/Sync — FFI bypasses Rust ownership rules | Rust | Info |
| P8 | No explicit Rust handle destroy path from bundle.close() | Launcher | Medium |
| P9 | `__del__` accesses module `_RUST` during shutdown — NameError leak | Bridge | Low |
---
2026-06-02 12:00:22 +02:00
## Q-Series: Serde Edges, Backup Diffs, Market Data/Timestamps (Pass 14)
*Full detail in TRACE doc under "PASS 14 — SERDE EDGE CASES, BACKUP DIFFS, MARKET DATA/TIMESTAMPS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| Q1 | `fromisoformat()` can't parse Rust `Z` suffix on Python < 3.11 | Bridge | **High** |
| Q2 | No `#[serde(deny_unknown_fields)]` — misspelled fields silently default | Rust | Medium |
| Q3 | `indexmap` new dependency in current code | Rust | Info |
| Q4 | Backup diff: 6 critical bug fixes between backup and current | Rust | Info |
| Q5 | `MarketSnapshot.timestamp` type inconsistent — float vs datetime in same file | Data Feed | **High** |
| Q6 | `fromisoformat()` Z-suffix fail on all 5 timestamp deserialization sites | Bridge | **High** |
| Q7 | No upper-bound price validation — 1e300 passes all guards | Bridge | Medium |
| Q8 | `_first_invalid_intent_field` does not reject zero/negative price or zero size | Bridge | Low |
| Q9 | Rust/Python clock sources diverge — transition timestamps mixed source | Rust | Low |
| Q10 | `threading.Event.wait()` uses platform-dependent clock — NTP jump risk | Venue | Medium |
| Q11 | Backup had no STALE_STATE_RECONCILING guard in on_venue_event (info) | Rust | Info |
| Q12 | All 5 fromisoformat() sites fail on Python < 3.11 ( duplicate of Q1 ) | Bridge | **High** |
---
2026-06-02 12:54:02 +02:00
## R-Series: Resource Leaks, Trust Boundaries, Security (Pass 15)
*Full detail in TRACE doc under "PASS 15 — RESOURCE LEAKS, TRUST BOUNDARIES, SECURITY."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| R1 | `ThreadPoolExecutor` never shut down — 3 threads leak | Venue | **High** |
| R2 | `BingxVenueAdapter` no `close()` — backend HTTP client unreleasable | Venue | **High** |
| R3 | `real_zinc_plane._intent_cache` grows unboundedly | Plane | **High** |
| R4 | `RealZincPlane` /`ControlPlane` partial-construction SharedRegion leak | Plane | Medium |
| R5 | `BingxUserStream.ClientSession` no `__del__` — connection pool leak | Venue | Medium |
| R6 | `test_alpha_blue_untouched_g7.py` open() without context manager | Test | Low |
| R7 | All exchange REST/WS data parsed without schema validation | Venue | **Critical** |
| R8 | Shared memory JSON deserialization without integrity check | Plane | **High** |
| R9 | `restore_state()` deserializes arbitrary JSON — full kernel takeover | Bridge | **Critical** |
| R10 | `DOLPHIN_BINGX_ENV` + `ALLOW_MAINNET` mainnet switch via env | Config | **High** |
| R11 | `.env` file loaded from project root — secrets exposure | Config | **High** |
| R12 | Unvalidated `int()` on env vars — recv_window, leverage extremes | Config | Medium |
| R13 | `listenKey` from exchange in WS URL f-string — MITM injection | Venue | **High** |
| R14 | `mock_venue._exchange_event_queue` unbounded growth | Test | Low |
---
PINK: E2E trace analysis — Pass 16 error handling/arithmetic/test infra (S1-S16)
Sixteenth pass: realized_pnl/mark_price NaN bypasses <=0 guard (S1 Critical),
MockVenue _exchange_event_queue check-then-act race drops events (S2 Critical),
no test_kernel_fsm.py exists (S3 Critical), generated tests use asyncio.sleep(0.8)
flaky on slow CI (S4 Critical), _rate_limit_retry_after_ms returns 0 on parse
failure instant retry storm (S5 High), venue adapter detects rate limits but
enforces zero backoff (S6 High), capital_epsilon=1e-4 too tight false WARN (S7
High), tests use asyncio.run() leaks tasks on 3.12+ (S8 High), str.replace()
patching silently does nothing (S9 High), WS _consume no per-message timeout (S10
High), _run blocks pool thread with no timeout lock adapter (S11 High).
305 total flaws across 16 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 13:32:53 +02:00
## S-Series: Error Handling, Arithmetic Stability, Test Infrastructure (Pass 16)
*Full detail in TRACE doc under "PASS 16 — ERROR HANDLING, ARITHMETIC STABILITY, TEST INFRASTRUCTURE."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| S1 | `realized_pnl()` /`mark_price()` NaN bypasses `<=0.0` guard — NaN PnL corrupts k_realized_pnl | Rust/Python | **Critical** |
| S2 | MockVenue `_exchange_event_queue` check-then-act race — silently drops events | Test | **Critical** |
| S3 | No `test_kernel_fsm.py` or `test_kernel_fsm_recovery.py` exists | Test | **Critical** |
| S4 | Generated tests use `await asyncio.sleep(0.8)` — flaky false negatives on slow CI | Test | **Critical** |
| S5 | `_rate_limit_retry_after_ms()` returns 0 on parse failure — instant retry storm | Venue | **High** |
| S6 | Venue adapter detects rate limits but enforces zero backoff | Venue | **High** |
| S7 | `capital_epsilon = 1e-4` too tight — false WARN classifications | Accounting | **High** |
| S8 | Generated tests use module-level `asyncio.run()` — leaks tasks on Python 3.12+ | Test | **High** |
| S9 | `str.replace()` patching silently does nothing on format change | Build | **High** |
| S10 | `_consume()` no per-message WS timeout — silent hang blocks forever | Venue | **High** |
| S11 | `_run()` blocks pool thread with no timeout — 3 hung calls lock adapter | Venue | **High** |
| S12 | Rate-limit regex depends on exchange message format — non-portable | Venue | Medium |
| S13 | `_row_float()` silently skips malformed rows, filters zero values | Venue | Medium |
| S14 | Reconnection backoff lacks jitter — thundering herd risk | Venue | Medium |
| S15 | `_venue_event_status_from_row()` falls back to ACKED — masks new rejections | Venue | Medium |
| S16 | `except: pass` in generated test code — swallows KeyboardInterrupt | Test | Medium |
---
2026-06-02 14:10:49 +02:00
## T-Series: Unsafe Review, Dead Code/Backup Debris, Build/Plane Protocols (Pass 17)
*Full detail in TRACE doc under "PASS 17 — UNSAFE REVIEW, DEAD CODE/BACKUP DEBRIS, BUILD/PLANE PROTOCOLS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| T1 | `catch_unwind` + `AssertUnwindSafe` — partially mutated state persists, no rollback | Rust | **High** |
| T2 | Empty backup dir `_backup_20260530_105512/` and stale `tea_debug.log` (0 bytes) | Repo | Low |
| T3 | `HazelcastRowWriter` uses bare `json.dumps(row, default=str)` — Enums/datetimes wrong format | Bridge | **High** |
| T4 | `real_zinc_plane._slot_from_payload()` direct key access `payload["last_event_time"]` — KeyError | Plane | **High** |
| T5 | `_build_pink_bodies.py` `str.index("]")` finds first `]` — corrupts SCENARIOS list | Build | **High** |
| T6 | `VenueAdapter` protocol missing `connect()` /`disconnect()` — AttributeError at runtime | Venue | **High** |
| T7 | Shared memory writes non-atomic — visible-zero window exposes partial state | Plane | **High** |
| T8 | `_slot_from_payload()` reconstructs internal_trade_id from slot trade_id — order data loss | Plane | Medium |
| T9 | `_slot_from_payload()` duplicated in two files — schema drift risk | Plane | Medium |
| T10 | `str.index("finally:")` finds first match — nested try/finally mismatch | Build | Medium |
| T11 | No workspace-root `.gitignore` — __pycache__ , backup dirs, debris untracked | Repo | Low |
| T12 | `projection.py` lazy import failure silently swallowed — writer=None drops all writes | Bridge | Medium |
| T13 | `Codex_CONTEXT_RESTORE__*.txt` AI context files in root — debris | Repo | Low |
| T14 | `_backup_20260530/` is a valid Python package — accidental old-code import risk | Repo | Medium |
---
2026-06-02 14:47:36 +02:00
## U-Series: Rust Test Gaps, Accounting Reconciliation Bugs, FFI Type Mismatches (Pass 18)
*Full detail in TRACE doc under "PASS 18 — RUST TEST GAPS, ACCOUNTING RECONCILIATION BUGS, FFI TYPE MISMATCHES."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| U1 | `order_type` /`limit_price` sent to Rust, no serde fields — silently dropped | FFI | **High** |
| U2 | Rust `VenueEventStatus` expects `"CANCEL_REJECTED"` (typo) — `"CANCELED_REJECTED"` fails deserialization | Rust | **High** |
| U3 | R2 compares cumulative K realized vs single-last-fill E realized — broken after 2nd+ fill | Accounting | **Critical** |
| U4 | R4 compares K open_notional vs E used_margin — fundamentally different quantities | Accounting | **Critical** |
| U5 | R3 skipped when `len(e.positions)==0` — K has positions but E reports none, silent | Accounting | **High** |
| U6 | `on_venue_event` /`apply_fill` no NaN guards on venue event price/size — NaN propagates | Rust | **Critical** |
| U7 | Zero Rust tests for ORDER_REJECT, PARTIAL_FILL, TERMINAL_STATE guard, etc. | Test | **High** |
| U8 | `safe_float()` returns NaN/Inf instead of default — contradicts `_safe()` | Bridge | Medium |
| U9 | `_scan_slots` uses `metadata.get("leverage")` not `slot.leverage` — wrong leverage | Accounting | Medium |
| U10 | Rust injects `"k_net_fees"` key alongside serde's `k_fees_paid` — duplicate | Bridge | Low |
| U11 | 10+ AccountState fields transmitted across FFI but never read by Python | FFI | Low |
| U12 | `_order_from_payload()` overwrites `internal_trade_id` with slot's trade_id | Bridge | Medium |
| U13 | No independent third reference — symmetrical K=E errors invisible | Accounting | Medium |
| U14 | `lot_step` declared in ReconcileConfig but never used — dead field | Accounting | Low |
---
2026-06-02 16:34:58 +02:00
## V-Series: Startup/Shutdown Lifecycle, Rust Kernel Subtleties, Generated Test Infra (Pass 19)
*Full detail in TRACE doc under "PASS 19 — STARTUP/SHUTDOWN LIFECYCLE, RUST KERNEL SUBTLETIES, GENERATED TEST INFRA."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
2026-06-02 18:04:33 +02:00
| V1 | `DITAv2LauncherBundle.close()` never calls `kernel.close()` — Rust handle leaks via `__del__` — ** ✅ FIXED `8d9762c` ** | Launcher | **Critical** |
| V2 | `BingxVenueAdapter` no `close()` /`disconnect()` — ThreadPoolExecutor/HTTP never release — ** ✅ FIXED `8d9762c` ** | Venue | **Critical** |
| V3 | `process_intent` ENTER doesn't clear `seen_event_ids` — old dedup pollutes new trade — ** ✅ FIXED `8d9762c` ** | Rust | **High** |
2026-06-02 16:34:58 +02:00
| V4 | 3 generators write same output file — last writer wins, incompatible prologues | Test | **Critical** |
| V5 | Generated tests triple env-gated — never run in CI, dead code | Test | **Critical** |
| V6 | `kernel.close()` destroys Rust handle immediately — no drain, no flush, UAF risk | Bridge | **Critical** |
| V7 | `_last_settled_pnl` dict accessed from process_intent and on_venue_event without locks | Bridge | Medium |
| V8 | `#[serde(default)] leverage: f64` default 0.0 — mark_price uses directly no .max(1.0) | Rust | Medium |
| V9 | No `conftest.py` , no `pytest.ini` , no `asyncio_mode` — test discovery fragile | Test | **High** |
| V10 | `kernel.close()` `except Exception: pass` — silently swallows destroy errors | Bridge | Low |
| V11 | `build_launcher_bundle()` no cleanup on partial failure — OOM orphans 4 components | Launcher | Medium |
| V12 | `KernelResult` clones entire kernel state every FFI call — wasted allocations | Rust | Medium |
| V13 | `_build_rb()` leaks bundle on post-creation failure | Test | Low |
| V14 | `_maybe_close` breaks after first method — never tries both close and disconnect | Launcher | Low |
---
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
## W-Series: Configuration Management, Math Sign Conventions, BingX Protocol (Pass 20)
*Full detail in TRACE doc under "PASS 20 — CONFIGURATION MANAGEMENT, MATH SIGN CONVENTIONS, BINGX PROTOCOL."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| W1 | `int()` on 3 env vars uncaught `ValueError` — non-numeric input crashes process | Config | **Critical** |
| W2 | `DITA_V2_PREFIX` default `"dita_v2"` — multi-process shared memory corruption | Config | **Critical** |
| W3 | Funding sign opposite Python V2 vs Rust — same raw value opposite capital effect | Accounting | **Critical** |
| W4 | `listenKeyExpired` frames silently swallowed — `continue` skips expiry check, dead code | Venue | **Critical** |
| W5 | `RECV_WINDOW_MS` no upper bound — extreme values enable replay attacks | Config | **High** |
| W6 | `ACTIVE_SLOT_LIMIT` stored but never enforced by Rust kernel — dead config | Config | **High** |
| W7 | No fill history fetched during WS reconnect gap-backfill — fills permanently lost | Venue | **High** |
| W8 | Rate limit detection fails on HTTP 429 without matching message — returns 0 instant retry | Venue | **High** |
| W9 | `CONTROL_PLANE=REAL_ZINC` silently falls back to in-memory — no persistence | Config | **High** |
2026-06-02 18:39:49 +02:00
| W10 | All `BingxHttpError` mapped to "REJECTED" — can't distinguish errors from real rejections — ** ✅ FIXED `e90d542` ** | Venue | **High** |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
| W11 | `os.environ["KEY"]` bracket access in tests vs `.get()` in launcher — inconsistent | Test | **High** |
| W12 | `MockVenueScenario` no `rate_limit` flag — RATE_LIMITED path untested in CI | Test | Medium |
| W13 | Rate-limit regex uses English phrase `"unblocked after"` — non-portable | Venue | Medium |
| W14 | Invalid `ACTIVE_SLOT_LIMIT` values silently discarded — no log, no warning | Config | Medium |
---
2026-06-02 18:04:33 +02:00
## X-Series: Rust Build/Deps, Python Packaging, Shared Memory Protocol (Pass 21)
*Full detail in TRACE doc under "PASS 21 — RUST BUILD/DEPS, PYTHON PACKAGING, SHARED MEMORY PROTOCOL."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| X1 | No ABI compatibility check on Rust `.so` load — stale binary corrupts silently | Bridge | **Critical** |
| X2 | `real_zinc_plane._write_region()` zeroes entire buffer before write — visible all-zero window | Plane | **Critical** |
| X3 | No `requirements.txt` /`setup.py` /`pyproject.toml` — zero Python dependency declarations | Build | **Critical** |
| X4 | `RealZincControlPlane.update()` no thread lock — concurrent calls corrupt seq and shared memory | Plane | **High** |
| X5 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | **High** |
| X6 | 5 test files use hardcoded `sys.path.insert(0, "/mnt/dolphinng5_predict")` — non-portable | Test | **High** |
| X7 | `_decode_packet()` no try/except on `json.loads` — partial body read crashes reader | Plane | **High** |
| X8 | `ExchangeEvent` /`ExchangeEventKind` not exported from `__init__.py` | Bridge | **High** |
| X9 | No MSRV or `rust-toolchain.toml` — builds differ per Rust version | Rust | Medium |
| X10 | RealZincPlane and RealZincControlPlane collide on `{prefix}_control` region name | Plane | Medium |
| X11 | Sequence number decoded but never read by any consumer — dead data on wire | Plane | Medium |
| X12 | `_maybe_close()` `fut.result(timeout=10.0)` — TimeoutError strands coroutine | Launcher | Medium |
| X13 | `__init__.py` flat re-exports 45 names — naming collision risk | Bridge | Medium |
| X14 | `close()` not idempotent on RealZincPlane/RealZincControlPlane | Plane | Medium |
---
2026-06-02 18:39:49 +02:00
## Y-Series: Serde Round-Trip, Mock Fidelity, Protocol Consistency (Pass 22)
*Full detail in TRACE doc under "PASS 22 — SERDE FIELD-BY-FIELD ROUND-TRIP, MOCK VENUE FIDELITY GAPS, PROTOCOL CONSISTENCY."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| Y1 | `exit_leg_ratios` serde default `[]` vs struct default `vec![1.0]` — wrong ratio on restore | Rust | Medium |
| Y2 | `KernelIntent.slot_id` is `i64` — inconsistent with `TradeSlot` /`KernelTransition` `usize` | Rust | Low |
| Y3 | `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` | Rust | Low |
| Y4 | `AccountState` injects `"k_net_fees"` as duplicate of serde's `k_fees_paid` | Rust | Low |
| Y5 | `asyncio.sleep(0.8)` in ~295 generated test bodies — timing-dependent flaky | Test | **Critical** |
| Y6 | MockVenueAdapter no `rate_limit` flag — RATE_LIMITED path untested | Test | **High** |
| Y7 | MockVenueAdapter `reconcile()` returns `[]` always — late fills untestable | Test | **High** |
| Y8 | MockVenueAdapter emits one fill per submit — multi-partial-fill untestable | Test | **High** |
| Y9 | MockVenueAdapter no `connect()` — runtime error if protocol gains it | Test | **High** |
| Y10 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | Medium |
| Y11 | MockVenueAdapter no `close()` — lifecycle cleanup untestable | Test | Medium |
| Y12 | MockVenueAdapter fills synchronous — masks async timing bugs | Test | Medium |
| Y13 | `IndexSet` JSON array doesn't encode LRU semantics — fragile on manual edit | Rust | Low |
| Y14 | `_slot_from_payload()` duplicated in two files with different behavior | Bridge | Medium |
---
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
## H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)
*Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | **Critical** |
| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | **Critical** |
| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | **Critical** |
| H4 | `_row_float` rejects zero as valid, `except Exception: continue` swallows all | Venue | **High** |
| H5 | `_backend_snapshot` timeout returns stale data/None — callers crash | Venue | **High** |
PINK: E2E trace analysis — Pass 20 config/math signs/BingX protocol (W1-W14)
Twentieth pass: int() on 3 env vars uncaught ValueError (W1 Critical),
DITA_V2_PREFIX default "dita_v2" multi-process shared memory corruption (W2
Critical), funding sign opposite Python V2 vs Rust same raw value opposite
capital effect (W3 Critical), listenKeyExpired frames silently swallowed
continue skips expiry check dead code (W4 Critical), RECV_WINDOW_MS no upper
bound replay attacks (W5 High), ACTIVE_SLOT_LIMIT stored never enforced by
Rust kernel (W6 High), no fill history fetched during WS reconnect gap-backfill
fills lost (W7 High), rate limit detection fails on HTTP 429 no matching
message instant retry (W8 High), CONTROL_PLANE=REAL_ZINC silently falls back
to in-memory (W9 High), all BingxHttpError mapped to REJECTED can't distinguish
errors (W10 High), os.environ bracket access vs .get() inconsistent (W11 High).
361 total flaws across 20 passes.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 17:13:21 +02:00
| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) — ** ✅ FIXED `fb03300` ** (Python bridge sites) | Bridge | **High** |
PINK: E2E trace analysis — Pass 8 observability/memory/time/dead code (K1-K23)
Eighth pass: system emits zero stdout/stderr, no health check or metrics (K1/K2 Critical),
failed trades invisible if caller ignores return value (K3), exception tracebacks all
swallowed (K4), circular ref cycle delays Rust handle destruction (K6), MemoryKernelJournal
silent data loss after 10K transitions (K7), RealZincPlane._intent_cache unbounded (K8),
_backend_snapshot timeout uses wall clock (K9), sys.path mutation on import (K20),
load_dotenv at import time (K21), 23 new flaws. 199 total.
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-01 21:07:26 +02:00
| H7 | `_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET | Venue | **High** |
| H8 | Unknown venue status silently mapped to ACKED | Venue | **High** |
| H9 | `RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost | Zinc | **High** |
| H10 | `RealZincControlPlane.read()` no atomicity with concurrent `update()` | Control | **High** |
| H11 | `_RustKernelLib` lazy init with race condition — concurrent cargo build | Bridge | **High** |
| H12 | `ExecutionKernel.__del__` use-after-free on Rust handle | Bridge | **High** |
| H13 | `MirroredControlPlane` missing protocol methods (wait/notify) | Control | Medium |
| H14 | `TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics | Contracts | Medium |
| H15 | `_maybe_close` asyncio.run RuntimeError silently swallowed | Launcher | Medium |
| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
| H17 | `load_dotenv()` at module level — import-time I/O side effect | Launcher | Medium |
| H18 | `_run()` blocks event loop on every HTTP call via thread pool | Venue | Medium |
| H19 | `HazelcastClientLike` protocol has zero concrete implementations | Projection | Low |
| H20 | `_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM | Zinc | Low |
| H21 | `wasm-bindgen` compiled into native library unnecessarily | Build | Low |
| H22 | `socket.getaddrinfo` monkey-patch in test code | Test | Low |
---
## A-Series: Architectural Flaws (detailed writeups)
*These are the original architectural flaws with full analysis.*
---
### Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size
**Location:** `_rust_kernel/src/lib.rs` lines ~770-780 (EXIT handler in `process_intent` )
**Severity:** **High**
**Nature:** Logic error — wrong base for exit-size computation.
### Downstream effect
The EXIT handler computes the exit size as `base_size * exit_ratio` where:
```rust
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
```
After partial fills (e.g., two separate MARKET exit legs), `initial_size` is still the
**original** entry size while `slot.size` has been reduced by previous legs. If the
cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the
computed exit size can exceed the remaining position.
The venue adapter clamps to actual position via `reduceOnly` , but the kernel's _own_
accounting reduces `slot.size` by the fill size, not by the intended exit size. The
slot can therefore go negative (`slot.size < 0` ) if the fill is larger than remaining.
### Exact trigger
1. Enter SHORT, size=1.0, `initial_size=1.0` , ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
2. EXIT leg 0: `exit_size = 1.0 * 0.6 = 0.6` . Fill consumes 0.6. Slot size goes to 0.4.
3. EXIT leg 1: `exit_size = 1.0 * 0.6 = 0.6` . But remaining is 0.4. Requests 0.6.
4. BingX `reduceOnly` clamps fill to 0.4. Slot size goes to 0.0.
5. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0` . Slot is already at 0.0.
Kernel returns `NO_OPEN_POSITION` — the final EXIT is rejected because `slot.closed`
was not set by the previous fill (it was a partial close, not terminal).
6. Slot is at size=0.0, `!slot.closed` , no active orders, but `!slot.is_free()` because
`size <= 0.0` is true but `fsm_state != IDLE/CLOSED` — slot is **stuck** in
`POSITION_OPEN` with zero size.
This is **not** purely a mis-sized ratio problem. With MARKET orders that fill fully,
even correct ratios can leave the slot stuck if the fill price differs from the
intended-size price and the venue adjusts fill quantity.
### Fix strategy
Use `slot.size` directly as the base (not `initial_size` ):
```rust
let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);
```
This guarantees the exit never requests more than the remaining position, regardless
of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.
---
### Flaw A2: Misleading CANCEL diagnostic code on entry-only slots
**Location:** `_rust_kernel/src/lib.rs` lines ~798-810 (CANCEL rejection path)
**Severity:** **Low**
**Nature:** Diagnostic pollution — wrong error code.
### Downstream effect
When a CANCEL intent arrives and **neither** `active_exit_order` nor
`active_entry_order` is cancellable, the kernel returns:
```rust
diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER
```
But the reason may be that there's no active entry order either, or the FSM state
doesn't permit cancellation. The diagnostic name suggests an exit-order-specific
problem when the failure is generic "nothing to cancel."
### Fix
Change to a generic `NO_ACTIVE_ORDER` diagnostic or `SLOT_IDLE` when the slot is
already in IDLE. `NO_ACTIVE_EXIT_ORDER` is misleading for a slot that has never had
any order.
---
### Flaw A3: Float-accumulated slot.size after partial fills can go negative
**Location:** `_rust_kernel/src/lib.rs` lines ~1365-1370 (apply_fill exit path)
**Severity:** **Low**
**Nature:** Numerical precision edge case.
### Code path
```rust
slot.size = (slot.size - fill_size).max(0.0);
```
This clamps to zero, which is correct. But if the venue fills *more* than requested
(on BingX, this can happen with market orders where the fill walks the book), the
slot sees `fill_size > intended_size` . The `max(0.0)` prevents negative, but the
slot then reports `size=0.0` with `!closed` and an FSM state that's not IDLE.
The `is_free()` check requires `size <= 0.0` AND `fsm_state in {IDLE, CLOSED}` . A
slot with `size=0.0` and `fsm_state=POSITION_OPEN` is stuck — no EXIT will be
accepted and no ENTER can start.
### Trigger
Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the
book, overshoots). `fill_size=0.8` , `slot.size = (0.6 - 0.8).max(0.0) = 0.0` .
Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), `closed=false` .
### Fix
When `slot.size <= 1e-12` after a fill and the slot is in an exit-related state,
force transition to CLOSED/IDLE regardless of leg index:
```rust
if slot.size < = 1e-12 {
slot.closed = true;
slot.fsm_state = TradeStage::CLOSED;
slot.active_exit_order = None;
slot.active_entry_order = None;
return;
}
```
---
### Flaw A4: Entry price is clobbered by mark_price if called before fill arrives
**Location:** `_rust_kernel/src/lib.rs` lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)
**Severity:** **Medium**
**Nature:** Accounting accuracy — incorrect PnL base.
### Code path
```rust
// In mark_price:
if self.entry_price < = 0.0 {
self.entry_price = price; // Seeds entry_price from mark before fill
}
// In apply_fill (entry):
if event.price > 0.0 {
slot.entry_price = event.price; // Overwrites with actual fill price
}
```
The `mark_price` path seeds `entry_price` from a market price when the slot has no
fill yet. The `apply_fill` entry path correctly overwrites with the actual fill price.
So in the normal flow this is harmless — the fill overwrites the mark.
**However**, consider this sequence:
1. ENTER intent accepted → slot goes `ORDER_REQUESTED` , `entry_price = 0.0`
2. `runtime.step()` calls `kernel.mark_price(snapshot.symbol, snapshot.price)` → sets `entry_price = 100.0`
3. `on_venue_event(ORDER_ACK)` → `ENTRY_WORKING` , `entry_price` still `100.0`
4. `on_venue_event(PARTIAL_FILL)` → `apply_fill` sets `entry_price = 99.5` (fill price)
5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5
This is a transient mis-valuation window. It corrects itself on the next `observe_slots`
call, but intra-step readers see wrong unrealized PnL. Not critical because:
- `account.snapshot.unrealized_pnl` uses the slot's `unrealized_pnl` , not the mark
- Realized PnL is computed from actual fill prices
- The window lasts at most one scan cycle (~5s)
### Fix
Don't set `entry_price` from `mark_price` when there's no fill:
```rust
fn mark_price(& mut self, price: f64) {
if !price.is_finite() || price < = 0.0 { return; }
// Don't seed entry_price — leave it at 0.0 until a fill arrives
if self.entry_price < = 0.0 || self.size < = 0.0 {
self.unrealized_pnl = 0.0;
return;
}
// ... normal PnL computation
}
```
---
### Flaw A5: Capital-before computation is arithmetic not snapshot-based
**Location:** `pink_clickhouse.py` lines ~761-762 (`_write_trade_exit_leg` ) and ~822-823 (`_write_trade_event` )
**Severity:** **High**
**Nature:** Accounting accuracy — wrong capital_before under multi-slot or intervening events.
### Code pattern (appears in two places)
```python
capital_after = self._capital()
capital_before = capital_after - pnl_leg # In _write_trade_exit_leg
capital_before = capital_after - pnl # In _write_trade_event
```
This reconstructs `capital_before` by subtracting the current leg's PnL from the
current capital. This is **only correct** if:
1. No other slots settled PnL between this leg and the previous one
2. No capital corrections (reconcile, manual override) happened between legs
3. No fees were deducted between legs
With multi-slot (PINK configurable `max_slots > 1` ), a concurrent trade on slot 1
that closes between slot 0's exit legs will have its PnL baked into `capital_after` ,
making `capital_before = capital_after - pnl_leg` wrong.
### Fix
Maintain a per-trade `capital_before_leg` snapshot taken at the moment of the
first fill event for each trade, advancing it by the realized PnL of each leg:
```python
self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
self._leg_state[trade_id]["capital_after"] = capital_after
```
And use `prev["capital_before"]` for the row, not `capital_after - pnl_leg` .
---
### Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange
**Location:** `pink_direct.py` lines ~597-630 (`recover_account` ) and docstring of `reconcile_account`
**Severity:** **Medium**
**Nature:** Operational drift — capital is never verified against exchange truth in hot loop.
### The gap
`reconcile_account()` (line 632) has this docstring:
```
Periodic exchange-led account sync.
Capital is re-seeded from the exchange balance as a guard against long-running drift
```
But the actual implementation:
```python
async def reconcile_account(self, ...) -> dict[str, Any]:
return await self.recover_account(...)
async def recover_account(self, ...) -> dict[str, Any]:
capital = float(self.kernel.account.snapshot.capital or 25000.0)
_reconcile_position_slot(self.kernel, capital, slot_id=0)
```
It passes the **kernel's own capital** to `_reconcile_position_slot` , which then
overwrites `kernel.account.snapshot.capital` with... the same value. No exchange
balance poll ever overwrites capital.
`connect()` at line 224 does the same — it passes `initial_capital` (an env default),
not the exchange balance. The exchange balance is never read for capital seeding
in the current code path. `_reconcile_position_slot` does call
`venue.open_positions()` , but it only reads positions, not capital.
### Effect
Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation,
or any other systematic error) accumulates monotonically. There is no mechanism to
detect or correct drift. Over weeks of live trading, the kernel's capital snapshot
can diverge arbitrarily from the exchange's actual balance.
### Fix
Either:
1. Make `_reconcile_position_slot` read the exchange balance and use it for
capital reseeding (the docstring claims it does this already), or
2. Add a separate capital-verification path that surfaces the delta between
kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.
---
### Flaw A7: No fee tracking in kernel accounting
**Location:** `rust_backend.py` lines ~540-545 (on_venue_event settle), `bingx_direct.py` submit_intent return
**Severity:** **Medium**
**Nature:** Accounting accuracy — fees are invisible to capital tracking.
### Downstream effect
When a trade closes, the kernel computes:
```rust
realized_pnl = delta * notional
```
This is **gross** PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%).
These fees are never subtracted from the kernel's realized PnL. Over 100 trades with
$100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000
trades at 10x leverage and $50k average notional, the error is $200k.
The `BingxDirectExecutionAdapter` does return `ExecutionReceipt` with fill data,
but `bingx_venue._events_from_submit()` only reads `price` and `filled_size` —
commission/fee fields are ignored.
### Fix
1. Read fee/commission from the BingX ack payload in `_events_from_submit()`
2. Pass fees through `VenueEvent.metadata["fee"]`
3. In the Rust kernel's `apply_fill` , subtract the fee from realized PnL:
```rust
let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
slot.realized_pnl += realized - fee;
```
---
### Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input
**Location:** `_rust_kernel/src/lib.rs` lines ~745-748
**Severity:** **Low**
**Nature:** Silent fallback — corrupt input produces a trade, not a rejection.
```rust
slot.leverage = if intent.leverage.is_finite() & & intent.leverage > 0.0 {
intent.leverage
} else {
1.0
};
```
A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of
rejecting the intent. The Python bridge does validate `_first_invalid_intent_field()`
which catches NaN/inf, but it doesn't catch `leverage <= 0.0` (it only checks
`not math.isfinite(value)` ).
### Fix
Add `leverage <= 0.0` to the Python bridge's invalid-intent check. The Rust kernel
should still have the `1.0` fallback as a defensive measure, but the bridge should
prevent bad leverages from reaching Rust in the first place.
---
### Flaw A9: Mock venue submit condition convoluted — dead code paths
**Location:** `mock_venue.py` lines ~60-90
**Severity:** **Informational**
**Nature:** Code clarity — confusing condition logic.
```python
if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
events.append(ack_event)
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
# ... fill events
```
The condition logic is confusing:
- When `emit_ack_before_fill=True` and `emit_fill_on_submit=True` : both branches run → ACK + fill
- When `emit_ack_before_fill=False` and `emit_fill_on_submit=True` : first branch runs because
`not True = False` , so `False or False = False` → no ACK. Second branch runs → fill only.
This produces a fill without an ACK, which is **not** a realistic venue scenario.
- When `partial_fill_ratio=1.0` (default): second branch runs and emits a `FULL_FILL` event
even when `emit_fill_on_submit=False` , because `0.0 or 1.0 > 0 = True` .
The partial fill ratio check should be gated on `emit_fill_on_submit` :
```python
should_emit_fill = self.scenario.emit_fill_on_submit or (
is_entry and self.scenario.entry_partial_fill_ratio > 0
) or (
not is_entry and self.scenario.exit_partial_fill_ratio > 0
)
```
---
### Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows
**Location:** `pink_direct.py` lines ~318-374 (`pump_venue_events` ), called at line ~436
**Severity:** **Medium**
**Nature:** Operational overhead — unnecessary exchange HTTP calls.
### The problem
`step()` calls `pump_venue_events()` **every cycle** , which calls `venue.reconcile()` .
For `BingxVenueAdapter` , `reconcile()` calls `_backend_snapshot()` which does up to 5
HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow
where orders fill synchronously within `process_intent()` , there are **no** late fills
to drain.
On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each
`pump_venue_events()` call consumes 5+ of that budget. At a 5-second policy cycle,
this is 60 requests/minute — 60% of the rate budget — just to poll for fills that
don't exist.
### Fix
Gate the pump on whether the previous cycle submitted a LIMIT order:
```python
self._has_resting_order = any(
o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
for o in kernel.open_orders()
)
if self._has_resting_order:
await self.pump_venue_events(snapshot, market_state=market_state)
```
Or add a config flag `async_fill_mode: bool = False` .
---
### Flaw A11: VenueAdapter.submit() blocks the event loop
**Location:** `bingx_venue.py` lines ~225-233 (`_run` )
**Severity:** **Medium**
**Nature:** Runtime safety — synchronous call in async context.
```python
def _run(self, result: Any) -> Any:
if inspect.isawaitable(result):
try:
asyncio.get_running_loop()
except RuntimeError:
return asyncio.run(result)
pool = self._get_executor()
return pool.submit(asyncio.run, result).result()
```
When called from `step()` (which is an async function), `_run` submits the async
`submit_intent()` to a thread pool, runs it with `asyncio.run()` , then calls
`.result()` which blocks the current thread until complete. The BingX HTTP call
can take 1-5 seconds depending on network latency and exchange load.
During this block, the event loop **cannot** process other async tasks (data feed
updates, health checks, signal processing). In a single-runtime deployment, this
stalls the entire policy cycle.
### Fix
Make `process_intent` in `ExecutionKernel` accept an async venue callback, or
make `BingxVenueAdapter` truly async (not sync-with-thread-bridge). For now,
at minimum the PINK runtime should run `step()` in an executor to avoid blocking
the main event loop.
---
### Flaw A12: Stale KernelStateView slot references after reconcile
**Location:** `rust_backend.py` lines ~350-365 (`KernelStateView.refresh` )
**Severity:** **Low**
**Nature:** Stale data — view not rebuilt on reconcile.
```python
class KernelStateView:
def __init__ (self, kernel):
self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
# ...
def refresh(self) -> None:
snapshot = self._kernel._snapshot_backend()
self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
self.client_order_index = dict(snapshot.get("client_order_index", {}))
```
`refresh()` updates the index maps but does **not** recreate `self.slots` . The slot
views in `self.slots` are live proxies (they read through `_get_slot` each time),
so slot data is current. But if `max_slots` changes (it shouldn't, but it's mutable)
or if slots are re-indexed by a reconcile, the view list is wrong.
Not critical because `max_slots` is set at init and never changes, but worth
fixing for robustness.
---
### Flaw A13: `persist_fill_events` uses current price as exit price
**Location:** `pink_clickhouse.py` lines ~408
**Severity:** **Low**
**Nature:** Historical accuracy — logged price may not match fill price.
```python
price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)
```
This correctly reads from the event's price. But `decision.reference_price` at line
417 falls back to this price, which is the fill price. The trade_event row at line
835 uses `exit_price = slot_dict.get("entry_price", ...)` — which is the **entry**
price, not the exit price. The trade_event always shows exit_price == entry_price.
This means `trade_events` in ClickHouse will never show a realistic exit price
for the persisted trade, breaking any PnL reconstruction that relies on
`(exit_price - entry_price) * size * leverage` .
---
### Flaw A14: `_write_position_state` maps active_leg_index to entry_bar
**Location:** `pink_clickhouse.py` line ~673
**Severity:** **Low**
**Nature:** Semantic mismatch — wrong field mapping.
```python
"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
```
`active_leg_index` is the index into the exit-leg-ratios array (which leg is being
exited next). It has nothing to do with how many bars the position has been held.
When a position opens, `active_leg_index` is 0. After the first exit leg, it
advances to 1. Neither value is a bar count.
`entry_bar` should be `bars_held` from the intent/decision, or a computed value
from `entry_time` to now.
---
### Flaw A15: `persist_recovery_state` passes account dict as slot dict
**Location:** `pink_clickhouse.py` lines ~447-460
**Severity:** **Low**
**Nature:** Wrong data — account snapshot used where slot data is expected.
```python
def persist_recovery_state(self, *, snapshot, acc_dict, ...):
slot_dict = acc_dict or {} # ← acc_dict is an account snapshot, not a slot
self._write_position_state(..., slot_dict={}, ...) # ← correctly uses empty dict
self._write_trade_reconstruction(
snapshot,
trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
# acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
)
```
The `trade_id` in the trade_reconstruction row will always be `""` because
`acc_dict` comes from `kernel.snapshot()["account"]` which has keys `capital` ,
`equity` , `realized_pnl` , etc. — not `trade_id` . This means the recovery
`trade_reconstruction` row has no trade_id linkage.