Twenty-second pass: asyncio.sleep(0.8) in ~295 generated test bodies flaky (Y5 Critical), MockVenueAdapter no rate_limit flag RATE_LIMITED path untested (Y6 High), reconcile() returns [] always late fills untestable (Y7 High), emits one fill per submit multi-partial-fill untestable (Y8 High), no connect() runtime error if protocol gains it (Y9 High), exit_leg_ratios serde default [] vs struct default vec[1.0] wrong ratio on restore (Y1 Medium), libc dead dep (Y10 Medium), no close() (Y11 Medium), synchronous fills masks timing bugs (Y12 Medium), _slot_from_payload duplicated two files different behavior (Y14 Medium). 389 total flaws across 22 passes. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
1212 lines
66 KiB
Markdown
1212 lines
66 KiB
Markdown
# PINK DITAv2 — Structural Flaw Analysis (CENTRAL)
|
|
|
|
**Analysis date:** 2026-05-31
|
|
**Last updated:** 2026-06-02 (flaw fix pass 4 — W10 closed; 17 total fixed)
|
|
**Scope:** Full PINK pipeline — all flaws across all modules.
|
|
|
|
> **Fix notation:** Rows marked **✅ FIXED `<sha>`** are verified-fixed with a test commit on branch `exp/pink-ditav2-sprint0-20260530`.
|
|
> Unfixed flaws remain as originally described.
|
|
|
|
### Fixes applied (2026-06-02)
|
|
|
|
| Flaw | Commit | What changed |
|
|
|------|--------|--------------|
|
|
| I1 — apply_fill partial fill accumulation | `c87ca78` | `slot.size += fill_size` (was `=`); all partial fills accumulate |
|
|
| I10 — seen_event_ids lost on restart | `c87ca78` + `3ca154e` | IndexSet dedup in AccountState; KernelFullSnapshot persists per-slot seen_event_ids; startup reconcile wired (I14) |
|
|
| I13 — stray event reactivates CLOSED slot | `c87ca78` | `slot.closed` guard returns TERMINAL_STATE before FSM branch |
|
|
| I14 — no Zinc restore on startup | `3ca154e` | `__init__` now calls `reconcile_from_slots(zinc_live)` for any non-idle Zinc slots |
|
|
| I15 — CANCEL_REJECT leaves slot in EXIT_WORKING | `9a8d1b9` | Clears `active_exit_order`, transitions to `POSITION_OPEN` |
|
|
| O1 — `_maybe_close()` silent skip from async context | `338811e` | Routes to thread-pool executor when a running loop is detected |
|
|
| O5 — `_run()` no timeout → process hang | `338811e` | `Future.result(timeout=_BACKEND_TIMEOUT_S)` (default 30 s); raises `TimeoutError` |
|
|
| O10 — no `close()` on ExecutionKernel | `3ca154e` | `close()` nulls `_backend` to prevent double-free; `__enter__`/`__exit__` added |
|
|
| N1 — `with_handle_mut` zero sync (partial) | `c87ca78` | `catch_unwind` at FFI boundary; concurrent-call UB mitigated by Python GIL |
|
|
|
|
**Sources:**
|
|
- This file (A-series): Detailed writeups for architectural flaws.
|
|
- [PINK_DITAv2_E2E_TRACE_ANALYSIS.md](./PINK_DITAv2_E2E_TRACE_ANALYSIS.md) (E, F, G-series):
|
|
Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans.
|
|
Every E, F, G entry below is a summary only — full detail is in the TRACE doc.
|
|
|
|
---
|
|
|
|
## Combined Catalog (All Flaws, All Passes)
|
|
|
|
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|
|
|------|-------|-------|----------|------|--------|-----|------|
|
|
| A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 |
|
|
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
|
|
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
|
|
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
|
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
|
|
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
|
|
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
|
|
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
|
|
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
|
|
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
|
|
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
|
|
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
|
|
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
|
|
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
|
|
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
|
|
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
|
|
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
|
|
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
|
|
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
|
|
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
|
|
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
|
|
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
|
|
| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
|
|
| **Total** | | **389** | **43** | **117** | **114** | **64** | **37** |
|
|
|
|
---
|
|
|
|
## T-Series: Threading & Atomicity Flaws
|
|
|
|
*Full detail in TRACE doc under "Threading & Atomicity" section.*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| T1 | `InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy | Zinc | **Critical** |
|
|
| T2 | Thread-unsafe kernel snapshot capture for account | Bridge | **High** |
|
|
| T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | **High** |
|
|
| T4 | Consequence: `on_venue_event` PnL settle races | Bridge | **High** |
|
|
| T5 | Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls | Zinc | Medium |
|
|
| T6 | `_write_region` buffer zero + notify race with concurrent reader | Zinc | Medium |
|
|
| T7 | Publication of events in `process_intent` loop not synchronized with persist | Bridge | Medium |
|
|
| T8 | `asyncio.run` executor skip in `_run` leads to event-loop stall | Venue | Low |
|
|
| T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low |
|
|
|
|
---
|
|
|
|
## E-Series: E2E Data-Flow Flaws (Pass 1)
|
|
|
|
*Full detail in TRACE doc under "Layer 1" through "Layer 9."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| E1 | `step()` calls `pump_venue_events()` every cycle unconditionally | Runtime | **High** |
|
|
| E2 | `kernel.snapshot()["account"]` returns a fresh dict, not a live view | Bridge | Low |
|
|
| E3 | `_decision_to_kernel_intent` drops `order_type` and `limit_price` | Runtime | **High** |
|
|
| E4 | `_exit_intent_from_slot` trusts slot.size but slot may be stale | Runtime | **High** |
|
|
| E5 | JSON serialization round-trip loses numeric precision | Bridge | Low |
|
|
| E6 | `_RustKernelLib` is a global singleton — shared across all kernels | Bridge | Low |
|
|
| E7 | ENTER handler silently allows re-entry with same trade_id | Rust | **High** |
|
|
| E8 | EXIT handler uses `initial_size` not current size | Rust | **High** |
|
|
| E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low |
|
|
| E10 | `apply_fill` entry branch double-sets `active_entry_order` | Rust | Low |
|
|
| E11 | `_legacy_intent()` is a lossy conversion | Venue | Low |
|
|
| E12 | `_events_from_submit()` price fallback chain can lose venue price | Venue | Low |
|
|
| E13 | `_backend_snapshot()` timeout returns stale data | Venue | Medium |
|
|
| E14 | `_events_from_cancel` uses stale `slot_id` from order metadata | Venue | Low |
|
|
| E15 | Submit sets leverage via separate HTTP call | Adapter | Medium |
|
|
| E16 | `_format_quantity`/`_format_price` may use zero tick/step | Adapter | Medium |
|
|
| E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium |
|
|
| E18 | `on_venue_event` settles PnL incrementally — fees never included | Bridge | Medium |
|
|
| E19 | `observe_slots` called with ALL slots, not just changed ones | Bridge | Low |
|
|
| E20 | `_capital()` reads live from `AccountProjection` — stale row risk | Persistence | Low |
|
|
| E21 | `persist_fill_events()` synthesizes fake Decision/Intent | Persistence | Medium |
|
|
| E22 | `_write_trade_exit_leg` capital_before uses arithmetic reconstruction | Persistence | Medium |
|
|
| E23 | `_write_trade_event` uses entry_price as exit_price | Persistence | Medium |
|
|
| E24 | Mock venue always emits fill on `partial_fill_ratio > 0` | Test | Low |
|
|
| E25 | Test scenarios use MARKET-only `_si()` helper — no LIMIT tests | Test | Low |
|
|
| E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low |
|
|
|
|
---
|
|
|
|
## F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)
|
|
|
|
*Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium |
|
|
| F2 | `_last_settled_pnl` reset before `venue.submit()` — transient window | Bridge | Medium |
|
|
| F3 | `_first_invalid_intent_field` allows `leverage=0` and `target_size=0` | Bridge | Low |
|
|
| F4 | `outcome.emitted_events` only from venue — Rust kernel events dropped | Bridge | Low |
|
|
| F5 | `on_venue_event` redundant FFI read of slot already returned by Rust | Bridge | Low |
|
|
| F6 | `process_intent` records pre-venue transitions with `event=None` | Bridge | Info |
|
|
| F7 | `reconcile_from_slots` writes ALL slots to projection/zinc | Bridge | Low |
|
|
| F8 | `HazelcastRowWriter.put()` synchronous, no error handling — crashes intent | Projection | **Medium** |
|
|
| F9 | `RealZincPlane.write_slot()` serializes ALL slots, not just changed one | Zinc | Low |
|
|
| F10 | `RealZincPlane` zeros buffer before write — concurrent read sees empty | Zinc | Low |
|
|
| F11 | `RealZincPlane._write_region` no partial-write recovery | Zinc | Low |
|
|
| F12 | `InMemoryZincPlane` intent_region grows without bound | Zinc | Low |
|
|
| F13 | `InMemoryZincPlane` uses non-re-entrant `threading.Condition` | Zinc | Low |
|
|
| F14 | `KernelSlotView.__setattr__` round-trips unknown fields — silently dropped | Bridge | Low |
|
|
| F15 | `on_venue_event` loop stops on first exception — slot left in partial state | Bridge | **High** |
|
|
| F16 | `venue.submit()` returning empty events leaves slot in ORDER_REQUESTED | Bridge | Medium |
|
|
| F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium |
|
|
| F18 | Leverage-set and order-submit failures share error handler | Adapter | Low |
|
|
| F19 | `_events_from_submit` stale snapshot fallback → wrong fill detection | Venue | Medium |
|
|
| F20 | `__del__` frees Rust handle at unpredictable GC time — no explicit close() | Bridge | **Medium** |
|
|
| F21 | `DITAv2LauncherBundle.close()` closes venue before kernel is done | Launcher | Low |
|
|
| F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | **Medium** |
|
|
| F23 | `VenueEvent.size` = `intent.target_size` not actual fill | Venue | Info |
|
|
| F24 | `asyncio.run()` inside async function in test generator | Test | Low |
|
|
| F25 | `_build_fresh_kernel_from_slot` leaks old kernel objects per call | Test | Low |
|
|
| F26 | `seen_event_ids` not cleared on re-entry — accumulates across trades | Rust | Low |
|
|
| F27 | `RealZincControlPlane.read()` parses Zinc region every call — no caching | Control | Low |
|
|
| F28 | `_legacy_intent` hardcodes confidence=1.0, bars_held=0 | Venue | Info |
|
|
| F29 | `_slot_to_payload` in real_zinc_plane.py is dead code | Zinc | Info |
|
|
| F30 | Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py | Zinc | Low |
|
|
|
|
---
|
|
|
|
## G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)
|
|
|
|
*Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | **Critical** |
|
|
| G2 | `into_c_string` unwrap() panics on NUL byte in FFI string — **✅ FIXED `c87ca78`** | Rust | **Critical** |
|
|
| G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition — **✅ FIXED `c87ca78`** | Rust | **Critical** |
|
|
| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg — **✅ FIXED `fb03300`** | Rust | **Critical** |
|
|
| G5 | `realized_pnl` unbounded f64 — overflows to inf at extreme values | Rust | **High** |
|
|
| G6 | `mark_price` produces unbounded unrealized_pnl — no result guard | Rust | **High** |
|
|
| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
|
|
| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
|
|
| G9 | `exchange_order_id` update targets wrong order — exit cancel broken — **✅ FIXED `fb03300`** | Rust | **High** |
|
|
| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
|
|
| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
|
|
| G12 | No max leverage cap enforced by kernel | Rust | Medium |
|
|
| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
|
|
| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
|
|
| G15 | Zero `__post_init__` validators on all 16 config dataclasses (127 fields) | Config | **High** |
|
|
| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
|
|
| G17 | String config fields — Zinc region injection risk | Config | Medium |
|
|
| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
|
|
| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
|
|
| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
|
|
| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
|
|
| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
|
|
| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
|
|
| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
|
|
| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
|
|
| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
|
|
| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
|
|
| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
|
|
| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
|
|
| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
|
|
| G31 | Hazelcast projection never closed | Lifecycle | Medium |
|
|
| G32 | _maybe_close() break skips second method | Lifecycle | Low |
|
|
| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
|
|
| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
|
|
| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
|
|
| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
|
|
|
|
---
|
|
|
|
## I-Series: Math, Tests, Concurrency, Recovery, Security (Pass 6)
|
|
|
|
*Full detail in TRACE doc under "PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| I1 | Entry `apply_fill` multiple partial fills overwrite size instead of accumulating — **✅ FIXED `c87ca78`** | Rust | **Critical** |
|
|
| I2 | Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED | Rust | Medium |
|
|
| I3 | entry_price inconsistency — Python falsy vs Rust `<= 0.0` gate | Bridge | Info |
|
|
| I4 | Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer | Rust | **High** |
|
|
| I5 | MockVenueScenario rejection flags exist but zero tests use them | Test | **High** |
|
|
| I6 | No LIMIT order test through full kernel path | Test | **High** |
|
|
| I7 | Three weak/vacuous assertions in test_flaws.py | Test | Low |
|
|
| I8 | Entry overfill no guard | Rust | Low |
|
|
| I9 | No crash durability — slot state pure in-memory until step 7 of process_intent | Bridge | **Critical** |
|
|
| I10 | seen_event_ids lost on restart — events double-processed — **✅ FIXED `c87ca78` + `3ca154e`** (IndexSet dedup + snapshot; startup restore wired) | Rust | **Critical** |
|
|
| I11 | No idempotency key sent to BingX — lost response creates duplicate orders | Venue | **High** |
|
|
| I12 | No graceful degradation for ANY subsystem | All | **High** |
|
|
| I13 | Stray venue event can reactivate CLOSED slot — no guard — **✅ FIXED `c87ca78`** | Rust | **High** |
|
|
| I14 | No reconcile_from_slots call on startup — Zinc state never loaded into kernel — **✅ FIXED `3ca154e`** | Restart | **High** |
|
|
| I15 | CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING — **✅ FIXED `9a8d1b9`** | Rust | Medium |
|
|
| I16 | Zinc shared memory world-readable/writable by same-machine processes | Zinc | **High** |
|
|
| I17 | KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards | Bridge | **High** |
|
|
| I18 | sys.path.insert(0) at import time in 3 production files — malicious module loading | Build | **High** |
|
|
| I19 | pump_venue_events stale snapshot diff produces phantom position events | Venue | **High** |
|
|
| I20 | exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) | Contracts | Info |
|
|
| I21 | RATE_LIMITED code path in both Python and Rust is completely untested | All | Medium |
|
|
| I22 | Thread pool max_workers=3 shared across all adapter instances — never shut down | Venue | Medium |
|
|
|
|
---
|
|
|
|
## J-Series: Test Infra, Data Feed, Rust Deeper, Env Parsing, Connections (Pass 7)
|
|
|
|
*Full detail in TRACE doc under "PASS 7 — TEST INFRA, DATA FEED, RUST DEEPER, ENV PARSING, CONNECTIONS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| J1 | `_flatten` submits wrong direction for LONG positions | Test | Medium |
|
|
| J2 | `_check_slot_accounting` double-counts unrealized PnL | Test | Medium |
|
|
| J3 | `_build_live_snapshot` timestamp is float vs datetime — type crash risk | Data Feed | **High** |
|
|
| J4 | `ExecutionKernel.mark_price()` never called — no mark-to-market | Bridge | **High** |
|
|
| J5 | All VenueEvent timestamps use local clock, not exchange timestamp | Venue | Medium |
|
|
| J6 | No monotonic timestamp verification anywhere | All | Low |
|
|
| J7 | `rebuild_indexes()` overwrites duplicate trade_id — last wins | Rust | **High** |
|
|
| J8 | `resolve_slot()` falls back to slot 0 — stray event corrupts slot 0 | Rust | **High** |
|
|
| J9 | `get_slot_json`/`snapshot_json` return null with no diagnostic | Rust | Medium |
|
|
| J10 | Two processes with same DITA_V2_PREFIX corrupt shared Zinc memory | Zinc | **High** |
|
|
| J11 | `load_dotenv()` only runs on launcher.py import — ordering dependency | Config | Medium |
|
|
| J12 | BINGX_API_KEY passed None with no validation — fails at HTTP time | Config | Medium |
|
|
| J13 | API credentials never masked in error messages or tracebacks | Config | **High** |
|
|
| J14 | `_env_bool` inconsistent: empty string = False vs unset = default | Config | Low |
|
|
| J15 | gen2.py and _gen_test.py write to same output — last writer wins | Test | Medium |
|
|
| J16 | Shim test bridge lacks step(), decision_engine — zero runtime fidelity | Test | **High** |
|
|
|
|
---
|
|
|
|
## K-Series: Observability, Memory, Time, Dead Code, Module Init (Pass 8)
|
|
|
|
*Full detail in TRACE doc under "PASS 8 — OBSERVABILITY, MEMORY, TIME, DEAD CODE, MODULE INIT."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| K1 | Zero stdout/stderr — system completely silent | All | **Critical** |
|
|
| K2 | No health check, metrics, or monitoring surface | All | **Critical** |
|
|
| K3 | Failed trades produce no notification — error in return value only | Bridge | **High** |
|
|
| K4 | Exception tracebacks not captured — all except:pass swallow silently | All | **High** |
|
|
| K5 | ~85+ Python objects per process_intent — 36 TradeSlot copies via FFI | Bridge | Medium |
|
|
| K6 | Circular ref cycle Kernel→StateView→SlotView→Kernel — delays __del__ | Bridge | **High** |
|
|
| K7 | MemoryKernelJournal silently drops transitions after 10K rows | Journal | **High** |
|
|
| K8 | RealZincPlane._intent_cache unbounded Python list growth | Zinc | **High** |
|
|
| K9 | _backend_snapshot timeout uses wall clock — NTP truncates/extends | Venue | **High** |
|
|
| K10 | RealZincControlPlane.wait() uses wall clock — no monotonic | Control | Medium |
|
|
| K11 | exchange_ts fallback to local time.time() when E missing | Stream | Medium |
|
|
| K12 | No monotonic timestamp verification anywhere | All | Medium |
|
|
| K13 | ControlPlane.wait()/notify() — zero callers across all impls | Control | Info |
|
|
| K14 | AccountProjection.to_account_event() — zero callers | Account | Info |
|
|
| K15 | HazelcastProjector entire class dead | Projection | Info |
|
|
| K16 | _order_to_payload() dead code | Bridge | Info |
|
|
| K17 | MirroredControlPlane entire class dead — never constructed | Control | Info |
|
|
| K18 | 12 of 20 TradeStage variants never matched in Rust FSM | Rust | Low |
|
|
| K19 | Unused imports in projection.py and hazelcast_projection.py | Projection | Info |
|
|
| K20 | sys.path mutation on import — global side effect | Config | Medium |
|
|
| K21 | load_dotenv() at module import time — mutates os.environ globally | Config | Medium |
|
|
| K22 | ControlPlane protocol not exported in __all__ | Config | Info |
|
|
| K23 | KernelSlotView.__getattr__ makes FFI call per attribute access | Bridge | Medium |
|
|
|
|
---
|
|
|
|
## L-Series: Contracts, Exchange Events, Network, FFI, Backup Diffs (Pass 9)
|
|
|
|
*Full detail in TRACE doc under "PASS 9 — CONTRACTS, EXCHANGE EVENTS, NETWORK, FFI, BACKUP DIFFS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| L1 | `KernelOutcome(accepted=True, diag=INVALID_INTENT)` parseable — no invariant check | Bridge | Medium |
|
|
| L2 | `VenueEvent.filled_size > size` possible via different source fields | Venue | Medium |
|
|
| L3 | `VenueEvent.price=0` reaches kernel — zero-price fill = 100% loss PnL | Venue | **High** |
|
|
| L4 | `available_margin` set to cross-wallet balance, not available margin | Stream | **High** |
|
|
| L5 | `wallet_balance` defaults to 0 when `"wb"` absent — E-side reconcile always ERROR | Stream | **High** |
|
|
| L6 | `_keepalive_loop` no stop mechanism — runs on old key after rotation | Stream | Medium |
|
|
| L7 | `event_id` integer 0 → `str(0)` falsy on `or` → random UUID generated | Stream | Medium |
|
|
| L8 | Hardcoded VST URLs in test generators — wrong env if LIVE configured | Test | Medium |
|
|
| L9 | No proxy support — can't deploy behind corporate proxy | Network | Low |
|
|
| L10 | 5-minute DNS cache TTL — stale IPs on infrastructure change | Network | Low |
|
|
| L11 | `limit_price` getattr reads dataclass field, not metadata dict | Venue | Low |
|
|
| L12 | Backup diff: 14+ critical bugs fixed, 428-line dual-ledger accounting added | Rust | Info |
|
|
| L13 | `_build_full_runtime` dead — real pipeline never tested | Test | **High** |
|
|
| L14 | `listenKeyExpired` raises RuntimeError instead of clean yield | Stream | Medium |
|
|
| L15 | `_delete_listen_key` suppresses all exceptions — leaked server keys | Stream | Low |
|
|
| L16 | `venue_order_id` target selection ambiguous when entry order exists | Rust | Medium |
|
|
|
|
---
|
|
|
|
## M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10)
|
|
|
|
*Full detail in TRACE doc under "PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | **Critical** |
|
|
| M2 | CANCEL creates no transition record — invisible in audit log | Rust | **Critical** |
|
|
| M3 | `_mk_intent` drops order_type/limit_price into metadata, not proper field | Test | **High** |
|
|
| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | **High** |
|
|
| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium |
|
|
| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium |
|
|
| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low |
|
|
| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium |
|
|
| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position — **✅ FIXED `fb03300`** | Rust | **Critical** |
|
|
| M10 | No aggregation of trade count, success/fail, latency — all zero | All | **High** |
|
|
| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | **High** |
|
|
| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | **High** |
|
|
| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium |
|
|
| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low |
|
|
| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low |
|
|
| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium |
|
|
| M17 | Flaw 9 tests named for cancel but never call cancel | Test | **High** |
|
|
| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | **High** |
|
|
|
|
---
|
|
|
|
## N-Series: Async/Sync Seams, Lock Analysis, Threading (Pass 11)
|
|
|
|
*Full detail in TRACE doc under "PASS 11 — ASYNC/SYNC SEAMS, LOCK ANALYSIS, THREADING."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| N1 | Rust kernel `with_handle_mut` zero sync — `&mut` from raw ptr, UB on concurrent FFI — *mitigated by Python GIL (single-threaded caller); catch_unwind added `c87ca78`* | Rust | **Critical** |
|
|
|
|
### Fixes applied (2026-06-02 pass 2)
|
|
|
|
| Flaw | Commit | What changed |
|
|
|------|--------|--------------|
|
|
| G2 — `into_c_string` NUL panic | `c87ca78` | NUL bytes stripped/escaped; `CString::new` error path handled |
|
|
| G3 — EXIT backward FSM transition | `c87ca78` | EXIT no longer hardcodes `prev_state=POSITION_OPEN`; uses real slot state |
|
|
| G4 — stale `all_legs_done` in exit leg | `fb03300` | Renamed pre-consume var; else-if guard reads fresh `active_leg_index` after consume |
|
|
| M9 — ORDER_REJECT nukes POSITION_OPEN | `fb03300` | Spurious reject (no matching order) no longer resets fsm_state; only entry-phase rejects → IDLE |
|
|
| G9 — venue_order_id targets wrong order | `fb03300` | Routes by FSM state: exit-phase events update exit order, not stale entry order |
|
|
| H6 — unknown enum variant crashes bridge | `fb03300` | `_safe_enum()` helper returns configurable default on unknown variants |
|
|
| N2 | `_run()` has two completely different code paths — runtime branch, not design | Venue | **Critical** |
|
|
| N3 | `_run()` path B blocks event loop thread for every venue HTTP operation | Venue | **Critical** |
|
|
| N4 | `asyncio.run()` called repeatedly — creates/destroys event loops per call | Venue | **Critical** |
|
|
| N5 | `_snapshot_ready` cascading re-fetch — N callers produce N overlapping HTTP | Venue | **High** |
|
|
| N6 | `BingxUserStream.close()` doesn't cancel pending tasks | Stream | Medium |
|
|
| N7 | Live test architecture forces worst-case `_run()` path for every operation | Test | Medium |
|
|
| N8 | `subscribe()` reconnect creates new tasks per iteration | Stream | Medium |
|
|
| N9 | No `asyncio.all_tasks()` or task accounting — leaked tasks undetectable | All | Low |
|
|
| N10 | `_snap_lock` no reader-side protection (informational) | Venue | Info |
|
|
|
|
### Fixes applied (2026-06-02 pass 3)
|
|
|
|
| Flaw | Commit | What changed |
|
|
|------|--------|--------------|
|
|
| V1 — `LauncherBundle.close()` missing `kernel.close()` | `8d9762c` | `self.kernel.close()` wired into bundle teardown; Rust handle deterministically destroyed |
|
|
| V2 — `BingxVenueAdapter` no `close()` | `8d9762c` | `close()` added; shuts down class-level `ThreadPoolExecutor` + delegates to `backend.close()` |
|
|
| V3 — `seen_event_ids` not cleared on slot reuse | `8d9762c` | `slot.seen_event_ids.clear()` added to ENTER handler in Rust kernel; fill dedup no longer pollutes across trades |
|
|
|
|
### Fixes applied (2026-06-02 pass 4)
|
|
|
|
| Flaw | Commit | What changed |
|
|
|------|--------|--------------|
|
|
| W10 — `BingxHttpError` blindly mapped to "REJECTED" | `e90d542` | `_http_error_status()` helper: 429/5xx/transport → RATE_LIMITED; 4xx → REJECTED |
|
|
|
|
---
|
|
|
|
## O-Series: Sync/Async Wider Scope (Launcher, Generators, Streams, FFI, Tests) (Pass 12)
|
|
|
|
*Full detail in TRACE doc under "PASS 12 — SYNC/ASYNC WIDER SCOPE."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| O1 | `_maybe_close()` asyncio.run without loop guard — close skipped from async context — **✅ FIXED `338811e`** | Launcher | **High** |
|
|
| O2 | `async def connect()` shims call sync venue.connect() without await — blocking | Test | Medium |
|
|
| O3 | `_contract_rows(client)` NOT awaited — `_pick_live_symbol` iterates coroutine = crash | Test | **High** |
|
|
| O4 | Deprecated `get_event_loop().run_until_complete()` in test file | Test | Medium |
|
|
| O5 | `_run()` thread pool .result() no timeout — backend hang freezes process — **✅ FIXED `338811e`** | Venue | **High** |
|
|
| O6 | MockVenueAdapter never exercises thread-pool bridge — untested in CI | Venue | Medium |
|
|
| O7 | `_keepalive_loop`/`_rotation_sentinel` fire-and-forget — exceptions silently lost | Stream | Low |
|
|
| O8 | `KernelSlotView.__getattr__` N FFI calls for N fields — no caching | Bridge | Medium |
|
|
| O9 | `DITAv2LauncherBundle` no `__del__` — GC'd bundle leaks resource tree | Launcher | Medium |
|
|
| O10 | `ExecutionKernel` no `close()` — Rust handle only freed by unpredictable __del__ — **✅ FIXED `3ca154e`** | Bridge | Medium |
|
|
| O11 | `KernelSlotView.__setattr__` triggers 5 persistence side effects — no read-only view | Bridge | Medium |
|
|
|
|
---
|
|
|
|
## P-Series: FFI Safety, Dangling Pointers, Coverage Gaps (Pass 13)
|
|
|
|
*Full detail in TRACE doc under "PASS 13 — FFI BOUNDARY SAFETY, DANGLING POINTERS, COVERAGE GAPS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| P1 | `dita_kernel_destroy` double-free UB — Python doesn't null handle.value | Bridge | **Critical** |
|
|
| P2 | `CStr::from_ptr(payload)` without null guard in 3 FFI exports | Rust | **High** |
|
|
| P3 | `_check_open_orders` calls `asyncio.run()` from async `_verify` — RuntimeError | Test | **High** |
|
|
| P4 | `into_c_string` NUL sanitizer produces invalid JSON — json.loads fails | Rust | Medium |
|
|
| P5 | `reconcile_slots_json`/`snapshot_json` return null on failure — no diagnostic | Rust | Medium |
|
|
| P6 | `_get_rust()` TOCTOU race — concurrent cargo build corruption | Bridge | **High** |
|
|
| P7 | `KernelHandle` no Send/Sync — FFI bypasses Rust ownership rules | Rust | Info |
|
|
| P8 | No explicit Rust handle destroy path from bundle.close() | Launcher | Medium |
|
|
| P9 | `__del__` accesses module `_RUST` during shutdown — NameError leak | Bridge | Low |
|
|
|
|
---
|
|
|
|
## Q-Series: Serde Edges, Backup Diffs, Market Data/Timestamps (Pass 14)
|
|
|
|
*Full detail in TRACE doc under "PASS 14 — SERDE EDGE CASES, BACKUP DIFFS, MARKET DATA/TIMESTAMPS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| Q1 | `fromisoformat()` can't parse Rust `Z` suffix on Python < 3.11 | Bridge | **High** |
|
|
| Q2 | No `#[serde(deny_unknown_fields)]` — misspelled fields silently default | Rust | Medium |
|
|
| Q3 | `indexmap` new dependency in current code | Rust | Info |
|
|
| Q4 | Backup diff: 6 critical bug fixes between backup and current | Rust | Info |
|
|
| Q5 | `MarketSnapshot.timestamp` type inconsistent — float vs datetime in same file | Data Feed | **High** |
|
|
| Q6 | `fromisoformat()` Z-suffix fail on all 5 timestamp deserialization sites | Bridge | **High** |
|
|
| Q7 | No upper-bound price validation — 1e300 passes all guards | Bridge | Medium |
|
|
| Q8 | `_first_invalid_intent_field` does not reject zero/negative price or zero size | Bridge | Low |
|
|
| Q9 | Rust/Python clock sources diverge — transition timestamps mixed source | Rust | Low |
|
|
| Q10 | `threading.Event.wait()` uses platform-dependent clock — NTP jump risk | Venue | Medium |
|
|
| Q11 | Backup had no STALE_STATE_RECONCILING guard in on_venue_event (info) | Rust | Info |
|
|
| Q12 | All 5 fromisoformat() sites fail on Python < 3.11 (duplicate of Q1) | Bridge | **High** |
|
|
|
|
---
|
|
|
|
## R-Series: Resource Leaks, Trust Boundaries, Security (Pass 15)
|
|
|
|
*Full detail in TRACE doc under "PASS 15 — RESOURCE LEAKS, TRUST BOUNDARIES, SECURITY."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| R1 | `ThreadPoolExecutor` never shut down — 3 threads leak | Venue | **High** |
|
|
| R2 | `BingxVenueAdapter` no `close()` — backend HTTP client unreleasable | Venue | **High** |
|
|
| R3 | `real_zinc_plane._intent_cache` grows unboundedly | Plane | **High** |
|
|
| R4 | `RealZincPlane`/`ControlPlane` partial-construction SharedRegion leak | Plane | Medium |
|
|
| R5 | `BingxUserStream.ClientSession` no `__del__` — connection pool leak | Venue | Medium |
|
|
| R6 | `test_alpha_blue_untouched_g7.py` open() without context manager | Test | Low |
|
|
| R7 | All exchange REST/WS data parsed without schema validation | Venue | **Critical** |
|
|
| R8 | Shared memory JSON deserialization without integrity check | Plane | **High** |
|
|
| R9 | `restore_state()` deserializes arbitrary JSON — full kernel takeover | Bridge | **Critical** |
|
|
| R10 | `DOLPHIN_BINGX_ENV` + `ALLOW_MAINNET` mainnet switch via env | Config | **High** |
|
|
| R11 | `.env` file loaded from project root — secrets exposure | Config | **High** |
|
|
| R12 | Unvalidated `int()` on env vars — recv_window, leverage extremes | Config | Medium |
|
|
| R13 | `listenKey` from exchange in WS URL f-string — MITM injection | Venue | **High** |
|
|
| R14 | `mock_venue._exchange_event_queue` unbounded growth | Test | Low |
|
|
|
|
---
|
|
|
|
## S-Series: Error Handling, Arithmetic Stability, Test Infrastructure (Pass 16)
|
|
|
|
*Full detail in TRACE doc under "PASS 16 — ERROR HANDLING, ARITHMETIC STABILITY, TEST INFRASTRUCTURE."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| S1 | `realized_pnl()`/`mark_price()` NaN bypasses `<=0.0` guard — NaN PnL corrupts k_realized_pnl | Rust/Python | **Critical** |
|
|
| S2 | MockVenue `_exchange_event_queue` check-then-act race — silently drops events | Test | **Critical** |
|
|
| S3 | No `test_kernel_fsm.py` or `test_kernel_fsm_recovery.py` exists | Test | **Critical** |
|
|
| S4 | Generated tests use `await asyncio.sleep(0.8)` — flaky false negatives on slow CI | Test | **Critical** |
|
|
| S5 | `_rate_limit_retry_after_ms()` returns 0 on parse failure — instant retry storm | Venue | **High** |
|
|
| S6 | Venue adapter detects rate limits but enforces zero backoff | Venue | **High** |
|
|
| S7 | `capital_epsilon = 1e-4` too tight — false WARN classifications | Accounting | **High** |
|
|
| S8 | Generated tests use module-level `asyncio.run()` — leaks tasks on Python 3.12+ | Test | **High** |
|
|
| S9 | `str.replace()` patching silently does nothing on format change | Build | **High** |
|
|
| S10 | `_consume()` no per-message WS timeout — silent hang blocks forever | Venue | **High** |
|
|
| S11 | `_run()` blocks pool thread with no timeout — 3 hung calls lock adapter | Venue | **High** |
|
|
| S12 | Rate-limit regex depends on exchange message format — non-portable | Venue | Medium |
|
|
| S13 | `_row_float()` silently skips malformed rows, filters zero values | Venue | Medium |
|
|
| S14 | Reconnection backoff lacks jitter — thundering herd risk | Venue | Medium |
|
|
| S15 | `_venue_event_status_from_row()` falls back to ACKED — masks new rejections | Venue | Medium |
|
|
| S16 | `except: pass` in generated test code — swallows KeyboardInterrupt | Test | Medium |
|
|
|
|
---
|
|
|
|
## T-Series: Unsafe Review, Dead Code/Backup Debris, Build/Plane Protocols (Pass 17)
|
|
|
|
*Full detail in TRACE doc under "PASS 17 — UNSAFE REVIEW, DEAD CODE/BACKUP DEBRIS, BUILD/PLANE PROTOCOLS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| T1 | `catch_unwind` + `AssertUnwindSafe` — partially mutated state persists, no rollback | Rust | **High** |
|
|
| T2 | Empty backup dir `_backup_20260530_105512/` and stale `tea_debug.log` (0 bytes) | Repo | Low |
|
|
| T3 | `HazelcastRowWriter` uses bare `json.dumps(row, default=str)` — Enums/datetimes wrong format | Bridge | **High** |
|
|
| T4 | `real_zinc_plane._slot_from_payload()` direct key access `payload["last_event_time"]` — KeyError | Plane | **High** |
|
|
| T5 | `_build_pink_bodies.py` `str.index("]")` finds first `]` — corrupts SCENARIOS list | Build | **High** |
|
|
| T6 | `VenueAdapter` protocol missing `connect()`/`disconnect()` — AttributeError at runtime | Venue | **High** |
|
|
| T7 | Shared memory writes non-atomic — visible-zero window exposes partial state | Plane | **High** |
|
|
| T8 | `_slot_from_payload()` reconstructs internal_trade_id from slot trade_id — order data loss | Plane | Medium |
|
|
| T9 | `_slot_from_payload()` duplicated in two files — schema drift risk | Plane | Medium |
|
|
| T10 | `str.index("finally:")` finds first match — nested try/finally mismatch | Build | Medium |
|
|
| T11 | No workspace-root `.gitignore` — __pycache__, backup dirs, debris untracked | Repo | Low |
|
|
| T12 | `projection.py` lazy import failure silently swallowed — writer=None drops all writes | Bridge | Medium |
|
|
| T13 | `Codex_CONTEXT_RESTORE__*.txt` AI context files in root — debris | Repo | Low |
|
|
| T14 | `_backup_20260530/` is a valid Python package — accidental old-code import risk | Repo | Medium |
|
|
|
|
---
|
|
|
|
## U-Series: Rust Test Gaps, Accounting Reconciliation Bugs, FFI Type Mismatches (Pass 18)
|
|
|
|
*Full detail in TRACE doc under "PASS 18 — RUST TEST GAPS, ACCOUNTING RECONCILIATION BUGS, FFI TYPE MISMATCHES."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| U1 | `order_type`/`limit_price` sent to Rust, no serde fields — silently dropped | FFI | **High** |
|
|
| U2 | Rust `VenueEventStatus` expects `"CANCEL_REJECTED"` (typo) — `"CANCELED_REJECTED"` fails deserialization | Rust | **High** |
|
|
| U3 | R2 compares cumulative K realized vs single-last-fill E realized — broken after 2nd+ fill | Accounting | **Critical** |
|
|
| U4 | R4 compares K open_notional vs E used_margin — fundamentally different quantities | Accounting | **Critical** |
|
|
| U5 | R3 skipped when `len(e.positions)==0` — K has positions but E reports none, silent | Accounting | **High** |
|
|
| U6 | `on_venue_event`/`apply_fill` no NaN guards on venue event price/size — NaN propagates | Rust | **Critical** |
|
|
| U7 | Zero Rust tests for ORDER_REJECT, PARTIAL_FILL, TERMINAL_STATE guard, etc. | Test | **High** |
|
|
| U8 | `safe_float()` returns NaN/Inf instead of default — contradicts `_safe()` | Bridge | Medium |
|
|
| U9 | `_scan_slots` uses `metadata.get("leverage")` not `slot.leverage` — wrong leverage | Accounting | Medium |
|
|
| U10 | Rust injects `"k_net_fees"` key alongside serde's `k_fees_paid` — duplicate | Bridge | Low |
|
|
| U11 | 10+ AccountState fields transmitted across FFI but never read by Python | FFI | Low |
|
|
| U12 | `_order_from_payload()` overwrites `internal_trade_id` with slot's trade_id | Bridge | Medium |
|
|
| U13 | No independent third reference — symmetrical K=E errors invisible | Accounting | Medium |
|
|
| U14 | `lot_step` declared in ReconcileConfig but never used — dead field | Accounting | Low |
|
|
|
|
---
|
|
|
|
## V-Series: Startup/Shutdown Lifecycle, Rust Kernel Subtleties, Generated Test Infra (Pass 19)
|
|
|
|
*Full detail in TRACE doc under "PASS 19 — STARTUP/SHUTDOWN LIFECYCLE, RUST KERNEL SUBTLETIES, GENERATED TEST INFRA."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| V1 | `DITAv2LauncherBundle.close()` never calls `kernel.close()` — Rust handle leaks via `__del__` — **✅ FIXED `8d9762c`** | Launcher | **Critical** |
|
|
| V2 | `BingxVenueAdapter` no `close()`/`disconnect()` — ThreadPoolExecutor/HTTP never release — **✅ FIXED `8d9762c`** | Venue | **Critical** |
|
|
| V3 | `process_intent` ENTER doesn't clear `seen_event_ids` — old dedup pollutes new trade — **✅ FIXED `8d9762c`** | Rust | **High** |
|
|
| V4 | 3 generators write same output file — last writer wins, incompatible prologues | Test | **Critical** |
|
|
| V5 | Generated tests triple env-gated — never run in CI, dead code | Test | **Critical** |
|
|
| V6 | `kernel.close()` destroys Rust handle immediately — no drain, no flush, UAF risk | Bridge | **Critical** |
|
|
| V7 | `_last_settled_pnl` dict accessed from process_intent and on_venue_event without locks | Bridge | Medium |
|
|
| V8 | `#[serde(default)] leverage: f64` default 0.0 — mark_price uses directly no .max(1.0) | Rust | Medium |
|
|
| V9 | No `conftest.py`, no `pytest.ini`, no `asyncio_mode` — test discovery fragile | Test | **High** |
|
|
| V10 | `kernel.close()` `except Exception: pass` — silently swallows destroy errors | Bridge | Low |
|
|
| V11 | `build_launcher_bundle()` no cleanup on partial failure — OOM orphans 4 components | Launcher | Medium |
|
|
| V12 | `KernelResult` clones entire kernel state every FFI call — wasted allocations | Rust | Medium |
|
|
| V13 | `_build_rb()` leaks bundle on post-creation failure | Test | Low |
|
|
| V14 | `_maybe_close` breaks after first method — never tries both close and disconnect | Launcher | Low |
|
|
|
|
---
|
|
|
|
## W-Series: Configuration Management, Math Sign Conventions, BingX Protocol (Pass 20)
|
|
|
|
*Full detail in TRACE doc under "PASS 20 — CONFIGURATION MANAGEMENT, MATH SIGN CONVENTIONS, BINGX PROTOCOL."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| W1 | `int()` on 3 env vars uncaught `ValueError` — non-numeric input crashes process | Config | **Critical** |
|
|
| W2 | `DITA_V2_PREFIX` default `"dita_v2"` — multi-process shared memory corruption | Config | **Critical** |
|
|
| W3 | Funding sign opposite Python V2 vs Rust — same raw value opposite capital effect | Accounting | **Critical** |
|
|
| W4 | `listenKeyExpired` frames silently swallowed — `continue` skips expiry check, dead code | Venue | **Critical** |
|
|
| W5 | `RECV_WINDOW_MS` no upper bound — extreme values enable replay attacks | Config | **High** |
|
|
| W6 | `ACTIVE_SLOT_LIMIT` stored but never enforced by Rust kernel — dead config | Config | **High** |
|
|
| W7 | No fill history fetched during WS reconnect gap-backfill — fills permanently lost | Venue | **High** |
|
|
| W8 | Rate limit detection fails on HTTP 429 without matching message — returns 0 instant retry | Venue | **High** |
|
|
| W9 | `CONTROL_PLANE=REAL_ZINC` silently falls back to in-memory — no persistence | Config | **High** |
|
|
| W10 | All `BingxHttpError` mapped to "REJECTED" — can't distinguish errors from real rejections — **✅ FIXED `e90d542`** | Venue | **High** |
|
|
| W11 | `os.environ["KEY"]` bracket access in tests vs `.get()` in launcher — inconsistent | Test | **High** |
|
|
| W12 | `MockVenueScenario` no `rate_limit` flag — RATE_LIMITED path untested in CI | Test | Medium |
|
|
| W13 | Rate-limit regex uses English phrase `"unblocked after"` — non-portable | Venue | Medium |
|
|
| W14 | Invalid `ACTIVE_SLOT_LIMIT` values silently discarded — no log, no warning | Config | Medium |
|
|
|
|
---
|
|
|
|
## X-Series: Rust Build/Deps, Python Packaging, Shared Memory Protocol (Pass 21)
|
|
|
|
*Full detail in TRACE doc under "PASS 21 — RUST BUILD/DEPS, PYTHON PACKAGING, SHARED MEMORY PROTOCOL."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| X1 | No ABI compatibility check on Rust `.so` load — stale binary corrupts silently | Bridge | **Critical** |
|
|
| X2 | `real_zinc_plane._write_region()` zeroes entire buffer before write — visible all-zero window | Plane | **Critical** |
|
|
| X3 | No `requirements.txt`/`setup.py`/`pyproject.toml` — zero Python dependency declarations | Build | **Critical** |
|
|
| X4 | `RealZincControlPlane.update()` no thread lock — concurrent calls corrupt seq and shared memory | Plane | **High** |
|
|
| X5 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | **High** |
|
|
| X6 | 5 test files use hardcoded `sys.path.insert(0, "/mnt/dolphinng5_predict")` — non-portable | Test | **High** |
|
|
| X7 | `_decode_packet()` no try/except on `json.loads` — partial body read crashes reader | Plane | **High** |
|
|
| X8 | `ExchangeEvent`/`ExchangeEventKind` not exported from `__init__.py` | Bridge | **High** |
|
|
| X9 | No MSRV or `rust-toolchain.toml` — builds differ per Rust version | Rust | Medium |
|
|
| X10 | RealZincPlane and RealZincControlPlane collide on `{prefix}_control` region name | Plane | Medium |
|
|
| X11 | Sequence number decoded but never read by any consumer — dead data on wire | Plane | Medium |
|
|
| X12 | `_maybe_close()` `fut.result(timeout=10.0)` — TimeoutError strands coroutine | Launcher | Medium |
|
|
| X13 | `__init__.py` flat re-exports 45 names — naming collision risk | Bridge | Medium |
|
|
| X14 | `close()` not idempotent on RealZincPlane/RealZincControlPlane | Plane | Medium |
|
|
|
|
---
|
|
|
|
## Y-Series: Serde Round-Trip, Mock Fidelity, Protocol Consistency (Pass 22)
|
|
|
|
*Full detail in TRACE doc under "PASS 22 — SERDE FIELD-BY-FIELD ROUND-TRIP, MOCK VENUE FIDELITY GAPS, PROTOCOL CONSISTENCY."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| Y1 | `exit_leg_ratios` serde default `[]` vs struct default `vec![1.0]` — wrong ratio on restore | Rust | Medium |
|
|
| Y2 | `KernelIntent.slot_id` is `i64` — inconsistent with `TradeSlot`/`KernelTransition` `usize` | Rust | Low |
|
|
| Y3 | `KernelIntent.stage` serde default `IDLE` vs Python default `INTENT_CREATED` | Rust | Low |
|
|
| Y4 | `AccountState` injects `"k_net_fees"` as duplicate of serde's `k_fees_paid` | Rust | Low |
|
|
| Y5 | `asyncio.sleep(0.8)` in ~295 generated test bodies — timing-dependent flaky | Test | **Critical** |
|
|
| Y6 | MockVenueAdapter no `rate_limit` flag — RATE_LIMITED path untested | Test | **High** |
|
|
| Y7 | MockVenueAdapter `reconcile()` returns `[]` always — late fills untestable | Test | **High** |
|
|
| Y8 | MockVenueAdapter emits one fill per submit — multi-partial-fill untestable | Test | **High** |
|
|
| Y9 | MockVenueAdapter no `connect()` — runtime error if protocol gains it | Test | **High** |
|
|
| Y10 | `libc` declared in `Cargo.toml` but never used — dead dependency | Rust | Medium |
|
|
| Y11 | MockVenueAdapter no `close()` — lifecycle cleanup untestable | Test | Medium |
|
|
| Y12 | MockVenueAdapter fills synchronous — masks async timing bugs | Test | Medium |
|
|
| Y13 | `IndexSet` JSON array doesn't encode LRU semantics — fragile on manual edit | Rust | Low |
|
|
| Y14 | `_slot_from_payload()` duplicated in two files with different behavior | Bridge | Medium |
|
|
|
|
---
|
|
|
|
## H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)
|
|
|
|
*Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."*
|
|
|
|
| # | Flaw | Layer | Severity |
|
|
|---|------|-------|----------|
|
|
| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | **Critical** |
|
|
| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | **Critical** |
|
|
| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | **Critical** |
|
|
| H4 | `_row_float` rejects zero as valid, `except Exception: continue` swallows all | Venue | **High** |
|
|
| H5 | `_backend_snapshot` timeout returns stale data/None — callers crash | Venue | **High** |
|
|
| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) — **✅ FIXED `fb03300`** (Python bridge sites) | Bridge | **High** |
|
|
| H7 | `_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET | Venue | **High** |
|
|
| H8 | Unknown venue status silently mapped to ACKED | Venue | **High** |
|
|
| H9 | `RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost | Zinc | **High** |
|
|
| H10 | `RealZincControlPlane.read()` no atomicity with concurrent `update()` | Control | **High** |
|
|
| H11 | `_RustKernelLib` lazy init with race condition — concurrent cargo build | Bridge | **High** |
|
|
| H12 | `ExecutionKernel.__del__` use-after-free on Rust handle | Bridge | **High** |
|
|
| H13 | `MirroredControlPlane` missing protocol methods (wait/notify) | Control | Medium |
|
|
| H14 | `TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics | Contracts | Medium |
|
|
| H15 | `_maybe_close` asyncio.run RuntimeError silently swallowed | Launcher | Medium |
|
|
| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
|
|
| H17 | `load_dotenv()` at module level — import-time I/O side effect | Launcher | Medium |
|
|
| H18 | `_run()` blocks event loop on every HTTP call via thread pool | Venue | Medium |
|
|
| H19 | `HazelcastClientLike` protocol has zero concrete implementations | Projection | Low |
|
|
| H20 | `_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM | Zinc | Low |
|
|
| H21 | `wasm-bindgen` compiled into native library unnecessarily | Build | Low |
|
|
| H22 | `socket.getaddrinfo` monkey-patch in test code | Test | Low |
|
|
|
|
---
|
|
|
|
## A-Series: Architectural Flaws (detailed writeups)
|
|
|
|
*These are the original architectural flaws with full analysis.*
|
|
|
|
---
|
|
|
|
### Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size
|
|
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~770-780 (EXIT handler in `process_intent`)
|
|
|
|
**Severity:** **High**
|
|
|
|
**Nature:** Logic error — wrong base for exit-size computation.
|
|
|
|
### Downstream effect
|
|
|
|
The EXIT handler computes the exit size as `base_size * exit_ratio` where:
|
|
```rust
|
|
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
|
|
```
|
|
|
|
After partial fills (e.g., two separate MARKET exit legs), `initial_size` is still the
|
|
**original** entry size while `slot.size` has been reduced by previous legs. If the
|
|
cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the
|
|
computed exit size can exceed the remaining position.
|
|
|
|
The venue adapter clamps to actual position via `reduceOnly`, but the kernel's _own_
|
|
accounting reduces `slot.size` by the fill size, not by the intended exit size. The
|
|
slot can therefore go negative (`slot.size < 0`) if the fill is larger than remaining.
|
|
|
|
### Exact trigger
|
|
|
|
1. Enter SHORT, size=1.0, `initial_size=1.0`, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
|
|
2. EXIT leg 0: `exit_size = 1.0 * 0.6 = 0.6`. Fill consumes 0.6. Slot size goes to 0.4.
|
|
3. EXIT leg 1: `exit_size = 1.0 * 0.6 = 0.6`. But remaining is 0.4. Requests 0.6.
|
|
4. BingX `reduceOnly` clamps fill to 0.4. Slot size goes to 0.0.
|
|
5. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is already at 0.0.
|
|
Kernel returns `NO_OPEN_POSITION` — the final EXIT is rejected because `slot.closed`
|
|
was not set by the previous fill (it was a partial close, not terminal).
|
|
6. Slot is at size=0.0, `!slot.closed`, no active orders, but `!slot.is_free()` because
|
|
`size <= 0.0` is true but `fsm_state != IDLE/CLOSED` — slot is **stuck** in
|
|
`POSITION_OPEN` with zero size.
|
|
|
|
This is **not** purely a mis-sized ratio problem. With MARKET orders that fill fully,
|
|
even correct ratios can leave the slot stuck if the fill price differs from the
|
|
intended-size price and the venue adjusts fill quantity.
|
|
|
|
### Fix strategy
|
|
|
|
Use `slot.size` directly as the base (not `initial_size`):
|
|
```rust
|
|
let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);
|
|
```
|
|
|
|
This guarantees the exit never requests more than the remaining position, regardless
|
|
of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.
|
|
|
|
---
|
|
|
|
### Flaw A2: Misleading CANCEL diagnostic code on entry-only slots
|
|
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~798-810 (CANCEL rejection path)
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Diagnostic pollution — wrong error code.
|
|
|
|
### Downstream effect
|
|
|
|
When a CANCEL intent arrives and **neither** `active_exit_order` nor
|
|
`active_entry_order` is cancellable, the kernel returns:
|
|
```rust
|
|
diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER
|
|
```
|
|
|
|
But the reason may be that there's no active entry order either, or the FSM state
|
|
doesn't permit cancellation. The diagnostic name suggests an exit-order-specific
|
|
problem when the failure is generic "nothing to cancel."
|
|
|
|
### Fix
|
|
|
|
Change to a generic `NO_ACTIVE_ORDER` diagnostic or `SLOT_IDLE` when the slot is
|
|
already in IDLE. `NO_ACTIVE_EXIT_ORDER` is misleading for a slot that has never had
|
|
any order.
|
|
|
|
---
|
|
|
|
### Flaw A3: Float-accumulated slot.size after partial fills can go negative
|
|
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~1365-1370 (apply_fill exit path)
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Numerical precision edge case.
|
|
|
|
### Code path
|
|
|
|
```rust
|
|
slot.size = (slot.size - fill_size).max(0.0);
|
|
```
|
|
|
|
This clamps to zero, which is correct. But if the venue fills *more* than requested
|
|
(on BingX, this can happen with market orders where the fill walks the book), the
|
|
slot sees `fill_size > intended_size`. The `max(0.0)` prevents negative, but the
|
|
slot then reports `size=0.0` with `!closed` and an FSM state that's not IDLE.
|
|
|
|
The `is_free()` check requires `size <= 0.0` AND `fsm_state in {IDLE, CLOSED}`. A
|
|
slot with `size=0.0` and `fsm_state=POSITION_OPEN` is stuck — no EXIT will be
|
|
accepted and no ENTER can start.
|
|
|
|
### Trigger
|
|
|
|
Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the
|
|
book, overshoots). `fill_size=0.8`, `slot.size = (0.6 - 0.8).max(0.0) = 0.0`.
|
|
Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), `closed=false`.
|
|
|
|
### Fix
|
|
|
|
When `slot.size <= 1e-12` after a fill and the slot is in an exit-related state,
|
|
force transition to CLOSED/IDLE regardless of leg index:
|
|
```rust
|
|
if slot.size <= 1e-12 {
|
|
slot.closed = true;
|
|
slot.fsm_state = TradeStage::CLOSED;
|
|
slot.active_exit_order = None;
|
|
slot.active_entry_order = None;
|
|
return;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Flaw A4: Entry price is clobbered by mark_price if called before fill arrives
|
|
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)
|
|
|
|
**Severity:** **Medium**
|
|
|
|
**Nature:** Accounting accuracy — incorrect PnL base.
|
|
|
|
### Code path
|
|
|
|
```rust
|
|
// In mark_price:
|
|
if self.entry_price <= 0.0 {
|
|
self.entry_price = price; // Seeds entry_price from mark before fill
|
|
}
|
|
|
|
// In apply_fill (entry):
|
|
if event.price > 0.0 {
|
|
slot.entry_price = event.price; // Overwrites with actual fill price
|
|
}
|
|
```
|
|
|
|
The `mark_price` path seeds `entry_price` from a market price when the slot has no
|
|
fill yet. The `apply_fill` entry path correctly overwrites with the actual fill price.
|
|
So in the normal flow this is harmless — the fill overwrites the mark.
|
|
|
|
**However**, consider this sequence:
|
|
1. ENTER intent accepted → slot goes `ORDER_REQUESTED`, `entry_price = 0.0`
|
|
2. `runtime.step()` calls `kernel.mark_price(snapshot.symbol, snapshot.price)` → sets `entry_price = 100.0`
|
|
3. `on_venue_event(ORDER_ACK)` → `ENTRY_WORKING`, `entry_price` still `100.0`
|
|
4. `on_venue_event(PARTIAL_FILL)` → `apply_fill` sets `entry_price = 99.5` (fill price)
|
|
5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5
|
|
|
|
This is a transient mis-valuation window. It corrects itself on the next `observe_slots`
|
|
call, but intra-step readers see wrong unrealized PnL. Not critical because:
|
|
- `account.snapshot.unrealized_pnl` uses the slot's `unrealized_pnl`, not the mark
|
|
- Realized PnL is computed from actual fill prices
|
|
- The window lasts at most one scan cycle (~5s)
|
|
|
|
### Fix
|
|
|
|
Don't set `entry_price` from `mark_price` when there's no fill:
|
|
```rust
|
|
fn mark_price(&mut self, price: f64) {
|
|
if !price.is_finite() || price <= 0.0 { return; }
|
|
// Don't seed entry_price — leave it at 0.0 until a fill arrives
|
|
if self.entry_price <= 0.0 || self.size <= 0.0 {
|
|
self.unrealized_pnl = 0.0;
|
|
return;
|
|
}
|
|
// ... normal PnL computation
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Flaw A5: Capital-before computation is arithmetic not snapshot-based
|
|
|
|
**Location:** `pink_clickhouse.py` lines ~761-762 (`_write_trade_exit_leg`) and ~822-823 (`_write_trade_event`)
|
|
|
|
**Severity:** **High**
|
|
|
|
**Nature:** Accounting accuracy — wrong capital_before under multi-slot or intervening events.
|
|
|
|
### Code pattern (appears in two places)
|
|
|
|
```python
|
|
capital_after = self._capital()
|
|
capital_before = capital_after - pnl_leg # In _write_trade_exit_leg
|
|
capital_before = capital_after - pnl # In _write_trade_event
|
|
```
|
|
|
|
This reconstructs `capital_before` by subtracting the current leg's PnL from the
|
|
current capital. This is **only correct** if:
|
|
1. No other slots settled PnL between this leg and the previous one
|
|
2. No capital corrections (reconcile, manual override) happened between legs
|
|
3. No fees were deducted between legs
|
|
|
|
With multi-slot (PINK configurable `max_slots > 1`), a concurrent trade on slot 1
|
|
that closes between slot 0's exit legs will have its PnL baked into `capital_after`,
|
|
making `capital_before = capital_after - pnl_leg` wrong.
|
|
|
|
### Fix
|
|
|
|
Maintain a per-trade `capital_before_leg` snapshot taken at the moment of the
|
|
first fill event for each trade, advancing it by the realized PnL of each leg:
|
|
```python
|
|
self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
|
|
self._leg_state[trade_id]["capital_after"] = capital_after
|
|
```
|
|
|
|
And use `prev["capital_before"]` for the row, not `capital_after - pnl_leg`.
|
|
|
|
---
|
|
|
|
### Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange
|
|
|
|
**Location:** `pink_direct.py` lines ~597-630 (`recover_account`) and docstring of `reconcile_account`
|
|
|
|
**Severity:** **Medium**
|
|
|
|
**Nature:** Operational drift — capital is never verified against exchange truth in hot loop.
|
|
|
|
### The gap
|
|
|
|
`reconcile_account()` (line 632) has this docstring:
|
|
```
|
|
Periodic exchange-led account sync.
|
|
Capital is re-seeded from the exchange balance as a guard against long-running drift
|
|
```
|
|
|
|
But the actual implementation:
|
|
```python
|
|
async def reconcile_account(self, ...) -> dict[str, Any]:
|
|
return await self.recover_account(...)
|
|
|
|
async def recover_account(self, ...) -> dict[str, Any]:
|
|
capital = float(self.kernel.account.snapshot.capital or 25000.0)
|
|
_reconcile_position_slot(self.kernel, capital, slot_id=0)
|
|
```
|
|
|
|
It passes the **kernel's own capital** to `_reconcile_position_slot`, which then
|
|
overwrites `kernel.account.snapshot.capital` with... the same value. No exchange
|
|
balance poll ever overwrites capital.
|
|
|
|
`connect()` at line 224 does the same — it passes `initial_capital` (an env default),
|
|
not the exchange balance. The exchange balance is never read for capital seeding
|
|
in the current code path. `_reconcile_position_slot` does call
|
|
`venue.open_positions()`, but it only reads positions, not capital.
|
|
|
|
### Effect
|
|
|
|
Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation,
|
|
or any other systematic error) accumulates monotonically. There is no mechanism to
|
|
detect or correct drift. Over weeks of live trading, the kernel's capital snapshot
|
|
can diverge arbitrarily from the exchange's actual balance.
|
|
|
|
### Fix
|
|
|
|
Either:
|
|
1. Make `_reconcile_position_slot` read the exchange balance and use it for
|
|
capital reseeding (the docstring claims it does this already), or
|
|
2. Add a separate capital-verification path that surfaces the delta between
|
|
kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.
|
|
|
|
---
|
|
|
|
### Flaw A7: No fee tracking in kernel accounting
|
|
|
|
**Location:** `rust_backend.py` lines ~540-545 (on_venue_event settle), `bingx_direct.py` submit_intent return
|
|
|
|
**Severity:** **Medium**
|
|
|
|
**Nature:** Accounting accuracy — fees are invisible to capital tracking.
|
|
|
|
### Downstream effect
|
|
|
|
When a trade closes, the kernel computes:
|
|
```rust
|
|
realized_pnl = delta * notional
|
|
```
|
|
|
|
This is **gross** PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%).
|
|
These fees are never subtracted from the kernel's realized PnL. Over 100 trades with
|
|
$100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000
|
|
trades at 10x leverage and $50k average notional, the error is $200k.
|
|
|
|
The `BingxDirectExecutionAdapter` does return `ExecutionReceipt` with fill data,
|
|
but `bingx_venue._events_from_submit()` only reads `price` and `filled_size` —
|
|
commission/fee fields are ignored.
|
|
|
|
### Fix
|
|
|
|
1. Read fee/commission from the BingX ack payload in `_events_from_submit()`
|
|
2. Pass fees through `VenueEvent.metadata["fee"]`
|
|
3. In the Rust kernel's `apply_fill`, subtract the fee from realized PnL:
|
|
```rust
|
|
let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
|
|
slot.realized_pnl += realized - fee;
|
|
```
|
|
|
|
---
|
|
|
|
### Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input
|
|
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~745-748
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Silent fallback — corrupt input produces a trade, not a rejection.
|
|
|
|
```rust
|
|
slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
|
|
intent.leverage
|
|
} else {
|
|
1.0
|
|
};
|
|
```
|
|
|
|
A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of
|
|
rejecting the intent. The Python bridge does validate `_first_invalid_intent_field()`
|
|
which catches NaN/inf, but it doesn't catch `leverage <= 0.0` (it only checks
|
|
`not math.isfinite(value)`).
|
|
|
|
### Fix
|
|
|
|
Add `leverage <= 0.0` to the Python bridge's invalid-intent check. The Rust kernel
|
|
should still have the `1.0` fallback as a defensive measure, but the bridge should
|
|
prevent bad leverages from reaching Rust in the first place.
|
|
|
|
---
|
|
|
|
### Flaw A9: Mock venue submit condition convoluted — dead code paths
|
|
|
|
**Location:** `mock_venue.py` lines ~60-90
|
|
|
|
**Severity:** **Informational**
|
|
|
|
**Nature:** Code clarity — confusing condition logic.
|
|
|
|
```python
|
|
if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
|
|
events.append(ack_event)
|
|
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
|
|
# ... fill events
|
|
```
|
|
|
|
The condition logic is confusing:
|
|
- When `emit_ack_before_fill=True` and `emit_fill_on_submit=True`: both branches run → ACK + fill
|
|
- When `emit_ack_before_fill=False` and `emit_fill_on_submit=True`: first branch runs because
|
|
`not True = False`, so `False or False = False` → no ACK. Second branch runs → fill only.
|
|
This produces a fill without an ACK, which is **not** a realistic venue scenario.
|
|
- When `partial_fill_ratio=1.0` (default): second branch runs and emits a `FULL_FILL` event
|
|
even when `emit_fill_on_submit=False`, because `0.0 or 1.0 > 0 = True`.
|
|
|
|
The partial fill ratio check should be gated on `emit_fill_on_submit`:
|
|
```python
|
|
should_emit_fill = self.scenario.emit_fill_on_submit or (
|
|
is_entry and self.scenario.entry_partial_fill_ratio > 0
|
|
) or (
|
|
not is_entry and self.scenario.exit_partial_fill_ratio > 0
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows
|
|
|
|
**Location:** `pink_direct.py` lines ~318-374 (`pump_venue_events`), called at line ~436
|
|
|
|
**Severity:** **Medium**
|
|
|
|
**Nature:** Operational overhead — unnecessary exchange HTTP calls.
|
|
|
|
### The problem
|
|
|
|
`step()` calls `pump_venue_events()` **every cycle**, which calls `venue.reconcile()`.
|
|
For `BingxVenueAdapter`, `reconcile()` calls `_backend_snapshot()` which does up to 5
|
|
HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow
|
|
where orders fill synchronously within `process_intent()`, there are **no** late fills
|
|
to drain.
|
|
|
|
On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each
|
|
`pump_venue_events()` call consumes 5+ of that budget. At a 5-second policy cycle,
|
|
this is 60 requests/minute — 60% of the rate budget — just to poll for fills that
|
|
don't exist.
|
|
|
|
### Fix
|
|
|
|
Gate the pump on whether the previous cycle submitted a LIMIT order:
|
|
```python
|
|
self._has_resting_order = any(
|
|
o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
|
|
for o in kernel.open_orders()
|
|
)
|
|
if self._has_resting_order:
|
|
await self.pump_venue_events(snapshot, market_state=market_state)
|
|
```
|
|
|
|
Or add a config flag `async_fill_mode: bool = False`.
|
|
|
|
---
|
|
|
|
### Flaw A11: VenueAdapter.submit() blocks the event loop
|
|
|
|
**Location:** `bingx_venue.py` lines ~225-233 (`_run`)
|
|
|
|
**Severity:** **Medium**
|
|
|
|
**Nature:** Runtime safety — synchronous call in async context.
|
|
|
|
```python
|
|
def _run(self, result: Any) -> Any:
|
|
if inspect.isawaitable(result):
|
|
try:
|
|
asyncio.get_running_loop()
|
|
except RuntimeError:
|
|
return asyncio.run(result)
|
|
pool = self._get_executor()
|
|
return pool.submit(asyncio.run, result).result()
|
|
```
|
|
|
|
When called from `step()` (which is an async function), `_run` submits the async
|
|
`submit_intent()` to a thread pool, runs it with `asyncio.run()`, then calls
|
|
`.result()` which blocks the current thread until complete. The BingX HTTP call
|
|
can take 1-5 seconds depending on network latency and exchange load.
|
|
|
|
During this block, the event loop **cannot** process other async tasks (data feed
|
|
updates, health checks, signal processing). In a single-runtime deployment, this
|
|
stalls the entire policy cycle.
|
|
|
|
### Fix
|
|
|
|
Make `process_intent` in `ExecutionKernel` accept an async venue callback, or
|
|
make `BingxVenueAdapter` truly async (not sync-with-thread-bridge). For now,
|
|
at minimum the PINK runtime should run `step()` in an executor to avoid blocking
|
|
the main event loop.
|
|
|
|
---
|
|
|
|
### Flaw A12: Stale KernelStateView slot references after reconcile
|
|
|
|
**Location:** `rust_backend.py` lines ~350-365 (`KernelStateView.refresh`)
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Stale data — view not rebuilt on reconcile.
|
|
|
|
```python
|
|
class KernelStateView:
|
|
def __init__(self, kernel):
|
|
self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
|
|
# ...
|
|
|
|
def refresh(self) -> None:
|
|
snapshot = self._kernel._snapshot_backend()
|
|
self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
|
|
self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
|
|
self.client_order_index = dict(snapshot.get("client_order_index", {}))
|
|
```
|
|
|
|
`refresh()` updates the index maps but does **not** recreate `self.slots`. The slot
|
|
views in `self.slots` are live proxies (they read through `_get_slot` each time),
|
|
so slot data is current. But if `max_slots` changes (it shouldn't, but it's mutable)
|
|
or if slots are re-indexed by a reconcile, the view list is wrong.
|
|
|
|
Not critical because `max_slots` is set at init and never changes, but worth
|
|
fixing for robustness.
|
|
|
|
---
|
|
|
|
### Flaw A13: `persist_fill_events` uses current price as exit price
|
|
|
|
**Location:** `pink_clickhouse.py` lines ~408
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Historical accuracy — logged price may not match fill price.
|
|
|
|
```python
|
|
price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
|
|
if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)
|
|
```
|
|
|
|
This correctly reads from the event's price. But `decision.reference_price` at line
|
|
417 falls back to this price, which is the fill price. The trade_event row at line
|
|
835 uses `exit_price = slot_dict.get("entry_price", ...)` — which is the **entry**
|
|
price, not the exit price. The trade_event always shows exit_price == entry_price.
|
|
|
|
This means `trade_events` in ClickHouse will never show a realistic exit price
|
|
for the persisted trade, breaking any PnL reconstruction that relies on
|
|
`(exit_price - entry_price) * size * leverage`.
|
|
|
|
---
|
|
|
|
### Flaw A14: `_write_position_state` maps active_leg_index to entry_bar
|
|
|
|
**Location:** `pink_clickhouse.py` line ~673
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Semantic mismatch — wrong field mapping.
|
|
|
|
```python
|
|
"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
|
|
```
|
|
|
|
`active_leg_index` is the index into the exit-leg-ratios array (which leg is being
|
|
exited next). It has nothing to do with how many bars the position has been held.
|
|
When a position opens, `active_leg_index` is 0. After the first exit leg, it
|
|
advances to 1. Neither value is a bar count.
|
|
|
|
`entry_bar` should be `bars_held` from the intent/decision, or a computed value
|
|
from `entry_time` to now.
|
|
|
|
---
|
|
|
|
### Flaw A15: `persist_recovery_state` passes account dict as slot dict
|
|
|
|
**Location:** `pink_clickhouse.py` lines ~447-460
|
|
|
|
**Severity:** **Low**
|
|
|
|
**Nature:** Wrong data — account snapshot used where slot data is expected.
|
|
|
|
```python
|
|
def persist_recovery_state(self, *, snapshot, acc_dict, ...):
|
|
slot_dict = acc_dict or {} # ← acc_dict is an account snapshot, not a slot
|
|
self._write_position_state(..., slot_dict={}, ...) # ← correctly uses empty dict
|
|
self._write_trade_reconstruction(
|
|
snapshot,
|
|
trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
|
|
# acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
|
|
)
|
|
```
|
|
|
|
The `trade_id` in the trade_reconstruction row will always be `""` because
|
|
`acc_dict` comes from `kernel.snapshot()["account"]` which has keys `capital`,
|
|
`equity`, `realized_pnl`, etc. — not `trade_id`. This means the recovery
|
|
`trade_reconstruction` row has no trade_id linkage.
|