Thirteenth pass: dita_kernel_destroy double-free UB — Python doesn't null handle.value (P1 Critical), CStr::from_ptr(payload) without null guard in 3 FFI exports (P2 High), _check_open_orders asyncio.run from async _verify crashes live tests (P3 High), _get_rust() TOCTOU race concurrent cargo build (P6 High), into_c_string NUL sanitizer produces invalid JSON (P4 Medium), reconcile/snapshot_json null on failure no diagnostic (P5 Medium). 263 total flaws across 13 passes. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
46 KiB
PINK DITAv2 — Structural Flaw Analysis (CENTRAL)
Analysis date: 2026-05-31 Scope: Full PINK pipeline — all flaws across all modules. Sources:
- This file (A-series): Detailed writeups for architectural flaws.
- PINK_DITAv2_E2E_TRACE_ANALYSIS.md (E, F, G-series): Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans. Every E, F, G entry below is a summary only — full detail is in the TRACE doc.
Combined Catalog (All Flaws, All Passes)
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|---|---|---|---|---|---|---|---|
| A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Total | 263 | 21 | 73 | 76 | 64 | 29 |
T-Series: Threading & Atomicity Flaws
Full detail in TRACE doc under "Threading & Atomicity" section.
| # | Flaw | Layer | Severity |
|---|---|---|---|
| T1 | InMemoryZincPlane thread-Condition deadlock from slot update re-entrancy |
Zinc | Critical |
| T2 | Thread-unsafe kernel snapshot capture for account | Bridge | High |
| T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | High |
| T4 | Consequence: on_venue_event PnL settle races |
Bridge | High |
| T5 | Access to shared _state_seq / _slot_cache in RealZincPlane from multiple kernel calls |
Zinc | Medium |
| T6 | _write_region buffer zero + notify race with concurrent reader |
Zinc | Medium |
| T7 | Publication of events in process_intent loop not synchronized with persist |
Bridge | Medium |
| T8 | asyncio.run executor skip in _run leads to event-loop stall |
Venue | Low |
| T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low |
E-Series: E2E Data-Flow Flaws (Pass 1)
Full detail in TRACE doc under "Layer 1" through "Layer 9."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| E1 | step() calls pump_venue_events() every cycle unconditionally |
Runtime | High |
| E2 | kernel.snapshot()["account"] returns a fresh dict, not a live view |
Bridge | Low |
| E3 | _decision_to_kernel_intent drops order_type and limit_price |
Runtime | High |
| E4 | _exit_intent_from_slot trusts slot.size but slot may be stale |
Runtime | High |
| E5 | JSON serialization round-trip loses numeric precision | Bridge | Low |
| E6 | _RustKernelLib is a global singleton — shared across all kernels |
Bridge | Low |
| E7 | ENTER handler silently allows re-entry with same trade_id | Rust | High |
| E8 | EXIT handler uses initial_size not current size |
Rust | High |
| E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low |
| E10 | apply_fill entry branch double-sets active_entry_order |
Rust | Low |
| E11 | _legacy_intent() is a lossy conversion |
Venue | Low |
| E12 | _events_from_submit() price fallback chain can lose venue price |
Venue | Low |
| E13 | _backend_snapshot() timeout returns stale data |
Venue | Medium |
| E14 | _events_from_cancel uses stale slot_id from order metadata |
Venue | Low |
| E15 | Submit sets leverage via separate HTTP call | Adapter | Medium |
| E16 | _format_quantity/_format_price may use zero tick/step |
Adapter | Medium |
| E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium |
| E18 | on_venue_event settles PnL incrementally — fees never included |
Bridge | Medium |
| E19 | observe_slots called with ALL slots, not just changed ones |
Bridge | Low |
| E20 | _capital() reads live from AccountProjection — stale row risk |
Persistence | Low |
| E21 | persist_fill_events() synthesizes fake Decision/Intent |
Persistence | Medium |
| E22 | _write_trade_exit_leg capital_before uses arithmetic reconstruction |
Persistence | Medium |
| E23 | _write_trade_event uses entry_price as exit_price |
Persistence | Medium |
| E24 | Mock venue always emits fill on partial_fill_ratio > 0 |
Test | Low |
| E25 | Test scenarios use MARKET-only _si() helper — no LIMIT tests |
Test | Low |
| E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low |
F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)
Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium |
| F2 | _last_settled_pnl reset before venue.submit() — transient window |
Bridge | Medium |
| F3 | _first_invalid_intent_field allows leverage=0 and target_size=0 |
Bridge | Low |
| F4 | outcome.emitted_events only from venue — Rust kernel events dropped |
Bridge | Low |
| F5 | on_venue_event redundant FFI read of slot already returned by Rust |
Bridge | Low |
| F6 | process_intent records pre-venue transitions with event=None |
Bridge | Info |
| F7 | reconcile_from_slots writes ALL slots to projection/zinc |
Bridge | Low |
| F8 | HazelcastRowWriter.put() synchronous, no error handling — crashes intent |
Projection | Medium |
| F9 | RealZincPlane.write_slot() serializes ALL slots, not just changed one |
Zinc | Low |
| F10 | RealZincPlane zeros buffer before write — concurrent read sees empty |
Zinc | Low |
| F11 | RealZincPlane._write_region no partial-write recovery |
Zinc | Low |
| F12 | InMemoryZincPlane intent_region grows without bound |
Zinc | Low |
| F13 | InMemoryZincPlane uses non-re-entrant threading.Condition |
Zinc | Low |
| F14 | KernelSlotView.__setattr__ round-trips unknown fields — silently dropped |
Bridge | Low |
| F15 | on_venue_event loop stops on first exception — slot left in partial state |
Bridge | High |
| F16 | venue.submit() returning empty events leaves slot in ORDER_REQUESTED |
Bridge | Medium |
| F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium |
| F18 | Leverage-set and order-submit failures share error handler | Adapter | Low |
| F19 | _events_from_submit stale snapshot fallback → wrong fill detection |
Venue | Medium |
| F20 | __del__ frees Rust handle at unpredictable GC time — no explicit close() |
Bridge | Medium |
| F21 | DITAv2LauncherBundle.close() closes venue before kernel is done |
Launcher | Low |
| F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | Medium |
| F23 | VenueEvent.size = intent.target_size not actual fill |
Venue | Info |
| F24 | asyncio.run() inside async function in test generator |
Test | Low |
| F25 | _build_fresh_kernel_from_slot leaks old kernel objects per call |
Test | Low |
| F26 | seen_event_ids not cleared on re-entry — accumulates across trades |
Rust | Low |
| F27 | RealZincControlPlane.read() parses Zinc region every call — no caching |
Control | Low |
| F28 | _legacy_intent hardcodes confidence=1.0, bars_held=0 |
Venue | Info |
| F29 | _slot_to_payload in real_zinc_plane.py is dead code |
Zinc | Info |
| F30 | Duplicate _slot_from_payload in real_zinc_plane.py and rust_backend.py |
Zinc | Low |
G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)
Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | Critical |
| G2 | into_c_string unwrap() panics on NUL byte in FFI string |
Rust | Critical |
| G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition | Rust | Critical |
| G4 | consume_exit_leg stale all_legs_done variable — wrong branch after last leg |
Rust | Critical |
| G5 | realized_pnl unbounded f64 — overflows to inf at extreme values |
Rust | High |
| G6 | mark_price produces unbounded unrealized_pnl — no result guard |
Rust | High |
| G7 | ENTER no is_finite() guard on target_size | Rust | High |
| G8 | reconcile_slots_json no dedup or bounds validation |
Rust | High |
| G9 | exchange_order_id update targets wrong order — exit cancel broken |
Rust | High |
| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | High |
| G11 | apply_fill overwrites intended_size with slot.size |
Rust | Medium |
| G12 | No max leverage cap enforced by kernel | Rust | Medium |
| G13 | resolve_slot fallback returns unwrap_or(0) — misroutes events |
Rust | Medium |
| G14 | commit_slot silently ignores out-of-bounds slot_id |
Rust | Medium |
| G15 | Zero __post_init__ validators on all 16 config dataclasses (127 fields) |
Config | High |
| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
| G17 | String config fields — Zinc region injection risk | Config | Medium |
| G18 | exit_leg_ratios no sum-to-1 validation |
Config | Low |
| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | High |
| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | High |
| G28 | RealZincPlane/RealZincControlPlane no del — SHM orphaned | Lifecycle | High |
| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | High |
| G30 | ExecutionKernel has no close() — relies on del for Rust handle | Lifecycle | High |
| G31 | Hazelcast projection never closed | Lifecycle | Medium |
| G32 | _maybe_close() break skips second method | Lifecycle | Low |
| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
| G36 | Only one try/finally in entire codebase | Lifecycle | High |
I-Series: Math, Tests, Concurrency, Recovery, Security (Pass 6)
Full detail in TRACE doc under "PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| I1 | Entry apply_fill multiple partial fills overwrite size instead of accumulating |
Rust | Critical |
| I2 | Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED | Rust | Medium |
| I3 | entry_price inconsistency — Python falsy vs Rust <= 0.0 gate |
Bridge | Info |
| I4 | Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer | Rust | High |
| I5 | MockVenueScenario rejection flags exist but zero tests use them | Test | High |
| I6 | No LIMIT order test through full kernel path | Test | High |
| I7 | Three weak/vacuous assertions in test_flaws.py | Test | Low |
| I8 | Entry overfill no guard | Rust | Low |
| I9 | No crash durability — slot state pure in-memory until step 7 of process_intent | Bridge | Critical |
| I10 | seen_event_ids lost on restart — events double-processed | Rust | Critical |
| I11 | No idempotency key sent to BingX — lost response creates duplicate orders | Venue | High |
| I12 | No graceful degradation for ANY subsystem | All | High |
| I13 | Stray venue event can reactivate CLOSED slot — no guard | Rust | High |
| I14 | No reconcile_from_slots call on startup — Zinc state never loaded into kernel | Restart | High |
| I15 | CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING | Rust | Medium |
| I16 | Zinc shared memory world-readable/writable by same-machine processes | Zinc | High |
| I17 | KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards | Bridge | High |
| I18 | sys.path.insert(0) at import time in 3 production files — malicious module loading | Build | High |
| I19 | pump_venue_events stale snapshot diff produces phantom position events | Venue | High |
| I20 | exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) | Contracts | Info |
| I21 | RATE_LIMITED code path in both Python and Rust is completely untested | All | Medium |
| I22 | Thread pool max_workers=3 shared across all adapter instances — never shut down | Venue | Medium |
J-Series: Test Infra, Data Feed, Rust Deeper, Env Parsing, Connections (Pass 7)
Full detail in TRACE doc under "PASS 7 — TEST INFRA, DATA FEED, RUST DEEPER, ENV PARSING, CONNECTIONS."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| J1 | _flatten submits wrong direction for LONG positions |
Test | Medium |
| J2 | _check_slot_accounting double-counts unrealized PnL |
Test | Medium |
| J3 | _build_live_snapshot timestamp is float vs datetime — type crash risk |
Data Feed | High |
| J4 | ExecutionKernel.mark_price() never called — no mark-to-market |
Bridge | High |
| J5 | All VenueEvent timestamps use local clock, not exchange timestamp | Venue | Medium |
| J6 | No monotonic timestamp verification anywhere | All | Low |
| J7 | rebuild_indexes() overwrites duplicate trade_id — last wins |
Rust | High |
| J8 | resolve_slot() falls back to slot 0 — stray event corrupts slot 0 |
Rust | High |
| J9 | get_slot_json/snapshot_json return null with no diagnostic |
Rust | Medium |
| J10 | Two processes with same DITA_V2_PREFIX corrupt shared Zinc memory | Zinc | High |
| J11 | load_dotenv() only runs on launcher.py import — ordering dependency |
Config | Medium |
| J12 | BINGX_API_KEY passed None with no validation — fails at HTTP time | Config | Medium |
| J13 | API credentials never masked in error messages or tracebacks | Config | High |
| J14 | _env_bool inconsistent: empty string = False vs unset = default |
Config | Low |
| J15 | gen2.py and _gen_test.py write to same output — last writer wins | Test | Medium |
| J16 | Shim test bridge lacks step(), decision_engine — zero runtime fidelity | Test | High |
K-Series: Observability, Memory, Time, Dead Code, Module Init (Pass 8)
Full detail in TRACE doc under "PASS 8 — OBSERVABILITY, MEMORY, TIME, DEAD CODE, MODULE INIT."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| K1 | Zero stdout/stderr — system completely silent | All | Critical |
| K2 | No health check, metrics, or monitoring surface | All | Critical |
| K3 | Failed trades produce no notification — error in return value only | Bridge | High |
| K4 | Exception tracebacks not captured — all except:pass swallow silently | All | High |
| K5 | ~85+ Python objects per process_intent — 36 TradeSlot copies via FFI | Bridge | Medium |
| K6 | Circular ref cycle Kernel→StateView→SlotView→Kernel — delays del | Bridge | High |
| K7 | MemoryKernelJournal silently drops transitions after 10K rows | Journal | High |
| K8 | RealZincPlane._intent_cache unbounded Python list growth | Zinc | High |
| K9 | _backend_snapshot timeout uses wall clock — NTP truncates/extends | Venue | High |
| K10 | RealZincControlPlane.wait() uses wall clock — no monotonic | Control | Medium |
| K11 | exchange_ts fallback to local time.time() when E missing | Stream | Medium |
| K12 | No monotonic timestamp verification anywhere | All | Medium |
| K13 | ControlPlane.wait()/notify() — zero callers across all impls | Control | Info |
| K14 | AccountProjection.to_account_event() — zero callers | Account | Info |
| K15 | HazelcastProjector entire class dead | Projection | Info |
| K16 | _order_to_payload() dead code | Bridge | Info |
| K17 | MirroredControlPlane entire class dead — never constructed | Control | Info |
| K18 | 12 of 20 TradeStage variants never matched in Rust FSM | Rust | Low |
| K19 | Unused imports in projection.py and hazelcast_projection.py | Projection | Info |
| K20 | sys.path mutation on import — global side effect | Config | Medium |
| K21 | load_dotenv() at module import time — mutates os.environ globally | Config | Medium |
| K22 | ControlPlane protocol not exported in all | Config | Info |
| K23 | KernelSlotView.getattr makes FFI call per attribute access | Bridge | Medium |
L-Series: Contracts, Exchange Events, Network, FFI, Backup Diffs (Pass 9)
Full detail in TRACE doc under "PASS 9 — CONTRACTS, EXCHANGE EVENTS, NETWORK, FFI, BACKUP DIFFS."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| L1 | KernelOutcome(accepted=True, diag=INVALID_INTENT) parseable — no invariant check |
Bridge | Medium |
| L2 | VenueEvent.filled_size > size possible via different source fields |
Venue | Medium |
| L3 | VenueEvent.price=0 reaches kernel — zero-price fill = 100% loss PnL |
Venue | High |
| L4 | available_margin set to cross-wallet balance, not available margin |
Stream | High |
| L5 | wallet_balance defaults to 0 when "wb" absent — E-side reconcile always ERROR |
Stream | High |
| L6 | _keepalive_loop no stop mechanism — runs on old key after rotation |
Stream | Medium |
| L7 | event_id integer 0 → str(0) falsy on or → random UUID generated |
Stream | Medium |
| L8 | Hardcoded VST URLs in test generators — wrong env if LIVE configured | Test | Medium |
| L9 | No proxy support — can't deploy behind corporate proxy | Network | Low |
| L10 | 5-minute DNS cache TTL — stale IPs on infrastructure change | Network | Low |
| L11 | limit_price getattr reads dataclass field, not metadata dict |
Venue | Low |
| L12 | Backup diff: 14+ critical bugs fixed, 428-line dual-ledger accounting added | Rust | Info |
| L13 | _build_full_runtime dead — real pipeline never tested |
Test | High |
| L14 | listenKeyExpired raises RuntimeError instead of clean yield |
Stream | Medium |
| L15 | _delete_listen_key suppresses all exceptions — leaked server keys |
Stream | Low |
| L16 | venue_order_id target selection ambiguous when entry order exists |
Rust | Medium |
M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10)
Full detail in TRACE doc under "PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| M1 | ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries | Rust | Critical |
| M2 | CANCEL creates no transition record — invisible in audit log | Rust | Critical |
| M3 | _mk_intent drops order_type/limit_price into metadata, not proper field |
Test | High |
| M4 | test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test | Test | High |
| M5 | Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested | Test | Medium |
| M6 | test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient | Test | Medium |
| M7 | test_outcome_state_matches_actual_slot is tautological | Test | Low |
| M8 | ORDER_ACK silent fallthrough when no active order — accepted with no effect | Rust | Medium |
| M9 | ORDER_REJECT on POSITION_OPEN with stale entry order destroys position | Rust | Critical |
| M10 | No aggregation of trade count, success/fail, latency — all zero | All | High |
| M11 | Flaw 6 tests pass via metadata passthrough, not field logic | Test | High |
| M12 | No retry/fallback for ClickHouse INSERT failures — crashes policy cycle | Persistence | High |
| M13 | AccountSnapshot.trade_seq never incremented — always 0 | Account | Medium |
| M14 | test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd | Test | Low |
| M15 | test_reconcile_rejects_position_open_with_zero_size passes for wrong reason | Test | Low |
| M16 | No built-in metric for active slots, event throughput, or memory | All | Medium |
| M17 | Flaw 9 tests named for cancel but never call cancel | Test | High |
| M18 | _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime | Runtime | High |
N-Series: Async/Sync Seams, Lock Analysis, Threading (Pass 11)
Full detail in TRACE doc under "PASS 11 — ASYNC/SYNC SEAMS, LOCK ANALYSIS, THREADING."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| N1 | Rust kernel with_handle_mut zero sync — &mut from raw ptr, UB on concurrent FFI |
Rust | Critical |
| N2 | _run() has two completely different code paths — runtime branch, not design |
Venue | Critical |
| N3 | _run() path B blocks event loop thread for every venue HTTP operation |
Venue | Critical |
| N4 | asyncio.run() called repeatedly — creates/destroys event loops per call |
Venue | Critical |
| N5 | _snapshot_ready cascading re-fetch — N callers produce N overlapping HTTP |
Venue | High |
| N6 | BingxUserStream.close() doesn't cancel pending tasks |
Stream | Medium |
| N7 | Live test architecture forces worst-case _run() path for every operation |
Test | Medium |
| N8 | subscribe() reconnect creates new tasks per iteration |
Stream | Medium |
| N9 | No asyncio.all_tasks() or task accounting — leaked tasks undetectable |
All | Low |
| N10 | _snap_lock no reader-side protection (informational) |
Venue | Info |
O-Series: Sync/Async Wider Scope (Launcher, Generators, Streams, FFI, Tests) (Pass 12)
Full detail in TRACE doc under "PASS 12 — SYNC/ASYNC WIDER SCOPE."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| O1 | _maybe_close() asyncio.run without loop guard — close skipped from async context |
Launcher | High |
| O2 | async def connect() shims call sync venue.connect() without await — blocking |
Test | Medium |
| O3 | _contract_rows(client) NOT awaited — _pick_live_symbol iterates coroutine = crash |
Test | High |
| O4 | Deprecated get_event_loop().run_until_complete() in test file |
Test | Medium |
| O5 | _run() thread pool .result() no timeout — backend hang freezes process |
Venue | High |
| O6 | MockVenueAdapter never exercises thread-pool bridge — untested in CI | Venue | Medium |
| O7 | _keepalive_loop/_rotation_sentinel fire-and-forget — exceptions silently lost |
Stream | Low |
| O8 | KernelSlotView.__getattr__ N FFI calls for N fields — no caching |
Bridge | Medium |
| O9 | DITAv2LauncherBundle no __del__ — GC'd bundle leaks resource tree |
Launcher | Medium |
| O10 | ExecutionKernel no close() — Rust handle only freed by unpredictable del |
Bridge | Medium |
| O11 | KernelSlotView.__setattr__ triggers 5 persistence side effects — no read-only view |
Bridge | Medium |
P-Series: FFI Safety, Dangling Pointers, Coverage Gaps (Pass 13)
Full detail in TRACE doc under "PASS 13 — FFI BOUNDARY SAFETY, DANGLING POINTERS, COVERAGE GAPS."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| P1 | dita_kernel_destroy double-free UB — Python doesn't null handle.value |
Bridge | Critical |
| P2 | CStr::from_ptr(payload) without null guard in 3 FFI exports |
Rust | High |
| P3 | _check_open_orders calls asyncio.run() from async _verify — RuntimeError |
Test | High |
| P4 | into_c_string NUL sanitizer produces invalid JSON — json.loads fails |
Rust | Medium |
| P5 | reconcile_slots_json/snapshot_json return null on failure — no diagnostic |
Rust | Medium |
| P6 | _get_rust() TOCTOU race — concurrent cargo build corruption |
Bridge | High |
| P7 | KernelHandle no Send/Sync — FFI bypasses Rust ownership rules |
Rust | Info |
| P8 | No explicit Rust handle destroy path from bundle.close() | Launcher | Medium |
| P9 | __del__ accesses module _RUST during shutdown — NameError leak |
Bridge | Low |
H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)
Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."
| # | Flaw | Layer | Severity |
|---|---|---|---|
| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | Critical |
| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | Critical |
| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | Critical |
| H4 | _row_float rejects zero as valid, except Exception: continue swallows all |
Venue | High |
| H5 | _backend_snapshot timeout returns stale data/None — callers crash |
Venue | High |
| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) | Bridge | High |
| H7 | _legacy_intent reads getattr(intent, "order_type") not metadata — always MARKET |
Venue | High |
| H8 | Unknown venue status silently mapped to ACKED | Venue | High |
| H9 | RealZincPlane.write_slot() slot_id >= slot_count silently lost |
Zinc | High |
| H10 | RealZincControlPlane.read() no atomicity with concurrent update() |
Control | High |
| H11 | _RustKernelLib lazy init with race condition — concurrent cargo build |
Bridge | High |
| H12 | ExecutionKernel.__del__ use-after-free on Rust handle |
Bridge | High |
| H13 | MirroredControlPlane missing protocol methods (wait/notify) |
Control | Medium |
| H14 | TradeSlot.remaining_size vs VenueOrder.remaining_size — different semantics |
Contracts | Medium |
| H15 | _maybe_close asyncio.run RuntimeError silently swallowed |
Launcher | Medium |
| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
| H17 | load_dotenv() at module level — import-time I/O side effect |
Launcher | Medium |
| H18 | _run() blocks event loop on every HTTP call via thread pool |
Venue | Medium |
| H19 | HazelcastClientLike protocol has zero concrete implementations |
Projection | Low |
| H20 | _decode_packet uncaught UnicodeDecodeError/ValueError on corrupted SHM |
Zinc | Low |
| H21 | wasm-bindgen compiled into native library unnecessarily |
Build | Low |
| H22 | socket.getaddrinfo monkey-patch in test code |
Test | Low |
A-Series: Architectural Flaws (detailed writeups)
These are the original architectural flaws with full analysis.
Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size
Location: _rust_kernel/src/lib.rs lines ~770-780 (EXIT handler in process_intent)
Severity: High
Nature: Logic error — wrong base for exit-size computation.
Downstream effect
The EXIT handler computes the exit size as base_size * exit_ratio where:
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
After partial fills (e.g., two separate MARKET exit legs), initial_size is still the
original entry size while slot.size has been reduced by previous legs. If the
cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the
computed exit size can exceed the remaining position.
The venue adapter clamps to actual position via reduceOnly, but the kernel's own
accounting reduces slot.size by the fill size, not by the intended exit size. The
slot can therefore go negative (slot.size < 0) if the fill is larger than remaining.
Exact trigger
- Enter SHORT, size=1.0,
initial_size=1.0, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0 - EXIT leg 0:
exit_size = 1.0 * 0.6 = 0.6. Fill consumes 0.6. Slot size goes to 0.4. - EXIT leg 1:
exit_size = 1.0 * 0.6 = 0.6. But remaining is 0.4. Requests 0.6. - BingX
reduceOnlyclamps fill to 0.4. Slot size goes to 0.0. - EXIT leg 2 (ratio 1.0):
exit_size = 1.0 * 1.0 = 1.0. Slot is already at 0.0. Kernel returnsNO_OPEN_POSITION— the final EXIT is rejected becauseslot.closedwas not set by the previous fill (it was a partial close, not terminal). - Slot is at size=0.0,
!slot.closed, no active orders, but!slot.is_free()becausesize <= 0.0is true butfsm_state != IDLE/CLOSED— slot is stuck inPOSITION_OPENwith zero size.
This is not purely a mis-sized ratio problem. With MARKET orders that fill fully, even correct ratios can leave the slot stuck if the fill price differs from the intended-size price and the venue adjusts fill quantity.
Fix strategy
Use slot.size directly as the base (not initial_size):
let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);
This guarantees the exit never requests more than the remaining position, regardless of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.
Flaw A2: Misleading CANCEL diagnostic code on entry-only slots
Location: _rust_kernel/src/lib.rs lines ~798-810 (CANCEL rejection path)
Severity: Low
Nature: Diagnostic pollution — wrong error code.
Downstream effect
When a CANCEL intent arrives and neither active_exit_order nor
active_entry_order is cancellable, the kernel returns:
diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER
But the reason may be that there's no active entry order either, or the FSM state doesn't permit cancellation. The diagnostic name suggests an exit-order-specific problem when the failure is generic "nothing to cancel."
Fix
Change to a generic NO_ACTIVE_ORDER diagnostic or SLOT_IDLE when the slot is
already in IDLE. NO_ACTIVE_EXIT_ORDER is misleading for a slot that has never had
any order.
Flaw A3: Float-accumulated slot.size after partial fills can go negative
Location: _rust_kernel/src/lib.rs lines ~1365-1370 (apply_fill exit path)
Severity: Low
Nature: Numerical precision edge case.
Code path
slot.size = (slot.size - fill_size).max(0.0);
This clamps to zero, which is correct. But if the venue fills more than requested
(on BingX, this can happen with market orders where the fill walks the book), the
slot sees fill_size > intended_size. The max(0.0) prevents negative, but the
slot then reports size=0.0 with !closed and an FSM state that's not IDLE.
The is_free() check requires size <= 0.0 AND fsm_state in {IDLE, CLOSED}. A
slot with size=0.0 and fsm_state=POSITION_OPEN is stuck — no EXIT will be
accepted and no ENTER can start.
Trigger
Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the
book, overshoots). fill_size=0.8, slot.size = (0.6 - 0.8).max(0.0) = 0.0.
Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), closed=false.
Fix
When slot.size <= 1e-12 after a fill and the slot is in an exit-related state,
force transition to CLOSED/IDLE regardless of leg index:
if slot.size <= 1e-12 {
slot.closed = true;
slot.fsm_state = TradeStage::CLOSED;
slot.active_exit_order = None;
slot.active_entry_order = None;
return;
}
Flaw A4: Entry price is clobbered by mark_price if called before fill arrives
Location: _rust_kernel/src/lib.rs lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)
Severity: Medium
Nature: Accounting accuracy — incorrect PnL base.
Code path
// In mark_price:
if self.entry_price <= 0.0 {
self.entry_price = price; // Seeds entry_price from mark before fill
}
// In apply_fill (entry):
if event.price > 0.0 {
slot.entry_price = event.price; // Overwrites with actual fill price
}
The mark_price path seeds entry_price from a market price when the slot has no
fill yet. The apply_fill entry path correctly overwrites with the actual fill price.
So in the normal flow this is harmless — the fill overwrites the mark.
However, consider this sequence:
- ENTER intent accepted → slot goes
ORDER_REQUESTED,entry_price = 0.0 runtime.step()callskernel.mark_price(snapshot.symbol, snapshot.price)→ setsentry_price = 100.0on_venue_event(ORDER_ACK)→ENTRY_WORKING,entry_pricestill100.0on_venue_event(PARTIAL_FILL)→apply_fillsetsentry_price = 99.5(fill price)- Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5
This is a transient mis-valuation window. It corrects itself on the next observe_slots
call, but intra-step readers see wrong unrealized PnL. Not critical because:
account.snapshot.unrealized_pnluses the slot'sunrealized_pnl, not the mark- Realized PnL is computed from actual fill prices
- The window lasts at most one scan cycle (~5s)
Fix
Don't set entry_price from mark_price when there's no fill:
fn mark_price(&mut self, price: f64) {
if !price.is_finite() || price <= 0.0 { return; }
// Don't seed entry_price — leave it at 0.0 until a fill arrives
if self.entry_price <= 0.0 || self.size <= 0.0 {
self.unrealized_pnl = 0.0;
return;
}
// ... normal PnL computation
}
Flaw A5: Capital-before computation is arithmetic not snapshot-based
Location: pink_clickhouse.py lines ~761-762 (_write_trade_exit_leg) and ~822-823 (_write_trade_event)
Severity: High
Nature: Accounting accuracy — wrong capital_before under multi-slot or intervening events.
Code pattern (appears in two places)
capital_after = self._capital()
capital_before = capital_after - pnl_leg # In _write_trade_exit_leg
capital_before = capital_after - pnl # In _write_trade_event
This reconstructs capital_before by subtracting the current leg's PnL from the
current capital. This is only correct if:
- No other slots settled PnL between this leg and the previous one
- No capital corrections (reconcile, manual override) happened between legs
- No fees were deducted between legs
With multi-slot (PINK configurable max_slots > 1), a concurrent trade on slot 1
that closes between slot 0's exit legs will have its PnL baked into capital_after,
making capital_before = capital_after - pnl_leg wrong.
Fix
Maintain a per-trade capital_before_leg snapshot taken at the moment of the
first fill event for each trade, advancing it by the realized PnL of each leg:
self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
self._leg_state[trade_id]["capital_after"] = capital_after
And use prev["capital_before"] for the row, not capital_after - pnl_leg.
Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange
Location: pink_direct.py lines ~597-630 (recover_account) and docstring of reconcile_account
Severity: Medium
Nature: Operational drift — capital is never verified against exchange truth in hot loop.
The gap
reconcile_account() (line 632) has this docstring:
Periodic exchange-led account sync.
Capital is re-seeded from the exchange balance as a guard against long-running drift
But the actual implementation:
async def reconcile_account(self, ...) -> dict[str, Any]:
return await self.recover_account(...)
async def recover_account(self, ...) -> dict[str, Any]:
capital = float(self.kernel.account.snapshot.capital or 25000.0)
_reconcile_position_slot(self.kernel, capital, slot_id=0)
It passes the kernel's own capital to _reconcile_position_slot, which then
overwrites kernel.account.snapshot.capital with... the same value. No exchange
balance poll ever overwrites capital.
connect() at line 224 does the same — it passes initial_capital (an env default),
not the exchange balance. The exchange balance is never read for capital seeding
in the current code path. _reconcile_position_slot does call
venue.open_positions(), but it only reads positions, not capital.
Effect
Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation, or any other systematic error) accumulates monotonically. There is no mechanism to detect or correct drift. Over weeks of live trading, the kernel's capital snapshot can diverge arbitrarily from the exchange's actual balance.
Fix
Either:
- Make
_reconcile_position_slotread the exchange balance and use it for capital reseeding (the docstring claims it does this already), or - Add a separate capital-verification path that surfaces the delta between kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.
Flaw A7: No fee tracking in kernel accounting
Location: rust_backend.py lines ~540-545 (on_venue_event settle), bingx_direct.py submit_intent return
Severity: Medium
Nature: Accounting accuracy — fees are invisible to capital tracking.
Downstream effect
When a trade closes, the kernel computes:
realized_pnl = delta * notional
This is gross PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%). These fees are never subtracted from the kernel's realized PnL. Over 100 trades with $100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000 trades at 10x leverage and $50k average notional, the error is $200k.
The BingxDirectExecutionAdapter does return ExecutionReceipt with fill data,
but bingx_venue._events_from_submit() only reads price and filled_size —
commission/fee fields are ignored.
Fix
- Read fee/commission from the BingX ack payload in
_events_from_submit() - Pass fees through
VenueEvent.metadata["fee"] - In the Rust kernel's
apply_fill, subtract the fee from realized PnL:let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0); slot.realized_pnl += realized - fee;
Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input
Location: _rust_kernel/src/lib.rs lines ~745-748
Severity: Low
Nature: Silent fallback — corrupt input produces a trade, not a rejection.
slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
intent.leverage
} else {
1.0
};
A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of
rejecting the intent. The Python bridge does validate _first_invalid_intent_field()
which catches NaN/inf, but it doesn't catch leverage <= 0.0 (it only checks
not math.isfinite(value)).
Fix
Add leverage <= 0.0 to the Python bridge's invalid-intent check. The Rust kernel
should still have the 1.0 fallback as a defensive measure, but the bridge should
prevent bad leverages from reaching Rust in the first place.
Flaw A9: Mock venue submit condition convoluted — dead code paths
Location: mock_venue.py lines ~60-90
Severity: Informational
Nature: Code clarity — confusing condition logic.
if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
events.append(ack_event)
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
# ... fill events
The condition logic is confusing:
- When
emit_ack_before_fill=Trueandemit_fill_on_submit=True: both branches run → ACK + fill - When
emit_ack_before_fill=Falseandemit_fill_on_submit=True: first branch runs becausenot True = False, soFalse or False = False→ no ACK. Second branch runs → fill only. This produces a fill without an ACK, which is not a realistic venue scenario. - When
partial_fill_ratio=1.0(default): second branch runs and emits aFULL_FILLevent even whenemit_fill_on_submit=False, because0.0 or 1.0 > 0 = True.
The partial fill ratio check should be gated on emit_fill_on_submit:
should_emit_fill = self.scenario.emit_fill_on_submit or (
is_entry and self.scenario.entry_partial_fill_ratio > 0
) or (
not is_entry and self.scenario.exit_partial_fill_ratio > 0
)
Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows
Location: pink_direct.py lines ~318-374 (pump_venue_events), called at line ~436
Severity: Medium
Nature: Operational overhead — unnecessary exchange HTTP calls.
The problem
step() calls pump_venue_events() every cycle, which calls venue.reconcile().
For BingxVenueAdapter, reconcile() calls _backend_snapshot() which does up to 5
HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow
where orders fill synchronously within process_intent(), there are no late fills
to drain.
On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each
pump_venue_events() call consumes 5+ of that budget. At a 5-second policy cycle,
this is 60 requests/minute — 60% of the rate budget — just to poll for fills that
don't exist.
Fix
Gate the pump on whether the previous cycle submitted a LIMIT order:
self._has_resting_order = any(
o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
for o in kernel.open_orders()
)
if self._has_resting_order:
await self.pump_venue_events(snapshot, market_state=market_state)
Or add a config flag async_fill_mode: bool = False.
Flaw A11: VenueAdapter.submit() blocks the event loop
Location: bingx_venue.py lines ~225-233 (_run)
Severity: Medium
Nature: Runtime safety — synchronous call in async context.
def _run(self, result: Any) -> Any:
if inspect.isawaitable(result):
try:
asyncio.get_running_loop()
except RuntimeError:
return asyncio.run(result)
pool = self._get_executor()
return pool.submit(asyncio.run, result).result()
When called from step() (which is an async function), _run submits the async
submit_intent() to a thread pool, runs it with asyncio.run(), then calls
.result() which blocks the current thread until complete. The BingX HTTP call
can take 1-5 seconds depending on network latency and exchange load.
During this block, the event loop cannot process other async tasks (data feed updates, health checks, signal processing). In a single-runtime deployment, this stalls the entire policy cycle.
Fix
Make process_intent in ExecutionKernel accept an async venue callback, or
make BingxVenueAdapter truly async (not sync-with-thread-bridge). For now,
at minimum the PINK runtime should run step() in an executor to avoid blocking
the main event loop.
Flaw A12: Stale KernelStateView slot references after reconcile
Location: rust_backend.py lines ~350-365 (KernelStateView.refresh)
Severity: Low
Nature: Stale data — view not rebuilt on reconcile.
class KernelStateView:
def __init__(self, kernel):
self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
# ...
def refresh(self) -> None:
snapshot = self._kernel._snapshot_backend()
self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
self.client_order_index = dict(snapshot.get("client_order_index", {}))
refresh() updates the index maps but does not recreate self.slots. The slot
views in self.slots are live proxies (they read through _get_slot each time),
so slot data is current. But if max_slots changes (it shouldn't, but it's mutable)
or if slots are re-indexed by a reconcile, the view list is wrong.
Not critical because max_slots is set at init and never changes, but worth
fixing for robustness.
Flaw A13: persist_fill_events uses current price as exit price
Location: pink_clickhouse.py lines ~408
Severity: Low
Nature: Historical accuracy — logged price may not match fill price.
price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)
This correctly reads from the event's price. But decision.reference_price at line
417 falls back to this price, which is the fill price. The trade_event row at line
835 uses exit_price = slot_dict.get("entry_price", ...) — which is the entry
price, not the exit price. The trade_event always shows exit_price == entry_price.
This means trade_events in ClickHouse will never show a realistic exit price
for the persisted trade, breaking any PnL reconstruction that relies on
(exit_price - entry_price) * size * leverage.
Flaw A14: _write_position_state maps active_leg_index to entry_bar
Location: pink_clickhouse.py line ~673
Severity: Low
Nature: Semantic mismatch — wrong field mapping.
"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
active_leg_index is the index into the exit-leg-ratios array (which leg is being
exited next). It has nothing to do with how many bars the position has been held.
When a position opens, active_leg_index is 0. After the first exit leg, it
advances to 1. Neither value is a bar count.
entry_bar should be bars_held from the intent/decision, or a computed value
from entry_time to now.
Flaw A15: persist_recovery_state passes account dict as slot dict
Location: pink_clickhouse.py lines ~447-460
Severity: Low
Nature: Wrong data — account snapshot used where slot data is expected.
def persist_recovery_state(self, *, snapshot, acc_dict, ...):
slot_dict = acc_dict or {} # ← acc_dict is an account snapshot, not a slot
self._write_position_state(..., slot_dict={}, ...) # ← correctly uses empty dict
self._write_trade_reconstruction(
snapshot,
trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
# acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
)
The trade_id in the trade_reconstruction row will always be "" because
acc_dict comes from kernel.snapshot()["account"] which has keys capital,
equity, realized_pnl, etc. — not trade_id. This means the recovery
trade_reconstruction row has no trade_id linkage.