Public Access

Files

Codex d1a6be0d27 PINK: E2E trace analysis — Pass 12 sync/async wider scope (O1-O11)

Twelfth pass: _maybe_close asyncio.run silently skips close from async
context (O1), _pick_live_symbol missing await crashes on coroutine iteration
(O3), _run() pool .result() no timeout — backend hang freezes process (O5),
KernelSlotView.__getattr__ N FFI calls for N fields no caching (O8),
DITAv2LauncherBundle no __del__ leaks resource tree (O9), ExecutionKernel
no close() — __del__ only cleanup (O10), __setattr__ triggers 5 persistence
side effects undocumented (O11). 254 total flaws.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

2026-06-02 09:27:25 +02:00

45 KiB

Raw Blame History

PINK DITAv2 — Structural Flaw Analysis (CENTRAL)

Analysis date: 2026-05-31 Scope: Full PINK pipeline — all flaws across all modules. Sources:

This file (A-series): Detailed writeups for architectural flaws.
PINK_DITAv2_E2E_TRACE_ANALYSIS.md (E, F, G-series): Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans. Every E, F, G entry below is a summary only — full detail is in the TRACE doc.

Combined Catalog (All Flaws, All Passes)

Pass	Focus	Count	Critical	High	Medium	Low	Info
A	Architectural (detailed in this file)	15	0	2	0	2	11
T	Threading/Atomicity	9	1	3	3	2	0
E	E2E Trace (Pass 1)	26	0	4	10	11	1
F	Deep E2E (Pass 3)	30	0	1	8	17	4
G	Domain Scans (Pass 4)	36	4	11	11	8	2
H	Edge Domains (Pass 5)	22	3	9	5	4	1
I	Pass 6 (Math/Tests/Recovery/Security)	22	3	11	4	2	2
J	Pass 7 (Test Infra/Data/Rust/Env/Conn)	16	0	7	7	2	0
K	Pass 8 (Observability/Memory/Time/DeadCode)	23	2	7	7	1	6
L	Pass 9 (Contracts/Events/Network/FFI/Diffs)	16	0	4	8	4	0
M	Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics)	18	3	7	5	3	0
N	Pass 11 (Async/Sync Seams/Locks/Threading)	10	4	1	3	1	1
O	Pass 12 (Sync/Async Wider Scope)	11	0	3	7	1	0
Total		254	20	70	73	60	28

T-Series: Threading & Atomicity Flaws

Full detail in TRACE doc under "Threading & Atomicity" section.

#	Flaw	Layer	Severity
T1	`InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy	Zinc	Critical
T2	Thread-unsafe kernel snapshot capture for account	Bridge	High
T3	Re-entrant or incorrectly-scoped Rust-kernel handle usage	Bridge	High
T4	Consequence: `on_venue_event` PnL settle races	Bridge	High
T5	Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls	Zinc	Medium
T6	`_write_region` buffer zero + notify race with concurrent reader	Zinc	Medium
T7	Publication of events in `process_intent` loop not synchronized with persist	Bridge	Medium
T8	`asyncio.run` executor skip in `_run` leads to event-loop stall	Venue	Low
T9	No thread-safe Python↔Rust ownership / lifetime protocol	Bridge	Low

E-Series: E2E Data-Flow Flaws (Pass 1)

Full detail in TRACE doc under "Layer 1" through "Layer 9."

#	Flaw	Layer	Severity
E1	`step()` calls `pump_venue_events()` every cycle unconditionally	Runtime	High
E2	`kernel.snapshot()["account"]` returns a fresh dict, not a live view	Bridge	Low
E3	`_decision_to_kernel_intent` drops `order_type` and `limit_price`	Runtime	High
E4	`_exit_intent_from_slot` trusts slot.size but slot may be stale	Runtime	High
E5	JSON serialization round-trip loses numeric precision	Bridge	Low
E6	`_RustKernelLib` is a global singleton — shared across all kernels	Bridge	Low
E7	ENTER handler silently allows re-entry with same trade_id	Rust	High
E8	EXIT handler uses `initial_size` not current size	Rust	High
E9	CANCEL handler returns diagnostic even when nothing happened	Rust	Low
E10	`apply_fill` entry branch double-sets `active_entry_order`	Rust	Low
E11	`_legacy_intent()` is a lossy conversion	Venue	Low
E12	`_events_from_submit()` price fallback chain can lose venue price	Venue	Low
E13	`_backend_snapshot()` timeout returns stale data	Venue	Medium
E14	`_events_from_cancel` uses stale `slot_id` from order metadata	Venue	Low
E15	Submit sets leverage via separate HTTP call	Adapter	Medium
E16	`_format_quantity`/`_format_price` may use zero tick/step	Adapter	Medium
E17	Cancel uses truth-based confirmation — can mask real errors	Adapter	Medium
E18	`on_venue_event` settles PnL incrementally — fees never included	Bridge	Medium
E19	`observe_slots` called with ALL slots, not just changed ones	Bridge	Low
E20	`_capital()` reads live from `AccountProjection` — stale row risk	Persistence	Low
E21	`persist_fill_events()` synthesizes fake Decision/Intent	Persistence	Medium
E22	`_write_trade_exit_leg` capital_before uses arithmetic reconstruction	Persistence	Medium
E23	`_write_trade_event` uses entry_price as exit_price	Persistence	Medium
E24	Mock venue always emits fill on `partial_fill_ratio > 0`	Test	Low
E25	Test scenarios use MARKET-only `_si()` helper — no LIMIT tests	Test	Low
E26	Fresh-kernel reconcile tests create second kernel but share venue	Test	Low

F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)

Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."

#	Flaw	Layer	Severity
F1	CANCEL returns "accepted" before cancel happens — stale diagnostic_code	Bridge	Medium
F2	`_last_settled_pnl` reset before `venue.submit()` — transient window	Bridge	Medium
F3	`_first_invalid_intent_field` allows `leverage=0` and `target_size=0`	Bridge	Low
F4	`outcome.emitted_events` only from venue — Rust kernel events dropped	Bridge	Low
F5	`on_venue_event` redundant FFI read of slot already returned by Rust	Bridge	Low
F6	`process_intent` records pre-venue transitions with `event=None`	Bridge	Info
F7	`reconcile_from_slots` writes ALL slots to projection/zinc	Bridge	Low
F8	`HazelcastRowWriter.put()` synchronous, no error handling — crashes intent	Projection	Medium
F9	`RealZincPlane.write_slot()` serializes ALL slots, not just changed one	Zinc	Low
F10	`RealZincPlane` zeros buffer before write — concurrent read sees empty	Zinc	Low
F11	`RealZincPlane._write_region` no partial-write recovery	Zinc	Low
F12	`InMemoryZincPlane` intent_region grows without bound	Zinc	Low
F13	`InMemoryZincPlane` uses non-re-entrant `threading.Condition`	Zinc	Low
F14	`KernelSlotView.__setattr__` round-trips unknown fields — silently dropped	Bridge	Low
F15	`on_venue_event` loop stops on first exception — slot left in partial state	Bridge	High
F16	`venue.submit()` returning empty events leaves slot in ORDER_REQUESTED	Bridge	Medium
F17	Cancel truth-based confirmation returns REJECTED for already-cancelled orders	Adapter	Medium
F18	Leverage-set and order-submit failures share error handler	Adapter	Low
F19	`_events_from_submit` stale snapshot fallback → wrong fill detection	Venue	Medium
F20	`__del__` frees Rust handle at unpredictable GC time — no explicit close()	Bridge	Medium
F21	`DITAv2LauncherBundle.close()` closes venue before kernel is done	Launcher	Low
F22	Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware	Launcher	Medium
F23	`VenueEvent.size` = `intent.target_size` not actual fill	Venue	Info
F24	`asyncio.run()` inside async function in test generator	Test	Low
F25	`_build_fresh_kernel_from_slot` leaks old kernel objects per call	Test	Low
F26	`seen_event_ids` not cleared on re-entry — accumulates across trades	Rust	Low
F27	`RealZincControlPlane.read()` parses Zinc region every call — no caching	Control	Low
F28	`_legacy_intent` hardcodes confidence=1.0, bars_held=0	Venue	Info
F29	`_slot_to_payload` in real_zinc_plane.py is dead code	Zinc	Info
F30	Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py	Zinc	Low

G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)

Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."

#	Flaw	Layer	Severity
G1	EXIT_RESIDUAL action missing from Rust KernelCommandType enum	Rust	Critical
G2	`into_c_string` unwrap() panics on NUL byte in FFI string	Rust	Critical
G3	EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition	Rust	Critical
G4	`consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg	Rust	Critical
G5	`realized_pnl` unbounded f64 — overflows to inf at extreme values	Rust	High
G6	`mark_price` produces unbounded unrealized_pnl — no result guard	Rust	High
G7	ENTER no is_finite() guard on target_size	Rust	High
G8	`reconcile_slots_json` no dedup or bounds validation	Rust	High
G9	`exchange_order_id` update targets wrong order — exit cancel broken	Rust	High
G10	CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER	Rust	High
G11	`apply_fill` overwrites intended_size with slot.size	Rust	Medium
G12	No max leverage cap enforced by kernel	Rust	Medium
G13	`resolve_slot` fallback returns unwrap_or(0) — misroutes events	Rust	Medium
G14	`commit_slot` silently ignores out-of-bounds slot_id	Rust	Medium
G15	Zero `__post_init__` validators on all 16 config dataclasses (127 fields)	Config	High
G16	DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset	Config	Info
G17	String config fields — Zinc region injection risk	Config	Medium
G18	`exit_leg_ratios` no sum-to-1 validation	Config	Low
G19	RealZincControlPlane.read() no sequence check — torn-read risk	Config	Low
G20	ClickHouse journal strategy/db env vars — SQL injection risk	Config	Low
G21	entry_price used as exit_price in trade_events — data loss	Persistence	High
G22	active_leg_index → entry_bar semantic mis-mapping	Persistence	Medium
G23	capital_before arithmetic absorbs cross-slot PnL	Persistence	Medium
G24	Recovery trade_reconstruction always has trade_id=""	Persistence	Medium
G25	seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables	Persistence	Low
G26	_safe_float silently converts NaN/None/Inf to 0.0	Persistence	Low
G27	build_launcher_bundle no exception safety — prior resources leak	Lifecycle	High
G28	RealZincPlane/RealZincControlPlane no del — SHM orphaned	Lifecycle	High
G29	Zero signal handlers — no cleanup on SIGTERM/SIGINT	Lifecycle	High
G30	ExecutionKernel has no close() — relies on del for Rust handle	Lifecycle	High
G31	Hazelcast projection never closed	Lifecycle	Medium
G32	_maybe_close() break skips second method	Lifecycle	Low
G33	close() not idempotent for RealZinc components	Lifecycle	Low
G34	No context manager on DITAv2LauncherBundle	Lifecycle	Low
G35	BingxVenueAdapter.connect() never called	Lifecycle	Info
G36	Only one try/finally in entire codebase	Lifecycle	High

I-Series: Math, Tests, Concurrency, Recovery, Security (Pass 6)

Full detail in TRACE doc under "PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY."

#	Flaw	Layer	Severity
I1	Entry `apply_fill` multiple partial fills overwrite size instead of accumulating	Rust	Critical
I2	Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED	Rust	Medium
I3	entry_price inconsistency — Python falsy vs Rust `<= 0.0` gate	Bridge	Info
I4	Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer	Rust	High
I5	MockVenueScenario rejection flags exist but zero tests use them	Test	High
I6	No LIMIT order test through full kernel path	Test	High
I7	Three weak/vacuous assertions in test_flaws.py	Test	Low
I8	Entry overfill no guard	Rust	Low
I9	No crash durability — slot state pure in-memory until step 7 of process_intent	Bridge	Critical
I10	seen_event_ids lost on restart — events double-processed	Rust	Critical
I11	No idempotency key sent to BingX — lost response creates duplicate orders	Venue	High
I12	No graceful degradation for ANY subsystem	All	High
I13	Stray venue event can reactivate CLOSED slot — no guard	Rust	High
I14	No reconcile_from_slots call on startup — Zinc state never loaded into kernel	Restart	High
I15	CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING	Rust	Medium
I16	Zinc shared memory world-readable/writable by same-machine processes	Zinc	High
I17	KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards	Bridge	High
I18	sys.path.insert(0) at import time in 3 production files — malicious module loading	Build	High
I19	pump_venue_events stale snapshot diff produces phantom position events	Venue	High
I20	exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented)	Contracts	Info
I21	RATE_LIMITED code path in both Python and Rust is completely untested	All	Medium
I22	Thread pool max_workers=3 shared across all adapter instances — never shut down	Venue	Medium

J-Series: Test Infra, Data Feed, Rust Deeper, Env Parsing, Connections (Pass 7)

Full detail in TRACE doc under "PASS 7 — TEST INFRA, DATA FEED, RUST DEEPER, ENV PARSING, CONNECTIONS."

#	Flaw	Layer	Severity
J1	`_flatten` submits wrong direction for LONG positions	Test	Medium
J2	`_check_slot_accounting` double-counts unrealized PnL	Test	Medium
J3	`_build_live_snapshot` timestamp is float vs datetime — type crash risk	Data Feed	High
J4	`ExecutionKernel.mark_price()` never called — no mark-to-market	Bridge	High
J5	All VenueEvent timestamps use local clock, not exchange timestamp	Venue	Medium
J6	No monotonic timestamp verification anywhere	All	Low
J7	`rebuild_indexes()` overwrites duplicate trade_id — last wins	Rust	High
J8	`resolve_slot()` falls back to slot 0 — stray event corrupts slot 0	Rust	High
J9	`get_slot_json`/`snapshot_json` return null with no diagnostic	Rust	Medium
J10	Two processes with same DITA_V2_PREFIX corrupt shared Zinc memory	Zinc	High
J11	`load_dotenv()` only runs on launcher.py import — ordering dependency	Config	Medium
J12	BINGX_API_KEY passed None with no validation — fails at HTTP time	Config	Medium
J13	API credentials never masked in error messages or tracebacks	Config	High
J14	`_env_bool` inconsistent: empty string = False vs unset = default	Config	Low
J15	gen2.py and _gen_test.py write to same output — last writer wins	Test	Medium
J16	Shim test bridge lacks step(), decision_engine — zero runtime fidelity	Test	High

K-Series: Observability, Memory, Time, Dead Code, Module Init (Pass 8)

Full detail in TRACE doc under "PASS 8 — OBSERVABILITY, MEMORY, TIME, DEAD CODE, MODULE INIT."

#	Flaw	Layer	Severity
K1	Zero stdout/stderr — system completely silent	All	Critical
K2	No health check, metrics, or monitoring surface	All	Critical
K3	Failed trades produce no notification — error in return value only	Bridge	High
K4	Exception tracebacks not captured — all except:pass swallow silently	All	High
K5	~85+ Python objects per process_intent — 36 TradeSlot copies via FFI	Bridge	Medium
K6	Circular ref cycle Kernel→StateView→SlotView→Kernel — delays del	Bridge	High
K7	MemoryKernelJournal silently drops transitions after 10K rows	Journal	High
K8	RealZincPlane._intent_cache unbounded Python list growth	Zinc	High
K9	_backend_snapshot timeout uses wall clock — NTP truncates/extends	Venue	High
K10	RealZincControlPlane.wait() uses wall clock — no monotonic	Control	Medium
K11	exchange_ts fallback to local time.time() when E missing	Stream	Medium
K12	No monotonic timestamp verification anywhere	All	Medium
K13	ControlPlane.wait()/notify() — zero callers across all impls	Control	Info
K14	AccountProjection.to_account_event() — zero callers	Account	Info
K15	HazelcastProjector entire class dead	Projection	Info
K16	_order_to_payload() dead code	Bridge	Info
K17	MirroredControlPlane entire class dead — never constructed	Control	Info
K18	12 of 20 TradeStage variants never matched in Rust FSM	Rust	Low
K19	Unused imports in projection.py and hazelcast_projection.py	Projection	Info
K20	sys.path mutation on import — global side effect	Config	Medium
K21	load_dotenv() at module import time — mutates os.environ globally	Config	Medium
K22	ControlPlane protocol not exported in all	Config	Info
K23	KernelSlotView.getattr makes FFI call per attribute access	Bridge	Medium

L-Series: Contracts, Exchange Events, Network, FFI, Backup Diffs (Pass 9)

Full detail in TRACE doc under "PASS 9 — CONTRACTS, EXCHANGE EVENTS, NETWORK, FFI, BACKUP DIFFS."

#	Flaw	Layer	Severity
L1	`KernelOutcome(accepted=True, diag=INVALID_INTENT)` parseable — no invariant check	Bridge	Medium
L2	`VenueEvent.filled_size > size` possible via different source fields	Venue	Medium
L3	`VenueEvent.price=0` reaches kernel — zero-price fill = 100% loss PnL	Venue	High
L4	`available_margin` set to cross-wallet balance, not available margin	Stream	High
L5	`wallet_balance` defaults to 0 when `"wb"` absent — E-side reconcile always ERROR	Stream	High
L6	`_keepalive_loop` no stop mechanism — runs on old key after rotation	Stream	Medium
L7	`event_id` integer 0 → `str(0)` falsy on `or` → random UUID generated	Stream	Medium
L8	Hardcoded VST URLs in test generators — wrong env if LIVE configured	Test	Medium
L9	No proxy support — can't deploy behind corporate proxy	Network	Low
L10	5-minute DNS cache TTL — stale IPs on infrastructure change	Network	Low
L11	`limit_price` getattr reads dataclass field, not metadata dict	Venue	Low
L12	Backup diff: 14+ critical bugs fixed, 428-line dual-ledger accounting added	Rust	Info
L13	`_build_full_runtime` dead — real pipeline never tested	Test	High
L14	`listenKeyExpired` raises RuntimeError instead of clean yield	Stream	Medium
L15	`_delete_listen_key` suppresses all exceptions — leaked server keys	Stream	Low
L16	`venue_order_id` target selection ambiguous when entry order exists	Rust	Medium

M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10)

Full detail in TRACE doc under "PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT."

#	Flaw	Layer	Severity
M1	ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries	Rust	Critical
M2	CANCEL creates no transition record — invisible in audit log	Rust	Critical
M3	`_mk_intent` drops order_type/limit_price into metadata, not proper field	Test	High
M4	test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test	Test	High
M5	Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested	Test	Medium
M6	test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient	Test	Medium
M7	test_outcome_state_matches_actual_slot is tautological	Test	Low
M8	ORDER_ACK silent fallthrough when no active order — accepted with no effect	Rust	Medium
M9	ORDER_REJECT on POSITION_OPEN with stale entry order destroys position	Rust	Critical
M10	No aggregation of trade count, success/fail, latency — all zero	All	High
M11	Flaw 6 tests pass via metadata passthrough, not field logic	Test	High
M12	No retry/fallback for ClickHouse INSERT failures — crashes policy cycle	Persistence	High
M13	AccountSnapshot.trade_seq never incremented — always 0	Account	Medium
M14	test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd	Test	Low
M15	test_reconcile_rejects_position_open_with_zero_size passes for wrong reason	Test	Low
M16	No built-in metric for active slots, event throughput, or memory	All	Medium
M17	Flaw 9 tests named for cancel but never call cancel	Test	High
M18	_decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime	Runtime	High

N-Series: Async/Sync Seams, Lock Analysis, Threading (Pass 11)

Full detail in TRACE doc under "PASS 11 — ASYNC/SYNC SEAMS, LOCK ANALYSIS, THREADING."

#	Flaw	Layer	Severity
N1	Rust kernel `with_handle_mut` zero sync — `&mut` from raw ptr, UB on concurrent FFI	Rust	Critical
N2	`_run()` has two completely different code paths — runtime branch, not design	Venue	Critical
N3	`_run()` path B blocks event loop thread for every venue HTTP operation	Venue	Critical
N4	`asyncio.run()` called repeatedly — creates/destroys event loops per call	Venue	Critical
N5	`_snapshot_ready` cascading re-fetch — N callers produce N overlapping HTTP	Venue	High
N6	`BingxUserStream.close()` doesn't cancel pending tasks	Stream	Medium
N7	Live test architecture forces worst-case `_run()` path for every operation	Test	Medium
N8	`subscribe()` reconnect creates new tasks per iteration	Stream	Medium
N9	No `asyncio.all_tasks()` or task accounting — leaked tasks undetectable	All	Low
N10	`_snap_lock` no reader-side protection (informational)	Venue	Info

O-Series: Sync/Async Wider Scope (Launcher, Generators, Streams, FFI, Tests) (Pass 12)

Full detail in TRACE doc under "PASS 12 — SYNC/ASYNC WIDER SCOPE."

#	Flaw	Layer	Severity
O1	`_maybe_close()` asyncio.run without loop guard — close skipped from async context	Launcher	High
O2	`async def connect()` shims call sync venue.connect() without await — blocking	Test	Medium
O3	`_contract_rows(client)` NOT awaited — `_pick_live_symbol` iterates coroutine = crash	Test	High
O4	Deprecated `get_event_loop().run_until_complete()` in test file	Test	Medium
O5	`_run()` thread pool .result() no timeout — backend hang freezes process	Venue	High
O6	MockVenueAdapter never exercises thread-pool bridge — untested in CI	Venue	Medium
O7	`_keepalive_loop`/`_rotation_sentinel` fire-and-forget — exceptions silently lost	Stream	Low
O8	`KernelSlotView.__getattr__` N FFI calls for N fields — no caching	Bridge	Medium
O9	`DITAv2LauncherBundle` no `__del__` — GC'd bundle leaks resource tree	Launcher	Medium
O10	`ExecutionKernel` no `close()` — Rust handle only freed by unpredictable del	Bridge	Medium
O11	`KernelSlotView.__setattr__` triggers 5 persistence side effects — no read-only view	Bridge	Medium

H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)

Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."

#	Flaw	Layer	Severity
H1	No Python dependency files (requirements.txt, pyproject.toml, etc.)	Build	Critical
H2	Rust kernel compiled from source on every cold start — no prebuilt binary	Build	Critical
H3	Zero logging — 16+ silent except:pass sites, no error observability	All	Critical
H4	`_row_float` rejects zero as valid, `except Exception: continue` swallows all	Venue	High
H5	`_backend_snapshot` timeout returns stale data/None — callers crash	Venue	High
H6	All enum-from-raw-string sites crash on unknown variant (17 sites)	Bridge	High
H7	`_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET	Venue	High
H8	Unknown venue status silently mapped to ACKED	Venue	High
H9	`RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost	Zinc	High
H10	`RealZincControlPlane.read()` no atomicity with concurrent `update()`	Control	High
H11	`_RustKernelLib` lazy init with race condition — concurrent cargo build	Bridge	High
H12	`ExecutionKernel.__del__` use-after-free on Rust handle	Bridge	High
H13	`MirroredControlPlane` missing protocol methods (wait/notify)	Control	Medium
H14	`TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics	Contracts	Medium
H15	`_maybe_close` asyncio.run RuntimeError silently swallowed	Launcher	Medium
H16	Lazy import of bingx_direct masks config errors until first trade	Build	Info
H17	`load_dotenv()` at module level — import-time I/O side effect	Launcher	Medium
H18	`_run()` blocks event loop on every HTTP call via thread pool	Venue	Medium
H19	`HazelcastClientLike` protocol has zero concrete implementations	Projection	Low
H20	`_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM	Zinc	Low
H21	`wasm-bindgen` compiled into native library unnecessarily	Build	Low
H22	`socket.getaddrinfo` monkey-patch in test code	Test	Low

A-Series: Architectural Flaws (detailed writeups)

These are the original architectural flaws with full analysis.

Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size

Location: _rust_kernel/src/lib.rs lines ~770-780 (EXIT handler in process_intent)

Severity: High

Nature: Logic error — wrong base for exit-size computation.

Downstream effect

The EXIT handler computes the exit size as base_size * exit_ratio where:

let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };

After partial fills (e.g., two separate MARKET exit legs), initial_size is still the original entry size while slot.size has been reduced by previous legs. If the cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the computed exit size can exceed the remaining position.

The venue adapter clamps to actual position via reduceOnly, but the kernel's own accounting reduces slot.size by the fill size, not by the intended exit size. The slot can therefore go negative (slot.size < 0) if the fill is larger than remaining.

Exact trigger

Enter SHORT, size=1.0, initial_size=1.0, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
EXIT leg 0: exit_size = 1.0 * 0.6 = 0.6. Fill consumes 0.6. Slot size goes to 0.4.
EXIT leg 1: exit_size = 1.0 * 0.6 = 0.6. But remaining is 0.4. Requests 0.6.
BingX reduceOnly clamps fill to 0.4. Slot size goes to 0.0.
EXIT leg 2 (ratio 1.0): exit_size = 1.0 * 1.0 = 1.0. Slot is already at 0.0. Kernel returns NO_OPEN_POSITION — the final EXIT is rejected because slot.closed was not set by the previous fill (it was a partial close, not terminal).
Slot is at size=0.0, !slot.closed, no active orders, but !slot.is_free() because size <= 0.0 is true but fsm_state != IDLE/CLOSED — slot is stuck in POSITION_OPEN with zero size.

This is not purely a mis-sized ratio problem. With MARKET orders that fill fully, even correct ratios can leave the slot stuck if the fill price differs from the intended-size price and the venue adjusts fill quantity.

Fix strategy

Use slot.size directly as the base (not initial_size):

let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);

This guarantees the exit never requests more than the remaining position, regardless of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.

Flaw A2: Misleading CANCEL diagnostic code on entry-only slots

Location: _rust_kernel/src/lib.rs lines ~798-810 (CANCEL rejection path)

Severity: Low

Nature: Diagnostic pollution — wrong error code.

Downstream effect

When a CANCEL intent arrives and neither active_exit_order nor active_entry_order is cancellable, the kernel returns:

diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER

But the reason may be that there's no active entry order either, or the FSM state doesn't permit cancellation. The diagnostic name suggests an exit-order-specific problem when the failure is generic "nothing to cancel."

Fix

Change to a generic NO_ACTIVE_ORDER diagnostic or SLOT_IDLE when the slot is already in IDLE. NO_ACTIVE_EXIT_ORDER is misleading for a slot that has never had any order.

Flaw A3: Float-accumulated slot.size after partial fills can go negative

Location: _rust_kernel/src/lib.rs lines ~1365-1370 (apply_fill exit path)

Severity: Low

Nature: Numerical precision edge case.

Code path

slot.size = (slot.size - fill_size).max(0.0);

This clamps to zero, which is correct. But if the venue fills more than requested (on BingX, this can happen with market orders where the fill walks the book), the slot sees fill_size > intended_size. The max(0.0) prevents negative, but the slot then reports size=0.0 with !closed and an FSM state that's not IDLE.

The is_free() check requires size <= 0.0 AND fsm_state in {IDLE, CLOSED}. A slot with size=0.0 and fsm_state=POSITION_OPEN is stuck — no EXIT will be accepted and no ENTER can start.

Trigger

Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the book, overshoots). fill_size=0.8, slot.size = (0.6 - 0.8).max(0.0) = 0.0. Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), closed=false.

Fix

When slot.size <= 1e-12 after a fill and the slot is in an exit-related state, force transition to CLOSED/IDLE regardless of leg index:

if slot.size <= 1e-12 {
    slot.closed = true;
    slot.fsm_state = TradeStage::CLOSED;
    slot.active_exit_order = None;
    slot.active_entry_order = None;
    return;
}

Flaw A4: Entry price is clobbered by mark_price if called before fill arrives

Location: _rust_kernel/src/lib.rs lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)

Severity: Medium

Nature: Accounting accuracy — incorrect PnL base.

Code path

// In mark_price:
if self.entry_price <= 0.0 {
    self.entry_price = price;  // Seeds entry_price from mark before fill
}

// In apply_fill (entry):
if event.price > 0.0 {
    slot.entry_price = event.price;  // Overwrites with actual fill price
}

The mark_price path seeds entry_price from a market price when the slot has no fill yet. The apply_fill entry path correctly overwrites with the actual fill price. So in the normal flow this is harmless — the fill overwrites the mark.

However, consider this sequence:

ENTER intent accepted → slot goes ORDER_REQUESTED, entry_price = 0.0
runtime.step() calls kernel.mark_price(snapshot.symbol, snapshot.price) → sets entry_price = 100.0
on_venue_event(ORDER_ACK) → ENTRY_WORKING, entry_price still 100.0
on_venue_event(PARTIAL_FILL) → apply_fill sets entry_price = 99.5 (fill price)
Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5

This is a transient mis-valuation window. It corrects itself on the next observe_slots call, but intra-step readers see wrong unrealized PnL. Not critical because:

account.snapshot.unrealized_pnl uses the slot's unrealized_pnl, not the mark
Realized PnL is computed from actual fill prices
The window lasts at most one scan cycle (~5s)

Fix

Don't set entry_price from mark_price when there's no fill:

fn mark_price(&mut self, price: f64) {
    if !price.is_finite() || price <= 0.0 { return; }
    // Don't seed entry_price — leave it at 0.0 until a fill arrives
    if self.entry_price <= 0.0 || self.size <= 0.0 {
        self.unrealized_pnl = 0.0;
        return;
    }
    // ... normal PnL computation
}

Flaw A5: Capital-before computation is arithmetic not snapshot-based

Location: pink_clickhouse.py lines ~761-762 (_write_trade_exit_leg) and ~822-823 (_write_trade_event)

Severity: High

Nature: Accounting accuracy — wrong capital_before under multi-slot or intervening events.

Code pattern (appears in two places)

capital_after = self._capital()
capital_before = capital_after - pnl_leg   # In _write_trade_exit_leg
capital_before = capital_after - pnl        # In _write_trade_event

This reconstructs capital_before by subtracting the current leg's PnL from the current capital. This is only correct if:

No other slots settled PnL between this leg and the previous one
No capital corrections (reconcile, manual override) happened between legs
No fees were deducted between legs

With multi-slot (PINK configurable max_slots > 1), a concurrent trade on slot 1 that closes between slot 0's exit legs will have its PnL baked into capital_after, making capital_before = capital_after - pnl_leg wrong.

Fix

Maintain a per-trade capital_before_leg snapshot taken at the moment of the first fill event for each trade, advancing it by the realized PnL of each leg:

self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
self._leg_state[trade_id]["capital_after"] = capital_after

And use prev["capital_before"] for the row, not capital_after - pnl_leg.

Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange

Location: pink_direct.py lines ~597-630 (recover_account) and docstring of reconcile_account

Severity: Medium

Nature: Operational drift — capital is never verified against exchange truth in hot loop.

The gap

reconcile_account() (line 632) has this docstring:

Periodic exchange-led account sync.
Capital is re-seeded from the exchange balance as a guard against long-running drift

But the actual implementation:

async def reconcile_account(self, ...) -> dict[str, Any]:
    return await self.recover_account(...)

async def recover_account(self, ...) -> dict[str, Any]:
    capital = float(self.kernel.account.snapshot.capital or 25000.0)
    _reconcile_position_slot(self.kernel, capital, slot_id=0)

It passes the kernel's own capital to _reconcile_position_slot, which then overwrites kernel.account.snapshot.capital with... the same value. No exchange balance poll ever overwrites capital.

connect() at line 224 does the same — it passes initial_capital (an env default), not the exchange balance. The exchange balance is never read for capital seeding in the current code path. _reconcile_position_slot does call venue.open_positions(), but it only reads positions, not capital.

Effect

Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation, or any other systematic error) accumulates monotonically. There is no mechanism to detect or correct drift. Over weeks of live trading, the kernel's capital snapshot can diverge arbitrarily from the exchange's actual balance.

Fix

Either:

Make _reconcile_position_slot read the exchange balance and use it for capital reseeding (the docstring claims it does this already), or
Add a separate capital-verification path that surfaces the delta between kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.

Flaw A7: No fee tracking in kernel accounting

Location: rust_backend.py lines ~540-545 (on_venue_event settle), bingx_direct.py submit_intent return

Severity: Medium

Nature: Accounting accuracy — fees are invisible to capital tracking.

Downstream effect

When a trade closes, the kernel computes:

realized_pnl = delta * notional

This is gross PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%). These fees are never subtracted from the kernel's realized PnL. Over 100 trades with $100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000 trades at 10x leverage and $50k average notional, the error is $200k.

The BingxDirectExecutionAdapter does return ExecutionReceipt with fill data, but bingx_venue._events_from_submit() only reads price and filled_size — commission/fee fields are ignored.

Fix

Read fee/commission from the BingX ack payload in _events_from_submit()
Pass fees through VenueEvent.metadata["fee"]

In the Rust kernel's apply_fill, subtract the fee from realized PnL:

let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
slot.realized_pnl += realized - fee;

Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input

Location: _rust_kernel/src/lib.rs lines ~745-748

Severity: Low

Nature: Silent fallback — corrupt input produces a trade, not a rejection.

slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
    intent.leverage
} else {
    1.0
};

A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of rejecting the intent. The Python bridge does validate _first_invalid_intent_field() which catches NaN/inf, but it doesn't catch leverage <= 0.0 (it only checks not math.isfinite(value)).

Fix

Add leverage <= 0.0 to the Python bridge's invalid-intent check. The Rust kernel should still have the 1.0 fallback as a defensive measure, but the bridge should prevent bad leverages from reaching Rust in the first place.

Flaw A9: Mock venue submit condition convoluted — dead code paths

Location: mock_venue.py lines ~60-90

Severity: Informational

Nature: Code clarity — confusing condition logic.

if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
    events.append(ack_event)
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
    # ... fill events

The condition logic is confusing:

When emit_ack_before_fill=True and emit_fill_on_submit=True: both branches run → ACK + fill
When emit_ack_before_fill=False and emit_fill_on_submit=True: first branch runs because not True = False, so False or False = False → no ACK. Second branch runs → fill only. This produces a fill without an ACK, which is not a realistic venue scenario.
When partial_fill_ratio=1.0 (default): second branch runs and emits a FULL_FILL event even when emit_fill_on_submit=False, because 0.0 or 1.0 > 0 = True.

The partial fill ratio check should be gated on emit_fill_on_submit:

should_emit_fill = self.scenario.emit_fill_on_submit or (
    is_entry and self.scenario.entry_partial_fill_ratio > 0
) or (
    not is_entry and self.scenario.exit_partial_fill_ratio > 0
)

Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows

Location: pink_direct.py lines ~318-374 (pump_venue_events), called at line ~436

Severity: Medium

Nature: Operational overhead — unnecessary exchange HTTP calls.

The problem

step() calls pump_venue_events() every cycle, which calls venue.reconcile(). For BingxVenueAdapter, reconcile() calls _backend_snapshot() which does up to 5 HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow where orders fill synchronously within process_intent(), there are no late fills to drain.

On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each pump_venue_events() call consumes 5+ of that budget. At a 5-second policy cycle, this is 60 requests/minute — 60% of the rate budget — just to poll for fills that don't exist.

Fix

Gate the pump on whether the previous cycle submitted a LIMIT order:

self._has_resting_order = any(
    o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
    for o in kernel.open_orders()
)
if self._has_resting_order:
    await self.pump_venue_events(snapshot, market_state=market_state)

Or add a config flag async_fill_mode: bool = False.

Flaw A11: VenueAdapter.submit() blocks the event loop

Location: bingx_venue.py lines ~225-233 (_run)

Severity: Medium

Nature: Runtime safety — synchronous call in async context.

def _run(self, result: Any) -> Any:
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()

When called from step() (which is an async function), _run submits the async submit_intent() to a thread pool, runs it with asyncio.run(), then calls .result() which blocks the current thread until complete. The BingX HTTP call can take 1-5 seconds depending on network latency and exchange load.

During this block, the event loop cannot process other async tasks (data feed updates, health checks, signal processing). In a single-runtime deployment, this stalls the entire policy cycle.

Fix

Make process_intent in ExecutionKernel accept an async venue callback, or make BingxVenueAdapter truly async (not sync-with-thread-bridge). For now, at minimum the PINK runtime should run step() in an executor to avoid blocking the main event loop.

Flaw A12: Stale KernelStateView slot references after reconcile

Location: rust_backend.py lines ~350-365 (KernelStateView.refresh)

Severity: Low

Nature: Stale data — view not rebuilt on reconcile.

class KernelStateView:
    def __init__(self, kernel):
        self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
        # ...

    def refresh(self) -> None:
        snapshot = self._kernel._snapshot_backend()
        self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
        self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
        self.client_order_index = dict(snapshot.get("client_order_index", {}))

refresh() updates the index maps but does not recreate self.slots. The slot views in self.slots are live proxies (they read through _get_slot each time), so slot data is current. But if max_slots changes (it shouldn't, but it's mutable) or if slots are re-indexed by a reconcile, the view list is wrong.

Not critical because max_slots is set at init and never changes, but worth fixing for robustness.

Flaw A13: `persist_fill_events` uses current price as exit price

Location: pink_clickhouse.py lines ~408

Severity: Low

Nature: Historical accuracy — logged price may not match fill price.

price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
              if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)

This correctly reads from the event's price. But decision.reference_price at line 417 falls back to this price, which is the fill price. The trade_event row at line 835 uses exit_price = slot_dict.get("entry_price", ...) — which is the entry price, not the exit price. The trade_event always shows exit_price == entry_price.

This means trade_events in ClickHouse will never show a realistic exit price for the persisted trade, breaking any PnL reconstruction that relies on (exit_price - entry_price) * size * leverage.

Flaw A14: `_write_position_state` maps active_leg_index to entry_bar

Location: pink_clickhouse.py line ~673

Severity: Low

Nature: Semantic mismatch — wrong field mapping.

"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),

active_leg_index is the index into the exit-leg-ratios array (which leg is being exited next). It has nothing to do with how many bars the position has been held. When a position opens, active_leg_index is 0. After the first exit leg, it advances to 1. Neither value is a bar count.

entry_bar should be bars_held from the intent/decision, or a computed value from entry_time to now.

Flaw A15: `persist_recovery_state` passes account dict as slot dict

Location: pink_clickhouse.py lines ~447-460

Severity: Low

Nature: Wrong data — account snapshot used where slot data is expected.

def persist_recovery_state(self, *, snapshot, acc_dict, ...):
    slot_dict = acc_dict or {}   # ← acc_dict is an account snapshot, not a slot
    self._write_position_state(..., slot_dict={}, ...)  # ← correctly uses empty dict
    self._write_trade_reconstruction(
        snapshot,
        trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
        # acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
    )

The trade_id in the trade_reconstruction row will always be "" because acc_dict comes from kernel.snapshot()["account"] which has keys capital, equity, realized_pnl, etc. — not trade_id. This means the recovery trade_reconstruction row has no trade_id linkage.

45 KiB Raw Blame History

PINK DITAv2 — Structural Flaw Analysis (CENTRAL)

Combined Catalog (All Flaws, All Passes)

T-Series: Threading & Atomicity Flaws

E-Series: E2E Data-Flow Flaws (Pass 1)

F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)

G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)

I-Series: Math, Tests, Concurrency, Recovery, Security (Pass 6)

J-Series: Test Infra, Data Feed, Rust Deeper, Env Parsing, Connections (Pass 7)

K-Series: Observability, Memory, Time, Dead Code, Module Init (Pass 8)

L-Series: Contracts, Exchange Events, Network, FFI, Backup Diffs (Pass 9)

M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10)

N-Series: Async/Sync Seams, Lock Analysis, Threading (Pass 11)

O-Series: Sync/Async Wider Scope (Launcher, Generators, Streams, FFI, Tests) (Pass 12)

H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)

A-Series: Architectural Flaws (detailed writeups)

Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size

Downstream effect

Exact trigger

Fix strategy

Flaw A2: Misleading CANCEL diagnostic code on entry-only slots

Downstream effect

Fix

Flaw A3: Float-accumulated slot.size after partial fills can go negative

Code path

Trigger

Fix

Flaw A4: Entry price is clobbered by mark_price if called before fill arrives

Code path

Fix

Flaw A5: Capital-before computation is arithmetic not snapshot-based

Code pattern (appears in two places)

Fix

Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange

The gap

Effect

Fix

Flaw A7: No fee tracking in kernel accounting

Downstream effect

Fix

Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input

Fix

Flaw A9: Mock venue submit condition convoluted — dead code paths

Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows

The problem

Fix

Flaw A11: VenueAdapter.submit() blocks the event loop

Fix

Flaw A12: Stale KernelStateView slot references after reconcile

Flaw A13: persist_fill_events uses current price as exit price

Flaw A14: _write_position_state maps active_leg_index to entry_bar

Flaw A15: persist_recovery_state passes account dict as slot dict

45 KiB

Raw Blame History

Flaw A13: `persist_fill_events` uses current price as exit price

Flaw A14: `_write_position_state` maps active_leg_index to entry_bar

Flaw A15: `persist_recovery_state` passes account dict as slot dict