Files
siloqy/PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
Codex 94078ee8fe PINK: E2E trace analysis — Pass 18 rust test gaps/accounting/FFI types (U1-U14)
Eighteenth pass: R2 compares cumulative vs last-fill realized PnL broken after
2nd fill (U3 Critical), R4 compares open_notional vs used_margin fundamentally
different quantities (U4 Critical), on_venue_event/apply_fill no NaN guards
price/size propagates NaN (U6 Critical), order_type/limit_price sent to Rust
no fields silently dropped (U1 High), VenueEventStatus expects
"CANCEL_REJECTED" typo fails deserialization (U2 High), R3 skipped when
len(e.positions)==0 silent false negative (U5 High), zero Rust tests for
ORDER_REJECT/PARTIAL_FILL/TERMINAL_STATE guard (U7 High), safe_float returns
NaN/Inf contradicts _safe (U8 Medium), _scan_slots uses metadata leverage not
slot.leverage (U9 Medium). 333 total flaws across 18 passes.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-02 14:47:36 +02:00

55 KiB

PINK DITAv2 — Structural Flaw Analysis (CENTRAL)

Analysis date: 2026-05-31 Scope: Full PINK pipeline — all flaws across all modules. Sources:

  • This file (A-series): Detailed writeups for architectural flaws.
  • PINK_DITAv2_E2E_TRACE_ANALYSIS.md (E, F, G-series): Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans. Every E, F, G entry below is a summary only — full detail is in the TRACE doc.

Combined Catalog (All Flaws, All Passes)

Pass Focus Count Critical High Medium Low Info
A Architectural (detailed in this file) 15 0 2 0 2 11
T Threading/Atomicity 9 1 3 3 2 0
E E2E Trace (Pass 1) 26 0 4 10 11 1
F Deep E2E (Pass 3) 30 0 1 8 17 4
G Domain Scans (Pass 4) 36 4 11 11 8 2
H Edge Domains (Pass 5) 22 3 9 5 4 1
I Pass 6 (Math/Tests/Recovery/Security) 22 3 11 4 2 2
J Pass 7 (Test Infra/Data/Rust/Env/Conn) 16 0 7 7 2 0
K Pass 8 (Observability/Memory/Time/DeadCode) 23 2 7 7 1 6
L Pass 9 (Contracts/Events/Network/FFI/Diffs) 16 0 4 8 4 0
M Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) 18 3 7 5 3 0
N Pass 11 (Async/Sync Seams/Locks/Threading) 10 4 1 3 1 1
O Pass 12 (Sync/Async Wider Scope) 11 0 3 7 1 0
P Pass 13 (FFI Safety/Dangling Pointers/Coverage) 9 1 3 3 1 1
Q Pass 14 (Serde Edges/Backup Diffs/Market Data) 12 0 4 3 2 3
R Pass 15 (Resource Leaks/Trust Boundaries/Security) 14 2 6 3 2 1
S Pass 16 (Error Handling/Arithmetic/Test Infra) 16 4 7 5 0 0
T Pass 17 (Unsafe Review/Dead Code/Build/Protocols) 14 0 5 5 4 0
U Pass 18 (Rust Test Gaps/Accounting/FFI Types) 14 3 4 4 3 0
Total 333 30 99 96 64 37

T-Series: Threading & Atomicity Flaws

Full detail in TRACE doc under "Threading & Atomicity" section.

# Flaw Layer Severity
T1 InMemoryZincPlane thread-Condition deadlock from slot update re-entrancy Zinc Critical
T2 Thread-unsafe kernel snapshot capture for account Bridge High
T3 Re-entrant or incorrectly-scoped Rust-kernel handle usage Bridge High
T4 Consequence: on_venue_event PnL settle races Bridge High
T5 Access to shared _state_seq / _slot_cache in RealZincPlane from multiple kernel calls Zinc Medium
T6 _write_region buffer zero + notify race with concurrent reader Zinc Medium
T7 Publication of events in process_intent loop not synchronized with persist Bridge Medium
T8 asyncio.run executor skip in _run leads to event-loop stall Venue Low
T9 No thread-safe Python↔Rust ownership / lifetime protocol Bridge Low

E-Series: E2E Data-Flow Flaws (Pass 1)

Full detail in TRACE doc under "Layer 1" through "Layer 9."

# Flaw Layer Severity
E1 step() calls pump_venue_events() every cycle unconditionally Runtime High
E2 kernel.snapshot()["account"] returns a fresh dict, not a live view Bridge Low
E3 _decision_to_kernel_intent drops order_type and limit_price Runtime High
E4 _exit_intent_from_slot trusts slot.size but slot may be stale Runtime High
E5 JSON serialization round-trip loses numeric precision Bridge Low
E6 _RustKernelLib is a global singleton — shared across all kernels Bridge Low
E7 ENTER handler silently allows re-entry with same trade_id Rust High
E8 EXIT handler uses initial_size not current size Rust High
E9 CANCEL handler returns diagnostic even when nothing happened Rust Low
E10 apply_fill entry branch double-sets active_entry_order Rust Low
E11 _legacy_intent() is a lossy conversion Venue Low
E12 _events_from_submit() price fallback chain can lose venue price Venue Low
E13 _backend_snapshot() timeout returns stale data Venue Medium
E14 _events_from_cancel uses stale slot_id from order metadata Venue Low
E15 Submit sets leverage via separate HTTP call Adapter Medium
E16 _format_quantity/_format_price may use zero tick/step Adapter Medium
E17 Cancel uses truth-based confirmation — can mask real errors Adapter Medium
E18 on_venue_event settles PnL incrementally — fees never included Bridge Medium
E19 observe_slots called with ALL slots, not just changed ones Bridge Low
E20 _capital() reads live from AccountProjection — stale row risk Persistence Low
E21 persist_fill_events() synthesizes fake Decision/Intent Persistence Medium
E22 _write_trade_exit_leg capital_before uses arithmetic reconstruction Persistence Medium
E23 _write_trade_event uses entry_price as exit_price Persistence Medium
E24 Mock venue always emits fill on partial_fill_ratio > 0 Test Low
E25 Test scenarios use MARKET-only _si() helper — no LIMIT tests Test Low
E26 Fresh-kernel reconcile tests create second kernel but share venue Test Low

F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)

Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."

# Flaw Layer Severity
F1 CANCEL returns "accepted" before cancel happens — stale diagnostic_code Bridge Medium
F2 _last_settled_pnl reset before venue.submit() — transient window Bridge Medium
F3 _first_invalid_intent_field allows leverage=0 and target_size=0 Bridge Low
F4 outcome.emitted_events only from venue — Rust kernel events dropped Bridge Low
F5 on_venue_event redundant FFI read of slot already returned by Rust Bridge Low
F6 process_intent records pre-venue transitions with event=None Bridge Info
F7 reconcile_from_slots writes ALL slots to projection/zinc Bridge Low
F8 HazelcastRowWriter.put() synchronous, no error handling — crashes intent Projection Medium
F9 RealZincPlane.write_slot() serializes ALL slots, not just changed one Zinc Low
F10 RealZincPlane zeros buffer before write — concurrent read sees empty Zinc Low
F11 RealZincPlane._write_region no partial-write recovery Zinc Low
F12 InMemoryZincPlane intent_region grows without bound Zinc Low
F13 InMemoryZincPlane uses non-re-entrant threading.Condition Zinc Low
F14 KernelSlotView.__setattr__ round-trips unknown fields — silently dropped Bridge Low
F15 on_venue_event loop stops on first exception — slot left in partial state Bridge High
F16 venue.submit() returning empty events leaves slot in ORDER_REQUESTED Bridge Medium
F17 Cancel truth-based confirmation returns REJECTED for already-cancelled orders Adapter Medium
F18 Leverage-set and order-submit failures share error handler Adapter Low
F19 _events_from_submit stale snapshot fallback → wrong fill detection Venue Medium
F20 __del__ frees Rust handle at unpredictable GC time — no explicit close() Bridge Medium
F21 DITAv2LauncherBundle.close() closes venue before kernel is done Launcher Low
F22 Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware Launcher Medium
F23 VenueEvent.size = intent.target_size not actual fill Venue Info
F24 asyncio.run() inside async function in test generator Test Low
F25 _build_fresh_kernel_from_slot leaks old kernel objects per call Test Low
F26 seen_event_ids not cleared on re-entry — accumulates across trades Rust Low
F27 RealZincControlPlane.read() parses Zinc region every call — no caching Control Low
F28 _legacy_intent hardcodes confidence=1.0, bars_held=0 Venue Info
F29 _slot_to_payload in real_zinc_plane.py is dead code Zinc Info
F30 Duplicate _slot_from_payload in real_zinc_plane.py and rust_backend.py Zinc Low

G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)

Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."

# Flaw Layer Severity
G1 EXIT_RESIDUAL action missing from Rust KernelCommandType enum Rust Critical
G2 into_c_string unwrap() panics on NUL byte in FFI string Rust Critical
G3 EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition Rust Critical
G4 consume_exit_leg stale all_legs_done variable — wrong branch after last leg Rust Critical
G5 realized_pnl unbounded f64 — overflows to inf at extreme values Rust High
G6 mark_price produces unbounded unrealized_pnl — no result guard Rust High
G7 ENTER no is_finite() guard on target_size Rust High
G8 reconcile_slots_json no dedup or bounds validation Rust High
G9 exchange_order_id update targets wrong order — exit cancel broken Rust High
G10 CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER Rust High
G11 apply_fill overwrites intended_size with slot.size Rust Medium
G12 No max leverage cap enforced by kernel Rust Medium
G13 resolve_slot fallback returns unwrap_or(0) — misroutes events Rust Medium
G14 commit_slot silently ignores out-of-bounds slot_id Rust Medium
G15 Zero __post_init__ validators on all 16 config dataclasses (127 fields) Config High
G16 DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset Config Info
G17 String config fields — Zinc region injection risk Config Medium
G18 exit_leg_ratios no sum-to-1 validation Config Low
G19 RealZincControlPlane.read() no sequence check — torn-read risk Config Low
G20 ClickHouse journal strategy/db env vars — SQL injection risk Config Low
G21 entry_price used as exit_price in trade_events — data loss Persistence High
G22 active_leg_index → entry_bar semantic mis-mapping Persistence Medium
G23 capital_before arithmetic absorbs cross-slot PnL Persistence Medium
G24 Recovery trade_reconstruction always has trade_id="" Persistence Medium
G25 seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables Persistence Low
G26 _safe_float silently converts NaN/None/Inf to 0.0 Persistence Low
G27 build_launcher_bundle no exception safety — prior resources leak Lifecycle High
G28 RealZincPlane/RealZincControlPlane no del — SHM orphaned Lifecycle High
G29 Zero signal handlers — no cleanup on SIGTERM/SIGINT Lifecycle High
G30 ExecutionKernel has no close() — relies on del for Rust handle Lifecycle High
G31 Hazelcast projection never closed Lifecycle Medium
G32 _maybe_close() break skips second method Lifecycle Low
G33 close() not idempotent for RealZinc components Lifecycle Low
G34 No context manager on DITAv2LauncherBundle Lifecycle Low
G35 BingxVenueAdapter.connect() never called Lifecycle Info
G36 Only one try/finally in entire codebase Lifecycle High

I-Series: Math, Tests, Concurrency, Recovery, Security (Pass 6)

Full detail in TRACE doc under "PASS 6 — MATH, TESTS, CONCURRENCY, RECOVERY, SECURITY."

# Flaw Layer Severity
I1 Entry apply_fill multiple partial fills overwrite size instead of accumulating Rust Critical
I2 Zero exit_ratio creates zero-size exit order — slot stuck in EXIT_REQUESTED Rust Medium
I3 entry_price inconsistency — Python falsy vs Rust <= 0.0 gate Bridge Info
I4 Only 1 Rust unit test for 1765-line kernel — 99% untested at Rust layer Rust High
I5 MockVenueScenario rejection flags exist but zero tests use them Test High
I6 No LIMIT order test through full kernel path Test High
I7 Three weak/vacuous assertions in test_flaws.py Test Low
I8 Entry overfill no guard Rust Low
I9 No crash durability — slot state pure in-memory until step 7 of process_intent Bridge Critical
I10 seen_event_ids lost on restart — events double-processed Rust Critical
I11 No idempotency key sent to BingX — lost response creates duplicate orders Venue High
I12 No graceful degradation for ANY subsystem All High
I13 Stray venue event can reactivate CLOSED slot — no guard Rust High
I14 No reconcile_from_slots call on startup — Zinc state never loaded into kernel Restart High
I15 CANCEL_REJECT doesn't clear active_exit_order — slot stuck in EXIT_WORKING Rust Medium
I16 Zinc shared memory world-readable/writable by same-machine processes Zinc High
I17 KernelSlotView unrestricted getattr/setattr — bypasses all FSM guards Bridge High
I18 sys.path.insert(0) at import time in 3 production files — malicious module loading Build High
I19 pump_venue_events stale snapshot diff produces phantom position events Venue High
I20 exit_leg_ratios empty list — next_exit_ratio defaults to 1.0 (undocumented) Contracts Info
I21 RATE_LIMITED code path in both Python and Rust is completely untested All Medium
I22 Thread pool max_workers=3 shared across all adapter instances — never shut down Venue Medium

J-Series: Test Infra, Data Feed, Rust Deeper, Env Parsing, Connections (Pass 7)

Full detail in TRACE doc under "PASS 7 — TEST INFRA, DATA FEED, RUST DEEPER, ENV PARSING, CONNECTIONS."

# Flaw Layer Severity
J1 _flatten submits wrong direction for LONG positions Test Medium
J2 _check_slot_accounting double-counts unrealized PnL Test Medium
J3 _build_live_snapshot timestamp is float vs datetime — type crash risk Data Feed High
J4 ExecutionKernel.mark_price() never called — no mark-to-market Bridge High
J5 All VenueEvent timestamps use local clock, not exchange timestamp Venue Medium
J6 No monotonic timestamp verification anywhere All Low
J7 rebuild_indexes() overwrites duplicate trade_id — last wins Rust High
J8 resolve_slot() falls back to slot 0 — stray event corrupts slot 0 Rust High
J9 get_slot_json/snapshot_json return null with no diagnostic Rust Medium
J10 Two processes with same DITA_V2_PREFIX corrupt shared Zinc memory Zinc High
J11 load_dotenv() only runs on launcher.py import — ordering dependency Config Medium
J12 BINGX_API_KEY passed None with no validation — fails at HTTP time Config Medium
J13 API credentials never masked in error messages or tracebacks Config High
J14 _env_bool inconsistent: empty string = False vs unset = default Config Low
J15 gen2.py and _gen_test.py write to same output — last writer wins Test Medium
J16 Shim test bridge lacks step(), decision_engine — zero runtime fidelity Test High

K-Series: Observability, Memory, Time, Dead Code, Module Init (Pass 8)

Full detail in TRACE doc under "PASS 8 — OBSERVABILITY, MEMORY, TIME, DEAD CODE, MODULE INIT."

# Flaw Layer Severity
K1 Zero stdout/stderr — system completely silent All Critical
K2 No health check, metrics, or monitoring surface All Critical
K3 Failed trades produce no notification — error in return value only Bridge High
K4 Exception tracebacks not captured — all except:pass swallow silently All High
K5 ~85+ Python objects per process_intent — 36 TradeSlot copies via FFI Bridge Medium
K6 Circular ref cycle Kernel→StateView→SlotView→Kernel — delays del Bridge High
K7 MemoryKernelJournal silently drops transitions after 10K rows Journal High
K8 RealZincPlane._intent_cache unbounded Python list growth Zinc High
K9 _backend_snapshot timeout uses wall clock — NTP truncates/extends Venue High
K10 RealZincControlPlane.wait() uses wall clock — no monotonic Control Medium
K11 exchange_ts fallback to local time.time() when E missing Stream Medium
K12 No monotonic timestamp verification anywhere All Medium
K13 ControlPlane.wait()/notify() — zero callers across all impls Control Info
K14 AccountProjection.to_account_event() — zero callers Account Info
K15 HazelcastProjector entire class dead Projection Info
K16 _order_to_payload() dead code Bridge Info
K17 MirroredControlPlane entire class dead — never constructed Control Info
K18 12 of 20 TradeStage variants never matched in Rust FSM Rust Low
K19 Unused imports in projection.py and hazelcast_projection.py Projection Info
K20 sys.path mutation on import — global side effect Config Medium
K21 load_dotenv() at module import time — mutates os.environ globally Config Medium
K22 ControlPlane protocol not exported in all Config Info
K23 KernelSlotView.getattr makes FFI call per attribute access Bridge Medium

L-Series: Contracts, Exchange Events, Network, FFI, Backup Diffs (Pass 9)

Full detail in TRACE doc under "PASS 9 — CONTRACTS, EXCHANGE EVENTS, NETWORK, FFI, BACKUP DIFFS."

# Flaw Layer Severity
L1 KernelOutcome(accepted=True, diag=INVALID_INTENT) parseable — no invariant check Bridge Medium
L2 VenueEvent.filled_size > size possible via different source fields Venue Medium
L3 VenueEvent.price=0 reaches kernel — zero-price fill = 100% loss PnL Venue High
L4 available_margin set to cross-wallet balance, not available margin Stream High
L5 wallet_balance defaults to 0 when "wb" absent — E-side reconcile always ERROR Stream High
L6 _keepalive_loop no stop mechanism — runs on old key after rotation Stream Medium
L7 event_id integer 0 → str(0) falsy on or → random UUID generated Stream Medium
L8 Hardcoded VST URLs in test generators — wrong env if LIVE configured Test Medium
L9 No proxy support — can't deploy behind corporate proxy Network Low
L10 5-minute DNS cache TTL — stale IPs on infrastructure change Network Low
L11 limit_price getattr reads dataclass field, not metadata dict Venue Low
L12 Backup diff: 14+ critical bugs fixed, 428-line dual-ledger accounting added Rust Info
L13 _build_full_runtime dead — real pipeline never tested Test High
L14 listenKeyExpired raises RuntimeError instead of clean yield Stream Medium
L15 _delete_listen_key suppresses all exceptions — leaked server keys Stream Low
L16 venue_order_id target selection ambiguous when entry order exists Rust Medium

M-Series: Runtime, Test Bugs, FSM Audit, Persistence, Measurement (Pass 10)

Full detail in TRACE doc under "PASS 10 — RUNTIME, TEST BUGS, FSM AUDIT, PERSISTENCE, MEASUREMENT."

# Flaw Layer Severity
M1 ENTER transition hardcodes prev_state=IDLE — audit trail lies for re-entries Rust Critical
M2 CANCEL creates no transition record — invisible in audit log Rust Critical
M3 _mk_intent drops order_type/limit_price into metadata, not proper field Test High
M4 test_cancel_entry_with_partial_fill never sends CANCEL — misnamed vacuous test Test High
M5 Flaw 7 tests never send EXIT — exit_partial_fill_ratio untested Test Medium
M6 test_dedup tests use wrong constant (actual=256, claim 64) — 70 events insufficient Test Medium
M7 test_outcome_state_matches_actual_slot is tautological Test Low
M8 ORDER_ACK silent fallthrough when no active order — accepted with no effect Rust Medium
M9 ORDER_REJECT on POSITION_OPEN with stale entry order destroys position Rust Critical
M10 No aggregation of trade count, success/fail, latency — all zero All High
M11 Flaw 6 tests pass via metadata passthrough, not field logic Test High
M12 No retry/fallback for ClickHouse INSERT failures — crashes policy cycle Persistence High
M13 AccountSnapshot.trade_seq never incremented — always 0 Account Medium
M14 test_reentry_after_full_close_no_pnl_loss uses 50% bound — absurd Test Low
M15 test_reconcile_rejects_position_open_with_zero_size passes for wrong reason Test Low
M16 No built-in metric for active slots, event throughput, or memory All Medium
M17 Flaw 9 tests named for cancel but never call cancel Test High
M18 _decision_to_kernel_intent drops order_type and limit_price — LIMIT dead from runtime Runtime High

N-Series: Async/Sync Seams, Lock Analysis, Threading (Pass 11)

Full detail in TRACE doc under "PASS 11 — ASYNC/SYNC SEAMS, LOCK ANALYSIS, THREADING."

# Flaw Layer Severity
N1 Rust kernel with_handle_mut zero sync — &mut from raw ptr, UB on concurrent FFI Rust Critical
N2 _run() has two completely different code paths — runtime branch, not design Venue Critical
N3 _run() path B blocks event loop thread for every venue HTTP operation Venue Critical
N4 asyncio.run() called repeatedly — creates/destroys event loops per call Venue Critical
N5 _snapshot_ready cascading re-fetch — N callers produce N overlapping HTTP Venue High
N6 BingxUserStream.close() doesn't cancel pending tasks Stream Medium
N7 Live test architecture forces worst-case _run() path for every operation Test Medium
N8 subscribe() reconnect creates new tasks per iteration Stream Medium
N9 No asyncio.all_tasks() or task accounting — leaked tasks undetectable All Low
N10 _snap_lock no reader-side protection (informational) Venue Info

O-Series: Sync/Async Wider Scope (Launcher, Generators, Streams, FFI, Tests) (Pass 12)

Full detail in TRACE doc under "PASS 12 — SYNC/ASYNC WIDER SCOPE."

# Flaw Layer Severity
O1 _maybe_close() asyncio.run without loop guard — close skipped from async context Launcher High
O2 async def connect() shims call sync venue.connect() without await — blocking Test Medium
O3 _contract_rows(client) NOT awaited — _pick_live_symbol iterates coroutine = crash Test High
O4 Deprecated get_event_loop().run_until_complete() in test file Test Medium
O5 _run() thread pool .result() no timeout — backend hang freezes process Venue High
O6 MockVenueAdapter never exercises thread-pool bridge — untested in CI Venue Medium
O7 _keepalive_loop/_rotation_sentinel fire-and-forget — exceptions silently lost Stream Low
O8 KernelSlotView.__getattr__ N FFI calls for N fields — no caching Bridge Medium
O9 DITAv2LauncherBundle no __del__ — GC'd bundle leaks resource tree Launcher Medium
O10 ExecutionKernel no close() — Rust handle only freed by unpredictable del Bridge Medium
O11 KernelSlotView.__setattr__ triggers 5 persistence side effects — no read-only view Bridge Medium

P-Series: FFI Safety, Dangling Pointers, Coverage Gaps (Pass 13)

Full detail in TRACE doc under "PASS 13 — FFI BOUNDARY SAFETY, DANGLING POINTERS, COVERAGE GAPS."

# Flaw Layer Severity
P1 dita_kernel_destroy double-free UB — Python doesn't null handle.value Bridge Critical
P2 CStr::from_ptr(payload) without null guard in 3 FFI exports Rust High
P3 _check_open_orders calls asyncio.run() from async _verify — RuntimeError Test High
P4 into_c_string NUL sanitizer produces invalid JSON — json.loads fails Rust Medium
P5 reconcile_slots_json/snapshot_json return null on failure — no diagnostic Rust Medium
P6 _get_rust() TOCTOU race — concurrent cargo build corruption Bridge High
P7 KernelHandle no Send/Sync — FFI bypasses Rust ownership rules Rust Info
P8 No explicit Rust handle destroy path from bundle.close() Launcher Medium
P9 __del__ accesses module _RUST during shutdown — NameError leak Bridge Low

Q-Series: Serde Edges, Backup Diffs, Market Data/Timestamps (Pass 14)

Full detail in TRACE doc under "PASS 14 — SERDE EDGE CASES, BACKUP DIFFS, MARKET DATA/TIMESTAMPS."

# Flaw Layer Severity
Q1 fromisoformat() can't parse Rust Z suffix on Python < 3.11 Bridge High
Q2 No #[serde(deny_unknown_fields)] — misspelled fields silently default Rust Medium
Q3 indexmap new dependency in current code Rust Info
Q4 Backup diff: 6 critical bug fixes between backup and current Rust Info
Q5 MarketSnapshot.timestamp type inconsistent — float vs datetime in same file Data Feed High
Q6 fromisoformat() Z-suffix fail on all 5 timestamp deserialization sites Bridge High
Q7 No upper-bound price validation — 1e300 passes all guards Bridge Medium
Q8 _first_invalid_intent_field does not reject zero/negative price or zero size Bridge Low
Q9 Rust/Python clock sources diverge — transition timestamps mixed source Rust Low
Q10 threading.Event.wait() uses platform-dependent clock — NTP jump risk Venue Medium
Q11 Backup had no STALE_STATE_RECONCILING guard in on_venue_event (info) Rust Info
Q12 All 5 fromisoformat() sites fail on Python < 3.11 (duplicate of Q1) Bridge High

R-Series: Resource Leaks, Trust Boundaries, Security (Pass 15)

Full detail in TRACE doc under "PASS 15 — RESOURCE LEAKS, TRUST BOUNDARIES, SECURITY."

# Flaw Layer Severity
R1 ThreadPoolExecutor never shut down — 3 threads leak Venue High
R2 BingxVenueAdapter no close() — backend HTTP client unreleasable Venue High
R3 real_zinc_plane._intent_cache grows unboundedly Plane High
R4 RealZincPlane/ControlPlane partial-construction SharedRegion leak Plane Medium
R5 BingxUserStream.ClientSession no __del__ — connection pool leak Venue Medium
R6 test_alpha_blue_untouched_g7.py open() without context manager Test Low
R7 All exchange REST/WS data parsed without schema validation Venue Critical
R8 Shared memory JSON deserialization without integrity check Plane High
R9 restore_state() deserializes arbitrary JSON — full kernel takeover Bridge Critical
R10 DOLPHIN_BINGX_ENV + ALLOW_MAINNET mainnet switch via env Config High
R11 .env file loaded from project root — secrets exposure Config High
R12 Unvalidated int() on env vars — recv_window, leverage extremes Config Medium
R13 listenKey from exchange in WS URL f-string — MITM injection Venue High
R14 mock_venue._exchange_event_queue unbounded growth Test Low

S-Series: Error Handling, Arithmetic Stability, Test Infrastructure (Pass 16)

Full detail in TRACE doc under "PASS 16 — ERROR HANDLING, ARITHMETIC STABILITY, TEST INFRASTRUCTURE."

# Flaw Layer Severity
S1 realized_pnl()/mark_price() NaN bypasses <=0.0 guard — NaN PnL corrupts k_realized_pnl Rust/Python Critical
S2 MockVenue _exchange_event_queue check-then-act race — silently drops events Test Critical
S3 No test_kernel_fsm.py or test_kernel_fsm_recovery.py exists Test Critical
S4 Generated tests use await asyncio.sleep(0.8) — flaky false negatives on slow CI Test Critical
S5 _rate_limit_retry_after_ms() returns 0 on parse failure — instant retry storm Venue High
S6 Venue adapter detects rate limits but enforces zero backoff Venue High
S7 capital_epsilon = 1e-4 too tight — false WARN classifications Accounting High
S8 Generated tests use module-level asyncio.run() — leaks tasks on Python 3.12+ Test High
S9 str.replace() patching silently does nothing on format change Build High
S10 _consume() no per-message WS timeout — silent hang blocks forever Venue High
S11 _run() blocks pool thread with no timeout — 3 hung calls lock adapter Venue High
S12 Rate-limit regex depends on exchange message format — non-portable Venue Medium
S13 _row_float() silently skips malformed rows, filters zero values Venue Medium
S14 Reconnection backoff lacks jitter — thundering herd risk Venue Medium
S15 _venue_event_status_from_row() falls back to ACKED — masks new rejections Venue Medium
S16 except: pass in generated test code — swallows KeyboardInterrupt Test Medium

T-Series: Unsafe Review, Dead Code/Backup Debris, Build/Plane Protocols (Pass 17)

Full detail in TRACE doc under "PASS 17 — UNSAFE REVIEW, DEAD CODE/BACKUP DEBRIS, BUILD/PLANE PROTOCOLS."

# Flaw Layer Severity
T1 catch_unwind + AssertUnwindSafe — partially mutated state persists, no rollback Rust High
T2 Empty backup dir _backup_20260530_105512/ and stale tea_debug.log (0 bytes) Repo Low
T3 HazelcastRowWriter uses bare json.dumps(row, default=str) — Enums/datetimes wrong format Bridge High
T4 real_zinc_plane._slot_from_payload() direct key access payload["last_event_time"] — KeyError Plane High
T5 _build_pink_bodies.py str.index("]") finds first ] — corrupts SCENARIOS list Build High
T6 VenueAdapter protocol missing connect()/disconnect() — AttributeError at runtime Venue High
T7 Shared memory writes non-atomic — visible-zero window exposes partial state Plane High
T8 _slot_from_payload() reconstructs internal_trade_id from slot trade_id — order data loss Plane Medium
T9 _slot_from_payload() duplicated in two files — schema drift risk Plane Medium
T10 str.index("finally:") finds first match — nested try/finally mismatch Build Medium
T11 No workspace-root .gitignorepycache, backup dirs, debris untracked Repo Low
T12 projection.py lazy import failure silently swallowed — writer=None drops all writes Bridge Medium
T13 Codex_CONTEXT_RESTORE__*.txt AI context files in root — debris Repo Low
T14 _backup_20260530/ is a valid Python package — accidental old-code import risk Repo Medium

U-Series: Rust Test Gaps, Accounting Reconciliation Bugs, FFI Type Mismatches (Pass 18)

Full detail in TRACE doc under "PASS 18 — RUST TEST GAPS, ACCOUNTING RECONCILIATION BUGS, FFI TYPE MISMATCHES."

# Flaw Layer Severity
U1 order_type/limit_price sent to Rust, no serde fields — silently dropped FFI High
U2 Rust VenueEventStatus expects "CANCEL_REJECTED" (typo) — "CANCELED_REJECTED" fails deserialization Rust High
U3 R2 compares cumulative K realized vs single-last-fill E realized — broken after 2nd+ fill Accounting Critical
U4 R4 compares K open_notional vs E used_margin — fundamentally different quantities Accounting Critical
U5 R3 skipped when len(e.positions)==0 — K has positions but E reports none, silent Accounting High
U6 on_venue_event/apply_fill no NaN guards on venue event price/size — NaN propagates Rust Critical
U7 Zero Rust tests for ORDER_REJECT, PARTIAL_FILL, TERMINAL_STATE guard, etc. Test High
U8 safe_float() returns NaN/Inf instead of default — contradicts _safe() Bridge Medium
U9 _scan_slots uses metadata.get("leverage") not slot.leverage — wrong leverage Accounting Medium
U10 Rust injects "k_net_fees" key alongside serde's k_fees_paid — duplicate Bridge Low
U11 10+ AccountState fields transmitted across FFI but never read by Python FFI Low
U12 _order_from_payload() overwrites internal_trade_id with slot's trade_id Bridge Medium
U13 No independent third reference — symmetrical K=E errors invisible Accounting Medium
U14 lot_step declared in ReconcileConfig but never used — dead field Accounting Low

H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)

Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."

# Flaw Layer Severity
H1 No Python dependency files (requirements.txt, pyproject.toml, etc.) Build Critical
H2 Rust kernel compiled from source on every cold start — no prebuilt binary Build Critical
H3 Zero logging — 16+ silent except:pass sites, no error observability All Critical
H4 _row_float rejects zero as valid, except Exception: continue swallows all Venue High
H5 _backend_snapshot timeout returns stale data/None — callers crash Venue High
H6 All enum-from-raw-string sites crash on unknown variant (17 sites) Bridge High
H7 _legacy_intent reads getattr(intent, "order_type") not metadata — always MARKET Venue High
H8 Unknown venue status silently mapped to ACKED Venue High
H9 RealZincPlane.write_slot() slot_id >= slot_count silently lost Zinc High
H10 RealZincControlPlane.read() no atomicity with concurrent update() Control High
H11 _RustKernelLib lazy init with race condition — concurrent cargo build Bridge High
H12 ExecutionKernel.__del__ use-after-free on Rust handle Bridge High
H13 MirroredControlPlane missing protocol methods (wait/notify) Control Medium
H14 TradeSlot.remaining_size vs VenueOrder.remaining_size — different semantics Contracts Medium
H15 _maybe_close asyncio.run RuntimeError silently swallowed Launcher Medium
H16 Lazy import of bingx_direct masks config errors until first trade Build Info
H17 load_dotenv() at module level — import-time I/O side effect Launcher Medium
H18 _run() blocks event loop on every HTTP call via thread pool Venue Medium
H19 HazelcastClientLike protocol has zero concrete implementations Projection Low
H20 _decode_packet uncaught UnicodeDecodeError/ValueError on corrupted SHM Zinc Low
H21 wasm-bindgen compiled into native library unnecessarily Build Low
H22 socket.getaddrinfo monkey-patch in test code Test Low

A-Series: Architectural Flaws (detailed writeups)

These are the original architectural flaws with full analysis.


Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size

Location: _rust_kernel/src/lib.rs lines ~770-780 (EXIT handler in process_intent)

Severity: High

Nature: Logic error — wrong base for exit-size computation.

Downstream effect

The EXIT handler computes the exit size as base_size * exit_ratio where:

let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };

After partial fills (e.g., two separate MARKET exit legs), initial_size is still the original entry size while slot.size has been reduced by previous legs. If the cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the computed exit size can exceed the remaining position.

The venue adapter clamps to actual position via reduceOnly, but the kernel's own accounting reduces slot.size by the fill size, not by the intended exit size. The slot can therefore go negative (slot.size < 0) if the fill is larger than remaining.

Exact trigger

  1. Enter SHORT, size=1.0, initial_size=1.0, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
  2. EXIT leg 0: exit_size = 1.0 * 0.6 = 0.6. Fill consumes 0.6. Slot size goes to 0.4.
  3. EXIT leg 1: exit_size = 1.0 * 0.6 = 0.6. But remaining is 0.4. Requests 0.6.
  4. BingX reduceOnly clamps fill to 0.4. Slot size goes to 0.0.
  5. EXIT leg 2 (ratio 1.0): exit_size = 1.0 * 1.0 = 1.0. Slot is already at 0.0. Kernel returns NO_OPEN_POSITION — the final EXIT is rejected because slot.closed was not set by the previous fill (it was a partial close, not terminal).
  6. Slot is at size=0.0, !slot.closed, no active orders, but !slot.is_free() because size <= 0.0 is true but fsm_state != IDLE/CLOSED — slot is stuck in POSITION_OPEN with zero size.

This is not purely a mis-sized ratio problem. With MARKET orders that fill fully, even correct ratios can leave the slot stuck if the fill price differs from the intended-size price and the venue adjusts fill quantity.

Fix strategy

Use slot.size directly as the base (not initial_size):

let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);

This guarantees the exit never requests more than the remaining position, regardless of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.


Flaw A2: Misleading CANCEL diagnostic code on entry-only slots

Location: _rust_kernel/src/lib.rs lines ~798-810 (CANCEL rejection path)

Severity: Low

Nature: Diagnostic pollution — wrong error code.

Downstream effect

When a CANCEL intent arrives and neither active_exit_order nor active_entry_order is cancellable, the kernel returns:

diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER

But the reason may be that there's no active entry order either, or the FSM state doesn't permit cancellation. The diagnostic name suggests an exit-order-specific problem when the failure is generic "nothing to cancel."

Fix

Change to a generic NO_ACTIVE_ORDER diagnostic or SLOT_IDLE when the slot is already in IDLE. NO_ACTIVE_EXIT_ORDER is misleading for a slot that has never had any order.


Flaw A3: Float-accumulated slot.size after partial fills can go negative

Location: _rust_kernel/src/lib.rs lines ~1365-1370 (apply_fill exit path)

Severity: Low

Nature: Numerical precision edge case.

Code path

slot.size = (slot.size - fill_size).max(0.0);

This clamps to zero, which is correct. But if the venue fills more than requested (on BingX, this can happen with market orders where the fill walks the book), the slot sees fill_size > intended_size. The max(0.0) prevents negative, but the slot then reports size=0.0 with !closed and an FSM state that's not IDLE.

The is_free() check requires size <= 0.0 AND fsm_state in {IDLE, CLOSED}. A slot with size=0.0 and fsm_state=POSITION_OPEN is stuck — no EXIT will be accepted and no ENTER can start.

Trigger

Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the book, overshoots). fill_size=0.8, slot.size = (0.6 - 0.8).max(0.0) = 0.0. Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), closed=false.

Fix

When slot.size <= 1e-12 after a fill and the slot is in an exit-related state, force transition to CLOSED/IDLE regardless of leg index:

if slot.size <= 1e-12 {
    slot.closed = true;
    slot.fsm_state = TradeStage::CLOSED;
    slot.active_exit_order = None;
    slot.active_entry_order = None;
    return;
}

Flaw A4: Entry price is clobbered by mark_price if called before fill arrives

Location: _rust_kernel/src/lib.rs lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)

Severity: Medium

Nature: Accounting accuracy — incorrect PnL base.

Code path

// In mark_price:
if self.entry_price <= 0.0 {
    self.entry_price = price;  // Seeds entry_price from mark before fill
}

// In apply_fill (entry):
if event.price > 0.0 {
    slot.entry_price = event.price;  // Overwrites with actual fill price
}

The mark_price path seeds entry_price from a market price when the slot has no fill yet. The apply_fill entry path correctly overwrites with the actual fill price. So in the normal flow this is harmless — the fill overwrites the mark.

However, consider this sequence:

  1. ENTER intent accepted → slot goes ORDER_REQUESTED, entry_price = 0.0
  2. runtime.step() calls kernel.mark_price(snapshot.symbol, snapshot.price) → sets entry_price = 100.0
  3. on_venue_event(ORDER_ACK)ENTRY_WORKING, entry_price still 100.0
  4. on_venue_event(PARTIAL_FILL)apply_fill sets entry_price = 99.5 (fill price)
  5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5

This is a transient mis-valuation window. It corrects itself on the next observe_slots call, but intra-step readers see wrong unrealized PnL. Not critical because:

  • account.snapshot.unrealized_pnl uses the slot's unrealized_pnl, not the mark
  • Realized PnL is computed from actual fill prices
  • The window lasts at most one scan cycle (~5s)

Fix

Don't set entry_price from mark_price when there's no fill:

fn mark_price(&mut self, price: f64) {
    if !price.is_finite() || price <= 0.0 { return; }
    // Don't seed entry_price — leave it at 0.0 until a fill arrives
    if self.entry_price <= 0.0 || self.size <= 0.0 {
        self.unrealized_pnl = 0.0;
        return;
    }
    // ... normal PnL computation
}

Flaw A5: Capital-before computation is arithmetic not snapshot-based

Location: pink_clickhouse.py lines ~761-762 (_write_trade_exit_leg) and ~822-823 (_write_trade_event)

Severity: High

Nature: Accounting accuracy — wrong capital_before under multi-slot or intervening events.

Code pattern (appears in two places)

capital_after = self._capital()
capital_before = capital_after - pnl_leg   # In _write_trade_exit_leg
capital_before = capital_after - pnl        # In _write_trade_event

This reconstructs capital_before by subtracting the current leg's PnL from the current capital. This is only correct if:

  1. No other slots settled PnL between this leg and the previous one
  2. No capital corrections (reconcile, manual override) happened between legs
  3. No fees were deducted between legs

With multi-slot (PINK configurable max_slots > 1), a concurrent trade on slot 1 that closes between slot 0's exit legs will have its PnL baked into capital_after, making capital_before = capital_after - pnl_leg wrong.

Fix

Maintain a per-trade capital_before_leg snapshot taken at the moment of the first fill event for each trade, advancing it by the realized PnL of each leg:

self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
self._leg_state[trade_id]["capital_after"] = capital_after

And use prev["capital_before"] for the row, not capital_after - pnl_leg.


Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange

Location: pink_direct.py lines ~597-630 (recover_account) and docstring of reconcile_account

Severity: Medium

Nature: Operational drift — capital is never verified against exchange truth in hot loop.

The gap

reconcile_account() (line 632) has this docstring:

Periodic exchange-led account sync.
Capital is re-seeded from the exchange balance as a guard against long-running drift

But the actual implementation:

async def reconcile_account(self, ...) -> dict[str, Any]:
    return await self.recover_account(...)

async def recover_account(self, ...) -> dict[str, Any]:
    capital = float(self.kernel.account.snapshot.capital or 25000.0)
    _reconcile_position_slot(self.kernel, capital, slot_id=0)

It passes the kernel's own capital to _reconcile_position_slot, which then overwrites kernel.account.snapshot.capital with... the same value. No exchange balance poll ever overwrites capital.

connect() at line 224 does the same — it passes initial_capital (an env default), not the exchange balance. The exchange balance is never read for capital seeding in the current code path. _reconcile_position_slot does call venue.open_positions(), but it only reads positions, not capital.

Effect

Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation, or any other systematic error) accumulates monotonically. There is no mechanism to detect or correct drift. Over weeks of live trading, the kernel's capital snapshot can diverge arbitrarily from the exchange's actual balance.

Fix

Either:

  1. Make _reconcile_position_slot read the exchange balance and use it for capital reseeding (the docstring claims it does this already), or
  2. Add a separate capital-verification path that surfaces the delta between kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.

Flaw A7: No fee tracking in kernel accounting

Location: rust_backend.py lines ~540-545 (on_venue_event settle), bingx_direct.py submit_intent return

Severity: Medium

Nature: Accounting accuracy — fees are invisible to capital tracking.

Downstream effect

When a trade closes, the kernel computes:

realized_pnl = delta * notional

This is gross PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%). These fees are never subtracted from the kernel's realized PnL. Over 100 trades with $100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000 trades at 10x leverage and $50k average notional, the error is $200k.

The BingxDirectExecutionAdapter does return ExecutionReceipt with fill data, but bingx_venue._events_from_submit() only reads price and filled_size — commission/fee fields are ignored.

Fix

  1. Read fee/commission from the BingX ack payload in _events_from_submit()
  2. Pass fees through VenueEvent.metadata["fee"]
  3. In the Rust kernel's apply_fill, subtract the fee from realized PnL:
    let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
    slot.realized_pnl += realized - fee;
    

Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input

Location: _rust_kernel/src/lib.rs lines ~745-748

Severity: Low

Nature: Silent fallback — corrupt input produces a trade, not a rejection.

slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
    intent.leverage
} else {
    1.0
};

A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of rejecting the intent. The Python bridge does validate _first_invalid_intent_field() which catches NaN/inf, but it doesn't catch leverage <= 0.0 (it only checks not math.isfinite(value)).

Fix

Add leverage <= 0.0 to the Python bridge's invalid-intent check. The Rust kernel should still have the 1.0 fallback as a defensive measure, but the bridge should prevent bad leverages from reaching Rust in the first place.


Flaw A9: Mock venue submit condition convoluted — dead code paths

Location: mock_venue.py lines ~60-90

Severity: Informational

Nature: Code clarity — confusing condition logic.

if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
    events.append(ack_event)
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
    # ... fill events

The condition logic is confusing:

  • When emit_ack_before_fill=True and emit_fill_on_submit=True: both branches run → ACK + fill
  • When emit_ack_before_fill=False and emit_fill_on_submit=True: first branch runs because not True = False, so False or False = False → no ACK. Second branch runs → fill only. This produces a fill without an ACK, which is not a realistic venue scenario.
  • When partial_fill_ratio=1.0 (default): second branch runs and emits a FULL_FILL event even when emit_fill_on_submit=False, because 0.0 or 1.0 > 0 = True.

The partial fill ratio check should be gated on emit_fill_on_submit:

should_emit_fill = self.scenario.emit_fill_on_submit or (
    is_entry and self.scenario.entry_partial_fill_ratio > 0
) or (
    not is_entry and self.scenario.exit_partial_fill_ratio > 0
)

Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows

Location: pink_direct.py lines ~318-374 (pump_venue_events), called at line ~436

Severity: Medium

Nature: Operational overhead — unnecessary exchange HTTP calls.

The problem

step() calls pump_venue_events() every cycle, which calls venue.reconcile(). For BingxVenueAdapter, reconcile() calls _backend_snapshot() which does up to 5 HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow where orders fill synchronously within process_intent(), there are no late fills to drain.

On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each pump_venue_events() call consumes 5+ of that budget. At a 5-second policy cycle, this is 60 requests/minute — 60% of the rate budget — just to poll for fills that don't exist.

Fix

Gate the pump on whether the previous cycle submitted a LIMIT order:

self._has_resting_order = any(
    o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
    for o in kernel.open_orders()
)
if self._has_resting_order:
    await self.pump_venue_events(snapshot, market_state=market_state)

Or add a config flag async_fill_mode: bool = False.


Flaw A11: VenueAdapter.submit() blocks the event loop

Location: bingx_venue.py lines ~225-233 (_run)

Severity: Medium

Nature: Runtime safety — synchronous call in async context.

def _run(self, result: Any) -> Any:
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()

When called from step() (which is an async function), _run submits the async submit_intent() to a thread pool, runs it with asyncio.run(), then calls .result() which blocks the current thread until complete. The BingX HTTP call can take 1-5 seconds depending on network latency and exchange load.

During this block, the event loop cannot process other async tasks (data feed updates, health checks, signal processing). In a single-runtime deployment, this stalls the entire policy cycle.

Fix

Make process_intent in ExecutionKernel accept an async venue callback, or make BingxVenueAdapter truly async (not sync-with-thread-bridge). For now, at minimum the PINK runtime should run step() in an executor to avoid blocking the main event loop.


Flaw A12: Stale KernelStateView slot references after reconcile

Location: rust_backend.py lines ~350-365 (KernelStateView.refresh)

Severity: Low

Nature: Stale data — view not rebuilt on reconcile.

class KernelStateView:
    def __init__(self, kernel):
        self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
        # ...

    def refresh(self) -> None:
        snapshot = self._kernel._snapshot_backend()
        self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
        self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
        self.client_order_index = dict(snapshot.get("client_order_index", {}))

refresh() updates the index maps but does not recreate self.slots. The slot views in self.slots are live proxies (they read through _get_slot each time), so slot data is current. But if max_slots changes (it shouldn't, but it's mutable) or if slots are re-indexed by a reconcile, the view list is wrong.

Not critical because max_slots is set at init and never changes, but worth fixing for robustness.


Flaw A13: persist_fill_events uses current price as exit price

Location: pink_clickhouse.py lines ~408

Severity: Low

Nature: Historical accuracy — logged price may not match fill price.

price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
              if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)

This correctly reads from the event's price. But decision.reference_price at line 417 falls back to this price, which is the fill price. The trade_event row at line 835 uses exit_price = slot_dict.get("entry_price", ...) — which is the entry price, not the exit price. The trade_event always shows exit_price == entry_price.

This means trade_events in ClickHouse will never show a realistic exit price for the persisted trade, breaking any PnL reconstruction that relies on (exit_price - entry_price) * size * leverage.


Flaw A14: _write_position_state maps active_leg_index to entry_bar

Location: pink_clickhouse.py line ~673

Severity: Low

Nature: Semantic mismatch — wrong field mapping.

"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),

active_leg_index is the index into the exit-leg-ratios array (which leg is being exited next). It has nothing to do with how many bars the position has been held. When a position opens, active_leg_index is 0. After the first exit leg, it advances to 1. Neither value is a bar count.

entry_bar should be bars_held from the intent/decision, or a computed value from entry_time to now.


Flaw A15: persist_recovery_state passes account dict as slot dict

Location: pink_clickhouse.py lines ~447-460

Severity: Low

Nature: Wrong data — account snapshot used where slot data is expected.

def persist_recovery_state(self, *, snapshot, acc_dict, ...):
    slot_dict = acc_dict or {}   # ← acc_dict is an account snapshot, not a slot
    self._write_position_state(..., slot_dict={}, ...)  # ← correctly uses empty dict
    self._write_trade_reconstruction(
        snapshot,
        trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
        # acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
    )

The trade_id in the trade_reconstruction row will always be "" because acc_dict comes from kernel.snapshot()["account"] which has keys capital, equity, realized_pnl, etc. — not trade_id. This means the recovery trade_reconstruction row has no trade_id linkage.