Public Access

Files

Codex 1f5a3266c4 PINK: E2E trace analysis — Pass 5 edge domains (H1-H22)

Fifth pass covering dependency management (no Python lockfile, Rust compiled
from source), error handling observability (zero logging, 16+ silent swallows),
type safety (17 enum-from-string crash sites, _legacy_intent always MARKET),
and protocol contracts (MirroredControlPlane missing methods, RealZinc read
atomicity, __del__ use-after-free). 22 new flaws. Combined catalog now 138.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>

2026-06-01 17:42:58 +02:00

32 KiB

Raw Blame History

PINK DITAv2 — Structural Flaw Analysis (CENTRAL)

Analysis date: 2026-05-31 Scope: Full PINK pipeline — all flaws across all modules. Sources:

This file (A-series): Detailed writeups for architectural flaws.
PINK_DITAv2_E2E_TRACE_ANALYSIS.md (E, F, G-series): Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans. Every E, F, G entry below is a summary only — full detail is in the TRACE doc.

Combined Catalog (All Flaws, All Passes)

Pass	Focus	Count	Critical	High	Medium	Low	Info
A	Architectural (detailed in this file)	15	0	2	0	2	11
T	Threading/Atomicity	9	1	3	3	2	0
E	E2E Trace (Pass 1)	26	0	4	10	11	1
F	Deep E2E (Pass 3)	30	0	1	8	17	4
G	Domain Scans (Pass 4)	36	4	11	11	8	2
H	Edge Domains (Pass 5)	22	3	9	5	4	1
Total		138	8	30	37	44	19

T-Series: Threading & Atomicity Flaws

Full detail in TRACE doc under "Threading & Atomicity" section.

#	Flaw	Layer	Severity
T1	`InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy	Zinc	Critical
T2	Thread-unsafe kernel snapshot capture for account	Bridge	High
T3	Re-entrant or incorrectly-scoped Rust-kernel handle usage	Bridge	High
T4	Consequence: `on_venue_event` PnL settle races	Bridge	High
T5	Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls	Zinc	Medium
T6	`_write_region` buffer zero + notify race with concurrent reader	Zinc	Medium
T7	Publication of events in `process_intent` loop not synchronized with persist	Bridge	Medium
T8	`asyncio.run` executor skip in `_run` leads to event-loop stall	Venue	Low
T9	No thread-safe Python↔Rust ownership / lifetime protocol	Bridge	Low

E-Series: E2E Data-Flow Flaws (Pass 1)

Full detail in TRACE doc under "Layer 1" through "Layer 9."

#	Flaw	Layer	Severity
E1	`step()` calls `pump_venue_events()` every cycle unconditionally	Runtime	High
E2	`kernel.snapshot()["account"]` returns a fresh dict, not a live view	Bridge	Low
E3	`_decision_to_kernel_intent` drops `order_type` and `limit_price`	Runtime	High
E4	`_exit_intent_from_slot` trusts slot.size but slot may be stale	Runtime	High
E5	JSON serialization round-trip loses numeric precision	Bridge	Low
E6	`_RustKernelLib` is a global singleton — shared across all kernels	Bridge	Low
E7	ENTER handler silently allows re-entry with same trade_id	Rust	High
E8	EXIT handler uses `initial_size` not current size	Rust	High
E9	CANCEL handler returns diagnostic even when nothing happened	Rust	Low
E10	`apply_fill` entry branch double-sets `active_entry_order`	Rust	Low
E11	`_legacy_intent()` is a lossy conversion	Venue	Low
E12	`_events_from_submit()` price fallback chain can lose venue price	Venue	Low
E13	`_backend_snapshot()` timeout returns stale data	Venue	Medium
E14	`_events_from_cancel` uses stale `slot_id` from order metadata	Venue	Low
E15	Submit sets leverage via separate HTTP call	Adapter	Medium
E16	`_format_quantity`/`_format_price` may use zero tick/step	Adapter	Medium
E17	Cancel uses truth-based confirmation — can mask real errors	Adapter	Medium
E18	`on_venue_event` settles PnL incrementally — fees never included	Bridge	Medium
E19	`observe_slots` called with ALL slots, not just changed ones	Bridge	Low
E20	`_capital()` reads live from `AccountProjection` — stale row risk	Persistence	Low
E21	`persist_fill_events()` synthesizes fake Decision/Intent	Persistence	Medium
E22	`_write_trade_exit_leg` capital_before uses arithmetic reconstruction	Persistence	Medium
E23	`_write_trade_event` uses entry_price as exit_price	Persistence	Medium
E24	Mock venue always emits fill on `partial_fill_ratio > 0`	Test	Low
E25	Test scenarios use MARKET-only `_si()` helper — no LIMIT tests	Test	Low
E26	Fresh-kernel reconcile tests create second kernel but share venue	Test	Low

F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)

Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."

#	Flaw	Layer	Severity
F1	CANCEL returns "accepted" before cancel happens — stale diagnostic_code	Bridge	Medium
F2	`_last_settled_pnl` reset before `venue.submit()` — transient window	Bridge	Medium
F3	`_first_invalid_intent_field` allows `leverage=0` and `target_size=0`	Bridge	Low
F4	`outcome.emitted_events` only from venue — Rust kernel events dropped	Bridge	Low
F5	`on_venue_event` redundant FFI read of slot already returned by Rust	Bridge	Low
F6	`process_intent` records pre-venue transitions with `event=None`	Bridge	Info
F7	`reconcile_from_slots` writes ALL slots to projection/zinc	Bridge	Low
F8	`HazelcastRowWriter.put()` synchronous, no error handling — crashes intent	Projection	Medium
F9	`RealZincPlane.write_slot()` serializes ALL slots, not just changed one	Zinc	Low
F10	`RealZincPlane` zeros buffer before write — concurrent read sees empty	Zinc	Low
F11	`RealZincPlane._write_region` no partial-write recovery	Zinc	Low
F12	`InMemoryZincPlane` intent_region grows without bound	Zinc	Low
F13	`InMemoryZincPlane` uses non-re-entrant `threading.Condition`	Zinc	Low
F14	`KernelSlotView.__setattr__` round-trips unknown fields — silently dropped	Bridge	Low
F15	`on_venue_event` loop stops on first exception — slot left in partial state	Bridge	High
F16	`venue.submit()` returning empty events leaves slot in ORDER_REQUESTED	Bridge	Medium
F17	Cancel truth-based confirmation returns REJECTED for already-cancelled orders	Adapter	Medium
F18	Leverage-set and order-submit failures share error handler	Adapter	Low
F19	`_events_from_submit` stale snapshot fallback → wrong fill detection	Venue	Medium
F20	`__del__` frees Rust handle at unpredictable GC time — no explicit close()	Bridge	Medium
F21	`DITAv2LauncherBundle.close()` closes venue before kernel is done	Launcher	Low
F22	Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware	Launcher	Medium
F23	`VenueEvent.size` = `intent.target_size` not actual fill	Venue	Info
F24	`asyncio.run()` inside async function in test generator	Test	Low
F25	`_build_fresh_kernel_from_slot` leaks old kernel objects per call	Test	Low
F26	`seen_event_ids` not cleared on re-entry — accumulates across trades	Rust	Low
F27	`RealZincControlPlane.read()` parses Zinc region every call — no caching	Control	Low
F28	`_legacy_intent` hardcodes confidence=1.0, bars_held=0	Venue	Info
F29	`_slot_to_payload` in real_zinc_plane.py is dead code	Zinc	Info
F30	Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py	Zinc	Low

G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)

Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."

#	Flaw	Layer	Severity
G1	EXIT_RESIDUAL action missing from Rust KernelCommandType enum	Rust	Critical
G2	`into_c_string` unwrap() panics on NUL byte in FFI string	Rust	Critical
G3	EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition	Rust	Critical
G4	`consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg	Rust	Critical
G5	`realized_pnl` unbounded f64 — overflows to inf at extreme values	Rust	High
G6	`mark_price` produces unbounded unrealized_pnl — no result guard	Rust	High
G7	ENTER no is_finite() guard on target_size	Rust	High
G8	`reconcile_slots_json` no dedup or bounds validation	Rust	High
G9	`exchange_order_id` update targets wrong order — exit cancel broken	Rust	High
G10	CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER	Rust	High
G11	`apply_fill` overwrites intended_size with slot.size	Rust	Medium
G12	No max leverage cap enforced by kernel	Rust	Medium
G13	`resolve_slot` fallback returns unwrap_or(0) — misroutes events	Rust	Medium
G14	`commit_slot` silently ignores out-of-bounds slot_id	Rust	Medium
G15	Zero `__post_init__` validators on all 16 config dataclasses (127 fields)	Config	High
G16	DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset	Config	Info
G17	String config fields — Zinc region injection risk	Config	Medium
G18	`exit_leg_ratios` no sum-to-1 validation	Config	Low
G19	RealZincControlPlane.read() no sequence check — torn-read risk	Config	Low
G20	ClickHouse journal strategy/db env vars — SQL injection risk	Config	Low
G21	entry_price used as exit_price in trade_events — data loss	Persistence	High
G22	active_leg_index → entry_bar semantic mis-mapping	Persistence	Medium
G23	capital_before arithmetic absorbs cross-slot PnL	Persistence	Medium
G24	Recovery trade_reconstruction always has trade_id=""	Persistence	Medium
G25	seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables	Persistence	Low
G26	_safe_float silently converts NaN/None/Inf to 0.0	Persistence	Low
G27	build_launcher_bundle no exception safety — prior resources leak	Lifecycle	High
G28	RealZincPlane/RealZincControlPlane no del — SHM orphaned	Lifecycle	High
G29	Zero signal handlers — no cleanup on SIGTERM/SIGINT	Lifecycle	High
G30	ExecutionKernel has no close() — relies on del for Rust handle	Lifecycle	High
G31	Hazelcast projection never closed	Lifecycle	Medium
G32	_maybe_close() break skips second method	Lifecycle	Low
G33	close() not idempotent for RealZinc components	Lifecycle	Low
G34	No context manager on DITAv2LauncherBundle	Lifecycle	Low
G35	BingxVenueAdapter.connect() never called	Lifecycle	Info
G36	Only one try/finally in entire codebase	Lifecycle	High

H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)

Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."

#	Flaw	Layer	Severity
H1	No Python dependency files (requirements.txt, pyproject.toml, etc.)	Build	Critical
H2	Rust kernel compiled from source on every cold start — no prebuilt binary	Build	Critical
H3	Zero logging — 16+ silent except:pass sites, no error observability	All	Critical
H4	`_row_float` rejects zero as valid, `except Exception: continue` swallows all	Venue	High
H5	`_backend_snapshot` timeout returns stale data/None — callers crash	Venue	High
H6	All enum-from-raw-string sites crash on unknown variant (17 sites)	Bridge	High
H7	`_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET	Venue	High
H8	Unknown venue status silently mapped to ACKED	Venue	High
H9	`RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost	Zinc	High
H10	`RealZincControlPlane.read()` no atomicity with concurrent `update()`	Control	High
H11	`_RustKernelLib` lazy init with race condition — concurrent cargo build	Bridge	High
H12	`ExecutionKernel.__del__` use-after-free on Rust handle	Bridge	High
H13	`MirroredControlPlane` missing protocol methods (wait/notify)	Control	Medium
H14	`TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics	Contracts	Medium
H15	`_maybe_close` asyncio.run RuntimeError silently swallowed	Launcher	Medium
H16	Lazy import of bingx_direct masks config errors until first trade	Build	Info
H17	`load_dotenv()` at module level — import-time I/O side effect	Launcher	Medium
H18	`_run()` blocks event loop on every HTTP call via thread pool	Venue	Medium
H19	`HazelcastClientLike` protocol has zero concrete implementations	Projection	Low
H20	`_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM	Zinc	Low
H21	`wasm-bindgen` compiled into native library unnecessarily	Build	Low
H22	`socket.getaddrinfo` monkey-patch in test code	Test	Low

A-Series: Architectural Flaws (detailed writeups)

These are the original architectural flaws with full analysis.

Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size

Location: _rust_kernel/src/lib.rs lines ~770-780 (EXIT handler in process_intent)

Severity: High

Nature: Logic error — wrong base for exit-size computation.

Downstream effect

The EXIT handler computes the exit size as base_size * exit_ratio where:

let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };

After partial fills (e.g., two separate MARKET exit legs), initial_size is still the original entry size while slot.size has been reduced by previous legs. If the cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the computed exit size can exceed the remaining position.

The venue adapter clamps to actual position via reduceOnly, but the kernel's own accounting reduces slot.size by the fill size, not by the intended exit size. The slot can therefore go negative (slot.size < 0) if the fill is larger than remaining.

Exact trigger

Enter SHORT, size=1.0, initial_size=1.0, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
EXIT leg 0: exit_size = 1.0 * 0.6 = 0.6. Fill consumes 0.6. Slot size goes to 0.4.
EXIT leg 1: exit_size = 1.0 * 0.6 = 0.6. But remaining is 0.4. Requests 0.6.
BingX reduceOnly clamps fill to 0.4. Slot size goes to 0.0.
EXIT leg 2 (ratio 1.0): exit_size = 1.0 * 1.0 = 1.0. Slot is already at 0.0. Kernel returns NO_OPEN_POSITION — the final EXIT is rejected because slot.closed was not set by the previous fill (it was a partial close, not terminal).
Slot is at size=0.0, !slot.closed, no active orders, but !slot.is_free() because size <= 0.0 is true but fsm_state != IDLE/CLOSED — slot is stuck in POSITION_OPEN with zero size.

This is not purely a mis-sized ratio problem. With MARKET orders that fill fully, even correct ratios can leave the slot stuck if the fill price differs from the intended-size price and the venue adjusts fill quantity.

Fix strategy

Use slot.size directly as the base (not initial_size):

let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);

This guarantees the exit never requests more than the remaining position, regardless of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.

Flaw A2: Misleading CANCEL diagnostic code on entry-only slots

Location: _rust_kernel/src/lib.rs lines ~798-810 (CANCEL rejection path)

Severity: Low

Nature: Diagnostic pollution — wrong error code.

Downstream effect

When a CANCEL intent arrives and neither active_exit_order nor active_entry_order is cancellable, the kernel returns:

diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER

But the reason may be that there's no active entry order either, or the FSM state doesn't permit cancellation. The diagnostic name suggests an exit-order-specific problem when the failure is generic "nothing to cancel."

Fix

Change to a generic NO_ACTIVE_ORDER diagnostic or SLOT_IDLE when the slot is already in IDLE. NO_ACTIVE_EXIT_ORDER is misleading for a slot that has never had any order.

Flaw A3: Float-accumulated slot.size after partial fills can go negative

Location: _rust_kernel/src/lib.rs lines ~1365-1370 (apply_fill exit path)

Severity: Low

Nature: Numerical precision edge case.

Code path

slot.size = (slot.size - fill_size).max(0.0);

This clamps to zero, which is correct. But if the venue fills more than requested (on BingX, this can happen with market orders where the fill walks the book), the slot sees fill_size > intended_size. The max(0.0) prevents negative, but the slot then reports size=0.0 with !closed and an FSM state that's not IDLE.

The is_free() check requires size <= 0.0 AND fsm_state in {IDLE, CLOSED}. A slot with size=0.0 and fsm_state=POSITION_OPEN is stuck — no EXIT will be accepted and no ENTER can start.

Trigger

Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the book, overshoots). fill_size=0.8, slot.size = (0.6 - 0.8).max(0.0) = 0.0. Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), closed=false.

Fix

When slot.size <= 1e-12 after a fill and the slot is in an exit-related state, force transition to CLOSED/IDLE regardless of leg index:

if slot.size <= 1e-12 {
    slot.closed = true;
    slot.fsm_state = TradeStage::CLOSED;
    slot.active_exit_order = None;
    slot.active_entry_order = None;
    return;
}

Flaw A4: Entry price is clobbered by mark_price if called before fill arrives

Location: _rust_kernel/src/lib.rs lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)

Severity: Medium

Nature: Accounting accuracy — incorrect PnL base.

Code path

// In mark_price:
if self.entry_price <= 0.0 {
    self.entry_price = price;  // Seeds entry_price from mark before fill
}

// In apply_fill (entry):
if event.price > 0.0 {
    slot.entry_price = event.price;  // Overwrites with actual fill price
}

The mark_price path seeds entry_price from a market price when the slot has no fill yet. The apply_fill entry path correctly overwrites with the actual fill price. So in the normal flow this is harmless — the fill overwrites the mark.

However, consider this sequence:

ENTER intent accepted → slot goes ORDER_REQUESTED, entry_price = 0.0
runtime.step() calls kernel.mark_price(snapshot.symbol, snapshot.price) → sets entry_price = 100.0
on_venue_event(ORDER_ACK) → ENTRY_WORKING, entry_price still 100.0
on_venue_event(PARTIAL_FILL) → apply_fill sets entry_price = 99.5 (fill price)
Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5

This is a transient mis-valuation window. It corrects itself on the next observe_slots call, but intra-step readers see wrong unrealized PnL. Not critical because:

account.snapshot.unrealized_pnl uses the slot's unrealized_pnl, not the mark
Realized PnL is computed from actual fill prices
The window lasts at most one scan cycle (~5s)

Fix

Don't set entry_price from mark_price when there's no fill:

fn mark_price(&mut self, price: f64) {
    if !price.is_finite() || price <= 0.0 { return; }
    // Don't seed entry_price — leave it at 0.0 until a fill arrives
    if self.entry_price <= 0.0 || self.size <= 0.0 {
        self.unrealized_pnl = 0.0;
        return;
    }
    // ... normal PnL computation
}

Flaw A5: Capital-before computation is arithmetic not snapshot-based

Location: pink_clickhouse.py lines ~761-762 (_write_trade_exit_leg) and ~822-823 (_write_trade_event)

Severity: High

Nature: Accounting accuracy — wrong capital_before under multi-slot or intervening events.

Code pattern (appears in two places)

capital_after = self._capital()
capital_before = capital_after - pnl_leg   # In _write_trade_exit_leg
capital_before = capital_after - pnl        # In _write_trade_event

This reconstructs capital_before by subtracting the current leg's PnL from the current capital. This is only correct if:

No other slots settled PnL between this leg and the previous one
No capital corrections (reconcile, manual override) happened between legs
No fees were deducted between legs

With multi-slot (PINK configurable max_slots > 1), a concurrent trade on slot 1 that closes between slot 0's exit legs will have its PnL baked into capital_after, making capital_before = capital_after - pnl_leg wrong.

Fix

Maintain a per-trade capital_before_leg snapshot taken at the moment of the first fill event for each trade, advancing it by the realized PnL of each leg:

self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
self._leg_state[trade_id]["capital_after"] = capital_after

And use prev["capital_before"] for the row, not capital_after - pnl_leg.

Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange

Location: pink_direct.py lines ~597-630 (recover_account) and docstring of reconcile_account

Severity: Medium

Nature: Operational drift — capital is never verified against exchange truth in hot loop.

The gap

reconcile_account() (line 632) has this docstring:

Periodic exchange-led account sync.
Capital is re-seeded from the exchange balance as a guard against long-running drift

But the actual implementation:

async def reconcile_account(self, ...) -> dict[str, Any]:
    return await self.recover_account(...)

async def recover_account(self, ...) -> dict[str, Any]:
    capital = float(self.kernel.account.snapshot.capital or 25000.0)
    _reconcile_position_slot(self.kernel, capital, slot_id=0)

It passes the kernel's own capital to _reconcile_position_slot, which then overwrites kernel.account.snapshot.capital with... the same value. No exchange balance poll ever overwrites capital.

connect() at line 224 does the same — it passes initial_capital (an env default), not the exchange balance. The exchange balance is never read for capital seeding in the current code path. _reconcile_position_slot does call venue.open_positions(), but it only reads positions, not capital.

Effect

Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation, or any other systematic error) accumulates monotonically. There is no mechanism to detect or correct drift. Over weeks of live trading, the kernel's capital snapshot can diverge arbitrarily from the exchange's actual balance.

Fix

Either:

Make _reconcile_position_slot read the exchange balance and use it for capital reseeding (the docstring claims it does this already), or
Add a separate capital-verification path that surfaces the delta between kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.

Flaw A7: No fee tracking in kernel accounting

Location: rust_backend.py lines ~540-545 (on_venue_event settle), bingx_direct.py submit_intent return

Severity: Medium

Nature: Accounting accuracy — fees are invisible to capital tracking.

Downstream effect

When a trade closes, the kernel computes:

realized_pnl = delta * notional

This is gross PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%). These fees are never subtracted from the kernel's realized PnL. Over 100 trades with $100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000 trades at 10x leverage and $50k average notional, the error is $200k.

The BingxDirectExecutionAdapter does return ExecutionReceipt with fill data, but bingx_venue._events_from_submit() only reads price and filled_size — commission/fee fields are ignored.

Fix

Read fee/commission from the BingX ack payload in _events_from_submit()
Pass fees through VenueEvent.metadata["fee"]

In the Rust kernel's apply_fill, subtract the fee from realized PnL:

let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
slot.realized_pnl += realized - fee;

Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input

Location: _rust_kernel/src/lib.rs lines ~745-748

Severity: Low

Nature: Silent fallback — corrupt input produces a trade, not a rejection.

slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
    intent.leverage
} else {
    1.0
};

A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of rejecting the intent. The Python bridge does validate _first_invalid_intent_field() which catches NaN/inf, but it doesn't catch leverage <= 0.0 (it only checks not math.isfinite(value)).

Fix

Add leverage <= 0.0 to the Python bridge's invalid-intent check. The Rust kernel should still have the 1.0 fallback as a defensive measure, but the bridge should prevent bad leverages from reaching Rust in the first place.

Flaw A9: Mock venue submit condition convoluted — dead code paths

Location: mock_venue.py lines ~60-90

Severity: Informational

Nature: Code clarity — confusing condition logic.

if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
    events.append(ack_event)
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
    # ... fill events

The condition logic is confusing:

When emit_ack_before_fill=True and emit_fill_on_submit=True: both branches run → ACK + fill
When emit_ack_before_fill=False and emit_fill_on_submit=True: first branch runs because not True = False, so False or False = False → no ACK. Second branch runs → fill only. This produces a fill without an ACK, which is not a realistic venue scenario.
When partial_fill_ratio=1.0 (default): second branch runs and emits a FULL_FILL event even when emit_fill_on_submit=False, because 0.0 or 1.0 > 0 = True.

The partial fill ratio check should be gated on emit_fill_on_submit:

should_emit_fill = self.scenario.emit_fill_on_submit or (
    is_entry and self.scenario.entry_partial_fill_ratio > 0
) or (
    not is_entry and self.scenario.exit_partial_fill_ratio > 0
)

Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows

Location: pink_direct.py lines ~318-374 (pump_venue_events), called at line ~436

Severity: Medium

Nature: Operational overhead — unnecessary exchange HTTP calls.

The problem

step() calls pump_venue_events() every cycle, which calls venue.reconcile(). For BingxVenueAdapter, reconcile() calls _backend_snapshot() which does up to 5 HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow where orders fill synchronously within process_intent(), there are no late fills to drain.

On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each pump_venue_events() call consumes 5+ of that budget. At a 5-second policy cycle, this is 60 requests/minute — 60% of the rate budget — just to poll for fills that don't exist.

Fix

Gate the pump on whether the previous cycle submitted a LIMIT order:

self._has_resting_order = any(
    o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
    for o in kernel.open_orders()
)
if self._has_resting_order:
    await self.pump_venue_events(snapshot, market_state=market_state)

Or add a config flag async_fill_mode: bool = False.

Flaw A11: VenueAdapter.submit() blocks the event loop

Location: bingx_venue.py lines ~225-233 (_run)

Severity: Medium

Nature: Runtime safety — synchronous call in async context.

def _run(self, result: Any) -> Any:
    if inspect.isawaitable(result):
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            return asyncio.run(result)
        pool = self._get_executor()
        return pool.submit(asyncio.run, result).result()

When called from step() (which is an async function), _run submits the async submit_intent() to a thread pool, runs it with asyncio.run(), then calls .result() which blocks the current thread until complete. The BingX HTTP call can take 1-5 seconds depending on network latency and exchange load.

During this block, the event loop cannot process other async tasks (data feed updates, health checks, signal processing). In a single-runtime deployment, this stalls the entire policy cycle.

Fix

Make process_intent in ExecutionKernel accept an async venue callback, or make BingxVenueAdapter truly async (not sync-with-thread-bridge). For now, at minimum the PINK runtime should run step() in an executor to avoid blocking the main event loop.

Flaw A12: Stale KernelStateView slot references after reconcile

Location: rust_backend.py lines ~350-365 (KernelStateView.refresh)

Severity: Low

Nature: Stale data — view not rebuilt on reconcile.

class KernelStateView:
    def __init__(self, kernel):
        self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
        # ...

    def refresh(self) -> None:
        snapshot = self._kernel._snapshot_backend()
        self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
        self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
        self.client_order_index = dict(snapshot.get("client_order_index", {}))

refresh() updates the index maps but does not recreate self.slots. The slot views in self.slots are live proxies (they read through _get_slot each time), so slot data is current. But if max_slots changes (it shouldn't, but it's mutable) or if slots are re-indexed by a reconcile, the view list is wrong.

Not critical because max_slots is set at init and never changes, but worth fixing for robustness.

Flaw A13: `persist_fill_events` uses current price as exit price

Location: pink_clickhouse.py lines ~408

Severity: Low

Nature: Historical accuracy — logged price may not match fill price.

price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
              if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)

This correctly reads from the event's price. But decision.reference_price at line 417 falls back to this price, which is the fill price. The trade_event row at line 835 uses exit_price = slot_dict.get("entry_price", ...) — which is the entry price, not the exit price. The trade_event always shows exit_price == entry_price.

This means trade_events in ClickHouse will never show a realistic exit price for the persisted trade, breaking any PnL reconstruction that relies on (exit_price - entry_price) * size * leverage.

Flaw A14: `_write_position_state` maps active_leg_index to entry_bar

Location: pink_clickhouse.py line ~673

Severity: Low

Nature: Semantic mismatch — wrong field mapping.

"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),

active_leg_index is the index into the exit-leg-ratios array (which leg is being exited next). It has nothing to do with how many bars the position has been held. When a position opens, active_leg_index is 0. After the first exit leg, it advances to 1. Neither value is a bar count.

entry_bar should be bars_held from the intent/decision, or a computed value from entry_time to now.

Flaw A15: `persist_recovery_state` passes account dict as slot dict

Location: pink_clickhouse.py lines ~447-460

Severity: Low

Nature: Wrong data — account snapshot used where slot data is expected.

def persist_recovery_state(self, *, snapshot, acc_dict, ...):
    slot_dict = acc_dict or {}   # ← acc_dict is an account snapshot, not a slot
    self._write_position_state(..., slot_dict={}, ...)  # ← correctly uses empty dict
    self._write_trade_reconstruction(
        snapshot,
        trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
        # acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
    )

The trade_id in the trade_reconstruction row will always be "" because acc_dict comes from kernel.snapshot()["account"] which has keys capital, equity, realized_pnl, etc. — not trade_id. This means the recovery trade_reconstruction row has no trade_id linkage.

32 KiB Raw Blame History

PINK DITAv2 — Structural Flaw Analysis (CENTRAL)

Combined Catalog (All Flaws, All Passes)

T-Series: Threading & Atomicity Flaws

E-Series: E2E Data-Flow Flaws (Pass 1)

F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)

G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)

H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)

A-Series: Architectural Flaws (detailed writeups)

Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size

Downstream effect

Exact trigger

Fix strategy

Flaw A2: Misleading CANCEL diagnostic code on entry-only slots

Downstream effect

Fix

Flaw A3: Float-accumulated slot.size after partial fills can go negative

Code path

Trigger

Fix

Flaw A4: Entry price is clobbered by mark_price if called before fill arrives

Code path

Fix

Flaw A5: Capital-before computation is arithmetic not snapshot-based

Code pattern (appears in two places)

Fix

Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange

The gap

Effect

Fix

Flaw A7: No fee tracking in kernel accounting

Downstream effect

Fix

Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input

Fix

Flaw A9: Mock venue submit condition convoluted — dead code paths

Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows

The problem

Fix

Flaw A11: VenueAdapter.submit() blocks the event loop

Fix

Flaw A12: Stale KernelStateView slot references after reconcile

Flaw A13: persist_fill_events uses current price as exit price

Flaw A14: _write_position_state maps active_leg_index to entry_bar

Flaw A15: persist_recovery_state passes account dict as slot dict

32 KiB

Raw Blame History

Flaw A13: `persist_fill_events` uses current price as exit price

Flaw A14: `_write_position_state` maps active_leg_index to entry_bar

Flaw A15: `persist_recovery_state` passes account dict as slot dict