First commit of the previously-untracked PINK-on-DITAv2 migration system (execution moves to the Rust kernel; policy stays on legacy DITA, so Alpha Engine algorithmic integrity is preserved). BLUE is untouched. Sprint 0 (safety snapshot + flaw-fix verification, MARKET single-leg scope): - Verified Rust FSM fixes (flaws 2,4,10,11,13) by source read of lib.rs. - Hardened 5 vacuous/guarded assertions in test_flaws.py so each flaw test genuinely exercises its fix. Most important: Flaw 5 now asserts capital moves by EXACTLY realized PnL (was entering/exiting at the same price). - Offline suites: 533 passed, 0 failed (35 flaws + 402 kernel/accounting/ bridge + 96 runtime/persistence/multi-exit/restart/seams). - GATE PASS: MARKET-path-critical flaws 1,2,5 confirmed fixed + green. - Added SPRINT0_FLAW_VERIFICATION.md report and _rust_kernel/.gitignore (excludes Rust target/ build artifacts). LIMIT/partial-fill remain explicitly out of scope (MARKET-only bring-up). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
32 KiB
CRITICAL: DITAv2 Execution Kernel — 13 Structural Flaws
Analysis date: 2026-05-30
Analyst: Systematic code review across Rust kernel, Python bridge, venue adapters, and test infrastructure
Scope: Full DITAv2 pipeline — kernel.py → rust_backend.py → _rust_kernel/src/lib.rs → bingx_venue.py → bingx_direct.py → BingX REST
How to read this document
Each flaw follows the same structure:
| Section | What you'll find |
|---|---|
| Location | File path(s) and approximate line numbers |
| Nature | What kind of defect — structural, logic, protocol, edge-case, missing-feature |
| Downstream effect | What breaks in practice, not just what the code does wrong |
| Exploit / trigger | The exact sequence of events that manifests the bug |
| Why it's not caught | Why existing tests (142/142 pass) don't detect it |
| Fix strategy | High-level approach; no patch code here |
Flaw 1: Entry-order cancellation is structurally broken
Location: rust_backend.py lines ~470–475 (Python bridge), _rust_kernel/src/lib.rs lines ~660–685 (Rust process_intent CANCEL branch), _rust_kernel/src/lib.rs lines ~740–748 (Rust on_venue_event CANCEL_ACK branch)
Nature: Missing feature / logic gap — two-layer hole
Downstream effect
A CANCEL intent submitted for an entry order (slot in ORDER_REQUESTED or ENTRY_WORKING) is silently ignored. The venue is never called, so the order remains live on the exchange. The caller receives an accepted=False, diagnostic_code=NO_ACTIVE_EXIT_ORDER outcome but no error is raised — normal execution continues.
With MARKET orders (the only type tested in the 142-scenario suite), this doesn't matter because the order fills in 1–3 seconds, arriving before the CANCEL even runs or making the CANCEL economically irrelevant. With LIMIT orders (per CRITICAL_NEEDED_PARTIAL_FILL_SUPPORT.md), resting orders on the book would be structurally impossible to cancel through the kernel.
Exact code path
Layer 1 — Python bridge (rust_backend.py):
elif intent.action == KernelCommandType.CANCEL:
emitted_events = self.venue.cancel(
self.slot(intent.slot_id).active_exit_order, # ← None for entry-only slots
...
) if self.slot(intent.slot_id).active_exit_order else [] # ← always []
The guard if self.slot(...).active_exit_order evaluates to False for any slot that only has an entry order. emitted_events stays []. The venue's cancel() is never called.
Layer 2 — Rust kernel process_intent (lib.rs):
KernelCommandType::CANCEL => {
if slot.active_exit_order.is_none() {
return KernelResult {
outcome: KernelOutcome {
accepted: false,
diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER,
...
},
...
};
}
// ... code only reachable if active_exit_order.is_some()
}
The Rust kernel also only looks for an exit order. It returns NO_ACTIVE_EXIT_ORDER for entry cancels.
Layer 3 — Rust kernel on_venue_event CANCEL_ACK (lib.rs):
KernelEventKind::CANCEL_ACK => {
if slot.active_exit_order.is_some() {
slot.active_exit_order = None;
slot.fsm_state = TradeStage::POSITION_OPEN;
}
}
Even if a CANCEL_ACK somehow arrived for an entry order, the Rust FSM has no branch to transition ENTRY_WORKING → IDLE on cancel. The slot would remain stuck.
Why it's not caught
The test suite has:
cancel_entry_order— ENTER → sleep 1s → CANCEL. By 1s the MARKET order has filled, so the slot is already POSITION_OPEN, making the CANCEL technically valid against active_exit_order? No — it's active_entry_order that's filled. But wait: when the entry fills, the Rust kernel transitions to POSITION_OPEN and keepsactive_entry_orderin place (filled state).active_exit_orderis still None. So the CANCEL still hits NO_ACTIVE_EXIT_ORDER. But the test only checks that capital is positive and exchange is flat — it never checksoutcome.acceptedoroutcome.diagnostic_codefor the CANCEL call.cancel_idempotent— Same pattern: ENTER → sleep 0.5s → CANCEL.double_cancel— Same.- All checks are pass/fail on capital + exchange flatness, not on whether the cancel actually did anything.
Fix strategy
- Add an
order_actionfield toKernelIntent(or use existingaction) to distinguish entry-cancel from exit-cancel - In the Python bridge, call
venue.cancel()onactive_entry_orderwhen the intent is CANCEL andactive_exit_orderis None - In the Rust kernel, add an
active_entry_orderbranch toprocess_intent(CANCEL)that transitionsENTRY_WORKING / ORDER_REQUESTED → IDLE - In the Rust kernel, add an
active_entry_orderbranch toon_venue_event(CANCEL_ACK)that transitions to IDLE
Flaw 2: Rust CANCEL FSM has no entry-order reset path
Location: _rust_kernel/src/lib.rs lines ~740–748
Nature: Missing FSM case — the on_venue_event handler for CANCEL_ACK only handles exit orders
Downstream effect
Even if the Python bridge were fixed to call venue.cancel() on the active entry order (fixing Flaw 1), and even if BingX returned a successful cancel-ack, the Rust kernel would not update the slot state. The slot would remain in ENTRY_WORKING with active_entry_order still attached. The kernel would believe the order is still live on the exchange.
No subsequent ENTER intent would be accepted (SLOT_BUSY). The slot would be permanently deadlocked until a manual reconcile_from_slots overwrites it.
Exact code path
KernelEventKind::CANCEL_ACK => {
if slot.active_exit_order.is_some() {
slot.active_exit_order = None;
slot.fsm_state = TradeStage::POSITION_OPEN;
}
// No else branch — silent no-op for entry cancels
}
The full FSM transition matrix for CANCEL_ACK should include:
ENTRY_WORKING, active_entry_order.is_some()→ clear entry order, set IDLEEXIT_WORKING, active_exit_order.is_some()→ clear exit order, set POSITION_OPEN (existing code)
Why it's not caught
Same reason as Flaw 1 — the cancel never fires, so CANCEL_ACK never arrives. The code path has never been exercised.
Fix strategy
Add an else if branch:
} else if slot.active_entry_order.is_some() {
slot.active_entry_order = None;
slot.trade_id.clear();
slot.asset.clear();
slot.side = TradeSide::FLAT;
slot.size = 0.0;
slot.initial_size = 0.0;
slot.fsm_state = TradeStage::IDLE;
}
Flaw 3: Python process_intent overwrites outcome with mixed-epoch state
Location: rust_backend.py lines ~490–505
Nature: Data consistency — returned KernelOutcome mixes pre-venue and post-venue state
Downstream effect
Any caller inspecting the returned KernelOutcome from process_intent() gets misleading information:
diagnostic_codeis from the Rust kernel's pre-venue opinionstateis from the slot after venue events were processedtransitionsonly contains pre-venue transitionsemitted_eventscorrectly contains post-venue events
A caller checking outcome.accepted == True and outcome.state == ORDER_REQUESTED (the Rust kernel's initial state) would be wrong — the slot is actually already in POSITION_OPEN because the fill arrived within the same function call.
Exact code path
result = _get_rust().process_intent(...) # Rust: IDLE → ORDER_REQUESTED
outcome = _outcome_from_payload(result["outcome"]) # state=ORDER_REQUESTED
# ... venue.submit() ... on_venue_event() ... transitions slot through ENTRY_WORKING → POSITION_OPEN
final_slot = self._get_slot(outcome.slot_id) # fsm_state=POSITION_OPEN now
final_outcome = KernelOutcome(
state=final_slot.fsm_state, # POSITION_OPEN ← post-venue
diagnostic_code=outcome.diagnostic_code, # OK ← pre-venue
transitions=outcome.transitions, # [IDLE→ORDER_REQUESTED] ← incomplete
emitted_events=tuple(emitted_events), # [ORDER_ACK, FULL_FILL] ← correct
)
Why it's not caught
No test inspects outcome.transitions or validates that outcome.state matches outcome.diagnostic_code. The outcome_inspect_entry test (_gen_test.py body) checks len(info["transitions"]) > 0 — which passes because there's at least one — and info["diagnostic"] == "OK". It doesn't check that the state in the outcome matches the diagnostic or that all transitions are present.
Fix strategy
Either:
- Re-read the Rust outcome after venue events complete (costly — additional FFI call), or
- Emit the venue-event transitions back from
on_venue_eventand append them to the returned outcome, or - Document that
outcome.transitionsis a partial snapshot and the caller should inspect the slot directly viak.slot(n)for current state
Flaw 4: Multi-leg exit final leg can double-close and double-settle
Location: _rust_kernel/src/lib.rs lines ~775–830, specifically the apply_fill exit path in on_venue_event
Nature: Logic error — redundant state mutation
Downstream effect
When a FULL_FILL closes the last leg of a multi-leg exit, the Rust kernel sets slot.fsm_state = CLOSED and slot.closed = true in two separate code blocks. Block A does it based on active_leg_index, block B does it independently based on slot.size <= 1e-12. Both blocks run on the same event.
In practice this doesn't double-settle because the Python side processes a single on_venue_event call. But the slot state after the event is unpredictable — block B clears active_entry_order and active_exit_order that block A left in place. If any code path depends on inspecting the orders after a close (e.g., for journaling), it sees inconsistent state.
Exact code path
// Block A (lines ~780-800):
if slot.active_leg_index >= slot.exit_leg_ratios.len() {
slot.closed = true;
slot.fsm_state = TradeStage::CLOSED;
slot.active_exit_order = None;
}
// Block B (lines ~810-830), runs unconditionally after block A:
if !partial {
slot.consume_exit_leg(); // advances leg index
if slot.size <= 1e-12 {
slot.closed = true; // redundant
slot.fsm_state = TradeStage::CLOSED; // redundant
slot.active_exit_order = None; // redundant
slot.active_entry_order = None; // extra — block A didn't do this
}
}
Why it's not caught
The multi-leg exit tests (multi_leg_exit, x4_partial_hold_exit, all leg ratio variants) check capital integrity and exchange flatness. They don't inspect the slot's active_entry_order or active_exit_order after exit. The final capital assertion passes because settle() is called once per on_venue_event call regardless of how many times the slot's internal flags toggle.
Fix strategy
Restructure apply_fill for exit fills so there's a single point where CLOSED is set:
- If
active_leg_index >= ratios.len()orsize <= 1e-12after the fill → set CLOSED - Not both independently
Flaw 5: Capital settlement only triggers on terminal states
Location: rust_backend.py lines ~520–525
Nature: Accounting accuracy — intra-trade realized PnL invisible to account projection
Downstream effect
When a LIMIT order partially fills (PARTIALLY_FILLED event), the Rust kernel correctly accumulates realized PnL on the slot:
slot.realized_pnl += realized;
But the Python bridge only pushes PnL to the account on terminal transitions:
if slot.fsm_state in {TradeStage.CLOSED, TradeStage.TRADE_TERMINAL_WRITTEN} and slot.realized_pnl != 0.0:
self.account.settle(slot.realized_pnl)
During a partial fill that leaves the slot in EXIT_WORKING, the accumulated PnL sits on the slot but never reaches account.snapshot.capital. For a LIMIT order that partially fills over several minutes, the system's view of available capital is stale during the entire fill window. This could cause the system to incorrectly calculate available margin for concurrent positions.
Exact trigger
- Slot is in POSITION_OPEN with size=1.0
- EXIT intent → slot moves to EXIT_WORKING
- Venue sends PARTIALLY_FILLED: filled_size=0.3, remaining_size=0.7
- Rust: slot.realized_pnl += +2.50 (3% gain on 30% of position)
- Python: slot.fsm_state == EXIT_WORKING (not CLOSED) → settle() is NOT called
account.snapshot.capitalstill shows pre-exit value- Venue sends FULL_FILL: filled_size=0.7, remaining_size=0.0
- Rust: slot.realized_pnl += +5.83 (remaining), total = 8.33
- Python: slot.fsm_state == CLOSED → settle(8.33) → capital jumps by full amount
For 3 minutes between step 4 and step 7, all downstream consumers see wrong capital.
Why it's not caught
All 142 tests use MARKET orders that fill instantly in one shot. There is never a multi-event fill sequence for a single order. The non-instant fills come from multi-leg exits (multiple MARKET orders), where each exit is a separate process_intent call with its own on_venue_event cycle, and each eventually reaches CLOSED independently.
Fix strategy
Change the settle trigger to fire on any realized PnL change, not just on terminal state transitions:
if slot.realized_pnl != self._last_settled_pnl.get(slot.slot_id, 0.0):
incremental = slot.realized_pnl - self._last_settled_pnl[slot.slot_id]
self.account.settle(incremental)
self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
Or simpler: settle the delta every time on_venue_event processes a fill event, regardless of slot state.
Flaw 6: _legacy_intent() silently drops order_type and limit_price
Location: bingx_venue.py lines ~280–295
Nature: Chain break — data loss at the Python level
Downstream effect
The CRITICAL_NEEDED_PARTIAL_FILL_SUPPORT.md spec adds order_type and limit_price to KernelIntent. But there are two venue adapters, and one of them strips the new fields:
BingxVenueAdapter receives KernelIntent and converts to LegacyIntent:
def submit(self, intent: KernelIntent) -> List[VenueEvent]:
receipt = self._call_backend("submit_intent", self._legacy_intent(intent))
_legacy_intent() builds a LegacyIntent — which has no order_type or limit_price fields:
return LegacyIntent(
timestamp=intent.timestamp,
trade_id=intent.trade_id,
decision_id=intent.intent_id,
asset=intent.asset,
action=action,
side=side,
reason=intent.reason,
target_size=float(intent.target_size),
leverage=float(intent.leverage),
reference_price=float(intent.reference_price),
confidence=1.0,
bars_held=0,
exit_leg_ratios=tuple(intent.exit_leg_ratios or (1.0,)),
metadata=dict(intent.metadata),
# order_type and limit_price are NOT HERE — silently dropped
)
The BingxDirectExecutionAdapter.submit_intent() receives LegacyIntent and uses intent.action, intent.side, intent.target_size, etc. — none of which carry the new fields.
MockVenueAdapter receives KernelIntent directly and would see the new fields — but it only uses intent.target_size, intent.reference_price, intent.side, and intent.action. order_type and limit_price are ignored there too.
So even after KernelIntent gains the new fields, no code path exists that reads them and passes them to the BingX REST payload.
Exact trigger
Someone constructs:
intent = KernelIntent(
action=ENTER, trade_id="t1",
order_type="LIMIT", limit_price=0.083456,
...
)
k.process_intent(intent)
The new fields survive through _intent_to_payload() to Rust (harmless — Rust ignores unknown fields), then back to Python. The Python bridge calls venue.submit(intent) with the intent that still has order_type="LIMIT". But bingx_venue.submit() converts to LegacyIntent — which drops them. bingx_direct.py sees a MARKET order.
Why it's not caught
The new fields don't exist yet. No test exercises LIMIT orders.
Fix strategy
The cleanest fix is to bypass _legacy_intent() for BingxVenueAdapter.submit() and pass KernelIntent directly to the adapter. The adapter's submit_intent() already has access to intent.asset, intent.side, etc. It just needs to receive the right type.
If BingxDirectExecutionAdapter must keep accepting LegacyIntent for backward compatibility, encode the new fields in LegacyIntent.metadata:
metadata = dict(intent.metadata)
metadata["_order_type"] = intent.order_type
metadata["_limit_price"] = intent.limit_price
Then on the adapter side, read intent.metadata.get("_order_type", "MARKET").
Flaw 7: Mock venue partial_fill_ratio applies to both entry and exit
Location: mock_venue.py lines ~60–90
Nature: Test infrastructure limitation — single ratio cannot distinguish entry vs exit
Downstream effect
The MockVenueScenario has one float: partial_fill_ratio: float = 1.0. When set to, say, 0.5, every submit() call produces a PARTIALLY_FILLED event with 50% fill — regardless of whether the intent is an ENTER or an EXIT.
This makes it impossible to write a mock-venue unit test that:
- Entry fills fully (ratio=1.0) but exit fills partially (ratio=0.5)
- Entry fills partially (ratio=0.3) and then fills fully on a second submit
- Different partial ratios per leg of a multi-leg exit
Exact code path
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
fill_ratio = max(0.0, min(1.0, float(self.scenario.partial_fill_ratio)))
fill_size = float(intent.target_size) * fill_ratio
# ... emits PARTIALLY_FILLED or FULL_FILL based on ratio
# No distinction between ENTER and EXIT
Why it's not caught
The mock venue is used in unit tests (test_rust_backend.py or similar), not in the live BingX e2e tests. The live tests use BingxVenueAdapter with real BingX VST, where MARKET orders always fill fully. The partial_fill_ratio path has never been used for a scenario that distinguishes entry from exit behavior.
Fix strategy
Add per-action-type ratios:
@dataclass(frozen=True)
class MockVenueScenario:
entry_partial_fill_ratio: float = 1.0
exit_partial_fill_ratio: float = 1.0
Or add a per-order override via intent.metadata.
Flaw 8: Per-asset price precision helper does not exist
Location: bingx_direct.py — _format_quantity() exists (line ~150) but _format_price() does not
Nature: Missing feature — LIMIT orders will be rejected by BingX
Downstream effect
BingX requires the price field of a LIMIT order to have the correct decimal precision for each symbol. The _format_quantity() method resolves size_increment from the instrument provider and quantizes the quantity. No equivalent exists for price.
Without it, submitting a LIMIT order with limit_price=0.08 for TRXUSDT sends "price": "0.08" to BingX. BingX expects 6 decimal places for TRXUSDT prices (e.g., 0.083456). The order is rejected with "code": 100001, "msg": "Invalid price precision".
| Symbol | Approx price | Required decimals | limit_price value |
What BingX expects |
|---|---|---|---|---|
| TRXUSDT | $0.08 | 6 | 0.083456 | "0.083456" |
| XRPUSDT | $0.52 | 4 | 0.5234 | "0.5234" |
| ADAUSDT | $0.45 | 4 | 0.4523 | "0.4523" |
| DOGEUSDT | $0.15 | 5 | 0.15234 | "0.15234" |
| BTCUSDT | $60,000 | 2 | 60000.50 | "60000.50" |
Why it's not caught
No LIMIT orders are submitted. All 142 tests use MARKET orders where type="MARKET" and no price field is sent.
Fix strategy
Add _format_price(self, asset: str, price: float) -> str mirroring _format_quantity:
def _format_price(self, asset: str, price: float) -> str:
instrument = self._resolve_instrument(asset)
if instrument is not None:
try:
price_step = Decimal(str(instrument.price_increment.as_decimal()))
value = Decimal(str(price))
quantized = (value / price_step).to_integral_value(rounding=ROUND_DOWN) * price_step
return _decimal_text(quantized)
except Exception:
pass
return f"{price:.8f}".rstrip("0").rstrip(".")
The instrument provider already exposes price_increment — it just needs to be accessed.
Flaw 9: Cancel path falls back to trade_id as symbol
Location: bingx_venue.py lines ~300–310 (within cancel())
Nature: Logic error — wrong variable in fallback chain
Downstream effect
When BingxVenueAdapter.cancel() is called and the order's metadata dict lacks an "asset" key, it falls back:
asset = str(order.metadata.get("asset") or order.internal_trade_id or order.venue_client_id or "")
order.internal_trade_id is the system's trade_id (e.g., "cancel-idle-1712345678"). This gets fed to self.backend._instrument_venue_symbol(asset) which does:
def _instrument_venue_symbol(self, asset: str) -> str:
text = _normalize_symbol(asset) # "CANCEL-IDLE-1712345678"
if text.endswith("USDT"):
return f"{text[:-4]}-USDT" # "CANCEL-IDLE-1712345678"-USDT — nonsense
return text # doesn't end with USDT → returns the garbage
The cancel HTTP call is sent to BingX with a symbol that doesn't exist. BingX returns an error or silently ignores the request. The cancel silently fails.
This can happen whenever a VenueOrder is constructed without metadata["asset"]. The mock venue's _event_from_order sets metadata={"intent_id": ..., "action": ...} but does not include "asset". So any cancel path triggered from a mock venue event will hit this bug.
Exact trigger sequence
MockVenueAdapter.submit()creates aVenueOrderwithmetadata={"intent_id": ..., "action": ...}— no"asset"- The kernel attaches this order to the slot
- A CANCEL intent arrives
- Python bridge calls
self.venue.cancel(self.slot(slot_id).active_entry_order) BingxVenueAdapter.cancel()doesorder.metadata.get("asset")→ None- Falls back to
order.internal_trade_id→ a trade_id string - Sends delete to BingX with a bogus symbol
Note: this only occurs when the mock venue is used in a test configuration. In live mode, BingxDirectExecutionAdapter stores richer metadata. But the fallback chain is still wrong and could bite in edge cases.
Why it's not caught
The live tests always have metadata["asset"] populated because the kernel attaches it before calling the venue. The mock venue's cancel path is only exercised in unit tests that don't check the BingX HTTP call content.
Fix strategy
Change the fallback to use the order's internal_trade_id to look up the slot's asset from the kernel, not try to interpret it as a symbol:
# In cancel(), before the fallback:
slot = self._kernel.slot(order.metadata.get("slot_id", 0))
asset = str(order.metadata.get("asset") or slot.asset or "")
Or at minimum, add the asset to the mock venue's event metadata.
Flaw 10: Event dedup window is bounded at 64
Location: _rust_kernel/src/lib.rs lines ~5 (constant), ~850–855 (eviction logic)
Nature: Resource management — fixed-size ring buffer with silent eviction
Downstream effect
Each TradeSlot tracks seen events in seen_event_ids: Vec<String>. When the vector exceeds 64 entries, the oldest entries are drained:
if slot.seen_event_ids.len() > MAX_SEEN_EVENT_IDS {
let overflow = slot.seen_event_ids.len() - MAX_SEEN_EVENT_IDS;
slot.seen_event_ids.drain(0..overflow);
}
This means:
- Events 1–64 are deduplicated correctly
- When event 65 arrives, event 1 is evicted. If event 1 arrives again, it's accepted as new
- When event 66 arrives, event 2 is evicted, etc.
- After 64 unique events, the dedup window is a rolling window of the last 64 events
With MARKET orders (1–3 events per trade), a slot would need ~20–60 trades before cycling through 64 events. With LIMIT orders that may receive many partial fills per order (e.g., a resting order that gets 5 fills/hour over 6 hours = 30 events), the limit could be hit in a single trade.
Why it's not caught
No test submits more than ~30 events to a single slot (rapid_ten_cycle does 10 entry→exit cycles = ~30 events). The 64 limit was never reached.
Fix strategy
Either:
- Increase
MAX_SEEN_EVENT_IDSto a larger value (256 or 1024), or - Use a proper LRU/size-bounded set (e.g.,
LruCachefrom thelrucrate), or - Change to a HashMap-based dedup keyed by
(event_id, action)so eviction is explicit
Flaw 11: Reconcile is a raw state override with no FSM validation
Location: _rust_kernel/src/lib.rs lines ~900–915 (dita_kernel_reconcile_slots_json)
Nature: Safety — no guards on incoming state
Downstream effect
The reconcile function blindly overwrites slot state:
for slot in slots {
if slot.slot_id < core.slots.len() {
core.slots[slot.slot_id] = slot.clone();
}
}
There is zero validation that the incoming slot state is a valid successor to the current state. A caller could:
- Set
fsm_state = POSITION_OPENwithsize = 0.0— the kernel thinks it has an open position with no size - Set
fsm_state = CLOSEDwithsize = 5.0— the kernel thinks a position is closed but still has size - Set
fsm_state = ENTRY_WORKINGwithtrade_id = ""— the kernel is in "entry working" state for no trade - Clear
seen_event_idsto reset dedup — silently accepting duplicates
The intended use is restoring kernel state from a snapshot after a crash, where the slot state was explicitly serialized by a previous kernel.snapshot(). In that case the state should be self-consistent. But there's no guard against malformed or corrupted snapshot data.
Why it's not caught
The reconcile tests (reconcile_empty, reconcile_after_entry, etc.) all reconcile with self-consistent slot data from k.slot(0). They never feed malformed state. The fresh_kernel_reconcile_* tests similarly use _slot_from_payload on data serialized from a real slot.
Fix strategy
Add validation in the Rust kernel (or Python bridge) that checks basic consistency:
fsm_state == POSITION_OPEN→size > 0andassetnon-emptyfsm_state == IDLE→size == 0andtrade_idemptyfsm_state == CLOSED→closed == truesize >= 0slot_idmatches array index
Flaw 12: outcome.transitions is incomplete — pre-venue only
Location: rust_backend.py lines ~490–505, _rust_kernel/src/lib.rs lines ~700–710
Nature: API contract — returned data is a partial snapshot
Downstream effect
process_intent() runs three phases in sequence:
- Rust kernel processes the intent (pure FSM:
IDLE → ORDER_REQUESTED) - Venue adapter submits to exchange (HTTP call, receives ack + fill)
- on_venue_event called per venue response (ORDER_ACK → ENTRY_WORKING, FULL_FILL → POSITION_OPEN)
Each phase produces KernelTransition records. But only phase 1 transitions appear in the returned KernelOutcome.transitions:
final_outcome = KernelOutcome(
...
transitions=outcome.transitions, # from Rust — phase 1 only
emitted_events=tuple(emitted_events), # from venue — phases 2-3
...
)
A caller inspecting transitions sees [IDLE → ORDER_REQUESTED] and has no way to discover that [ORDER_REQUESTED → ENTRY_WORKING] and [ENTRY_WORKING → POSITION_OPEN] also occurred. The journal (ClickHouseKernelJournal) records all transitions correctly — but the returned KernelOutcome is the API surface that callers interact with.
Why it's not caught
The outcome_inspect_entry test checks len(info["transitions"]) > 0 and info["diagnostic"] == "OK". It doesn't validate that all expected transitions are present. The transitions are journaled to the debug sink, but no test reads the journal.
Fix strategy
Collect transitions from phases 2-3 and append them to the outcome:
all_transitions = list(outcome.transitions)
for event in emitted_events:
event_outcome = self.on_venue_event(event)
all_transitions.extend(event_outcome.transitions)
final_outcome = KernelOutcome(..., transitions=tuple(all_transitions), ...)
Or document that transitions is an incomplete snapshot and the journal is the authoritative source.
Flaw 13: Slot realized PnL is not reset on re-entry after partial exit
Location: _rust_kernel/src/lib.rs lines ~575–600 (ENTER intent handler), specifically slot reset
Nature: State leakage — accumulated PnL from prior trade survives into next cycle
Downstream effect
When an ENTER intent arrives, the Rust kernel resets most slot fields:
slot.trade_id = intent.trade_id.clone();
slot.asset = intent.asset.clone();
slot.side = intent.side.clone();
slot.entry_time = Some(intent.timestamp);
slot.entry_price = 0.0;
slot.size = 0.0;
slot.initial_size = 0.0;
slot.unrealized_pnl = 0.0;
slot.realized_pnl = 0.0; // ← reset to zero
slot.exit_leg_ratios = ...;
slot.active_leg_index = 0;
slot.active_entry_order = None;
slot.active_exit_order = None;
slot.closed = false;
slot.last_event_time = None;
slot.fsm_state = TradeStage::ORDER_REQUESTED;
slot.realized_pnl = 0.0 is explicitly set — correct for a fresh trade. But recall from Flaw 5 that realized PnL from partial fills (before the terminal close) may not yet have been settled to the account. If the slot accumulates realized PnL during partial fills, then re-enters before the final settle happens, the in-flight PnL is zeroed without being settled.
This is actually the correct behavior because:
- All MARKET-order fills settle immediately (they arrive as FULL_FILL and transition to CLOSED in one shot)
- For LIMIT orders that partially fill, the re-entry scenario is impossible because the slot isn't IDLE — it can't accept a new ENTER until the position is fully closed
- The slot CAN re-enter after a full close, and by then all PnL has been settled
So this is a latent rather than active flaw. It would manifest if:
- A LIMIT order partially fills (PnL on slot, not settled)
- The remaining limit is cancelled
- The slot's
consume_exit_leg()leaves the slot in POSITION_OPEN withsize > 0and!closedbut no active orders - Another ENTER arrives — but the Rust kernel rejects it because
!slot.is_free()
So the slot design prevents this from happening accidentally. The flaw is that if a future code path bypasses the is_free() check (e.g., a force-enter feature), the unreleased PnL would be silently zeroed.
Why it's not caught
The scenario can't happen with the current FSM. All fills eventually reach CLOSED, which triggers settle. No test forces an entry on a non-free slot.
Fix strategy
Add an explicit assertion or sentinel in the ENTER handler:
if slot.realized_pnl.abs() > 1e-10 {
// Log warning: unsynchronized PnL being discarded
}
Or enforce that settle() is always called before realized_pnl is reset, by moving the settle trigger to the Rust side.
Summary table
| # | Flaw | Layer | Severity | Blocks partial-fill? |
|---|---|---|---|---|
| 1 | Entry-order cancellation broken | Python + Rust | Critical | Yes — can't cancel resting LIMIT entries |
| 2 | No CANCEL_ACK → IDLE for entry | Rust FSM | Critical | Yes — slot stuck after cancelled entry |
| 3 | Outcome mixes pre/post-venue state | Python bridge | Medium | No |
| 4 | Multi-leg exit double-close | Rust FSM | Low | No |
| 5 | Capital settle only on terminal state | Python bridge | High | Partial — stale capital during partial fills |
| 6 | order_type/limit_price dropped in legacy intent | Python venue | Critical | Yes — LIMIT orders never reach BingX |
| 7 | Mock venue single ratio for entry+exit | Mock venue | Low | No (mock tests only) |
| 8 | Missing price formatting | Adapter | High | Yes — BingX rejects bad price precision |
| 9 | Cancel falls back to trade_id as symbol | Python venue | Medium | No |
| 10 | Event dedup window at 64 | Rust FSM | Low | No |
| 11 | Reconcile has no FSM validation | Rust FSM | Low | No |
| 12 | Outcome transitions incomplete | Python bridge | Medium | No |
| 13 | Unsettled realized PnL on re-entry | Rust FSM | Low | No |
6 critical/high — must be fixed before safe LIMIT order / partial-fill deployment. 4 medium — should be fixed in the same pass to keep hygiene. 3 low — latent; fix opportunistically.