721 lines
32 KiB
Markdown
721 lines
32 KiB
Markdown
|
|
# CRITICAL: DITAv2 Execution Kernel — 13 Structural Flaws
|
|||
|
|
|
|||
|
|
**Analysis date:** 2026-05-30
|
|||
|
|
**Analyst:** Systematic code review across Rust kernel, Python bridge, venue adapters, and test infrastructure
|
|||
|
|
**Scope:** Full DITAv2 pipeline — `kernel.py` → `rust_backend.py` → `_rust_kernel/src/lib.rs` → `bingx_venue.py` → `bingx_direct.py` → BingX REST
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## How to read this document
|
|||
|
|
|
|||
|
|
Each flaw follows the same structure:
|
|||
|
|
|
|||
|
|
| Section | What you'll find |
|
|||
|
|
|---------|-----------------|
|
|||
|
|
| **Location** | File path(s) and approximate line numbers |
|
|||
|
|
| **Nature** | What kind of defect — structural, logic, protocol, edge-case, missing-feature |
|
|||
|
|
| **Downstream effect** | What breaks in practice, not just what the code does wrong |
|
|||
|
|
| **Exploit / trigger** | The exact sequence of events that manifests the bug |
|
|||
|
|
| **Why it's not caught** | Why existing tests (142/142 pass) don't detect it |
|
|||
|
|
| **Fix strategy** | High-level approach; no patch code here |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 1: Entry-order cancellation is structurally broken
|
|||
|
|
|
|||
|
|
**Location:** `rust_backend.py` lines ~470–475 (Python bridge), `_rust_kernel/src/lib.rs` lines ~660–685 (Rust `process_intent` CANCEL branch), `_rust_kernel/src/lib.rs` lines ~740–748 (Rust `on_venue_event` CANCEL_ACK branch)
|
|||
|
|
|
|||
|
|
**Nature:** Missing feature / logic gap — two-layer hole
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
A CANCEL intent submitted for an entry order (slot in `ORDER_REQUESTED` or `ENTRY_WORKING`) is silently ignored. The venue is never called, so the order remains live on the exchange. The caller receives an `accepted=False, diagnostic_code=NO_ACTIVE_EXIT_ORDER` outcome but no error is raised — normal execution continues.
|
|||
|
|
|
|||
|
|
With MARKET orders (the only type tested in the 142-scenario suite), this doesn't matter because the order fills in 1–3 seconds, arriving before the CANCEL even runs or making the CANCEL economically irrelevant. With LIMIT orders (per `CRITICAL_NEEDED_PARTIAL_FILL_SUPPORT.md`), resting orders on the book would be **structurally impossible to cancel** through the kernel.
|
|||
|
|
|
|||
|
|
### Exact code path
|
|||
|
|
|
|||
|
|
**Layer 1 — Python bridge (rust_backend.py):**
|
|||
|
|
```python
|
|||
|
|
elif intent.action == KernelCommandType.CANCEL:
|
|||
|
|
emitted_events = self.venue.cancel(
|
|||
|
|
self.slot(intent.slot_id).active_exit_order, # ← None for entry-only slots
|
|||
|
|
...
|
|||
|
|
) if self.slot(intent.slot_id).active_exit_order else [] # ← always []
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The guard `if self.slot(...).active_exit_order` evaluates to `False` for any slot that only has an entry order. `emitted_events` stays `[]`. The venue's `cancel()` is never called.
|
|||
|
|
|
|||
|
|
**Layer 2 — Rust kernel process_intent (lib.rs):**
|
|||
|
|
```rust
|
|||
|
|
KernelCommandType::CANCEL => {
|
|||
|
|
if slot.active_exit_order.is_none() {
|
|||
|
|
return KernelResult {
|
|||
|
|
outcome: KernelOutcome {
|
|||
|
|
accepted: false,
|
|||
|
|
diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER,
|
|||
|
|
...
|
|||
|
|
},
|
|||
|
|
...
|
|||
|
|
};
|
|||
|
|
}
|
|||
|
|
// ... code only reachable if active_exit_order.is_some()
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The Rust kernel also only looks for an exit order. It returns `NO_ACTIVE_EXIT_ORDER` for entry cancels.
|
|||
|
|
|
|||
|
|
**Layer 3 — Rust kernel on_venue_event CANCEL_ACK (lib.rs):**
|
|||
|
|
```rust
|
|||
|
|
KernelEventKind::CANCEL_ACK => {
|
|||
|
|
if slot.active_exit_order.is_some() {
|
|||
|
|
slot.active_exit_order = None;
|
|||
|
|
slot.fsm_state = TradeStage::POSITION_OPEN;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Even if a CANCEL_ACK somehow arrived for an entry order, the Rust FSM has no branch to transition `ENTRY_WORKING → IDLE` on cancel. The slot would remain stuck.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The test suite has:
|
|||
|
|
- `cancel_entry_order` — ENTER → sleep 1s → CANCEL. By 1s the MARKET order has filled, so the slot is already POSITION_OPEN, making the CANCEL technically valid against active_exit_order? No — it's active_entry_order that's filled. But wait: when the entry fills, the Rust kernel transitions to POSITION_OPEN and keeps `active_entry_order` in place (filled state). `active_exit_order` is still None. So the CANCEL still hits NO_ACTIVE_EXIT_ORDER. But the test only checks that capital is positive and exchange is flat — it never checks `outcome.accepted` or `outcome.diagnostic_code` for the CANCEL call.
|
|||
|
|
- `cancel_idempotent` — Same pattern: ENTER → sleep 0.5s → CANCEL.
|
|||
|
|
- `double_cancel` — Same.
|
|||
|
|
- All checks are pass/fail on capital + exchange flatness, not on whether the cancel actually did anything.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
1. Add an `order_action` field to `KernelIntent` (or use existing `action`) to distinguish entry-cancel from exit-cancel
|
|||
|
|
2. In the Python bridge, call `venue.cancel()` on `active_entry_order` when the intent is CANCEL and `active_exit_order` is None
|
|||
|
|
3. In the Rust kernel, add an `active_entry_order` branch to `process_intent(CANCEL)` that transitions `ENTRY_WORKING / ORDER_REQUESTED → IDLE`
|
|||
|
|
4. In the Rust kernel, add an `active_entry_order` branch to `on_venue_event(CANCEL_ACK)` that transitions to IDLE
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 2: Rust CANCEL FSM has no entry-order reset path
|
|||
|
|
|
|||
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~740–748
|
|||
|
|
|
|||
|
|
**Nature:** Missing FSM case — the `on_venue_event` handler for `CANCEL_ACK` only handles exit orders
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
Even if the Python bridge were fixed to call `venue.cancel()` on the active entry order (fixing Flaw 1), and even if BingX returned a successful cancel-ack, the Rust kernel **would not update the slot state**. The slot would remain in `ENTRY_WORKING` with `active_entry_order` still attached. The kernel would believe the order is still live on the exchange.
|
|||
|
|
|
|||
|
|
No subsequent `ENTER` intent would be accepted (SLOT_BUSY). The slot would be permanently deadlocked until a manual `reconcile_from_slots` overwrites it.
|
|||
|
|
|
|||
|
|
### Exact code path
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
KernelEventKind::CANCEL_ACK => {
|
|||
|
|
if slot.active_exit_order.is_some() {
|
|||
|
|
slot.active_exit_order = None;
|
|||
|
|
slot.fsm_state = TradeStage::POSITION_OPEN;
|
|||
|
|
}
|
|||
|
|
// No else branch — silent no-op for entry cancels
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The full FSM transition matrix for CANCEL_ACK should include:
|
|||
|
|
- `ENTRY_WORKING, active_entry_order.is_some()` → clear entry order, set IDLE
|
|||
|
|
- `EXIT_WORKING, active_exit_order.is_some()` → clear exit order, set POSITION_OPEN (existing code)
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
Same reason as Flaw 1 — the cancel never fires, so CANCEL_ACK never arrives. The code path has never been exercised.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Add an `else if` branch:
|
|||
|
|
```rust
|
|||
|
|
} else if slot.active_entry_order.is_some() {
|
|||
|
|
slot.active_entry_order = None;
|
|||
|
|
slot.trade_id.clear();
|
|||
|
|
slot.asset.clear();
|
|||
|
|
slot.side = TradeSide::FLAT;
|
|||
|
|
slot.size = 0.0;
|
|||
|
|
slot.initial_size = 0.0;
|
|||
|
|
slot.fsm_state = TradeStage::IDLE;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 3: Python `process_intent` overwrites outcome with mixed-epoch state
|
|||
|
|
|
|||
|
|
**Location:** `rust_backend.py` lines ~490–505
|
|||
|
|
|
|||
|
|
**Nature:** Data consistency — returned `KernelOutcome` mixes pre-venue and post-venue state
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
Any caller inspecting the returned `KernelOutcome` from `process_intent()` gets misleading information:
|
|||
|
|
- `diagnostic_code` is from the Rust kernel's pre-venue opinion
|
|||
|
|
- `state` is from the slot **after** venue events were processed
|
|||
|
|
- `transitions` only contains pre-venue transitions
|
|||
|
|
- `emitted_events` correctly contains post-venue events
|
|||
|
|
|
|||
|
|
A caller checking `outcome.accepted == True` and `outcome.state == ORDER_REQUESTED` (the Rust kernel's initial state) would be wrong — the slot is actually already in `POSITION_OPEN` because the fill arrived within the same function call.
|
|||
|
|
|
|||
|
|
### Exact code path
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
result = _get_rust().process_intent(...) # Rust: IDLE → ORDER_REQUESTED
|
|||
|
|
outcome = _outcome_from_payload(result["outcome"]) # state=ORDER_REQUESTED
|
|||
|
|
|
|||
|
|
# ... venue.submit() ... on_venue_event() ... transitions slot through ENTRY_WORKING → POSITION_OPEN
|
|||
|
|
|
|||
|
|
final_slot = self._get_slot(outcome.slot_id) # fsm_state=POSITION_OPEN now
|
|||
|
|
|
|||
|
|
final_outcome = KernelOutcome(
|
|||
|
|
state=final_slot.fsm_state, # POSITION_OPEN ← post-venue
|
|||
|
|
diagnostic_code=outcome.diagnostic_code, # OK ← pre-venue
|
|||
|
|
transitions=outcome.transitions, # [IDLE→ORDER_REQUESTED] ← incomplete
|
|||
|
|
emitted_events=tuple(emitted_events), # [ORDER_ACK, FULL_FILL] ← correct
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
No test inspects `outcome.transitions` or validates that `outcome.state` matches `outcome.diagnostic_code`. The `outcome_inspect_entry` test (`_gen_test.py` body) checks `len(info["transitions"]) > 0` — which passes because there's at least one — and `info["diagnostic"] == "OK"`. It doesn't check that the state in the outcome matches the diagnostic or that all transitions are present.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Either:
|
|||
|
|
1. Re-read the Rust outcome after venue events complete (costly — additional FFI call), or
|
|||
|
|
2. Emit the venue-event transitions back from `on_venue_event` and append them to the returned outcome, or
|
|||
|
|
3. Document that `outcome.transitions` is a partial snapshot and the caller should inspect the slot directly via `k.slot(n)` for current state
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 4: Multi-leg exit final leg can double-close and double-settle
|
|||
|
|
|
|||
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~775–830, specifically the `apply_fill` exit path in `on_venue_event`
|
|||
|
|
|
|||
|
|
**Nature:** Logic error — redundant state mutation
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
When a FULL_FILL closes the last leg of a multi-leg exit, the Rust kernel sets `slot.fsm_state = CLOSED` and `slot.closed = true` in two separate code blocks. Block A does it based on `active_leg_index`, block B does it independently based on `slot.size <= 1e-12`. Both blocks run on the same event.
|
|||
|
|
|
|||
|
|
In practice this doesn't double-settle because the Python side processes a single `on_venue_event` call. But the slot state after the event is unpredictable — block B clears `active_entry_order` and `active_exit_order` that block A left in place. If any code path depends on inspecting the orders after a close (e.g., for journaling), it sees inconsistent state.
|
|||
|
|
|
|||
|
|
### Exact code path
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
// Block A (lines ~780-800):
|
|||
|
|
if slot.active_leg_index >= slot.exit_leg_ratios.len() {
|
|||
|
|
slot.closed = true;
|
|||
|
|
slot.fsm_state = TradeStage::CLOSED;
|
|||
|
|
slot.active_exit_order = None;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Block B (lines ~810-830), runs unconditionally after block A:
|
|||
|
|
if !partial {
|
|||
|
|
slot.consume_exit_leg(); // advances leg index
|
|||
|
|
if slot.size <= 1e-12 {
|
|||
|
|
slot.closed = true; // redundant
|
|||
|
|
slot.fsm_state = TradeStage::CLOSED; // redundant
|
|||
|
|
slot.active_exit_order = None; // redundant
|
|||
|
|
slot.active_entry_order = None; // extra — block A didn't do this
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The multi-leg exit tests (`multi_leg_exit`, `x4_partial_hold_exit`, all leg ratio variants) check capital integrity and exchange flatness. They don't inspect the slot's `active_entry_order` or `active_exit_order` after exit. The final capital assertion passes because `settle()` is called once per `on_venue_event` call regardless of how many times the slot's internal flags toggle.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Restructure `apply_fill` for exit fills so there's a single point where `CLOSED` is set:
|
|||
|
|
- If `active_leg_index >= ratios.len()` **or** `size <= 1e-12` after the fill → set CLOSED
|
|||
|
|
- Not both independently
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 5: Capital settlement only triggers on terminal states
|
|||
|
|
|
|||
|
|
**Location:** `rust_backend.py` lines ~520–525
|
|||
|
|
|
|||
|
|
**Nature:** Accounting accuracy — intra-trade realized PnL invisible to account projection
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
When a LIMIT order partially fills (PARTIALLY_FILLED event), the Rust kernel correctly accumulates realized PnL on the slot:
|
|||
|
|
```rust
|
|||
|
|
slot.realized_pnl += realized;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
But the Python bridge only pushes PnL to the account on terminal transitions:
|
|||
|
|
```python
|
|||
|
|
if slot.fsm_state in {TradeStage.CLOSED, TradeStage.TRADE_TERMINAL_WRITTEN} and slot.realized_pnl != 0.0:
|
|||
|
|
self.account.settle(slot.realized_pnl)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
During a partial fill that leaves the slot in EXIT_WORKING, the accumulated PnL sits on the slot but never reaches `account.snapshot.capital`. For a LIMIT order that partially fills over several minutes, the system's view of available capital is **stale** during the entire fill window. This could cause the system to incorrectly calculate available margin for concurrent positions.
|
|||
|
|
|
|||
|
|
### Exact trigger
|
|||
|
|
|
|||
|
|
1. Slot is in POSITION_OPEN with size=1.0
|
|||
|
|
2. EXIT intent → slot moves to EXIT_WORKING
|
|||
|
|
3. Venue sends PARTIALLY_FILLED: filled_size=0.3, remaining_size=0.7
|
|||
|
|
4. Rust: slot.realized_pnl += +2.50 (3% gain on 30% of position)
|
|||
|
|
5. Python: slot.fsm_state == EXIT_WORKING (not CLOSED) → settle() is NOT called
|
|||
|
|
6. `account.snapshot.capital` still shows pre-exit value
|
|||
|
|
7. Venue sends FULL_FILL: filled_size=0.7, remaining_size=0.0
|
|||
|
|
8. Rust: slot.realized_pnl += +5.83 (remaining), total = 8.33
|
|||
|
|
9. Python: slot.fsm_state == CLOSED → settle(8.33) → capital jumps by full amount
|
|||
|
|
|
|||
|
|
For 3 minutes between step 4 and step 7, all downstream consumers see wrong capital.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
All 142 tests use MARKET orders that fill instantly in one shot. There is never a multi-event fill sequence for a single order. The non-instant fills come from multi-leg exits (multiple MARKET orders), where each exit is a separate `process_intent` call with its own `on_venue_event` cycle, and each eventually reaches CLOSED independently.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Change the settle trigger to fire on **any realized PnL change**, not just on terminal state transitions:
|
|||
|
|
```python
|
|||
|
|
if slot.realized_pnl != self._last_settled_pnl.get(slot.slot_id, 0.0):
|
|||
|
|
incremental = slot.realized_pnl - self._last_settled_pnl[slot.slot_id]
|
|||
|
|
self.account.settle(incremental)
|
|||
|
|
self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or simpler: settle the delta every time `on_venue_event` processes a fill event, regardless of slot state.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 6: `_legacy_intent()` silently drops `order_type` and `limit_price`
|
|||
|
|
|
|||
|
|
**Location:** `bingx_venue.py` lines ~280–295
|
|||
|
|
|
|||
|
|
**Nature:** Chain break — data loss at the Python level
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
The `CRITICAL_NEEDED_PARTIAL_FILL_SUPPORT.md` spec adds `order_type` and `limit_price` to `KernelIntent`. But there are **two** venue adapters, and one of them strips the new fields:
|
|||
|
|
|
|||
|
|
**BingxVenueAdapter** receives `KernelIntent` and converts to `LegacyIntent`:
|
|||
|
|
```python
|
|||
|
|
def submit(self, intent: KernelIntent) -> List[VenueEvent]:
|
|||
|
|
receipt = self._call_backend("submit_intent", self._legacy_intent(intent))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`_legacy_intent()` builds a `LegacyIntent` — which has no `order_type` or `limit_price` fields:
|
|||
|
|
```python
|
|||
|
|
return LegacyIntent(
|
|||
|
|
timestamp=intent.timestamp,
|
|||
|
|
trade_id=intent.trade_id,
|
|||
|
|
decision_id=intent.intent_id,
|
|||
|
|
asset=intent.asset,
|
|||
|
|
action=action,
|
|||
|
|
side=side,
|
|||
|
|
reason=intent.reason,
|
|||
|
|
target_size=float(intent.target_size),
|
|||
|
|
leverage=float(intent.leverage),
|
|||
|
|
reference_price=float(intent.reference_price),
|
|||
|
|
confidence=1.0,
|
|||
|
|
bars_held=0,
|
|||
|
|
exit_leg_ratios=tuple(intent.exit_leg_ratios or (1.0,)),
|
|||
|
|
metadata=dict(intent.metadata),
|
|||
|
|
# order_type and limit_price are NOT HERE — silently dropped
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `BingxDirectExecutionAdapter.submit_intent()` receives `LegacyIntent` and uses `intent.action`, `intent.side`, `intent.target_size`, etc. — none of which carry the new fields.
|
|||
|
|
|
|||
|
|
**MockVenueAdapter** receives `KernelIntent` directly and *would* see the new fields — but it only uses `intent.target_size`, `intent.reference_price`, `intent.side`, and `intent.action`. `order_type` and `limit_price` are ignored there too.
|
|||
|
|
|
|||
|
|
So even after `KernelIntent` gains the new fields, **no code path exists** that reads them and passes them to the BingX REST payload.
|
|||
|
|
|
|||
|
|
### Exact trigger
|
|||
|
|
|
|||
|
|
Someone constructs:
|
|||
|
|
```python
|
|||
|
|
intent = KernelIntent(
|
|||
|
|
action=ENTER, trade_id="t1",
|
|||
|
|
order_type="LIMIT", limit_price=0.083456,
|
|||
|
|
...
|
|||
|
|
)
|
|||
|
|
k.process_intent(intent)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The new fields survive through `_intent_to_payload()` to Rust (harmless — Rust ignores unknown fields), then back to Python. The Python bridge calls `venue.submit(intent)` with the `intent` that still has `order_type="LIMIT"`. But `bingx_venue.submit()` converts to `LegacyIntent` — which drops them. `bingx_direct.py` sees a MARKET order.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The new fields don't exist yet. No test exercises LIMIT orders.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
The cleanest fix is to **bypass `_legacy_intent()`** for `BingxVenueAdapter.submit()` and pass `KernelIntent` directly to the adapter. The adapter's `submit_intent()` already has access to `intent.asset`, `intent.side`, etc. It just needs to receive the right type.
|
|||
|
|
|
|||
|
|
If `BingxDirectExecutionAdapter` must keep accepting `LegacyIntent` for backward compatibility, encode the new fields in `LegacyIntent.metadata`:
|
|||
|
|
```python
|
|||
|
|
metadata = dict(intent.metadata)
|
|||
|
|
metadata["_order_type"] = intent.order_type
|
|||
|
|
metadata["_limit_price"] = intent.limit_price
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Then on the adapter side, read `intent.metadata.get("_order_type", "MARKET")`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 7: Mock venue partial_fill_ratio applies to both entry and exit
|
|||
|
|
|
|||
|
|
**Location:** `mock_venue.py` lines ~60–90
|
|||
|
|
|
|||
|
|
**Nature:** Test infrastructure limitation — single ratio cannot distinguish entry vs exit
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
The `MockVenueScenario` has one float: `partial_fill_ratio: float = 1.0`. When set to, say, `0.5`, **every** `submit()` call produces a `PARTIALLY_FILLED` event with 50% fill — regardless of whether the intent is an ENTER or an EXIT.
|
|||
|
|
|
|||
|
|
This makes it impossible to write a mock-venue unit test that:
|
|||
|
|
- Entry fills fully (ratio=1.0) but exit fills partially (ratio=0.5)
|
|||
|
|
- Entry fills partially (ratio=0.3) and then fills fully on a second submit
|
|||
|
|
- Different partial ratios per leg of a multi-leg exit
|
|||
|
|
|
|||
|
|
### Exact code path
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
|
|||
|
|
fill_ratio = max(0.0, min(1.0, float(self.scenario.partial_fill_ratio)))
|
|||
|
|
fill_size = float(intent.target_size) * fill_ratio
|
|||
|
|
# ... emits PARTIALLY_FILLED or FULL_FILL based on ratio
|
|||
|
|
# No distinction between ENTER and EXIT
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The mock venue is used in unit tests (`test_rust_backend.py` or similar), not in the live BingX e2e tests. The live tests use `BingxVenueAdapter` with real BingX VST, where MARKET orders always fill fully. The partial_fill_ratio path has never been used for a scenario that distinguishes entry from exit behavior.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Add per-action-type ratios:
|
|||
|
|
```python
|
|||
|
|
@dataclass(frozen=True)
|
|||
|
|
class MockVenueScenario:
|
|||
|
|
entry_partial_fill_ratio: float = 1.0
|
|||
|
|
exit_partial_fill_ratio: float = 1.0
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or add a per-order override via `intent.metadata`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 8: Per-asset price precision helper does not exist
|
|||
|
|
|
|||
|
|
**Location:** `bingx_direct.py` — `_format_quantity()` exists (line ~150) but `_format_price()` does not
|
|||
|
|
|
|||
|
|
**Nature:** Missing feature — LIMIT orders will be rejected by BingX
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
BingX requires the `price` field of a LIMIT order to have the correct decimal precision for each symbol. The `_format_quantity()` method resolves `size_increment` from the instrument provider and quantizes the quantity. No equivalent exists for price.
|
|||
|
|
|
|||
|
|
Without it, submitting a LIMIT order with `limit_price=0.08` for TRXUSDT sends `"price": "0.08"` to BingX. BingX expects 6 decimal places for TRXUSDT prices (e.g., `0.083456`). The order is rejected with `"code": 100001, "msg": "Invalid price precision"`.
|
|||
|
|
|
|||
|
|
| Symbol | Approx price | Required decimals | `limit_price` value | What BingX expects |
|
|||
|
|
|--------|-------------|-------------------|-------------------|-------------------|
|
|||
|
|
| TRXUSDT | $0.08 | 6 | 0.083456 | `"0.083456"` |
|
|||
|
|
| XRPUSDT | $0.52 | 4 | 0.5234 | `"0.5234"` |
|
|||
|
|
| ADAUSDT | $0.45 | 4 | 0.4523 | `"0.4523"` |
|
|||
|
|
| DOGEUSDT | $0.15 | 5 | 0.15234 | `"0.15234"` |
|
|||
|
|
| BTCUSDT | $60,000 | 2 | 60000.50 | `"60000.50"` |
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
No LIMIT orders are submitted. All 142 tests use MARKET orders where `type="MARKET"` and no `price` field is sent.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Add `_format_price(self, asset: str, price: float) -> str` mirroring `_format_quantity`:
|
|||
|
|
```python
|
|||
|
|
def _format_price(self, asset: str, price: float) -> str:
|
|||
|
|
instrument = self._resolve_instrument(asset)
|
|||
|
|
if instrument is not None:
|
|||
|
|
try:
|
|||
|
|
price_step = Decimal(str(instrument.price_increment.as_decimal()))
|
|||
|
|
value = Decimal(str(price))
|
|||
|
|
quantized = (value / price_step).to_integral_value(rounding=ROUND_DOWN) * price_step
|
|||
|
|
return _decimal_text(quantized)
|
|||
|
|
except Exception:
|
|||
|
|
pass
|
|||
|
|
return f"{price:.8f}".rstrip("0").rstrip(".")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The instrument provider already exposes `price_increment` — it just needs to be accessed.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 9: Cancel path falls back to trade_id as symbol
|
|||
|
|
|
|||
|
|
**Location:** `bingx_venue.py` lines ~300–310 (within `cancel()`)
|
|||
|
|
|
|||
|
|
**Nature:** Logic error — wrong variable in fallback chain
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
When `BingxVenueAdapter.cancel()` is called and the order's `metadata` dict lacks an `"asset"` key, it falls back:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
asset = str(order.metadata.get("asset") or order.internal_trade_id or order.venue_client_id or "")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`order.internal_trade_id` is the system's trade_id (e.g., `"cancel-idle-1712345678"`). This gets fed to `self.backend._instrument_venue_symbol(asset)` which does:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def _instrument_venue_symbol(self, asset: str) -> str:
|
|||
|
|
text = _normalize_symbol(asset) # "CANCEL-IDLE-1712345678"
|
|||
|
|
if text.endswith("USDT"):
|
|||
|
|
return f"{text[:-4]}-USDT" # "CANCEL-IDLE-1712345678"-USDT — nonsense
|
|||
|
|
return text # doesn't end with USDT → returns the garbage
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The cancel HTTP call is sent to BingX with a symbol that doesn't exist. BingX returns an error or silently ignores the request. The cancel silently fails.
|
|||
|
|
|
|||
|
|
This can happen whenever a `VenueOrder` is constructed without `metadata["asset"]`. The mock venue's `_event_from_order` sets `metadata={"intent_id": ..., "action": ...}` but does **not** include `"asset"`. So any cancel path triggered from a mock venue event will hit this bug.
|
|||
|
|
|
|||
|
|
### Exact trigger sequence
|
|||
|
|
|
|||
|
|
1. `MockVenueAdapter.submit()` creates a `VenueOrder` with `metadata={"intent_id": ..., "action": ...}` — no `"asset"`
|
|||
|
|
2. The kernel attaches this order to the slot
|
|||
|
|
3. A CANCEL intent arrives
|
|||
|
|
4. Python bridge calls `self.venue.cancel(self.slot(slot_id).active_entry_order)`
|
|||
|
|
5. `BingxVenueAdapter.cancel()` does `order.metadata.get("asset")` → None
|
|||
|
|
6. Falls back to `order.internal_trade_id` → a trade_id string
|
|||
|
|
7. Sends delete to BingX with a bogus symbol
|
|||
|
|
|
|||
|
|
Note: this only occurs when the mock venue is used in a test configuration. In live mode, `BingxDirectExecutionAdapter` stores richer metadata. But the fallback chain is still wrong and could bite in edge cases.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The live tests always have `metadata["asset"]` populated because the kernel attaches it before calling the venue. The mock venue's cancel path is only exercised in unit tests that don't check the BingX HTTP call content.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Change the fallback to use the order's `internal_trade_id` to look up the slot's asset from the kernel, not try to interpret it as a symbol:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# In cancel(), before the fallback:
|
|||
|
|
slot = self._kernel.slot(order.metadata.get("slot_id", 0))
|
|||
|
|
asset = str(order.metadata.get("asset") or slot.asset or "")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or at minimum, add the asset to the mock venue's event metadata.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 10: Event dedup window is bounded at 64
|
|||
|
|
|
|||
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~5 (constant), ~850–855 (eviction logic)
|
|||
|
|
|
|||
|
|
**Nature:** Resource management — fixed-size ring buffer with silent eviction
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
Each `TradeSlot` tracks seen events in `seen_event_ids: Vec<String>`. When the vector exceeds 64 entries, the oldest entries are drained:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
if slot.seen_event_ids.len() > MAX_SEEN_EVENT_IDS {
|
|||
|
|
let overflow = slot.seen_event_ids.len() - MAX_SEEN_EVENT_IDS;
|
|||
|
|
slot.seen_event_ids.drain(0..overflow);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This means:
|
|||
|
|
- Events 1–64 are deduplicated correctly
|
|||
|
|
- When event 65 arrives, event 1 is evicted. If event 1 arrives again, it's accepted as new
|
|||
|
|
- When event 66 arrives, event 2 is evicted, etc.
|
|||
|
|
- After 64 unique events, the dedup window is a rolling window of the last 64 events
|
|||
|
|
|
|||
|
|
With MARKET orders (1–3 events per trade), a slot would need ~20–60 trades before cycling through 64 events. With LIMIT orders that may receive many partial fills per order (e.g., a resting order that gets 5 fills/hour over 6 hours = 30 events), the limit could be hit in a single trade.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
No test submits more than ~30 events to a single slot (`rapid_ten_cycle` does 10 entry→exit cycles = ~30 events). The 64 limit was never reached.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Either:
|
|||
|
|
1. Increase `MAX_SEEN_EVENT_IDS` to a larger value (256 or 1024), or
|
|||
|
|
2. Use a proper LRU/size-bounded set (e.g., `LruCache` from the `lru` crate), or
|
|||
|
|
3. Change to a HashMap-based dedup keyed by `(event_id, action)` so eviction is explicit
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 11: Reconcile is a raw state override with no FSM validation
|
|||
|
|
|
|||
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~900–915 (`dita_kernel_reconcile_slots_json`)
|
|||
|
|
|
|||
|
|
**Nature:** Safety — no guards on incoming state
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
The reconcile function blindly overwrites slot state:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
for slot in slots {
|
|||
|
|
if slot.slot_id < core.slots.len() {
|
|||
|
|
core.slots[slot.slot_id] = slot.clone();
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
There is **zero validation** that the incoming slot state is a valid successor to the current state. A caller could:
|
|||
|
|
|
|||
|
|
- Set `fsm_state = POSITION_OPEN` with `size = 0.0` — the kernel thinks it has an open position with no size
|
|||
|
|
- Set `fsm_state = CLOSED` with `size = 5.0` — the kernel thinks a position is closed but still has size
|
|||
|
|
- Set `fsm_state = ENTRY_WORKING` with `trade_id = ""` — the kernel is in "entry working" state for no trade
|
|||
|
|
- Clear `seen_event_ids` to reset dedup — silently accepting duplicates
|
|||
|
|
|
|||
|
|
The intended use is restoring kernel state from a snapshot after a crash, where the slot state was explicitly serialized by a previous `kernel.snapshot()`. In that case the state should be self-consistent. But there's no guard against malformed or corrupted snapshot data.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The reconcile tests (`reconcile_empty`, `reconcile_after_entry`, etc.) all reconcile with self-consistent slot data from `k.slot(0)`. They never feed malformed state. The `fresh_kernel_reconcile_*` tests similarly use `_slot_from_payload` on data serialized from a real slot.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Add validation in the Rust kernel (or Python bridge) that checks basic consistency:
|
|||
|
|
- `fsm_state == POSITION_OPEN` → `size > 0` and `asset` non-empty
|
|||
|
|
- `fsm_state == IDLE` → `size == 0` and `trade_id` empty
|
|||
|
|
- `fsm_state == CLOSED` → `closed == true`
|
|||
|
|
- `size >= 0`
|
|||
|
|
- `slot_id` matches array index
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 12: `outcome.transitions` is incomplete — pre-venue only
|
|||
|
|
|
|||
|
|
**Location:** `rust_backend.py` lines ~490–505, `_rust_kernel/src/lib.rs` lines ~700–710
|
|||
|
|
|
|||
|
|
**Nature:** API contract — returned data is a partial snapshot
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
`process_intent()` runs three phases in sequence:
|
|||
|
|
1. **Rust kernel** processes the intent (pure FSM: `IDLE → ORDER_REQUESTED`)
|
|||
|
|
2. **Venue adapter** submits to exchange (HTTP call, receives ack + fill)
|
|||
|
|
3. **on_venue_event** called per venue response (ORDER_ACK → ENTRY_WORKING, FULL_FILL → POSITION_OPEN)
|
|||
|
|
|
|||
|
|
Each phase produces `KernelTransition` records. But only **phase 1** transitions appear in the returned `KernelOutcome.transitions`:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
final_outcome = KernelOutcome(
|
|||
|
|
...
|
|||
|
|
transitions=outcome.transitions, # from Rust — phase 1 only
|
|||
|
|
emitted_events=tuple(emitted_events), # from venue — phases 2-3
|
|||
|
|
...
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
A caller inspecting transitions sees `[IDLE → ORDER_REQUESTED]` and has no way to discover that `[ORDER_REQUESTED → ENTRY_WORKING]` and `[ENTRY_WORKING → POSITION_OPEN]` also occurred. The journal (`ClickHouseKernelJournal`) records all transitions correctly — but the returned `KernelOutcome` is the API surface that callers interact with.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The `outcome_inspect_entry` test checks `len(info["transitions"]) > 0` and `info["diagnostic"] == "OK"`. It doesn't validate that all expected transitions are present. The transitions are journaled to the debug sink, but no test reads the journal.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Collect transitions from phases 2-3 and append them to the outcome:
|
|||
|
|
```python
|
|||
|
|
all_transitions = list(outcome.transitions)
|
|||
|
|
for event in emitted_events:
|
|||
|
|
event_outcome = self.on_venue_event(event)
|
|||
|
|
all_transitions.extend(event_outcome.transitions)
|
|||
|
|
final_outcome = KernelOutcome(..., transitions=tuple(all_transitions), ...)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or document that `transitions` is an incomplete snapshot and the journal is the authoritative source.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Flaw 13: Slot realized PnL is not reset on re-entry after partial exit
|
|||
|
|
|
|||
|
|
**Location:** `_rust_kernel/src/lib.rs` lines ~575–600 (ENTER intent handler), specifically slot reset
|
|||
|
|
|
|||
|
|
**Nature:** State leakage — accumulated PnL from prior trade survives into next cycle
|
|||
|
|
|
|||
|
|
### Downstream effect
|
|||
|
|
|
|||
|
|
When an ENTER intent arrives, the Rust kernel resets most slot fields:
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
slot.trade_id = intent.trade_id.clone();
|
|||
|
|
slot.asset = intent.asset.clone();
|
|||
|
|
slot.side = intent.side.clone();
|
|||
|
|
slot.entry_time = Some(intent.timestamp);
|
|||
|
|
slot.entry_price = 0.0;
|
|||
|
|
slot.size = 0.0;
|
|||
|
|
slot.initial_size = 0.0;
|
|||
|
|
slot.unrealized_pnl = 0.0;
|
|||
|
|
slot.realized_pnl = 0.0; // ← reset to zero
|
|||
|
|
slot.exit_leg_ratios = ...;
|
|||
|
|
slot.active_leg_index = 0;
|
|||
|
|
slot.active_entry_order = None;
|
|||
|
|
slot.active_exit_order = None;
|
|||
|
|
slot.closed = false;
|
|||
|
|
slot.last_event_time = None;
|
|||
|
|
slot.fsm_state = TradeStage::ORDER_REQUESTED;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`slot.realized_pnl = 0.0` is explicitly set — correct for a fresh trade. But recall from Flaw 5 that realized PnL from partial fills (before the terminal close) may **not yet have been settled** to the account. If the slot accumulates realized PnL during partial fills, then re-enters before the final settle happens, the in-flight PnL is **zeroed without being settled**.
|
|||
|
|
|
|||
|
|
**This is actually the correct behavior** because:
|
|||
|
|
1. All MARKET-order fills settle immediately (they arrive as FULL_FILL and transition to CLOSED in one shot)
|
|||
|
|
2. For LIMIT orders that partially fill, the re-entry scenario is impossible because the slot isn't IDLE — it can't accept a new ENTER until the position is fully closed
|
|||
|
|
3. The slot CAN re-enter after a full close, and by then all PnL has been settled
|
|||
|
|
|
|||
|
|
So this is a **latent** rather than active flaw. It would manifest if:
|
|||
|
|
1. A LIMIT order partially fills (PnL on slot, not settled)
|
|||
|
|
2. The remaining limit is cancelled
|
|||
|
|
3. The slot's `consume_exit_leg()` leaves the slot in POSITION_OPEN with `size > 0` and `!closed` but no active orders
|
|||
|
|
4. Another ENTER arrives — but the Rust kernel rejects it because `!slot.is_free()`
|
|||
|
|
|
|||
|
|
So the slot design prevents this from happening accidentally. The flaw is that if a future code path bypasses the `is_free()` check (e.g., a force-enter feature), the unreleased PnL would be silently zeroed.
|
|||
|
|
|
|||
|
|
### Why it's not caught
|
|||
|
|
|
|||
|
|
The scenario can't happen with the current FSM. All fills eventually reach CLOSED, which triggers settle. No test forces an entry on a non-free slot.
|
|||
|
|
|
|||
|
|
### Fix strategy
|
|||
|
|
|
|||
|
|
Add an explicit assertion or sentinel in the ENTER handler:
|
|||
|
|
```rust
|
|||
|
|
if slot.realized_pnl.abs() > 1e-10 {
|
|||
|
|
// Log warning: unsynchronized PnL being discarded
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or enforce that `settle()` is always called before `realized_pnl` is reset, by moving the settle trigger to the Rust side.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Summary table
|
|||
|
|
|
|||
|
|
| # | Flaw | Layer | Severity | Blocks partial-fill? |
|
|||
|
|
|---|------|-------|----------|---------------------|
|
|||
|
|
| 1 | Entry-order cancellation broken | Python + Rust | **Critical** | **Yes** — can't cancel resting LIMIT entries |
|
|||
|
|
| 2 | No CANCEL_ACK → IDLE for entry | Rust FSM | **Critical** | **Yes** — slot stuck after cancelled entry |
|
|||
|
|
| 3 | Outcome mixes pre/post-venue state | Python bridge | Medium | No |
|
|||
|
|
| 4 | Multi-leg exit double-close | Rust FSM | Low | No |
|
|||
|
|
| 5 | Capital settle only on terminal state | Python bridge | **High** | **Partial** — stale capital during partial fills |
|
|||
|
|
| 6 | order_type/limit_price dropped in legacy intent | Python venue | **Critical** | **Yes** — LIMIT orders never reach BingX |
|
|||
|
|
| 7 | Mock venue single ratio for entry+exit | Mock venue | Low | No (mock tests only) |
|
|||
|
|
| 8 | Missing price formatting | Adapter | **High** | **Yes** — BingX rejects bad price precision |
|
|||
|
|
| 9 | Cancel falls back to trade_id as symbol | Python venue | Medium | No |
|
|||
|
|
| 10 | Event dedup window at 64 | Rust FSM | Low | No |
|
|||
|
|
| 11 | Reconcile has no FSM validation | Rust FSM | Low | No |
|
|||
|
|
| 12 | Outcome transitions incomplete | Python bridge | Medium | No |
|
|||
|
|
| 13 | Unsettled realized PnL on re-entry | Rust FSM | Low | No |
|
|||
|
|
|
|||
|
|
**6 critical/high** — must be fixed before safe LIMIT order / partial-fill deployment.
|
|||
|
|
**4 medium** — should be fixed in the same pass to keep hygiene.
|
|||
|
|
**3 low** — latent; fix opportunistically.
|