Third and deepest pass across all module boundaries, data transforms, and error paths. 30 new flaws found (F1-F30), including the highest-risk single flaw: an unprotected on_venue_event loop that leaves slots unrecoverable on any exception. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
1552 lines
62 KiB
Markdown
1552 lines
62 KiB
Markdown
# PINK DITAv2 — End-to-End Trace & Flaw Analysis
|
||
|
||
**Analysis date:** 2026-05-31
|
||
**Method:** Full-trace static analysis — every file, every data path, every
|
||
boundary crossing in the PINK execution pipeline. No test execution.
|
||
**System scope:** 34 active source files, ~12,000 lines across Rust kernel,
|
||
Python bridge, venue adapter, runtime, and persistence.
|
||
|
||
---
|
||
|
||
## E2E Data Flow (One Call)
|
||
|
||
Every E2E path in the PINK system traces through this sequence. Each numbered
|
||
step below is a site where data crosses a module boundary and can be lost,
|
||
mangled, or misinterpreted.
|
||
|
||
```
|
||
PinkDirectRuntime.step() # R1: policy cycle entry
|
||
├─ pump_venue_events() # R2: drain async fills
|
||
├─ kernel.snapshot()["account"] # R3: read capital
|
||
├─ kernel.slot(0) # R4: read slot state
|
||
├─ decision_engine.decide() # R5: policy-layer ENTER/EXIT
|
||
├─ intent_engine.plan() # R6: intent sizing
|
||
├─ _decision_to_kernel_intent() # R7: Decision → KernelIntent
|
||
├─ kernel.process_intent(kernel_intent) # R8: KERNEL BOUNDARY
|
||
│ ├─ rust_backend._intent_to_payload() # R8a: KernelIntent → JSON
|
||
│ ├─ _RustKernelLib.process_intent() # R8b: JSON → C FFI
|
||
│ │ └─ Rust process_intent() # R8c: FSM mutates TradeSlot
|
||
│ ├─ venue.submit(intent) # R9: VENUE BOUNDARY
|
||
│ │ ├─ bingx_venue._legacy_intent() # R9a: KernelIntent → LegacyIntent
|
||
│ │ ├─ BingxDirectExecutionAdapter # R9b: HTTP POST /trade/order
|
||
│ │ │ .submit_intent()
|
||
│ │ └─ bingx_venue._events_from_submit() # R9c: receipt → VenueEvent[]
|
||
│ └─ on_venue_event(event) # R10: FEEDBACK BOUNDARY
|
||
│ ├─ _RustKernelLib → Rust FSM # R10a: C FFI → FSM transition
|
||
│ ├─ account.settle(delta) # R10b: incremental PnL settlement
|
||
│ └─ persistence writes # R10c: ClickHouse / Zinc / HZ
|
||
├─ kernel.snapshot()["account"] # R11: read final capital
|
||
└─ persistence.persist_step() # R12: PERSISTENCE BOUNDARY
|
||
```
|
||
|
||
---
|
||
|
||
## Layer 1: Policy Cycle Entry (pink_direct.py:422)
|
||
|
||
### E1: `step()` calls `pump_venue_events()` every cycle unconditionally
|
||
|
||
**pink_direct.py:436**
|
||
```python
|
||
await self.pump_venue_events(snapshot, market_state=market_state)
|
||
```
|
||
|
||
This is called **before** reading slot/account state for the policy decision.
|
||
The pump calls `venue.reconcile()` which for `BingxVenueAdapter` does 5 HTTP
|
||
requests (balance, positions, open orders, plus history if `include_history`).
|
||
|
||
For MARKET-only workflows, no resting orders exist, so `reconcile()` returns
|
||
empty events every time. But the HTTP calls still happen. On BingX VST with
|
||
~10 req/s limit and a 5s policy cycle, this burns 1 req/s just to learn
|
||
"nothing changed." Add the actual trade HTTP calls, and the budget is tight.
|
||
|
||
**Flaw: E1 — unconditional exchange poll wastes rate limit.**
|
||
Already documented as A10, but worse when traced E2E: each `pump_venue_events`
|
||
calls `venue.reconcile()` → `_backend_snapshot()` → parallel `asyncio.gather`
|
||
of 3 HTTP GETs. The `_refresh_exchange_state` at bingx_direct.py:281-352
|
||
always fetches balance + positions + openOrders concurrently. Even when
|
||
`include_history=False` (which it is for the pump), that's 3 HTTP calls
|
||
every policy cycle regardless of whether any orders are resting.
|
||
|
||
**Severity: Medium.** Wasteful but not destructive on testnet.
|
||
|
||
### E2: `kernel.snapshot()["account"]` returns a fresh dict, not a live view
|
||
|
||
**pink_direct.py:437**
|
||
```python
|
||
acc = self.kernel.snapshot()["account"]
|
||
```
|
||
|
||
`ExecutionKernel.snapshot()` at rust_backend.py:740-752 builds a dict from
|
||
kernel state at call time. The decision/intent engines then consume this
|
||
snapshot. Between the snapshot and `process_intent()` (line 523), another
|
||
caller (or the same runtime in a concurrent cycle) could advance the kernel
|
||
state, making the decision based on stale capital.
|
||
|
||
**Flaw: E2 — TOCTOU between capital snapshot and intent execution.**
|
||
The `context.capital` read at line 437 is used at line 523 for the ENTER
|
||
safety guard (`_unsafe_entry_reason`) and possibly by the decision/intent
|
||
engines. If capital changes between these two points (e.g. an async fill
|
||
arrives via a concurrent test-HTTP path), the guard uses stale capital.
|
||
|
||
**Severity: Low** in single-threaded deployment. Critical under concurrency.
|
||
|
||
---
|
||
|
||
## Layer 2: Decision/Intent Bridging (pink_direct.py:79-115)
|
||
|
||
### E3: `_decision_to_kernel_intent` drops `order_type` and `limit_price`
|
||
|
||
**pink_direct.py:79-115**
|
||
```python
|
||
def _decision_to_kernel_intent(decision, intent, slot_id=0):
|
||
return KernelIntent(
|
||
...
|
||
# order_type and limit_price are NOT SET here
|
||
)
|
||
```
|
||
|
||
`KernelIntent` has `order_type="MARKET"` and `limit_price=0.0` as defaults,
|
||
so MARKET orders work correctly. But the runtime **never** sets these fields
|
||
from the policy layer. If `decision` or `intent` ever carries `order_type`
|
||
or `limit_price`, it's silently dropped because the bridge doesn't map them.
|
||
|
||
**Flaw: E3 — LIMIT support in runtime is dead code.**
|
||
The `order_type`/`limit_price` fields in `KernelIntent` and the LIMIT payload
|
||
building in `bingx_direct.py` lines 384-398 are unreachable from the runtime.
|
||
The only path that can set them is direct `KernelIntent(...)` construction
|
||
in tests (`_build_pink_bodies.py` style scenarios). The `_decision_to_kernel_intent`
|
||
bridge must be patched when a policy engine needs to emit LIMIT orders.
|
||
|
||
**Severity: Medium.** Blocks any production path to LIMIT orders.
|
||
|
||
### E4: `_exit_intent_from_slot` trusts slot.size but slot may be stale
|
||
|
||
**pink_direct.py:398-420**
|
||
```python
|
||
def _exit_intent_from_slot(self, kernel_intent):
|
||
try:
|
||
slot_size = float(self.kernel.slot(int(kernel_intent.slot_id)).size or 0.0)
|
||
except Exception:
|
||
slot_size = 0.0
|
||
...
|
||
exit_size = min(policy_size, slot_size) if policy_ok else slot_size
|
||
```
|
||
|
||
Reads `slot.size` fresh from the Rust kernel at call time, then uses it to
|
||
cap the exit size. Between this read and the `process_intent` call that
|
||
actually executes the EXIT (line 523), the slot can be modified by
|
||
`pump_venue_events` (line 436) or a concurrent cycle. If a partial fill
|
||
arrived between the slot read and the EXIT, the exit size could be wrong.
|
||
|
||
**Flaw: E4 — TOCTOU between exit sizing and exit execution.**
|
||
Same class as E2 but for exit size rather than capital. If the pump drained
|
||
a partial fill between R4 (slot read) and R8 (process_intent), the EXIT
|
||
requests a size based on pre-pump remaining size. The kernel caps it at
|
||
actual remaining, so this is self-correcting — but the intent payload has
|
||
wrong metadata.
|
||
|
||
**Severity: Low.** Self-correcting at kernel level.
|
||
|
||
---
|
||
|
||
## Layer 3: Kernel Bridge — Rust FSM Entry (rust_backend.py)
|
||
|
||
### E5: JSON serialization round-trip loses numeric precision
|
||
|
||
**rust_backend.py:460-485 (`_intent_to_payload`)**
|
||
|
||
`KernelIntent` fields like `reference_price`, `target_size`, `leverage` are
|
||
Python floats. They're serialized to JSON text, sent through C FFI, parsed
|
||
by serde_json into Rust `f64`, then serialized back to JSON, parsed by Python
|
||
`json.loads()`. Each serialization step can introduce precision loss:
|
||
|
||
```python
|
||
# Python float → JSON: 0.1 → "0.1" → Rust f64: 0.10000000000000000555
|
||
# Rust f64 → JSON: → serde_json may print "0.10000000000000001"
|
||
# Python json.loads → 0.10000000000000001
|
||
```
|
||
|
||
For prices (TRXUSDT at ~$0.08), a 1e-16 relative error is negligible. For
|
||
PnL accumulation over thousands of trades at 9x leverage, the error can grow
|
||
to cents or dollars. The `|Δcapital − realized| < 1e-9` assertion in tests
|
||
would catch gross errors but not sub-cent accumulation.
|
||
|
||
**Flaw: E5 — JSON serialization precision drift over long runs.**
|
||
**Severity: Low.** Not a practical concern for the current deployment scale.
|
||
|
||
### E6: `_RustKernelLib` is a global singleton — shared across all kernels
|
||
|
||
**rust_backend.py:40-45**
|
||
```python
|
||
_RUST: _RustKernelLib | None = None
|
||
|
||
def _get_rust() -> _RustKernelLib:
|
||
global _RUST
|
||
if _RUST is None:
|
||
_RUST = _RustKernelLib()
|
||
return _RUST
|
||
```
|
||
|
||
The `_RustKernelLib` singleton loads the `.so` shared library once and
|
||
provides FFI functions. Each `ExecutionKernel` instance gets its own
|
||
`KernelHandle` via `_get_rust().create(max_slots)`. The FFI functions take
|
||
the handle as the first argument, so multiple kernels are isolated at the
|
||
Rust level.
|
||
|
||
**However**, the singleton means ALL kernels share the same ctypes function
|
||
pointer table. If a second kernel is created and the first is destroyed,
|
||
`KernelHandle` of the first becomes a dangling pointer. Calling any FFI
|
||
function on the destroyed kernel's handle is use-after-free.
|
||
|
||
**Flaw: E6 — No protection against use-after-free on kernel destroy.**
|
||
Already documented as T7. Worth re-emphasizing in the E2E trace because the
|
||
test infrastructure creates and destroys kernels frequently (fresh-kernel
|
||
reconcile tests, each `_build_rb()` call in scenario wrappers).
|
||
|
||
**Severity: High.** Use-after-free in C FFI is memory corruption.
|
||
|
||
---
|
||
|
||
## Layer 4: Rust Kernel FSM (lib.rs:728)
|
||
|
||
### E7: ENTER handler silently allows re-entry with same trade_id
|
||
|
||
**lib.rs:740-745**
|
||
```rust
|
||
if !slot.is_free() && !slot.trade_id.is_empty() && slot.trade_id != intent.trade_id {
|
||
return SLOT_BUSY;
|
||
}
|
||
```
|
||
|
||
If `slot.trade_id == intent.trade_id`, the ENTER is accepted even if the
|
||
slot is not free (e.g., POSITION_OPEN with an active position). This is by
|
||
design — it lets the same trade_id re-enter after the slot was partially
|
||
reconciled or restored from a snapshot. But it also means:
|
||
|
||
1. EXIT sets `slot.closed=true` and transitions to `CLOSED`
|
||
2. A new ENTER with the **same** trade_id re-enters the CLOSED slot
|
||
3. The slot resets `slot.closed=false`, `slot.size=0.0`, `slot.initial_size=0.0`
|
||
4. Kernel now thinks the trade is new, but the Rust indexes still have the
|
||
old trade_id pointing to slot 0
|
||
|
||
**Downstream effect:** After a re-entry with the same trade_id, the
|
||
`active_trade_index[trade_id]` still correctly points to slot 0. But the
|
||
old `VenueOrder` in `client_order_index` and `venue_order_index` is still
|
||
present until the new entry fills and creates new orders. A reconcile event
|
||
addressed to the old `venue_client_id` could stomp on the new trade.
|
||
|
||
**Flaw: E7 — Re-entry with same trade_id leaves stale index entries.**
|
||
**Severity: Low.** The `rebuild_indexes()` call in `commit_slot()` rebuilds
|
||
from scratch, so stale entries are cleared on the first write.
|
||
|
||
### E8: EXIT handler uses `initial_size` not `current size`
|
||
|
||
**lib.rs:770-775**
|
||
```rust
|
||
let exit_ratio = slot.next_exit_ratio();
|
||
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
|
||
let exit_size = (base_size * exit_ratio).max(0.0);
|
||
```
|
||
|
||
Already documented as A1. In the E2E trace, this is the single most impactful
|
||
execution flaw. A concrete scenario:
|
||
|
||
1. Enter `size=1.0`, `initial_size=1.0`, `exit_leg_ratios=(0.5, 0.5, 1.0)`
|
||
2. EXIT leg 0: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.5.
|
||
3. EXIT leg 1: requests `1.0 * 0.5 = 0.5`. Slot goes to 0.0.
|
||
`active_leg_index` advances to 2. `all_legs_done = (2 >= 3) = false`.
|
||
But wait — `exit_leg_ratios.len()` is 3: [0.5, 0.5, 1.0]. So
|
||
`all_legs_done = (2 >= 3) = false`. The slot stays at `POSITION_OPEN`,
|
||
`size=0.0`, `!closed`.
|
||
4. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is at 0.0.
|
||
`slot.is_free()`: `fsm_state=POSITION_OPEN`, not in `{IDLE, CLOSED}`.
|
||
`slot.size <= 0.0` is true. But `!slot.is_free()` returns true because
|
||
of the FSM state check, not the size check. The ENTER guard `!slot.is_free()`
|
||
blocks re-entry. The EXIT guard `slot.is_free() || slot.closed || size <= 0.0`
|
||
triggers — returns `NO_OPEN_POSITION`.
|
||
5. **Slot is stuck forever.** No operation can advance it.
|
||
|
||
**Severity: High.** Concrete, reproducible, and not caught by any test.
|
||
|
||
### E9: CANCEL handler returns diagnostic even when nothing happened
|
||
|
||
**lib.rs:795-810**
|
||
```rust
|
||
if matches!(intent.action, KernelCommandType::CANCEL) {
|
||
let has_cancellable_exit = slot.active_exit_order.is_some();
|
||
let has_cancellable_entry = slot.active_entry_order.is_some()
|
||
&& matches!(slot.fsm_state, ENTRY_WORKING | ORDER_REQUESTED | ORDER_SENT | IDLE);
|
||
if !has_cancellable_exit && !has_cancellable_entry {
|
||
return KernelResult {
|
||
outcome: KernelOutcome {
|
||
accepted: false,
|
||
diagnostic_code: NO_ACTIVE_EXIT_ORDER,
|
||
...
|
||
},
|
||
...
|
||
};
|
||
}
|
||
return KernelResult {
|
||
outcome: KernelOutcome {
|
||
accepted: true,
|
||
...
|
||
},
|
||
...
|
||
};
|
||
}
|
||
```
|
||
|
||
Two issues:
|
||
1. When **neither** is cancellable, the diagnostic is `NO_ACTIVE_EXIT_ORDER`
|
||
even if the actual reason is "no active entry order either" or "slot is
|
||
already IDLE". The diagnostic is misleading.
|
||
2. When at least one IS cancellable, the Rust kernel returns `accepted=true`
|
||
but does **not** mutate the slot at all — it returns immediately with the
|
||
slot as-is. The actual cancel (HTTP call + FSM transition) happens in the
|
||
Python bridge. The Rust kernel's "accept" just means "yes you may try to
|
||
cancel this" — not "the cancel is complete."
|
||
|
||
This disconnect means: if the Python bridge's `venue.cancel()` fails (HTTP
|
||
error), the Rust kernel has already returned `accepted=true` for a cancel
|
||
that never happened. The caller sees `accepted=true` but the slot state
|
||
hasn't changed.
|
||
|
||
**Flaw: E9 — Rust CANCEL "accepts" before Python actually cancels.**
|
||
**Severity: Medium.** The `outcome.accepted` boolean is misleading for CANCEL.
|
||
|
||
### E10: `apply_fill` entry branch double-sets `active_entry_order`
|
||
|
||
**lib.rs:1330-1390**
|
||
```rust
|
||
// First set — at the top of the entry branch:
|
||
slot.active_entry_order = Some(VenueOrder {
|
||
...
|
||
filled_size: fill_size,
|
||
status: if partial { PARTIALLY_FILLED } else { FILLED },
|
||
...
|
||
});
|
||
|
||
// ... then later for full fill:
|
||
if !partial {
|
||
slot.fsm_state = TradeStage::POSITION_OPEN;
|
||
slot.active_entry_order = Some(VenueOrder { // SECOND SET
|
||
...
|
||
filled_size: slot.size, // uses updated slot.size
|
||
...
|
||
});
|
||
}
|
||
```
|
||
|
||
The entry branch sets `active_entry_order` at the top with `filled_size` from
|
||
the event, then for a FULL_FILL, sets it again with `filled_size = slot.size`
|
||
(which may have been updated by `slot.initial_size = fill_size` above). The
|
||
first VenueOrder's `intended_size` is from the event, the second uses
|
||
`slot.size`. Both are correct in isolation, but the double-write is wasteful.
|
||
|
||
More importantly, for a PARTIAL_FILL entry, the first set is the ONLY set.
|
||
If a second PARTIAL_FILL arrives for the same order, the entry branch at
|
||
line 1334 checks `slot.active_entry_order.is_some()` which is true (set by
|
||
the first partial), but the FSM state is `ENTRY_WORKING` (also set by first
|
||
partial). The condition at line 1334-1338 matches `ENTRY_WORKING`, so the
|
||
second partial enters the entry branch again. But `fill_size` is the event's
|
||
`filled_size` — the **total** filled, not the incremental amount.
|
||
|
||
**Flaw: E10 — Second PARTIAL_FILL on entry overwrites, doesn't accumulate.**
|
||
```rust
|
||
let fill_size = if event.filled_size > 0.0 {
|
||
event.filled_size // ← TOTAL filled, not incremental
|
||
} else {
|
||
event.size
|
||
}.max(0.0);
|
||
|
||
slot.active_entry_order = Some(VenueOrder {
|
||
...
|
||
filled_size: fill_size, // ← overwrites previous filled_size
|
||
...
|
||
});
|
||
|
||
slot.initial_size = slot.initial_size.max(fill_size); // ← OK, uses max
|
||
slot.size = fill_size; // ← OVERWRITES previous size with total
|
||
```
|
||
|
||
On a RESTING LIMIT entry that partially fills in two events:
|
||
- Event 1: filled_size=0.3 → slot.size=0.3, entry_order.filled_size=0.3
|
||
- Event 2: filled_size=0.7 → slot.size=0.7, entry_order.filled_size=0.7
|
||
|
||
The `filled_size` on the VenueOrder correctly reflects cumulative fill
|
||
(0.7), but `slot.size` jumps from 0.3 to 0.7 — the increment is 0.4, which
|
||
is correct because `fill_size` IS the cumulative fill (0.7). Actually this
|
||
is correct — the venue sends cumulative filled_size, not incremental. Let
|
||
me re-verify: at `bingx_venue._events_from_submit()` line ~480:
|
||
```python
|
||
filled_size = _row_float(ack_row, "executedQty", ...)
|
||
```
|
||
This reads `executedQty` which on BingX IS cumulative. So the second event's
|
||
`filled_size=0.7` means "total filled across all fills = 0.7." The kernel
|
||
sets `slot.size = 0.7` which is the total position size. This is correct.
|
||
|
||
But the second fill event has `slot.entry_price` overwritten by the new
|
||
fill's price. If the first fill was at 0.0834 and the second at 0.0836, the
|
||
slot's `entry_price` becomes 0.0836 — losing the blended average. For a LIMIT
|
||
entry with two partial fills at different prices, the entry_price in the slot
|
||
is the price of the LAST fill, not the VWAP.
|
||
|
||
**Flaw: E10a — Entry price on multi-partial entry is last-fill, not VWAP.**
|
||
**Severity: Low.** Unrealized PnL computation uses this price. Error is small
|
||
for tight spreads.
|
||
|
||
---
|
||
|
||
## Layer 5: Venue Adapter Boundary (bingx_venue.py)
|
||
|
||
### E11: `_legacy_intent()` is a lossy conversion
|
||
|
||
**bingx_venue.py:270-285**
|
||
```python
|
||
@staticmethod
|
||
def _legacy_intent(intent: KernelIntent) -> LegacyIntent:
|
||
action = LegacyDecisionAction.ENTER if intent.action == E.ENTER else ...
|
||
side = LegacyTradeSide.SHORT if intent.side == TS.SHORT else ...
|
||
metadata = dict(intent.metadata)
|
||
metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
|
||
metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
|
||
return LegacyIntent(
|
||
timestamp=intent.timestamp,
|
||
trade_id=intent.trade_id,
|
||
decision_id=intent.intent_id,
|
||
asset=intent.asset,
|
||
action=action,
|
||
side=side,
|
||
reason=intent.reason,
|
||
target_size=float(intent.target_size),
|
||
leverage=float(intent.leverage),
|
||
reference_price=float(intent.reference_price),
|
||
confidence=1.0, # ← HARDCODED
|
||
bars_held=0, # ← HARDCODED
|
||
exit_leg_ratios=tuple(intent.exit_leg_ratios or (1.0,)),
|
||
metadata=metadata,
|
||
)
|
||
```
|
||
|
||
`confidence` is always 1.0 and `bars_held` is always 0. The `LegacyIntent`
|
||
carries these to `BingxDirectExecutionAdapter.submit_intent()` which ignores
|
||
them (it only reads `asset`, `side`, `action`, `target_size`, `leverage`,
|
||
and `metadata`). So the hardcoded values don't affect execution — but they
|
||
affect the `ExecutionReceipt` and any downstream consumers that might read
|
||
`receipt.confidence`.
|
||
|
||
**Flaw: E11 — Lossy conversion with hardcoded metadata.**
|
||
**Severity: Informational.** No downstream consumer reads these fields.
|
||
|
||
### E12: `_events_from_submit()` price fallback chain can lose venue price
|
||
|
||
**bingx_venue.py:375-400 (`_events_from_submit`)**
|
||
```python
|
||
base_event = VenueEvent(
|
||
...
|
||
price=safe_float(getattr(receipt, "price", 0.0), 0.0),
|
||
...
|
||
)
|
||
|
||
# ... later for fill event:
|
||
fill_price = safe_float(
|
||
_row_float(ack_row, "avgPrice", "ap", "price", "lastFillPrice",
|
||
default=getattr(receipt, "price", 0.0)),
|
||
0.0
|
||
)
|
||
```
|
||
|
||
The fill price is read from `ack_row` (the HTTP response dict) first, falling
|
||
back to `receipt.price` (the `ExecutionReceipt` field). The `executionReceipt`
|
||
price comes from `bingx_direct.py:434`:
|
||
```python
|
||
fill_price = 0.0
|
||
for key in ("avgPrice", "avgFilledPrice", "price", "lastFillPrice", "tradePrice"):
|
||
try: value = float(ack_row.get(key) or 0.0)
|
||
except: value = 0.0
|
||
if value > 0: fill_price = value; break
|
||
if fill_price <= 0 and self._state is not None:
|
||
fill_price = next((float(...)) for ... in self._state.open_positions.values() ...)
|
||
```
|
||
|
||
So the price flows: BingX HTTP ack → `ack_row[key]` → `receipt.price` →
|
||
`_events_from_submit()` → `fill_price` in VenueEvent.
|
||
|
||
If `ack_row` has no price field AND `self._state.open_positions` has no matching
|
||
position (e.g., first fill on a new entry), `fill_price` stays 0.0. The kernel's
|
||
`apply_fill` at lib.rs:1397 checks `if event.price > 0.0` before setting
|
||
`entry_price` — so a zero fill price leaves `entry_price` at 0.0. This means:
|
||
|
||
- The slot's `entry_price` stays 0.0
|
||
- `realized_pnl()` at lib.rs:662 checks `if slot.entry_price <= 0.0` → returns 0.0
|
||
- **PnL is never computed for this fill**
|
||
- Capital never settles
|
||
|
||
This is very unlikely on BingX VST, which always returns `avgPrice` in order
|
||
acknowledgements. But on any venue that doesn't, PnL is silently zeroed.
|
||
|
||
**Flaw: E12 — Zero fill price → zero entry_price → zero PnL.**
|
||
**Severity: Medium.** Silent PnL loss if venue returns no price.
|
||
|
||
### E13: `_backend_snapshot()` timeout returns stale data
|
||
|
||
**bingx_venue.py:290-320**
|
||
```python
|
||
def _backend_snapshot(self, *, include_history=False, timeout_ms=5000.0):
|
||
if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
|
||
with self._snap_lock:
|
||
return self._last_snapshot # ← STALE DATA
|
||
```
|
||
|
||
If the previous snapshot fetch is still in-flight when a new caller arrives,
|
||
the timeout returns `self._last_snapshot` — which could be seconds or minutes
|
||
old. The caller (e.g., `submit()`) then uses this stale snapshot to compute
|
||
`_filled_size_from_snapshots()` — potentially comparing stale "before" data
|
||
with fresh "after" data, producing a wrong delta.
|
||
|
||
**Flaw: E13 — Stale snapshot fallback causes wrong fill-size detection.**
|
||
**Severity: Medium.** The `_filled_size_from_snapshots` diff can be wrong.
|
||
|
||
### E14: `_events_from_cancel` uses stale `slot_id` from order metadata
|
||
|
||
**bingx_venue.py:485-510**
|
||
```python
|
||
VenueEvent(
|
||
...
|
||
slot_id=int(order.metadata.get("slot_id", 0) or 0),
|
||
...
|
||
)
|
||
```
|
||
|
||
The `slot_id` in the CANCEL event comes from the `VenueOrder.metadata` which
|
||
was set when the order was created (in Rust FSM's `process_intent` or
|
||
`on_venue_event`). If the slot was re-assigned or the kernel's slot count
|
||
changed since order creation, this slot_id is wrong. The Rust kernel's
|
||
`resolve_slot()` at lib.rs:610-624 would use the event's `slot_id` (the
|
||
stale one) and find the wrong slot.
|
||
|
||
**Flaw: E14 — Cancel event carries stale slot_id from order creation.**
|
||
**Severity: Low.** Slots are stable and never renumbered.
|
||
|
||
---
|
||
|
||
## Layer 6: BingX Direct Adapter (bingx_direct.py)
|
||
|
||
### E15: Submit sets leverage via separate HTTP call
|
||
|
||
**bingx_direct.py:376-379**
|
||
```python
|
||
await self._client.signed_post(
|
||
"/openApi/swap/v2/trade/leverage",
|
||
{"symbol": symbol, "side": "BOTH", "leverage": leverage},
|
||
)
|
||
```
|
||
|
||
This is a POST to set exchange leverage **before** each order. If this call
|
||
fails (rate limit, network error), the exception at line 417 sets
|
||
`status = "RATE_LIMITED"` and returns a rejection — the order is NOT
|
||
submitted. But the error handling at line 417 catches `BingxHttpError` for
|
||
the leverage call AND the order call with the same handler. If the leverage
|
||
call fails with a non-rate-limit error (e.g., `400 Bad Request` for invalid
|
||
symbol), the status is `"REJECTED"` and no order is placed. This is correct
|
||
behavior — but the error message doesn't distinguish "leverage set failed"
|
||
from "order submission failed."
|
||
|
||
**Flaw: E15 — Leverage-set failure and order failure share error handler.**
|
||
**Severity: Low.** Correct behavior, poor diagnostics.
|
||
|
||
### E16: `_format_quantity` and `_format_price` use `_instrument_step`/`_instrument_tick` — both may be zero
|
||
|
||
**bingx_direct.py:234-268**
|
||
```python
|
||
def _instrument_step(self, asset):
|
||
instrument = self._resolve_instrument(asset)
|
||
if instrument is not None:
|
||
try: return Decimal(str(instrument.size_increment.as_decimal()))
|
||
except: pass
|
||
return Decimal("0.001") # fallback
|
||
|
||
def _format_quantity(self, asset, quantity):
|
||
step = self._instrument_step(asset)
|
||
if step <= 0:
|
||
return str(max(0.0, quantity))
|
||
...
|
||
```
|
||
|
||
If `_resolve_instrument` returns None (asset not in provider), `step=0.001`
|
||
and `tick=0.01`. These defaults are correct for most USDT perpetuals on
|
||
BingX VST, but may be wrong for non-standard symbols. The format functions
|
||
still produce a valid string — just possibly with wrong precision.
|
||
|
||
More concerning: `_resolve_instrument` at line 211-226 tries three lookup
|
||
strategies and iterates all instruments on the third. This iteration is O(n)
|
||
in the number of instruments and happens on EVERY `submit_intent()` call.
|
||
With 540 instruments, this is ~0.5ms — acceptable. But `_instrument_step`
|
||
and `_instrument_tick` each call `_resolve_instrument` independently, so
|
||
`submit_intent()` calls it twice (once for quantity, once for price, plus
|
||
once for `_instrument_venue_symbol` at line 358). Three full-instrument-list
|
||
iterations per order.
|
||
|
||
**Flaw: E16 — Instrument resolution called 3x per order with O(n) scan.**
|
||
**Severity: Low.** Performance, not correctness.
|
||
|
||
### E17: Cancel uses truth-based confirmation — can mask real errors
|
||
|
||
**bingx_direct.py:474-498**
|
||
```python
|
||
still_open = True
|
||
try:
|
||
oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
|
||
...
|
||
still_open = (venue_order_id in ids) if venue_order_id else (venue_client_id in cids)
|
||
except Exception:
|
||
still_open = None
|
||
|
||
if still_open is False:
|
||
return {"status": "CANCELED", ...}
|
||
if str(delete_resp.get("status", "")).upper() in {"CANCELED", "CANCELLED", "SUCCESS", "OK"}:
|
||
return {"status": "CANCELED", ...}
|
||
return {"status": delete_resp.get("status", "REJECTED"), ...}
|
||
```
|
||
|
||
The cancel logic:
|
||
1. DELETE the order on BingX
|
||
2. GET open orders to verify
|
||
3. If the order is no longer open, return CANCELED
|
||
4. If the DELETE response says CANCELED, return CANCELED
|
||
5. Otherwise return REJECTED
|
||
|
||
If step 2's GET fails (network error, rate limit), `still_open=None`.
|
||
Then step 4 checks the DELETE response. If the DELETE also returned an error
|
||
(e.g., "order not found" because it was already cancelled by another caller),
|
||
`status` is `"ERROR"` or `"not found"` — neither matches `"CANCELED"`.
|
||
The cancel is reported as `REJECTED` even though the order IS cancelled.
|
||
|
||
The `bingx_venue._events_from_cancel()` then emits `CANCEL_REJECT` instead
|
||
of `CANCEL_ACK`. The Rust kernel handles `CANCEL_REJECT` at lib.rs:1218:
|
||
```rust
|
||
KernelEventKind::CANCEL_REJECT => {
|
||
if slot.fsm_state == TradeStage::EXIT_WORKING {
|
||
slot.fsm_state = TradeStage::EXIT_WORKING; // no-op
|
||
}
|
||
diagnostic_code = KernelDiagnosticCode::CANCEL_REJECTED;
|
||
}
|
||
```
|
||
|
||
The slot stays in its current state (e.g., `EXIT_WORKING`) with no active order
|
||
(the exchange has no record of it). The slot is stuck until a manual reconcile.
|
||
|
||
**Flaw: E17 — Cancel can return false REJECTED for already-cancelled orders.**
|
||
**Severity: Medium.** Leads to stuck slot requiring manual intervention.
|
||
|
||
---
|
||
|
||
## Layer 7: Fill Feedback Loop (rust_backend.py on_venue_event)
|
||
|
||
### E18: `on_venue_event` settles PnL incrementally — but fees are never included
|
||
|
||
**rust_backend.py:530-545**
|
||
```python
|
||
incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
|
||
if abs(incremental_pnl) > 1e-12:
|
||
self.account.settle(incremental_pnl)
|
||
self._last_settled_pnl[slot.slot_id] = slot.realized_pnl
|
||
```
|
||
|
||
The Rust kernel's `apply_fill` computes realized PnL as:
|
||
```rust
|
||
let realized = Self::realized_pnl(slot, event.price, fill_size);
|
||
slot.realized_pnl += realized;
|
||
```
|
||
|
||
No fee subtraction. No commission reading from the event. The `VenueEvent`
|
||
could carry fee data via `metadata["fee"]` or `raw_payload["commission"]`,
|
||
but the Rust kernel doesn't read it and the Python bridge doesn't extract it.
|
||
|
||
Over the 142 live test scenarios on VST (where fees are 0 or negligible),
|
||
this is invisible. On live mainnet with exchange fees of 0.02-0.04%, the
|
||
cumulative error is unbounded.
|
||
|
||
**Flaw: E18 — PnL settlement ignores fees.**
|
||
Already documented as A7. In the E2E trace, the gap is specifically here:
|
||
`VenueEvent.price` is used for `realized_pnl()` but `VenueEvent.metadata`
|
||
(which could carry `commission` from the venue) is never read.
|
||
|
||
**Severity: Medium** (grows with trade volume).
|
||
|
||
### E19: `observe_slots` called with ALL slots, not just changed ones
|
||
|
||
**rust_backend.py:538-545**
|
||
```python
|
||
slots = [self._get_slot(i) for i in range(self.max_slots)]
|
||
self.account.observe_slots(slots)
|
||
```
|
||
|
||
Every `on_venue_event` call re-reads ALL slots from the Rust kernel (N FFI
|
||
calls) and calls `observe_slots` with the full list. With `max_slots=10`,
|
||
this is 10 FFI round-trips per venue event. Each round-trip serializes a
|
||
TradeSlot to JSON, passes through C FFI, parses on the Rust side, serializes
|
||
the result, passes back, and parses on the Python side. For a multi-leg EXIT
|
||
with 3 fills (ACK + PARTIAL + FULL), that's 3 × 10 = 30 slot reads per
|
||
process_intent call.
|
||
|
||
**Flaw: E19 — Full-slot-list read on every event is N×FFI overhead.**
|
||
**Severity: Low** (performance). Not a correctness issue.
|
||
|
||
---
|
||
|
||
## Layer 8: Persistence Boundary (pink_clickhouse.py)
|
||
|
||
### E20: `_capital()` reads live from `AccountProjection` — stale row risk
|
||
|
||
**pink_clickhouse.py:199-200**
|
||
```python
|
||
def _capital(self) -> float:
|
||
return float(self.account.snapshot.capital or 0.0)
|
||
```
|
||
|
||
Every row writer calls `_capital()` at write time to get the current capital.
|
||
But `persist_result()` is called AFTER `kernel.process_intent()` returns —
|
||
at which point the account has already been settled. The `account_events`,
|
||
`position_state`, and `trade_events` rows all record the SAME capital value
|
||
(the post-settle value). `capital_before` is then reconstructed by
|
||
subtracting PnL (already documented as A5).
|
||
|
||
The effect: all ClickHouse rows for a single `process_intent()` call show
|
||
identical `capital` / `account_capital` / `portfolio_capital` values, because
|
||
they're all written within the same Python call stack with no intervening
|
||
events. This is correct for single-threaded operation — all rows reflect
|
||
POST-trade state. But it means ClickHouse querying for "capital before trade"
|
||
must use `capital_after - pnl`, which is the wrong formula under multi-slot.
|
||
|
||
**Flaw: E20 — All persistence rows write post-trade capital, not pre-trade.**
|
||
Already documented as A5 from the capital_before angle.
|
||
|
||
**Severity: High** for multi-slot accounting reconstruction.
|
||
|
||
### E21: `persist_fill_events()` synthesizes fake Decision/Intent
|
||
|
||
**pink_clickhouse.py:383-435**
|
||
```python
|
||
def persist_fill_events(self, *, snapshot, events, slot_dict, market_state):
|
||
...
|
||
decision = Decision(
|
||
timestamp=ts, decision_id=trade_id or "async", asset=asset,
|
||
action=action, side=side, reason="ASYNC_FILL",
|
||
confidence=0.0, velocity_divergence=0.0, irp_alignment=0.0,
|
||
reference_price=price, target_size=cur_size, leverage=leverage,
|
||
...
|
||
)
|
||
intent = Intent(
|
||
timestamp=ts, trade_id=trade_id, decision_id=trade_id or "async",
|
||
...
|
||
)
|
||
```
|
||
|
||
The async fill pump (called by `pump_venue_events`) constructs fake
|
||
Decision/Intent objects because there's no real policy decision backing an
|
||
async fill — it just arrived from the exchange. These synthetic objects have:
|
||
- `decision_id = trade_id` (or `"async"` if trade_id is empty)
|
||
- `decision_id` and `trade_id` are the same string
|
||
- `confidence=0.0`, `velocity_divergence=0.0`, `irp_alignment=0.0`
|
||
- `target_size = cur_size` (the remaining size after the fill, not the
|
||
size that was filled)
|
||
|
||
These are written to `policy_events`, `trade_reconstruction`, and
|
||
`trade_events` with the same row shapes as real policy-driven fills. Any
|
||
ClickHouse query that joins `policy_events` to `trade_events` on
|
||
`decision_id` will find matching rows (both set to `trade_id`), but the
|
||
policy_events row's `target_size` is the POST-fill size, not the pre-fill
|
||
size. A replay system that reconstructs position from `policy_events` →
|
||
`trade_reconstruction` would see incorrect sizing.
|
||
|
||
**Flaw: E21 — Async fill persistence uses synthetic decision with wrong data.**
|
||
**Severity: Medium.** Misleading historical records.
|
||
|
||
### E22: `_write_trade_exit_leg` capital_before uses arithmetic reconstruction
|
||
|
||
**pink_clickhouse.py:761-762**
|
||
```python
|
||
capital_after = self._capital()
|
||
capital_before = capital_after - pnl_leg
|
||
```
|
||
|
||
Already documented as A5. In the E2E trace, the specific path is:
|
||
1. Slot 0 exit leg fills → `_capital()` returns capital AFTER settlement
|
||
(because the kernel's `on_venue_event` already called `account.settle`)
|
||
2. `capital_before = capital_after - pnl_leg` reconstructs pre-leg capital
|
||
|
||
If slot 1 also settled between the leg fill and the persistence write
|
||
(possible in multi-threaded or concurrent scenario), `capital_after` includes
|
||
slot 1's PnL, and `capital_before` is wrong by exactly slot 1's contribution.
|
||
|
||
**Severity: High** for multi-slot.
|
||
|
||
### E23: `_write_trade_event` uses `slot_dict.get("entry_price")` as exit_price
|
||
|
||
**pink_clickhouse.py:813-815**
|
||
```python
|
||
entry_price = _safe_float(slot_dict.get("entry_price", 0.0), ...)
|
||
exit_price = _safe_float(slot_dict.get("entry_price", 0.0), ...) # ← SAME FIELD
|
||
```
|
||
|
||
Already documented as A13. The `exit_price` is set to `entry_price` from
|
||
the same slot dict field. The BingX ack payload does contain the fill price,
|
||
but it's not propagated to the slot dict's `entry_price` for exit fills —
|
||
the slot's `entry_price` is set during entry fill and remains unchanged
|
||
during exit. The exit fill price is only on the `VenueEvent`, which is not
|
||
passed through to `_write_trade_event`.
|
||
|
||
The `trade_events` row in ClickHouse always shows `exit_price == entry_price`,
|
||
making PnL reconstruction from `(exit_price - entry_price) × size × lev`
|
||
impossible. The `pnl` field IS correct (it's `slot.realized_pnl`), but only
|
||
the summary is accurate — the component prices are wrong.
|
||
|
||
**Severity: Low.** `pnl` is correct, only the decomposed price is wrong.
|
||
|
||
---
|
||
|
||
## Layer 9: Test Infrastructure
|
||
|
||
### E24: `MockVenueAdapter.submit()` always emits fill on `partial_fill_ratio > 0`
|
||
|
||
**mock_venue.py:60-90**
|
||
```python
|
||
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
|
||
fill_ratio = max(0.0, min(1.0, float(effective_ratio)))
|
||
...
|
||
if is_entry:
|
||
effective_ratio = self.scenario.entry_partial_fill_ratio if \
|
||
self.scenario.entry_partial_fill_ratio != 1.0 else \
|
||
self.scenario.partial_fill_ratio
|
||
else:
|
||
effective_ratio = self.scenario.exit_partial_fill_ratio ...
|
||
```
|
||
|
||
The default `MockVenueScenario()` has `partial_fill_ratio=1.0`. So every
|
||
`submit()` call on a default mock emits a FULL_FILL event immediately.
|
||
This means mock-venue tests always test the "order fills instantly" path —
|
||
they never test resting orders, partial fills, or async fills.
|
||
|
||
Any test that relies on the mock venue is testing a subset of real venue
|
||
behavior. The mock never produces:
|
||
- DELAYED fills (fill arrives on a later `reconcile()` call)
|
||
- PARTIAL fills with subsequent fills
|
||
- Partial fills during entry (entry fills partially, then more later)
|
||
- Mixed entry/exit partial behavior
|
||
|
||
**Flaw: E24 — Mock venue always fills synchronously — never tests async path.**
|
||
**Severity: Medium.** The `pump_venue_events()` path has never been exercised
|
||
with the mock venue.
|
||
|
||
### E25: Test scenarios use MARKET-only `_si()` helper — no LIMIT tests
|
||
|
||
**gen_live_tests.py and _gen_test.py**
|
||
|
||
The `_si()` helper constructs a `KernelIntent` with `order_type="MARKET"` and
|
||
`limit_price=0.0` (the defaults). All 157 live test scenarios use `_si()`.
|
||
The 3 "LIMIT" scenarios (`limit_does_not_fill`, `limit_immediate_fill`) use
|
||
`reference_price=0.0` and `target_size=-0.001` respectively — they test
|
||
**intent validation**, not actual LIMIT order submission.
|
||
|
||
There is **zero** live-test coverage of:
|
||
- Submitting a LIMIT order that rests on the book
|
||
- A resting LIMIT being cancelled
|
||
- A resting LIMIT receiving a partial fill then a subsequent fill
|
||
- An async fill arriving via `pump_venue_events()`
|
||
|
||
The Rust kernel's `PARTIAL_FILL` event handling and the Python bridge's
|
||
`on_venue_event` + incremental settle + async pump has never been exercised
|
||
on a live exchange.
|
||
|
||
**Flaw: E25 — Zero live tests for LIMIT/resting/async-fill paths.**
|
||
**Severity: High.** The partial-fill code path is untested in production.
|
||
|
||
### E26: Fresh-kernel reconcile tests create second kernel but share venue
|
||
|
||
**gen_live_tests.py** (fresh_kernel_reconcile_entry body)
|
||
```python
|
||
fresh = _build_fresh_kernel_from_slot(slot_data, ic=cb)
|
||
k2 = fresh.runtime.kernel
|
||
```
|
||
|
||
The `_build_fresh_kernel_from_slot` function creates a new `PinkDirectRuntime`
|
||
with a new `ExecutionKernel`. But the **venue adapter** is shared or
|
||
re-created with the same BingX backend. Two kernels making concurrent HTTP
|
||
calls to BingX through shared or separate venue adapters is exactly the
|
||
multi-threaded scenario that triggers T1 (Rust kernel UB) — except the tests
|
||
are sequential, not concurrent, so they don't trigger it.
|
||
|
||
The fresh kernel does NOT restore the venue state (open orders, positions).
|
||
The fresh kernel has a blank venue adapter state — it can't know about
|
||
previous LIMIT orders resting on the exchange. This is correct for MARKET-only
|
||
tests (no resting orders) but would fail for LIMIT tests.
|
||
|
||
**Flaw: E26 — Fresh-kernel reconcile doesn't restore venue state.**
|
||
**Severity: Medium** (would break LIMIT scenarios).
|
||
|
||
---
|
||
|
||
## Summary: Critical E2E Flaw Chain
|
||
|
||
The most dangerous E2E scenario is a **LIMIT order with partial fills** on
|
||
a live exchange:
|
||
|
||
```
|
||
1. Policy emits LIMIT ENTER [E3: can't happen — bridge drops order_type]
|
||
2. KernelIntent with order_type="LIMIT" [dead code path from step 1]
|
||
3. bingx_direct.submit_intent builds LIMIT payload [works if reached]
|
||
4. BingX accepts LIMIT, returns ACK with no fill [VenueEvent.price may be 0]
|
||
5. FSM transitions to ENTRY_WORKING [correct]
|
||
6. RESTING LIMIT sits on book [no further kernel events]
|
||
7. Next policy cycle: pump_venue_events() [E1: expensive HTTP calls]
|
||
8. Reconciled venue has no fill events [nothing to drain]
|
||
9. Repeated cycles with no progress [wasteful but safe]
|
||
10. Eventually BingX fills partially [VenueEvent arrives]
|
||
11. apply_fill PARTIAL_FILL entry branch runs [E10: entry_price = last fill, not VWAP]
|
||
12. on_venue_event settles incremental PnL [E18: fees not included]
|
||
13. persistence writes [E20/E21/E22/E23: wrong capital_before, exit_price]
|
||
14. Remaining LIMIT still rests on book [continues to step 7]
|
||
15. Eventually full fill or cancel [E17: cancel can return false REJECTED]
|
||
```
|
||
|
||
**None of steps 4-15 have live test coverage.**
|
||
|
||
---
|
||
|
||
## Complete Flaw Catalog (All Layers)
|
||
|
||
| # | Flaw | Layer | Step | Severity |
|
||
|---|------|-------|------|----------|
|
||
| E1 | Unconditional pump_venue_events wastes rate limit | Runtime | R2 | Medium |
|
||
| E2 | TOCTOU between capital snapshot and intent | Runtime | R3→R8 | Medium |
|
||
| E3 | Runtime bridge drops order_type/limit_price | Bridging | R7 | **Medium** |
|
||
| E4 | TOCTOU between exit sizing and execution | Runtime | R8 | Low |
|
||
| E5 | JSON precision drift over long runs | Bridge | R8a→R8c | Low |
|
||
| E6 | Global FFI singleton no guard vs use-after-free | Bridge | R8b | **High** |
|
||
| E7 | Same-trade-id re-entry leaves stale index entries | Rust | R8c | Low |
|
||
| E8 | EXIT uses initial_size not remaining size | Rust | R8c | **High** |
|
||
| E9 | CANCEL "accepted" before cancel actually happens | Rust | R8c | Medium |
|
||
| E10 | Entry price on multi-partial fill = last fill, not VWAP | Rust | R10a | Low |
|
||
| E11 | _legacy_intent hardcodes confidence/bars_held | Venue | R9a | Info |
|
||
| E12 | Zero fill price → zero PnL | Venue | R9c | Medium |
|
||
| E13 | Stale snapshot fallback causes wrong fill delta | Venue | R9c | Medium |
|
||
| E14 | Cancel event carries stale slot_id | Venue | R9c | Low |
|
||
| E15 | Leverage-set failure and order failure share handler | Adapter | R9b | Low |
|
||
| E16 | Instrument resolution 3x per order, O(n) scan | Adapter | R9b | Low |
|
||
| E17 | Cancel returns false REJECTED for already-cancelled | Adapter | R9b | Medium |
|
||
| E18 | PnL settlement ignores fees | Bridge | R10b | **Medium** |
|
||
| E19 | Full-slot-list read on every event = N×FFI overhead | Bridge | R10b | Low |
|
||
| E20 | All persistence rows write post-trade capital | Persistence | R12 | **High** |
|
||
| E21 | Async fill uses synthetic Decision with wrong size | Persistence | R12 | Medium |
|
||
| E22 | capital_before arithmetic reconstruction wrong | Persistence | R12 | **High** |
|
||
| E23 | trade_events exit_price = entry_price | Persistence | R12 | Low |
|
||
| E24 | Mock venue always fills synchronously | Test | — | Medium |
|
||
| E25 | Zero live tests for LIMIT/async-fill paths | Test | — | **High** |
|
||
| E26 | Fresh-kernel reconcile doesn't restore venue | Test | — | Medium |
|
||
|
||
**Total: 26 E2E flaws (4 High, 10 Medium, 11 Low, 1 Info)**
|
||
|
||
The four High-severity flaws in the E2E trace:
|
||
- **E6**: Global FFI singleton + `__del__` use-after-free — memory corruption risk
|
||
- **E8**: Exit-size overshoot — slot can get stuck (A1)
|
||
- **E20/E22**: Post-trade capital in all persistence rows + arithmetic
|
||
capital_before — ClickHouse records are misleading for accounting
|
||
- **E25**: No LIMIT/async-fill test coverage — partial-fill path is production
|
||
code with zero live validation
|
||
|
||
---
|
||
|
||
## PASS 3 — NEW FINDINGS (Deepest E2E Trace)
|
||
|
||
### F1: `process_intent` CANCEL returns "accepted" before the cancel happens — caller gets wrong `outcome.state`
|
||
|
||
**File:** `rust_backend.py:595-614`
|
||
|
||
The CANCEL path:
|
||
1. Calls `self.venue.cancel(order)` → HTTP DELETE → returns `VenueEvent[]`
|
||
2. For each event, calls `self.on_venue_event(event)` → Rust FSM transition
|
||
3. Assembles `final_outcome` from the Rust kernel's **pre-venue-event** slot state
|
||
|
||
```python
|
||
outcome = _outcome_from_payload(result["outcome"]) # Rust CANCEL accepts (slot NOT mutated yet)
|
||
# ... venue.cancel() ...
|
||
# ... on_venue_event() for each event (now slot IS mutated) ...
|
||
final_slot = self._get_slot(outcome.slot_id) # Re-reads post-mutation state
|
||
final_outcome = KernelOutcome(
|
||
accepted=outcome.accepted, # TRUE — from Rust's pre-event accept
|
||
state=final_slot.fsm_state, # IDLE — from post-event state
|
||
diagnostic_code=outcome.diagnostic_code, # "OK" — from Rust's pre-event accept
|
||
)
|
||
```
|
||
|
||
For ENTER/EXIT, the same pattern exists — the Rust kernel's `outcome` is
|
||
pre-venue. But for CANCEL the disconnect is worst: Rust returns `accepted=true`
|
||
with the slot still in `ENTRY_WORKING`, and only the subsequent
|
||
`on_venue_event(CANCEL_ACK)` transitions to `IDLE`.
|
||
|
||
**Fix:** The diagnostic code should be reconciled with the actual venue outcome,
|
||
not taken from the pre-venue Rust outcome.
|
||
|
||
**Severity: Medium**
|
||
|
||
### F2: `_last_settled_pnl` reset before `venue.submit()` — transient window
|
||
|
||
**File:** `rust_backend.py:597-604`
|
||
|
||
```python
|
||
if intent.action == KernelCommandType.ENTER and outcome.accepted:
|
||
self._last_settled_pnl[intent.slot_id] = 0.0 # reset HERE
|
||
# ... venue.submit() called below ...
|
||
```
|
||
|
||
If `venue.submit()` fails (HTTP error, rate limit), the ENTER was accepted by
|
||
the Rust FSM but no venue order was placed. The slot is stuck in
|
||
`ORDER_REQUESTED`. If the caller retries the same ENTER, `_last_settled_pnl`
|
||
is 0.0 from the first attempt — correct for a new trade.
|
||
|
||
**Real risk:** If the previous trade on this slot had realized PnL that was
|
||
never settled (impossible with incremental settle, but hypothetically), resetting
|
||
to 0.0 loses that PnL. In practice, incremental settle makes this safe.
|
||
|
||
**Severity: Medium** (retry-safe, but exposes slot-stall)
|
||
|
||
### F3: `_first_invalid_intent_field` allows `leverage=0` and `target_size=0`
|
||
|
||
**File:** `rust_backend.py:295-316`
|
||
|
||
The guard catches NaN/Inf and negative `target_size`. Does NOT catch:
|
||
- `leverage=0` or negative (Rust silently falls back to 1.0)
|
||
- `target_size=0` (submits zero-quantity order to BingX)
|
||
- `reference_price=0` (mark_price ignores non-positive)
|
||
- `limit_price=0` with `order_type="LIMIT"` (BingX rejects price=0)
|
||
|
||
The zero-target-size case: a direct `process_intent(EXIT, target_size=0.0)`
|
||
computes `exit_size = 0`, submits MARKET order with quantity=0 to BingX,
|
||
which may return an error or silent no-op.
|
||
|
||
**Severity: Low** (runtime's `_exit_intent_from_slot` prevents for EXIT; direct
|
||
kernel API users can trigger it)
|
||
|
||
### F4: `outcome.emitted_events` only contains venue events — Rust kernel's events silently dropped
|
||
|
||
**File:** `rust_backend.py:641-652`
|
||
|
||
```python
|
||
final_outcome = KernelOutcome(
|
||
emitted_events=tuple(emitted_events), # only from venue.submit()
|
||
)
|
||
```
|
||
|
||
The Rust kernel's `KernelOutcome` struct has `emitted_events` — currently always
|
||
empty because the Rust FSM never sets it. If a future change adds Rust-side
|
||
event emission, those events are silently dropped: `final_outcome` only uses
|
||
the Python-side list.
|
||
|
||
**Severity: Low** (no Rust-emitted events exist today)
|
||
|
||
### F5: `on_venue_event` does redundant FFI read of slot already returned by Rust
|
||
|
||
**File:** `rust_backend.py:698-706**
|
||
|
||
```python
|
||
def on_venue_event(self, event):
|
||
result = _get_rust().on_venue_event(...)
|
||
outcome = _outcome_from_payload(result["outcome"])
|
||
slot_payload = result.get("slot")
|
||
slot = _slot_from_payload(slot_payload) if slot_payload else self._get_slot(...)
|
||
# ...
|
||
current = self._get_slot(slot.slot_id) # REDUNDANT — slot already has this data!
|
||
self.projection.write_slot(current)
|
||
```
|
||
|
||
Line 706 re-reads `current` from the backend even though `slot` (from the
|
||
Rust result) already has the exact same data. Each redundant FFI read is
|
||
JSON serialize → C FFI → Rust serialize → C FFI → Python parse — ~100μs.
|
||
With 2-3 events per process_intent and 10 slots, ~3ms wasted per cycle.
|
||
|
||
**Severity: Low** (performance)
|
||
|
||
### F6: `_record_transitions` in `process_intent` records pre-venue transitions with `event=None`
|
||
|
||
**File:** `rust_backend.py:708, 650**
|
||
|
||
```python
|
||
# process_intent line 650:
|
||
self._record_transitions(outcome.transitions, final_slot, None) # event=None
|
||
|
||
# on_venue_event line 708:
|
||
self._record_transitions(outcome.transitions, slot, event) # event attached
|
||
```
|
||
|
||
Venue-event transitions ARE recorded individually inside each
|
||
`on_venue_event` call (line 708). The journal has all transitions. But the
|
||
pre-venue transitions (from Rust FSM before venue call) have `event=None`
|
||
attached — no event context for the journal reader.
|
||
|
||
**Severity: Informational** (diagnostic inconvenience only)
|
||
|
||
### F7: `reconcile_from_slots` writes ALL slots to projection/zinc, not just reconciled ones
|
||
|
||
**File:** `rust_backend.py:718-733**
|
||
|
||
```python
|
||
for current in slots: # iterates ALL max_slots
|
||
self.projection.write_slot(current) # writes unchanged slots too
|
||
self.zinc_plane.write_slot(current)
|
||
```
|
||
|
||
After reconcile, ALL slots are written to projection and Zinc, even if the
|
||
reconcile only modified one slot. Slots 1-9 are serialized and written with
|
||
their unchanged state. Wasteful but harmless.
|
||
|
||
Also: Rust kernel's `reconcile_slots_json` silently ignores `slot_id` out of
|
||
range — no error returned. Caller sees `accepted=true` even if no slots were
|
||
reconciled.
|
||
|
||
**Severity: Low**
|
||
|
||
### F8: `HazelcastRowWriter.put()` is synchronous with no error handling — Hazelcast failure crashes the intent
|
||
|
||
**File:** `hazelcast_projection.py:30-48**
|
||
|
||
```python
|
||
class HazelcastRowWriter:
|
||
def __call__(self, name, row):
|
||
if name.endswith("trade_events"):
|
||
self.client.get_topic(name).publish(json.dumps(row, ...))
|
||
return
|
||
self.client.get_map(name).put(key, json_safe(row)) # synchronous, no try/except
|
||
```
|
||
|
||
No try/except. Hazelcast `put()` is synchronous — blocks until the cluster
|
||
acknowledges. If Hazelcast is down, under load, or partitioned, this:
|
||
|
||
1. Blocks the calling thread (which holds the Rust kernel handle — no other
|
||
operation can proceed)
|
||
2. Raises an exception that propagates through `_set_slot()` → `process_intent()`
|
||
→ crashes the entire intent
|
||
|
||
**Severity: Medium** (Hazelcast failure in hot path stalls execution)
|
||
|
||
### F9: `RealZincPlane.write_slot()` serializes ALL slots, not just the changed one
|
||
|
||
**File:** `real_zinc_plane.py:205-212**
|
||
|
||
```python
|
||
def write_slot(self, slot):
|
||
with self._lock:
|
||
self._slot_cache[int(slot.slot_id)] = slot
|
||
payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
|
||
self._write_region(self.state_region, self._state_seq, payload)
|
||
```
|
||
|
||
Every single-slot write serializes ALL `slot_count` slots (default 10) to JSON.
|
||
With VenueOrder metadata, each slot payload can be ~1-5KB → 10-50KB per write.
|
||
This is written to Zinc shared memory on every `process_intent()` and
|
||
`on_venue_event()` call.
|
||
|
||
`InMemoryZincPlane` does NOT have this problem — it only stores the one slot.
|
||
|
||
**Severity: Low** (performance + Zinc shared-memory capacity waste)
|
||
|
||
### F10: `RealZincPlane.write_slot` zeros buffer before write — concurrent read sees empty data
|
||
|
||
**File:** `real_zinc_plane.py:255-263**
|
||
|
||
```python
|
||
def _write_region(self, region, seq, payload):
|
||
buf = region.as_buffer()
|
||
view = memoryview(buf)
|
||
view[:] = b"\x00" * len(view) # Zeros the buffer
|
||
view[: len(packet)] = packet # Writes packet
|
||
region.notify()
|
||
```
|
||
|
||
Between the zero and the write, any concurrent reader sees zeros or a truncated
|
||
packet. `_decode_packet` checks `size <= len(buf) - 16` — a partially-written
|
||
packet fails validation and returns `{}`. The reader (e.g., another thread
|
||
calling `read_slots()`) gets an empty result.
|
||
|
||
Window is microseconds but it exists. No version guard — reader always returns
|
||
whatever is in the region.
|
||
|
||
**Severity: Low** (brief window, no corruption — just empty results)
|
||
|
||
### F11: `RealZincPlane._write_region` has no partial-write recovery
|
||
|
||
**File:** `real_zinc_plane.py:255-263**
|
||
|
||
If `_encode_packet` raises (JSON serialization error), the method raises before
|
||
writing — region retains previous content. Safe.
|
||
|
||
If `view[:] = b"\x00"` fails (memory error), the region is partially zeroed.
|
||
Not recoverable. No fallback.
|
||
|
||
**Severity: Low** (memory errors are extremely rare)
|
||
|
||
### F12: `InMemoryZincPlane` intent_region grows without bound
|
||
|
||
**File:** `zinc_plane.py:83-85**
|
||
|
||
```python
|
||
def publish_intent(self, intent):
|
||
self.intent_region.append(intent) # unbounded growth
|
||
```
|
||
|
||
`self.intent_region` is `List[KernelIntent]` — grows on every `publish_intent`
|
||
call. Over thousands of policy cycles, this grows without bound.
|
||
|
||
`RealZincPlane.publish_intent()` limits to last 512 entries in shared memory,
|
||
but its `self._intent_cache` (in-memory) also grows without bound.
|
||
|
||
**Severity: Low** (memory leak — ~MB/day)
|
||
|
||
### F13: `InMemoryZincPlane` uses non-re-entrant `threading.Condition`
|
||
|
||
**File:** `zinc_plane.py:41-43**
|
||
|
||
```python
|
||
_signal: threading.Condition = field(default_factory=threading.Condition)
|
||
```
|
||
|
||
`threading.Condition` is NOT re-entrant. If any code path calls back into
|
||
`publish_intent` while holding the condition's lock — deadlock.
|
||
|
||
**Severity: Low** (no current code path triggers this, but it's a landmine)
|
||
|
||
### F14: `KernelSlotView.__setattr__` round-trips unknown fields through Rust — silently dropped
|
||
|
||
**File:** `rust_backend.py:370-395**
|
||
|
||
If a new field is added to Python's `TradeSlot` that Rust's `TradeSlot` doesn't
|
||
know about, `slot.to_dict()` includes it. `_set_slot` serializes to JSON, sends
|
||
to Rust, which deserializes with `#[serde(default)]` — unknown fields are
|
||
silently dropped. The round-trip loses data without warning.
|
||
|
||
The reverse: if Rust adds a field that Python doesn't know about,
|
||
`_slot_from_payload` ignores unknown keys. Also silently dropped.
|
||
|
||
**Severity: Low** (fields must be added to both sides atomically; no guard)
|
||
|
||
### F15: `on_venue_event` loop in `process_intent` stops on first exception — slot left in partial state
|
||
|
||
**File:** `rust_backend.py:599-610**
|
||
|
||
```python
|
||
for event in emitted_events:
|
||
evt_outcome = self.on_venue_event(event) # NO TRY/EXCEPT
|
||
```
|
||
|
||
If `self.on_venue_event(event)` raises (FFI error, null pointer, OOM), the loop
|
||
stops. Events after the failing event are never processed. The slot is in a
|
||
partial state — some events applied, some not.
|
||
|
||
**Concrete scenario:** ACK arrives first → applied. FULL_FILL arrives second
|
||
→ FFI error, exception raised. Slot is stuck in `ENTRY_WORKING` with `size=0`.
|
||
Next `process_intent(EXIT)` returns `NO_OPEN_POSITION`. **No recovery path exists.**
|
||
|
||
**Severity: High** — single exception during fill feedback leaves slot
|
||
unrecoverable. Zero defense in depth.
|
||
|
||
### F16: `venue.submit()` returning empty events leaves slot in `ORDER_REQUESTED`
|
||
|
||
**File:** `rust_backend.py:599-610**
|
||
|
||
If `venue.submit()` returns `[]` (venue rejected order with no response, or
|
||
internal error), the `for` loop doesn't run. No `on_venue_event` is called.
|
||
Slot stays in Rust's pre-venue state (`ORDER_REQUESTED`).
|
||
|
||
`final_outcome` has `accepted=true, state=ORDER_REQUESTED, emitted_events=[]`.
|
||
Caller sees "successful" but no exchange order exists. Slot stuck in
|
||
`ORDER_REQUESTED` until `pump_venue_events()` or manual reconcile.
|
||
|
||
**Severity: Medium** — silent slot stall with no error indication.
|
||
|
||
### F17: Cancel truth-based confirmation returns `REJECTED` for already-cancelled orders on GET failure
|
||
|
||
**File:** `bingx_direct.py:474-498**
|
||
|
||
```python
|
||
try:
|
||
oo = await self._client.signed_get("/openApi/swap/v2/trade/openOrders", ...)
|
||
still_open = (venue_order_id in ids)
|
||
except Exception:
|
||
still_open = None # GET failed
|
||
|
||
if still_open is False:
|
||
return {"status": "CANCELED", ...}
|
||
# still_open is None (GET failed) or True (order still on book)
|
||
# Falls through to DELETE response check
|
||
```
|
||
|
||
If the DELETE succeeded but the verification GET failed (network blip, rate limit
|
||
on the verification endpoint), `still_open=None`. The code then checks the DELETE
|
||
response. If the DELETE returned an ambiguous error (e.g., "order not found"
|
||
because it was already cancelled by another path), the status is "ERROR" —
|
||
reported as REJECTED even though the order IS cancelled.
|
||
|
||
The `bingx_venue._events_from_cancel()` emits `CANCEL_REJECT`. The Rust FSM
|
||
handles `CANCEL_REJECT` as a no-op — slot stays in `EXIT_WORKING` with no
|
||
active order. Stuck until `pump_venue_events()` or manual reconcile.
|
||
|
||
**Severity: Medium** — needs a third state: "definitely cancelled,"
|
||
"probably cancelled," "definitely not cancelled."
|
||
|
||
### F18: Leverage-set and order-submit failures share error handler — poor diagnostics
|
||
|
||
**File:** `bingx_direct.py:376-417**
|
||
|
||
```python
|
||
await self._client.signed_post("/openApi/swap/v2/trade/leverage", ...) # step A
|
||
# ...
|
||
ack_payload = await self._client.signed_post("/openApi/swap/v2/trade/order", payload) # step B
|
||
```
|
||
|
||
If step A fails (400 for invalid symbol), the exception handler at line 417
|
||
catches `BingxHttpError` and returns REJECTED. No way for the caller to know
|
||
whether the leverage set failed or the order submission failed — both go through
|
||
the same handler. The error message just says "REJECTED."
|
||
|
||
Also: if step A succeeds and step B fails, leverage was changed on the exchange
|
||
but no order was placed. System state unchanged (leverage changes don't affect
|
||
capital), but diagnostics are poor.
|
||
|
||
**Severity: Low** (correct behavior, poor diagnostics)
|
||
|
||
### F19: `_events_from_submit` stale snapshot fallback → wrong fill detection
|
||
|
||
**File:** `bingx_venue.py:375-400**
|
||
|
||
`_filled_size_from_snapshots()` diffs position quantity before and after
|
||
submit. The "before" snapshot comes from `_backend_snapshot()` which can
|
||
return stale data (E13). A stale "before" against a fresh "after" produces
|
||
a wrong diff — could be negative, zero, or larger than reality.
|
||
|
||
This wrong diff propagates to `emitted_events` — the `PARTIAL_FILL` or
|
||
`FULL_FILL` event has wrong `filled_size`. The Rust kernel's `apply_fill`
|
||
uses this wrong `filled_size` to set `slot.size`. Capital settles on the
|
||
wrong delta.
|
||
|
||
**Severity: Medium** — wrong fill size propagates to kernel state and PnL.
|
||
|
||
### F20: `__del__` frees Rust handle at unpredictable GC time — no explicit `close()`
|
||
|
||
**File:** `rust_backend.py:558-566**
|
||
|
||
```python
|
||
def __del__(self):
|
||
backend = getattr(self, "_backend", None)
|
||
if backend is not None:
|
||
try: _get_rust().destroy(backend)
|
||
except: pass
|
||
```
|
||
|
||
`ExecutionKernel` has no `close()` method. The Rust `KernelHandle` is only
|
||
freed by `__del__`, which runs on the GC thread at unpredictable time. If
|
||
any code holds a stale reference to `self._backend`, the pointer dangles
|
||
when the kernel is GC'd.
|
||
|
||
`DITAv2LauncherBundle.close()` calls `_maybe_close` on venue, zinc, and
|
||
control plane — but NOT on kernel (which has no `close()` or `disconnect()`).
|
||
The kernel is leaked until GC.
|
||
|
||
**Severity: Medium** — reliance on `__del__` for critical C resource cleanup.
|
||
|
||
### F21: `DITAv2LauncherBundle.close()` closes venue before kernel is done with it
|
||
|
||
**File:** `launcher.py:90-95**
|
||
|
||
```python
|
||
def close(self):
|
||
_maybe_close(self.venue) # Closes HTTP client
|
||
_maybe_close(self.zinc_plane) # Closes Zinc regions
|
||
```
|
||
|
||
If the kernel is mid-`process_intent` in another thread (hypothetical —
|
||
single-threaded in practice), `venue.submit()` would fail because the HTTP
|
||
client is already closed. No ordering enforcement.
|
||
|
||
**Severity: Low** (single-threaded deployment)
|
||
|
||
### F22: Silent fallback from real Zinc/Hazelcast to in-memory on error — operator unaware
|
||
|
||
**File:** `control.py:210-217`, `launcher.py:175-185`, `projection.py:30-40`
|
||
|
||
```python
|
||
def build_control_plane(...):
|
||
if real_requested:
|
||
try:
|
||
return RealZincControlPlane(...)
|
||
except Exception:
|
||
pass # SILENT — operator never knows
|
||
return ZincControlPlane(snapshot=snapshot)
|
||
```
|
||
|
||
Three places have this pattern. An operator who configures `DITA_V2_ZINC=REAL`
|
||
and Zinc isn't available gets in-memory storage without any warning, error, or
|
||
log. The `ZincPlane` protocol has no introspection method to check if it's
|
||
real or in-memory.
|
||
|
||
The same applies to Hazelcast projection and the venue adapter.
|
||
|
||
**Severity: Medium** — configuration errors are silently masked.
|
||
|
||
### F23: `VenueEvent.size` = `intent.target_size` not actual fill — wrong for multi-leg EXIT
|
||
|
||
**File:** `bingx_venue.py:410-420**
|
||
|
||
```python
|
||
base_event = VenueEvent(
|
||
size=float(intent.target_size or 0.0), # target, not fill
|
||
)
|
||
```
|
||
|
||
For an EXIT leg, `intent.target_size` is the intended exit size. The ACK
|
||
event's `size` reflects the target, not the actual fill. For fully-filled
|
||
MARKET orders, `target == fill` so it's invisible. For partially-filled
|
||
LIMIT orders, `size` on the ACK is wrong.
|
||
|
||
The fill event later has `filled_size` from the venue's `executedQty`, so
|
||
the downstream kernel uses the correct fill size. The ACK's `size` is
|
||
unused by the kernel (the kernel uses `filled_size` for PnL computation).
|
||
|
||
**Severity: Informational** (unused by kernel)
|
||
|
||
### F24: `asyncio.run()` inside async function in test generator — nested event loops
|
||
|
||
**File:** `_build_pink_extended.py:75-81`
|
||
|
||
```python
|
||
def _check_open_orders(c, vs):
|
||
r = __import__('asyncio').run(c._request_json("GET", ...))
|
||
```
|
||
|
||
`asyncio.run()` is called INSIDE an `async def` context (the test body is
|
||
async). This creates a new event loop on the current thread, suspending
|
||
pytest's asyncio loop. Nested event loops are "not recommended" per Python
|
||
docs.
|
||
|
||
**Severity: Low** (works in practice)
|
||
|
||
### F25: `_build_fresh_kernel_from_slot` leaks old kernel objects per call
|
||
|
||
**File:** `_build_pink_extended.py:95-108**
|
||
|
||
```python
|
||
def _build_fresh_kernel_from_slot(slot_data, ic=25000.0):
|
||
cfg = _build_config(ic)
|
||
b = build_launcher_bundle(venue_mode="BINGX", ...) # NEW bundle, OLD not closed
|
||
k = b.kernel
|
||
return RB(runtime=Shim(k), config=cfg)
|
||
```
|
||
|
||
Each call creates a new launcher bundle (new kernel, new Rust handle, new HTTP
|
||
client, new Zinc plane) without closing the old one. Called 4 times across the
|
||
fresh-kernel test bodies. Leaks ~50MB per call (Rust lib, HTTP connections).
|
||
|
||
**Severity: Low** (test infrastructure only)
|
||
|
||
### F26: `seen_event_ids` not cleared on re-entry — event IDs accumulate across trades
|
||
|
||
**File:** `lib.rs:672-683`
|
||
|
||
When a slot re-enters (new ENTER after previous EXIT), the Rust kernel resets
|
||
most fields (lib.rs:740-765) but does NOT clear `seen_event_ids`. The new
|
||
trade inherits the previous trade's event history up to `MAX_SEEN_EVENT_IDS`
|
||
(256). After 256 events across multiple trades, old IDs are drained.
|
||
|
||
For MARKET trading (2-4 events per trade), this takes ~60-80 trades before
|
||
draining. For LIMIT trading (many partial fills), could be 5-10 trades.
|
||
|
||
**Fix:** `slot.seen_event_ids.clear()` on ENTER.
|
||
|
||
**Severity: Low** (event ID collision across trades is astronomically unlikely)
|
||
|
||
### F27: `RealZincControlPlane.read()` parses Zinc region every call — no caching
|
||
|
||
**File:** `real_control_plane.py:88-94**
|
||
|
||
```python
|
||
def read(self):
|
||
payload = _decode_packet(self.region.as_buffer()) # JSON parse every call
|
||
control = payload.get("control")
|
||
self._snapshot = KernelControlSnapshot(**control) # reconstruct every call
|
||
return self._snapshot
|
||
```
|
||
|
||
Called by `ExecutionKernel.control` property on every `process_intent()`.
|
||
Each call re-constructs a `KernelControlSnapshot` from dict — allocating
|
||
new objects for every field. ~50μs per call. A simple cached-until-modified
|
||
pattern would eliminate all parses between writes.
|
||
|
||
**Severity: Low** (performance)
|
||
|
||
### F28: `_legacy_intent` hardcodes `confidence=1.0` and `bars_held=0`
|
||
|
||
**File:** `bingx_venue.py:270-285`
|
||
|
||
These fields are in `LegacyIntent` but unused by `submit_intent()` (which
|
||
only reads `asset`, `side`, `action`, `target_size`, `leverage`, `metadata`).
|
||
The downstream ClickHouse rows use the policy-layer `Intent`, not `LegacyIntent`,
|
||
so the hardcoded values don't reach persistence.
|
||
|
||
Only propagates through the venue adapter's internal chain. No consumer reads
|
||
them today.
|
||
|
||
**Severity: Informational**
|
||
|
||
### F29: `_slot_to_payload` in `real_zinc_plane.py` is dead code
|
||
|
||
**File:** `real_zinc_plane.py:57-59**
|
||
|
||
```python
|
||
def _slot_to_payload(slot):
|
||
data = slot.to_dict()
|
||
return data
|
||
```
|
||
|
||
Defined, never called anywhere in the file. All slot serialization calls
|
||
`slot.to_dict()` directly.
|
||
|
||
**Severity: Informational**
|
||
|
||
### F30: Duplicate `_slot_from_payload` in `real_zinc_plane.py` and `rust_backend.py`
|
||
|
||
**File:** `real_zinc_plane.py:62-112**, `rust_backend.py:270-310`
|
||
|
||
Two nearly identical implementations. The `real_zinc_plane` version manually
|
||
constructs `VenueOrder` objects (lines 63-88) with different defaults
|
||
(e.g., fallback to slot `size` if `intended_size` missing). The `rust_backend`
|
||
version delegates to `_order_from_payload` with all-default fallbacks.
|
||
|
||
If fields are added to `TradeSlot` or `VenueOrder`, both must be updated.
|
||
|
||
**Severity: Low** (code duplication risk)
|
||
|
||
---
|
||
|
||
## Complete Flaw Catalog
|
||
|
||
### All-Passes Combined
|
||
|
||
| Family | Focus | Count | Critical | High | Medium | Low | Info |
|
||
|--------|-------|-------|----------|------|--------|-----|------|
|
||
| A | Architectural (old 13, now superseded) | 15 | 0 | 2 | 0 | 2 | 11 |
|
||
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
|
||
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
|
||
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
||
| **Total** | | **80** | **1** | **10** | **21** | **32** | **16** |
|
||
|
||
### Most Dangerous Single Flaw: F15
|
||
|
||
An exception in `on_venue_event()` during the fill-feedback loop stops the
|
||
chain mid-apply. The ACK applied but the FILL didn't. Slot in `ENTRY_WORKING`
|
||
with no position. **No retry mechanism, no recovery path.** The slot is stuck
|
||
forever until manual intervention. Zero defense in depth — no try/except, no
|
||
undo, no validation that the slot reached a consistent state.
|
||
|
||
This is the single highest-impact E2E flaw because it requires no concurrency,
|
||
no race condition, no unusual market conditions — just a transient FFI error
|
||
during normal operation.
|