PINK: centralize all flaw findings in FLAW_ANALYSIS doc
Rewrite PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md as the central registry with combined catalog (A+T+E+F+G = 116 flaws), severity distribution, and cross-references to the TRACE doc for deep E, F, G detail. Add reciprocal cross-reference in TRACE doc header. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
714
PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
Normal file
714
PINK_DITAv2_FLAW_ANALYSIS_2026-05-31.md
Normal file
@@ -0,0 +1,714 @@
|
||||
# PINK DITAv2 — Structural Flaw Analysis (CENTRAL)
|
||||
|
||||
**Analysis date:** 2026-05-31
|
||||
**Scope:** Full PINK pipeline — all flaws across all modules.
|
||||
**Sources:**
|
||||
- This file (A-series): Detailed writeups for architectural flaws.
|
||||
- [PINK_DITAv2_E2E_TRACE_ANALYSIS.md](./PINK_DITAv2_E2E_TRACE_ANALYSIS.md) (E, F, G-series):
|
||||
Full E2E data-flow trace, deep bridge/Zinc/lifecycle scans.
|
||||
Every E, F, G entry below is a summary only — full detail is in the TRACE doc.
|
||||
|
||||
---
|
||||
|
||||
## Combined Catalog (All Flaws, All Passes)
|
||||
|
||||
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|
||||
|------|-------|-------|----------|------|--------|-----|------|
|
||||
| A | Architectural (detailed in this file) | 15 | 0 | 2 | 0 | 2 | 11 |
|
||||
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
|
||||
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
|
||||
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
||||
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
|
||||
| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |
|
||||
|
||||
---
|
||||
|
||||
## T-Series: Threading & Atomicity Flaws
|
||||
|
||||
*Full detail in TRACE doc under "Threading & Atomicity" section.*
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| T1 | `InMemoryZincPlane` thread-Condition deadlock from slot update re-entrancy | Zinc | **Critical** |
|
||||
| T2 | Thread-unsafe kernel snapshot capture for account | Bridge | **High** |
|
||||
| T3 | Re-entrant or incorrectly-scoped Rust-kernel handle usage | Bridge | **High** |
|
||||
| T4 | Consequence: `on_venue_event` PnL settle races | Bridge | **High** |
|
||||
| T5 | Access to shared `_state_seq` / `_slot_cache` in `RealZincPlane` from multiple kernel calls | Zinc | Medium |
|
||||
| T6 | `_write_region` buffer zero + notify race with concurrent reader | Zinc | Medium |
|
||||
| T7 | Publication of events in `process_intent` loop not synchronized with persist | Bridge | Medium |
|
||||
| T8 | `asyncio.run` executor skip in `_run` leads to event-loop stall | Venue | Low |
|
||||
| T9 | No thread-safe Python↔Rust ownership / lifetime protocol | Bridge | Low |
|
||||
|
||||
---
|
||||
|
||||
## E-Series: E2E Data-Flow Flaws (Pass 1)
|
||||
|
||||
*Full detail in TRACE doc under "Layer 1" through "Layer 9."*
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| E1 | `step()` calls `pump_venue_events()` every cycle unconditionally | Runtime | **High** |
|
||||
| E2 | `kernel.snapshot()["account"]` returns a fresh dict, not a live view | Bridge | Low |
|
||||
| E3 | `_decision_to_kernel_intent` drops `order_type` and `limit_price` | Runtime | **High** |
|
||||
| E4 | `_exit_intent_from_slot` trusts slot.size but slot may be stale | Runtime | **High** |
|
||||
| E5 | JSON serialization round-trip loses numeric precision | Bridge | Low |
|
||||
| E6 | `_RustKernelLib` is a global singleton — shared across all kernels | Bridge | Low |
|
||||
| E7 | ENTER handler silently allows re-entry with same trade_id | Rust | **High** |
|
||||
| E8 | EXIT handler uses `initial_size` not current size | Rust | **High** |
|
||||
| E9 | CANCEL handler returns diagnostic even when nothing happened | Rust | Low |
|
||||
| E10 | `apply_fill` entry branch double-sets `active_entry_order` | Rust | Low |
|
||||
| E11 | `_legacy_intent()` is a lossy conversion | Venue | Low |
|
||||
| E12 | `_events_from_submit()` price fallback chain can lose venue price | Venue | Low |
|
||||
| E13 | `_backend_snapshot()` timeout returns stale data | Venue | Medium |
|
||||
| E14 | `_events_from_cancel` uses stale `slot_id` from order metadata | Venue | Low |
|
||||
| E15 | Submit sets leverage via separate HTTP call | Adapter | Medium |
|
||||
| E16 | `_format_quantity`/`_format_price` may use zero tick/step | Adapter | Medium |
|
||||
| E17 | Cancel uses truth-based confirmation — can mask real errors | Adapter | Medium |
|
||||
| E18 | `on_venue_event` settles PnL incrementally — fees never included | Bridge | Medium |
|
||||
| E19 | `observe_slots` called with ALL slots, not just changed ones | Bridge | Low |
|
||||
| E20 | `_capital()` reads live from `AccountProjection` — stale row risk | Persistence | Low |
|
||||
| E21 | `persist_fill_events()` synthesizes fake Decision/Intent | Persistence | Medium |
|
||||
| E22 | `_write_trade_exit_leg` capital_before uses arithmetic reconstruction | Persistence | Medium |
|
||||
| E23 | `_write_trade_event` uses entry_price as exit_price | Persistence | Medium |
|
||||
| E24 | Mock venue always emits fill on `partial_fill_ratio > 0` | Test | Low |
|
||||
| E25 | Test scenarios use MARKET-only `_si()` helper — no LIMIT tests | Test | Low |
|
||||
| E26 | Fresh-kernel reconcile tests create second kernel but share venue | Test | Low |
|
||||
|
||||
---
|
||||
|
||||
## F-Series: Deep Bridge/Zinc/Lifecycle Flaws (Pass 3)
|
||||
|
||||
*Full detail in TRACE doc under "PASS 3 — NEW FINDINGS."*
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| F1 | CANCEL returns "accepted" before cancel happens — stale diagnostic_code | Bridge | Medium |
|
||||
| F2 | `_last_settled_pnl` reset before `venue.submit()` — transient window | Bridge | Medium |
|
||||
| F3 | `_first_invalid_intent_field` allows `leverage=0` and `target_size=0` | Bridge | Low |
|
||||
| F4 | `outcome.emitted_events` only from venue — Rust kernel events dropped | Bridge | Low |
|
||||
| F5 | `on_venue_event` redundant FFI read of slot already returned by Rust | Bridge | Low |
|
||||
| F6 | `process_intent` records pre-venue transitions with `event=None` | Bridge | Info |
|
||||
| F7 | `reconcile_from_slots` writes ALL slots to projection/zinc | Bridge | Low |
|
||||
| F8 | `HazelcastRowWriter.put()` synchronous, no error handling — crashes intent | Projection | **Medium** |
|
||||
| F9 | `RealZincPlane.write_slot()` serializes ALL slots, not just changed one | Zinc | Low |
|
||||
| F10 | `RealZincPlane` zeros buffer before write — concurrent read sees empty | Zinc | Low |
|
||||
| F11 | `RealZincPlane._write_region` no partial-write recovery | Zinc | Low |
|
||||
| F12 | `InMemoryZincPlane` intent_region grows without bound | Zinc | Low |
|
||||
| F13 | `InMemoryZincPlane` uses non-re-entrant `threading.Condition` | Zinc | Low |
|
||||
| F14 | `KernelSlotView.__setattr__` round-trips unknown fields — silently dropped | Bridge | Low |
|
||||
| F15 | `on_venue_event` loop stops on first exception — slot left in partial state | Bridge | **High** |
|
||||
| F16 | `venue.submit()` returning empty events leaves slot in ORDER_REQUESTED | Bridge | Medium |
|
||||
| F17 | Cancel truth-based confirmation returns REJECTED for already-cancelled orders | Adapter | Medium |
|
||||
| F18 | Leverage-set and order-submit failures share error handler | Adapter | Low |
|
||||
| F19 | `_events_from_submit` stale snapshot fallback → wrong fill detection | Venue | Medium |
|
||||
| F20 | `__del__` frees Rust handle at unpredictable GC time — no explicit close() | Bridge | **Medium** |
|
||||
| F21 | `DITAv2LauncherBundle.close()` closes venue before kernel is done | Launcher | Low |
|
||||
| F22 | Silent fallback from real Zinc/Hazelcast to in-memory — operator unaware | Launcher | **Medium** |
|
||||
| F23 | `VenueEvent.size` = `intent.target_size` not actual fill | Venue | Info |
|
||||
| F24 | `asyncio.run()` inside async function in test generator | Test | Low |
|
||||
| F25 | `_build_fresh_kernel_from_slot` leaks old kernel objects per call | Test | Low |
|
||||
| F26 | `seen_event_ids` not cleared on re-entry — accumulates across trades | Rust | Low |
|
||||
| F27 | `RealZincControlPlane.read()` parses Zinc region every call — no caching | Control | Low |
|
||||
| F28 | `_legacy_intent` hardcodes confidence=1.0, bars_held=0 | Venue | Info |
|
||||
| F29 | `_slot_to_payload` in real_zinc_plane.py is dead code | Zinc | Info |
|
||||
| F30 | Duplicate `_slot_from_payload` in real_zinc_plane.py and rust_backend.py | Zinc | Low |
|
||||
|
||||
---
|
||||
|
||||
## G-Series: Domain Scans — Rust Kernel, Config, Persistence, Lifecycle (Pass 4)
|
||||
|
||||
*Full detail in TRACE doc under "PASS 4 — SYSTEMATIC DOMAIN SCANS."*
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| G1 | EXIT_RESIDUAL action missing from Rust KernelCommandType enum | Rust | **Critical** |
|
||||
| G2 | `into_c_string` unwrap() panics on NUL byte in FFI string | Rust | **Critical** |
|
||||
| G3 | EXIT hardcodes prev_state=POSITION_OPEN — allows backward FSM transition | Rust | **Critical** |
|
||||
| G4 | `consume_exit_leg` stale `all_legs_done` variable — wrong branch after last leg | Rust | **Critical** |
|
||||
| G5 | `realized_pnl` unbounded f64 — overflows to inf at extreme values | Rust | **High** |
|
||||
| G6 | `mark_price` produces unbounded unrealized_pnl — no result guard | Rust | **High** |
|
||||
| G7 | ENTER no is_finite() guard on target_size | Rust | **High** |
|
||||
| G8 | `reconcile_slots_json` no dedup or bounds validation | Rust | **High** |
|
||||
| G9 | `exchange_order_id` update targets wrong order — exit cancel broken | Rust | **High** |
|
||||
| G10 | CANCEL diagnostic always says NO_ACTIVE_EXIT_ORDER | Rust | **High** |
|
||||
| G11 | `apply_fill` overwrites intended_size with slot.size | Rust | Medium |
|
||||
| G12 | No max leverage cap enforced by kernel | Rust | Medium |
|
||||
| G13 | `resolve_slot` fallback returns unwrap_or(0) — misroutes events | Rust | Medium |
|
||||
| G14 | `commit_slot` silently ignores out-of-bounds slot_id | Rust | Medium |
|
||||
| G15 | Zero `__post_init__` validators on all 16 config dataclasses (127 fields) | Config | **High** |
|
||||
| G16 | DITA_V2_DEBUG_CLICKHOUSE defaults to True when unset | Config | Info |
|
||||
| G17 | String config fields — Zinc region injection risk | Config | Medium |
|
||||
| G18 | `exit_leg_ratios` no sum-to-1 validation | Config | Low |
|
||||
| G19 | RealZincControlPlane.read() no sequence check — torn-read risk | Config | Low |
|
||||
| G20 | ClickHouse journal strategy/db env vars — SQL injection risk | Config | Low |
|
||||
| G21 | entry_price used as exit_price in trade_events — data loss | Persistence | **High** |
|
||||
| G22 | active_leg_index → entry_bar semantic mis-mapping | Persistence | Medium |
|
||||
| G23 | capital_before arithmetic absorbs cross-slot PnL | Persistence | Medium |
|
||||
| G24 | Recovery trade_reconstruction always has trade_id="" | Persistence | Medium |
|
||||
| G25 | seen_event_ids, exit_leg_ratios, VenueOrder, metadata not in flat CH tables | Persistence | Low |
|
||||
| G26 | _safe_float silently converts NaN/None/Inf to 0.0 | Persistence | Low |
|
||||
| G27 | build_launcher_bundle no exception safety — prior resources leak | Lifecycle | **High** |
|
||||
| G28 | RealZincPlane/RealZincControlPlane no __del__ — SHM orphaned | Lifecycle | **High** |
|
||||
| G29 | Zero signal handlers — no cleanup on SIGTERM/SIGINT | Lifecycle | **High** |
|
||||
| G30 | ExecutionKernel has no close() — relies on __del__ for Rust handle | Lifecycle | **High** |
|
||||
| G31 | Hazelcast projection never closed | Lifecycle | Medium |
|
||||
| G32 | _maybe_close() break skips second method | Lifecycle | Low |
|
||||
| G33 | close() not idempotent for RealZinc components | Lifecycle | Low |
|
||||
| G34 | No context manager on DITAv2LauncherBundle | Lifecycle | Low |
|
||||
| G35 | BingxVenueAdapter.connect() never called | Lifecycle | Info |
|
||||
| G36 | Only one try/finally in entire codebase | Lifecycle | **High** |
|
||||
|
||||
---
|
||||
|
||||
## A-Series: Architectural Flaws (detailed writeups)
|
||||
|
||||
*These are the original architectural flaws with full analysis.*
|
||||
|
||||
---
|
||||
|
||||
### Flaw A1: Exit-size overshoot on multi-leg with initial_size > remaining size
|
||||
|
||||
**Location:** `_rust_kernel/src/lib.rs` lines ~770-780 (EXIT handler in `process_intent`)
|
||||
|
||||
**Severity:** **High**
|
||||
|
||||
**Nature:** Logic error — wrong base for exit-size computation.
|
||||
|
||||
### Downstream effect
|
||||
|
||||
The EXIT handler computes the exit size as `base_size * exit_ratio` where:
|
||||
```rust
|
||||
let base_size = if slot.initial_size > 0.0 { slot.initial_size } else { slot.size };
|
||||
```
|
||||
|
||||
After partial fills (e.g., two separate MARKET exit legs), `initial_size` is still the
|
||||
**original** entry size while `slot.size` has been reduced by previous legs. If the
|
||||
cumulative leg ratios don't sum to exactly 1.0 (or the final ratio is not 1.0), the
|
||||
computed exit size can exceed the remaining position.
|
||||
|
||||
The venue adapter clamps to actual position via `reduceOnly`, but the kernel's _own_
|
||||
accounting reduces `slot.size` by the fill size, not by the intended exit size. The
|
||||
slot can therefore go negative (`slot.size < 0`) if the fill is larger than remaining.
|
||||
|
||||
### Exact trigger
|
||||
|
||||
1. Enter SHORT, size=1.0, `initial_size=1.0`, ratios=(0.6, 0.6, 1.0) — note ratios sum > 1.0
|
||||
2. EXIT leg 0: `exit_size = 1.0 * 0.6 = 0.6`. Fill consumes 0.6. Slot size goes to 0.4.
|
||||
3. EXIT leg 1: `exit_size = 1.0 * 0.6 = 0.6`. But remaining is 0.4. Requests 0.6.
|
||||
4. BingX `reduceOnly` clamps fill to 0.4. Slot size goes to 0.0.
|
||||
5. EXIT leg 2 (ratio 1.0): `exit_size = 1.0 * 1.0 = 1.0`. Slot is already at 0.0.
|
||||
Kernel returns `NO_OPEN_POSITION` — the final EXIT is rejected because `slot.closed`
|
||||
was not set by the previous fill (it was a partial close, not terminal).
|
||||
6. Slot is at size=0.0, `!slot.closed`, no active orders, but `!slot.is_free()` because
|
||||
`size <= 0.0` is true but `fsm_state != IDLE/CLOSED` — slot is **stuck** in
|
||||
`POSITION_OPEN` with zero size.
|
||||
|
||||
This is **not** purely a mis-sized ratio problem. With MARKET orders that fill fully,
|
||||
even correct ratios can leave the slot stuck if the fill price differs from the
|
||||
intended-size price and the venue adjusts fill quantity.
|
||||
|
||||
### Fix strategy
|
||||
|
||||
Use `slot.size` directly as the base (not `initial_size`):
|
||||
```rust
|
||||
let exit_size = (slot.size * exit_ratio).max(0.0).min(slot.size);
|
||||
```
|
||||
|
||||
This guarantees the exit never requests more than the remaining position, regardless
|
||||
of cumulative ratio math. The venue still clamps, but the kernel's intent is correct.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A2: Misleading CANCEL diagnostic code on entry-only slots
|
||||
|
||||
**Location:** `_rust_kernel/src/lib.rs` lines ~798-810 (CANCEL rejection path)
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Diagnostic pollution — wrong error code.
|
||||
|
||||
### Downstream effect
|
||||
|
||||
When a CANCEL intent arrives and **neither** `active_exit_order` nor
|
||||
`active_entry_order` is cancellable, the kernel returns:
|
||||
```rust
|
||||
diagnostic_code: KernelDiagnosticCode::NO_ACTIVE_EXIT_ORDER
|
||||
```
|
||||
|
||||
But the reason may be that there's no active entry order either, or the FSM state
|
||||
doesn't permit cancellation. The diagnostic name suggests an exit-order-specific
|
||||
problem when the failure is generic "nothing to cancel."
|
||||
|
||||
### Fix
|
||||
|
||||
Change to a generic `NO_ACTIVE_ORDER` diagnostic or `SLOT_IDLE` when the slot is
|
||||
already in IDLE. `NO_ACTIVE_EXIT_ORDER` is misleading for a slot that has never had
|
||||
any order.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A3: Float-accumulated slot.size after partial fills can go negative
|
||||
|
||||
**Location:** `_rust_kernel/src/lib.rs` lines ~1365-1370 (apply_fill exit path)
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Numerical precision edge case.
|
||||
|
||||
### Code path
|
||||
|
||||
```rust
|
||||
slot.size = (slot.size - fill_size).max(0.0);
|
||||
```
|
||||
|
||||
This clamps to zero, which is correct. But if the venue fills *more* than requested
|
||||
(on BingX, this can happen with market orders where the fill walks the book), the
|
||||
slot sees `fill_size > intended_size`. The `max(0.0)` prevents negative, but the
|
||||
slot then reports `size=0.0` with `!closed` and an FSM state that's not IDLE.
|
||||
|
||||
The `is_free()` check requires `size <= 0.0` AND `fsm_state in {IDLE, CLOSED}`. A
|
||||
slot with `size=0.0` and `fsm_state=POSITION_OPEN` is stuck — no EXIT will be
|
||||
accepted and no ENTER can start.
|
||||
|
||||
### Trigger
|
||||
|
||||
Submit an EXIT for 0.6 of remaining 0.6. BingX fills 0.8 (market order walks the
|
||||
book, overshoots). `fill_size=0.8`, `slot.size = (0.6 - 0.8).max(0.0) = 0.0`.
|
||||
Slot is now size=0, fsm_state=EXIT_WORKING (or POSITION_OPEN), `closed=false`.
|
||||
|
||||
### Fix
|
||||
|
||||
When `slot.size <= 1e-12` after a fill and the slot is in an exit-related state,
|
||||
force transition to CLOSED/IDLE regardless of leg index:
|
||||
```rust
|
||||
if slot.size <= 1e-12 {
|
||||
slot.closed = true;
|
||||
slot.fsm_state = TradeStage::CLOSED;
|
||||
slot.active_exit_order = None;
|
||||
slot.active_entry_order = None;
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Flaw A4: Entry price is clobbered by mark_price if called before fill arrives
|
||||
|
||||
**Location:** `_rust_kernel/src/lib.rs` lines ~432-436 (mark_price) and ~1390 (apply_fill entry branch)
|
||||
|
||||
**Severity:** **Medium**
|
||||
|
||||
**Nature:** Accounting accuracy — incorrect PnL base.
|
||||
|
||||
### Code path
|
||||
|
||||
```rust
|
||||
// In mark_price:
|
||||
if self.entry_price <= 0.0 {
|
||||
self.entry_price = price; // Seeds entry_price from mark before fill
|
||||
}
|
||||
|
||||
// In apply_fill (entry):
|
||||
if event.price > 0.0 {
|
||||
slot.entry_price = event.price; // Overwrites with actual fill price
|
||||
}
|
||||
```
|
||||
|
||||
The `mark_price` path seeds `entry_price` from a market price when the slot has no
|
||||
fill yet. The `apply_fill` entry path correctly overwrites with the actual fill price.
|
||||
So in the normal flow this is harmless — the fill overwrites the mark.
|
||||
|
||||
**However**, consider this sequence:
|
||||
1. ENTER intent accepted → slot goes `ORDER_REQUESTED`, `entry_price = 0.0`
|
||||
2. `runtime.step()` calls `kernel.mark_price(snapshot.symbol, snapshot.price)` → sets `entry_price = 100.0`
|
||||
3. `on_venue_event(ORDER_ACK)` → `ENTRY_WORKING`, `entry_price` still `100.0`
|
||||
4. `on_venue_event(PARTIAL_FILL)` → `apply_fill` sets `entry_price = 99.5` (fill price)
|
||||
5. Unrealized PnL from step 2-3 used a mark price of 100.0, not the fill price of 99.5
|
||||
|
||||
This is a transient mis-valuation window. It corrects itself on the next `observe_slots`
|
||||
call, but intra-step readers see wrong unrealized PnL. Not critical because:
|
||||
- `account.snapshot.unrealized_pnl` uses the slot's `unrealized_pnl`, not the mark
|
||||
- Realized PnL is computed from actual fill prices
|
||||
- The window lasts at most one scan cycle (~5s)
|
||||
|
||||
### Fix
|
||||
|
||||
Don't set `entry_price` from `mark_price` when there's no fill:
|
||||
```rust
|
||||
fn mark_price(&mut self, price: f64) {
|
||||
if !price.is_finite() || price <= 0.0 { return; }
|
||||
// Don't seed entry_price — leave it at 0.0 until a fill arrives
|
||||
if self.entry_price <= 0.0 || self.size <= 0.0 {
|
||||
self.unrealized_pnl = 0.0;
|
||||
return;
|
||||
}
|
||||
// ... normal PnL computation
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Flaw A5: Capital-before computation is arithmetic not snapshot-based
|
||||
|
||||
**Location:** `pink_clickhouse.py` lines ~761-762 (`_write_trade_exit_leg`) and ~822-823 (`_write_trade_event`)
|
||||
|
||||
**Severity:** **High**
|
||||
|
||||
**Nature:** Accounting accuracy — wrong capital_before under multi-slot or intervening events.
|
||||
|
||||
### Code pattern (appears in two places)
|
||||
|
||||
```python
|
||||
capital_after = self._capital()
|
||||
capital_before = capital_after - pnl_leg # In _write_trade_exit_leg
|
||||
capital_before = capital_after - pnl # In _write_trade_event
|
||||
```
|
||||
|
||||
This reconstructs `capital_before` by subtracting the current leg's PnL from the
|
||||
current capital. This is **only correct** if:
|
||||
1. No other slots settled PnL between this leg and the previous one
|
||||
2. No capital corrections (reconcile, manual override) happened between legs
|
||||
3. No fees were deducted between legs
|
||||
|
||||
With multi-slot (PINK configurable `max_slots > 1`), a concurrent trade on slot 1
|
||||
that closes between slot 0's exit legs will have its PnL baked into `capital_after`,
|
||||
making `capital_before = capital_after - pnl_leg` wrong.
|
||||
|
||||
### Fix
|
||||
|
||||
Maintain a per-trade `capital_before_leg` snapshot taken at the moment of the
|
||||
first fill event for each trade, advancing it by the realized PnL of each leg:
|
||||
```python
|
||||
self._leg_state[trade_id]["capital_before"] = prev.get("capital_after", capital_after - pnl_leg)
|
||||
self._leg_state[trade_id]["capital_after"] = capital_after
|
||||
```
|
||||
|
||||
And use `prev["capital_before"]` for the row, not `capital_after - pnl_leg`.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A6: Reconcile accoun(t) reseeds capital from kernel, not exchange
|
||||
|
||||
**Location:** `pink_direct.py` lines ~597-630 (`recover_account`) and docstring of `reconcile_account`
|
||||
|
||||
**Severity:** **Medium**
|
||||
|
||||
**Nature:** Operational drift — capital is never verified against exchange truth in hot loop.
|
||||
|
||||
### The gap
|
||||
|
||||
`reconcile_account()` (line 632) has this docstring:
|
||||
```
|
||||
Periodic exchange-led account sync.
|
||||
Capital is re-seeded from the exchange balance as a guard against long-running drift
|
||||
```
|
||||
|
||||
But the actual implementation:
|
||||
```python
|
||||
async def reconcile_account(self, ...) -> dict[str, Any]:
|
||||
return await self.recover_account(...)
|
||||
|
||||
async def recover_account(self, ...) -> dict[str, Any]:
|
||||
capital = float(self.kernel.account.snapshot.capital or 25000.0)
|
||||
_reconcile_position_slot(self.kernel, capital, slot_id=0)
|
||||
```
|
||||
|
||||
It passes the **kernel's own capital** to `_reconcile_position_slot`, which then
|
||||
overwrites `kernel.account.snapshot.capital` with... the same value. No exchange
|
||||
balance poll ever overwrites capital.
|
||||
|
||||
`connect()` at line 224 does the same — it passes `initial_capital` (an env default),
|
||||
not the exchange balance. The exchange balance is never read for capital seeding
|
||||
in the current code path. `_reconcile_position_slot` does call
|
||||
`venue.open_positions()`, but it only reads positions, not capital.
|
||||
|
||||
### Effect
|
||||
|
||||
Capital drift (caused by fees the kernel doesn't track, unrealized PnL mis-valuation,
|
||||
or any other systematic error) accumulates monotonically. There is no mechanism to
|
||||
detect or correct drift. Over weeks of live trading, the kernel's capital snapshot
|
||||
can diverge arbitrarily from the exchange's actual balance.
|
||||
|
||||
### Fix
|
||||
|
||||
Either:
|
||||
1. Make `_reconcile_position_slot` read the exchange balance and use it for
|
||||
capital reseeding (the docstring claims it does this already), or
|
||||
2. Add a separate capital-verification path that surfaces the delta between
|
||||
kernel capital and exchange balance as an anomaly, even if it doesn't auto-correct.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A7: No fee tracking in kernel accounting
|
||||
|
||||
**Location:** `rust_backend.py` lines ~540-545 (on_venue_event settle), `bingx_direct.py` submit_intent return
|
||||
|
||||
**Severity:** **Medium**
|
||||
|
||||
**Nature:** Accounting accuracy — fees are invisible to capital tracking.
|
||||
|
||||
### Downstream effect
|
||||
|
||||
When a trade closes, the kernel computes:
|
||||
```rust
|
||||
realized_pnl = delta * notional
|
||||
```
|
||||
|
||||
This is **gross** PnL. BingX charges fees on every fill (taker ~0.04%, maker ~0.02%).
|
||||
These fees are never subtracted from the kernel's realized PnL. Over 100 trades with
|
||||
$100 average notional at 0.04%, the cumulative error is $4 — negligible. Over 10,000
|
||||
trades at 10x leverage and $50k average notional, the error is $200k.
|
||||
|
||||
The `BingxDirectExecutionAdapter` does return `ExecutionReceipt` with fill data,
|
||||
but `bingx_venue._events_from_submit()` only reads `price` and `filled_size` —
|
||||
commission/fee fields are ignored.
|
||||
|
||||
### Fix
|
||||
|
||||
1. Read fee/commission from the BingX ack payload in `_events_from_submit()`
|
||||
2. Pass fees through `VenueEvent.metadata["fee"]`
|
||||
3. In the Rust kernel's `apply_fill`, subtract the fee from realized PnL:
|
||||
```rust
|
||||
let fee = event.metadata.get("fee").and_then(|v| v.as_f64()).unwrap_or(0.0);
|
||||
slot.realized_pnl += realized - fee;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Flaw A8: ENTER intent silently defaults leverage to 1.0 on bad input
|
||||
|
||||
**Location:** `_rust_kernel/src/lib.rs` lines ~745-748
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Silent fallback — corrupt input produces a trade, not a rejection.
|
||||
|
||||
```rust
|
||||
slot.leverage = if intent.leverage.is_finite() && intent.leverage > 0.0 {
|
||||
intent.leverage
|
||||
} else {
|
||||
1.0
|
||||
};
|
||||
```
|
||||
|
||||
A NaN, zero, negative, or infinite leverage value silently trades at 1x instead of
|
||||
rejecting the intent. The Python bridge does validate `_first_invalid_intent_field()`
|
||||
which catches NaN/inf, but it doesn't catch `leverage <= 0.0` (it only checks
|
||||
`not math.isfinite(value)`).
|
||||
|
||||
### Fix
|
||||
|
||||
Add `leverage <= 0.0` to the Python bridge's invalid-intent check. The Rust kernel
|
||||
should still have the `1.0` fallback as a defensive measure, but the bridge should
|
||||
prevent bad leverages from reaching Rust in the first place.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A9: Mock venue submit condition convoluted — dead code paths
|
||||
|
||||
**Location:** `mock_venue.py` lines ~60-90
|
||||
|
||||
**Severity:** **Informational**
|
||||
|
||||
**Nature:** Code clarity — confusing condition logic.
|
||||
|
||||
```python
|
||||
if self.scenario.emit_ack_before_fill or not self.scenario.emit_fill_on_submit:
|
||||
events.append(ack_event)
|
||||
if self.scenario.emit_fill_on_submit or self.scenario.partial_fill_ratio > 0:
|
||||
# ... fill events
|
||||
```
|
||||
|
||||
The condition logic is confusing:
|
||||
- When `emit_ack_before_fill=True` and `emit_fill_on_submit=True`: both branches run → ACK + fill
|
||||
- When `emit_ack_before_fill=False` and `emit_fill_on_submit=True`: first branch runs because
|
||||
`not True = False`, so `False or False = False` → no ACK. Second branch runs → fill only.
|
||||
This produces a fill without an ACK, which is **not** a realistic venue scenario.
|
||||
- When `partial_fill_ratio=1.0` (default): second branch runs and emits a `FULL_FILL` event
|
||||
even when `emit_fill_on_submit=False`, because `0.0 or 1.0 > 0 = True`.
|
||||
|
||||
The partial fill ratio check should be gated on `emit_fill_on_submit`:
|
||||
```python
|
||||
should_emit_fill = self.scenario.emit_fill_on_submit or (
|
||||
is_entry and self.scenario.entry_partial_fill_ratio > 0
|
||||
) or (
|
||||
not is_entry and self.scenario.exit_partial_fill_ratio > 0
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Flaw A10: Pump venue events on every step cycle — expensive for MARKET-only flows
|
||||
|
||||
**Location:** `pink_direct.py` lines ~318-374 (`pump_venue_events`), called at line ~436
|
||||
|
||||
**Severity:** **Medium**
|
||||
|
||||
**Nature:** Operational overhead — unnecessary exchange HTTP calls.
|
||||
|
||||
### The problem
|
||||
|
||||
`step()` calls `pump_venue_events()` **every cycle**, which calls `venue.reconcile()`.
|
||||
For `BingxVenueAdapter`, `reconcile()` calls `_backend_snapshot()` which does up to 5
|
||||
HTTP requests (balance, positions, open orders) in parallel. For a MARKET-only workflow
|
||||
where orders fill synchronously within `process_intent()`, there are **no** late fills
|
||||
to drain.
|
||||
|
||||
On BingX VST, the rate limit is ~10 requests/second across all endpoints. Each
|
||||
`pump_venue_events()` call consumes 5+ of that budget. At a 5-second policy cycle,
|
||||
this is 60 requests/minute — 60% of the rate budget — just to poll for fills that
|
||||
don't exist.
|
||||
|
||||
### Fix
|
||||
|
||||
Gate the pump on whether the previous cycle submitted a LIMIT order:
|
||||
```python
|
||||
self._has_resting_order = any(
|
||||
o.status not in (VenueOrderStatus.FILLED, VenueOrderStatus.CANCELED)
|
||||
for o in kernel.open_orders()
|
||||
)
|
||||
if self._has_resting_order:
|
||||
await self.pump_venue_events(snapshot, market_state=market_state)
|
||||
```
|
||||
|
||||
Or add a config flag `async_fill_mode: bool = False`.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A11: VenueAdapter.submit() blocks the event loop
|
||||
|
||||
**Location:** `bingx_venue.py` lines ~225-233 (`_run`)
|
||||
|
||||
**Severity:** **Medium**
|
||||
|
||||
**Nature:** Runtime safety — synchronous call in async context.
|
||||
|
||||
```python
|
||||
def _run(self, result: Any) -> Any:
|
||||
if inspect.isawaitable(result):
|
||||
try:
|
||||
asyncio.get_running_loop()
|
||||
except RuntimeError:
|
||||
return asyncio.run(result)
|
||||
pool = self._get_executor()
|
||||
return pool.submit(asyncio.run, result).result()
|
||||
```
|
||||
|
||||
When called from `step()` (which is an async function), `_run` submits the async
|
||||
`submit_intent()` to a thread pool, runs it with `asyncio.run()`, then calls
|
||||
`.result()` which blocks the current thread until complete. The BingX HTTP call
|
||||
can take 1-5 seconds depending on network latency and exchange load.
|
||||
|
||||
During this block, the event loop **cannot** process other async tasks (data feed
|
||||
updates, health checks, signal processing). In a single-runtime deployment, this
|
||||
stalls the entire policy cycle.
|
||||
|
||||
### Fix
|
||||
|
||||
Make `process_intent` in `ExecutionKernel` accept an async venue callback, or
|
||||
make `BingxVenueAdapter` truly async (not sync-with-thread-bridge). For now,
|
||||
at minimum the PINK runtime should run `step()` in an executor to avoid blocking
|
||||
the main event loop.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A12: Stale KernelStateView slot references after reconcile
|
||||
|
||||
**Location:** `rust_backend.py` lines ~350-365 (`KernelStateView.refresh`)
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Stale data — view not rebuilt on reconcile.
|
||||
|
||||
```python
|
||||
class KernelStateView:
|
||||
def __init__(self, kernel):
|
||||
self.slots = [KernelSlotView(kernel, slot_id) for slot_id in range(kernel.max_slots)]
|
||||
# ...
|
||||
|
||||
def refresh(self) -> None:
|
||||
snapshot = self._kernel._snapshot_backend()
|
||||
self.active_trade_index = dict(snapshot.get("active_trade_index", {}))
|
||||
self.venue_order_index = dict(snapshot.get("venue_order_index", {}))
|
||||
self.client_order_index = dict(snapshot.get("client_order_index", {}))
|
||||
```
|
||||
|
||||
`refresh()` updates the index maps but does **not** recreate `self.slots`. The slot
|
||||
views in `self.slots` are live proxies (they read through `_get_slot` each time),
|
||||
so slot data is current. But if `max_slots` changes (it shouldn't, but it's mutable)
|
||||
or if slots are re-indexed by a reconcile, the view list is wrong.
|
||||
|
||||
Not critical because `max_slots` is set at init and never changes, but worth
|
||||
fixing for robustness.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A13: `persist_fill_events` uses current price as exit price
|
||||
|
||||
**Location:** `pink_clickhouse.py` lines ~408
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Historical accuracy — logged price may not match fill price.
|
||||
|
||||
```python
|
||||
price = next((float(getattr(e, "price", 0.0) or 0.0) for e in event_list
|
||||
if getattr(e, "price", 0.0)), 0.0) or self._slot_entry_price(slot)
|
||||
```
|
||||
|
||||
This correctly reads from the event's price. But `decision.reference_price` at line
|
||||
417 falls back to this price, which is the fill price. The trade_event row at line
|
||||
835 uses `exit_price = slot_dict.get("entry_price", ...)` — which is the **entry**
|
||||
price, not the exit price. The trade_event always shows exit_price == entry_price.
|
||||
|
||||
This means `trade_events` in ClickHouse will never show a realistic exit price
|
||||
for the persisted trade, breaking any PnL reconstruction that relies on
|
||||
`(exit_price - entry_price) * size * leverage`.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A14: `_write_position_state` maps active_leg_index to entry_bar
|
||||
|
||||
**Location:** `pink_clickhouse.py` line ~673
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Semantic mismatch — wrong field mapping.
|
||||
|
||||
```python
|
||||
"entry_bar": int(slot_dict.get("active_leg_index", 0) or 0),
|
||||
```
|
||||
|
||||
`active_leg_index` is the index into the exit-leg-ratios array (which leg is being
|
||||
exited next). It has nothing to do with how many bars the position has been held.
|
||||
When a position opens, `active_leg_index` is 0. After the first exit leg, it
|
||||
advances to 1. Neither value is a bar count.
|
||||
|
||||
`entry_bar` should be `bars_held` from the intent/decision, or a computed value
|
||||
from `entry_time` to now.
|
||||
|
||||
---
|
||||
|
||||
### Flaw A15: `persist_recovery_state` passes account dict as slot dict
|
||||
|
||||
**Location:** `pink_clickhouse.py` lines ~447-460
|
||||
|
||||
**Severity:** **Low**
|
||||
|
||||
**Nature:** Wrong data — account snapshot used where slot data is expected.
|
||||
|
||||
```python
|
||||
def persist_recovery_state(self, *, snapshot, acc_dict, ...):
|
||||
slot_dict = acc_dict or {} # ← acc_dict is an account snapshot, not a slot
|
||||
self._write_position_state(..., slot_dict={}, ...) # ← correctly uses empty dict
|
||||
self._write_trade_reconstruction(
|
||||
snapshot,
|
||||
trade_id=acc_dict.get("trade_id", "") if acc_dict else "",
|
||||
# acc_dict is {"capital": ..., "equity": ...} — no "trade_id" key
|
||||
)
|
||||
```
|
||||
|
||||
The `trade_id` in the trade_reconstruction row will always be `""` because
|
||||
`acc_dict` comes from `kernel.snapshot()["account"]` which has keys `capital`,
|
||||
`equity`, `realized_pnl`, etc. — not `trade_id`. This means the recovery
|
||||
`trade_reconstruction` row has no trade_id linkage.
|
||||
Reference in New Issue
Block a user