PINK: E2E trace analysis — Pass 5 edge domains (H1-H22)
Fifth pass covering dependency management (no Python lockfile, Rust compiled from source), error handling observability (zero logging, 16+ silent swallows), type safety (17 enum-from-string crash sites, _legacy_intent always MARKET), and protocol contracts (MirroredControlPlane missing methods, RealZinc read atomicity, __del__ use-after-free). 22 new flaws. Combined catalog now 138. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
@@ -2173,3 +2173,402 @@ No cleanup is guaranteed on exception:
|
||||
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
||||
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
|
||||
| **Total** | | **116** | **5** | **21** | **32** | **40** | **18** |
|
||||
|
||||
---
|
||||
|
||||
## PASS 5 — EDGE DOMAINS (Dependencies, Error Handling, Types, Contracts)
|
||||
|
||||
### H1: No Python dependency declaration files exist in workspace
|
||||
|
||||
**Files:** workspace root
|
||||
|
||||
Zero `requirements.txt`, `setup.py`, `setup.cfg`, `pyproject.toml`, `Pipfile`, or `poetry.lock` anywhere. All Python package dependencies are entirely implicit — determined by what's installed in the runtime environment. No reproducible installs, no version pinning, no audit trail.
|
||||
|
||||
The Rust side does have `Cargo.toml` + `Cargo.lock` — but all 4 direct Rust deps use open ranges (`"0.4"`, `"0.2"`, `"1"`, `"1"`).
|
||||
|
||||
**Severity: Critical**
|
||||
|
||||
### H2: Rust kernel compiled from source on every cold start via subprocess
|
||||
|
||||
**File:** `rust_backend.py:60-72`
|
||||
|
||||
```python
|
||||
def _ensure_library() -> Path:
|
||||
path = _library_path()
|
||||
if not path.exists():
|
||||
_build_library() # cargo build --release
|
||||
return path
|
||||
|
||||
def _build_library():
|
||||
subprocess.run(
|
||||
["cargo", "build", "--release", ...],
|
||||
check=True, # no timeout!
|
||||
)
|
||||
```
|
||||
|
||||
First load takes 3-10 minutes (Rust compilation). Requires Rust toolchain in production. `subprocess.run()` has no `timeout=` — if `cargo` hangs (network, disk, lock contention), the Python process hangs indefinitely. No prebuilt binary distribution.
|
||||
|
||||
**Severity: Critical**
|
||||
|
||||
### H3: Zero logging — every swallowed error is invisible
|
||||
|
||||
The entire codebase has zero use of Python's `logging` module, `print()`, or `warnings.warn()` for error reporting. Every `except: pass`, `except Exception: pass`, and `return default` silently discards the error. **There is no mechanism to detect, alert, or diagnose production failures.**
|
||||
|
||||
All `try/except: pass` sites found:
|
||||
|
||||
| # | File:Line | What's Hidden |
|
||||
|---|-----------|---------------|
|
||||
| 1 | `bingx_venue.py:51` | `float()` conversion failure on any API field value |
|
||||
| 2 | `bingx_venue.py:133` | regex match failure in rate-limit parsing |
|
||||
| 3 | `bingx_venue.py:136` | int/float conversion of retry_after |
|
||||
| 4 | `bingx_venue.py:325` | slot lookup failure during cancel asset resolution |
|
||||
| 5 | `bingx_venue.py:350` | BingXHttpError in cancel — network error looks like rejection |
|
||||
| 6 | `control.py:213` | RealZincControlPlane construction failure |
|
||||
| 7 | `launcher.py:187` | RealZincPlane construction failure |
|
||||
| 8 | `launcher.py:119` | malformed env var for active_slot_limit |
|
||||
| 9 | `launcher.py:243` | asyncio.run() RuntimeError in _maybe_close |
|
||||
| 10 | `launcher.py:277` | RealZincControlPlane fallback in build_control_plane |
|
||||
| 11 | `real_control_plane.py:97` | region.wait() exception — timeout and error both return False |
|
||||
| 12 | `real_control_plane.py:112` | region.notify() exception — writer thinks broadcast succeeded |
|
||||
| 13 | `real_zinc_plane.py:31` | Zinc SharedRegion import failure |
|
||||
| 14 | `projection.py:87` | HazelcastRowWriter import failure |
|
||||
| 15 | `rust_backend.py:102` | __del__ exception in Rust kernel destroy |
|
||||
| 16 | `bingx_venue.py:55` | `_row_float` tries 5+ key fallbacks, each failing silently |
|
||||
|
||||
**Severity: Critical**
|
||||
|
||||
### H4: `_row_float` rejects zero as a valid value — `or` pattern treats 0 as missing
|
||||
|
||||
**File:** `bingx_venue.py:47-55`
|
||||
|
||||
```python
|
||||
def _row_float(row, *keys, default=0.0):
|
||||
for key in keys:
|
||||
try:
|
||||
value = float(row.get(key) or 0.0) # `or 0.0` treats 0 as missing
|
||||
except Exception:
|
||||
continue
|
||||
if value == value and value not in (float("inf"), float("-inf")) and value != 0.0:
|
||||
return value # explicitly rejects 0.0
|
||||
return default
|
||||
```
|
||||
|
||||
Two bugs: (a) `except Exception: continue` swallows ALL conversion errors, and (b) `value != 0.0` explicitly rejects zero as a valid return value. A legitimate zero price, zero filled quantity, or zero position amount causes `_row_float` to skip that key and search further. If ALL keys return 0, the default `0.0` is returned — indistinguishable from "none of the keys existed."
|
||||
|
||||
Called by every single BingX API response parser: `_position_qty()`, `_position_price()`, `_venue_order_from_row()`, `_event_from_row()`, `_fill_event_from_row()`, `_events_from_submit()`, `_events_from_cancel()`, `_filled_size_from_snapshots()`. None verify the returned 0.0 is real vs. missing-vs-zero.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H5: `_backend_snapshot` timeout returns stale data with no signal to callers
|
||||
|
||||
**File:** `bingx_venue.py:242-251**
|
||||
|
||||
```python
|
||||
def _backend_snapshot(self, *, timeout_ms=5000.0):
|
||||
if not self._snapshot_ready.wait(timeout=timeout_ms / 1000.0):
|
||||
with self._snap_lock:
|
||||
return self._last_snapshot # STALE — could be hours old
|
||||
```
|
||||
|
||||
When the snapshot-fetch condition times out, returns `self._last_snapshot` — initialized to `None` and only updated on successful fetches. First timeout returns `None`. All callers (`cancel()`, `open_orders()`, `open_positions()`, `reconcile()`, `submit()`) access `.open_orders`, `.open_positions` immediately — crash with `AttributeError: 'NoneType' object has no attribute 'open_orders'`.
|
||||
|
||||
Even after the first fetch succeeds, subsequent timeouts return the last-good snapshot which could be arbitrarily stale. No caller timestamps, version-checks, or requests a refresh.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H6: All enum-from-raw-string sites crash on unknown value — zero fallback
|
||||
|
||||
**Files:** `rust_backend.py:250-386`, `real_zinc_plane.py:70-106`
|
||||
|
||||
Every site that reconstructs a Python enum from a string received from the Rust kernel:
|
||||
|
||||
```python
|
||||
side=TradeSide(str(payload.get("side", TradeSide.FLAT.value)))
|
||||
status=VenueOrderStatus(str(payload.get("status", VenueOrderStatus.NEW.value)))
|
||||
fsm_state=TradeStage(str(payload.get("fsm_state", TradeStage.IDLE.value)))
|
||||
kind=KernelEventKind(str(row.get("kind", KernelEventKind.ORDER_ACK.value)))
|
||||
```
|
||||
|
||||
If the Rust kernel introduces a new enum variant (e.g., `TradeStage::ENTRY_REJECTED`) not in the Python `TradeStage` enum, `TradeStage("ENTRY_REJECTED")` raises `ValueError` with zero fallback. Crashes `_outcome_from_payload()` and takes down the kernel's event processing loop.
|
||||
|
||||
17 sites total across `rust_backend.py` and `real_zinc_plane.py`. No try/except, no mapping, no fallback on any of them.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H7: `_legacy_intent` reads `getattr(intent, "order_type", "MARKET")` — always defaults to MARKET
|
||||
|
||||
**File:** `bingx_venue.py:282-285**
|
||||
|
||||
```python
|
||||
metadata["_order_type"] = getattr(intent, "order_type", "MARKET")
|
||||
metadata["_limit_price"] = float(getattr(intent, "limit_price", 0.0) or 0.0)
|
||||
```
|
||||
|
||||
`order_type` and `limit_price` are NOT fields on `KernelIntent` (contracts.py). They only exist in `intent.metadata` as `metadata["order_type"]` if set by the caller. `getattr(intent, "order_type", "MARKET")` checks the dataclass field — not the metadata dict — so it ALWAYS returns `"MARKET"`.
|
||||
|
||||
Even when the PINK runtime produces a LIMIT intent (LIMIT_DECISION → `metadata["order_type"] = "LIMIT"`), the legacy adapter converts is to MARKET because it reads the wrong source. Every LIMIT order is submitted as MARKET.
|
||||
|
||||
Similarly, `limit_price` is always `0.0` — any limit price from the metadata dict is lost.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H8: `_venue_event_status_from_row` silently maps unknown venue status to ACKED
|
||||
|
||||
**File:** `bingx_venue.py:83-96**
|
||||
|
||||
```python
|
||||
def _venue_event_status_from_row(status: str) -> VenueEventStatus:
|
||||
normalized = _normalize_status(status)
|
||||
# ... checks known statuses ...
|
||||
return VenueEventStatus.ACKED # fallthrough for anything unknown
|
||||
```
|
||||
|
||||
If BingX introduces a new status (`"SUSPENDED"`, `"PENDING_CANCEL"`, `"EXPIRED"`), it doesn't match any known mapping and silently returns `ACKED`. The kernel treats a suspended/cancelled/expired order as acknowledged — dangerous misclassification.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H9: `RealZincPlane.write_slot()` — slot written to `slot_id >= slot_count` is invisible
|
||||
|
||||
**File:** `real_zinc_plane.py:206-210**
|
||||
|
||||
```python
|
||||
def write_slot(self, slot):
|
||||
with self._lock:
|
||||
self._slot_cache[int(slot.slot_id)] = slot
|
||||
payload = {"slots": [self._slot_cache[key].to_dict() for key in range(self._slot_count)]}
|
||||
```
|
||||
|
||||
`_slot_cache` is a plain dict — accepts any key. But `read_slots()` only reads 0..slot_count-1. Writing to `slot_id >= slot_count` stores the slot in the cache but it's **never serialized or read back**. No error.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H10: `RealZincControlPlane.read()` has no atomicity with concurrent `update()`
|
||||
|
||||
**File:** `real_control_plane.py:70-77**
|
||||
|
||||
`_write_region()` zero-fills the buffer then writes the packet. If `read()` interleaves between zero-fill and write, it sees a partially-zeroed buffer → `_decode_packet` returns `{}` → returns stale `self._snapshot` with no observable error. No lock, no sequence check, no atomic read.
|
||||
|
||||
The same bug exists in `RealZincPlane.read_slots()` (real_zinc_plane.py:220-230) — reads shared memory while a concurrent `write_slot()` is in progress.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H11: `_RustKernelLib` lazily initialized with race condition
|
||||
|
||||
**File:** `rust_backend.py:187-190**
|
||||
|
||||
```python
|
||||
_RUST: _RustKernelLib | None = None
|
||||
|
||||
def _get_rust():
|
||||
global _RUST
|
||||
if _RUST is None:
|
||||
_RUST = _RustKernelLib() # no lock — two threads can both create
|
||||
return _RUST
|
||||
```
|
||||
|
||||
No threading lock. Two concurrent calls to `_get_rust()` (possible via `BingxVenueAdapter`'s thread pool) can create two `_RustKernelLib` objects. The `_RustKernelLib()` constructor runs `_ensure_library()` which runs `subprocess.run(["cargo", "build", ...], check=True)` — concurrent `cargo build` can corrupt the build directory.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H12: `ExecutionKernel.__del__` can deadlock or use-after-free
|
||||
|
||||
**File:** `rust_backend.py:527-531**
|
||||
|
||||
```python
|
||||
def __del__(self):
|
||||
backend = getattr(self, "_backend", None)
|
||||
if backend is not None:
|
||||
try:
|
||||
_get_rust().destroy(backend) # accesses module singleton
|
||||
except Exception:
|
||||
pass
|
||||
```
|
||||
|
||||
`_get_rust()` accesses the module-level `_RUST` singleton, which may already be destroyed if the module's garbage collection runs before the instance's. The destroy call happens outside any lock — one thread's destructor could destroy the Rust kernel while another thread is still using it. Use-after-free.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### H13: `MirroredControlPlane` missing protocol methods
|
||||
|
||||
**File:** `control.py:171-184**
|
||||
|
||||
`ControlPlane` protocol defines `wait()` and `notify()`. `MirroredControlPlane` inherits from nothing and only implements `read()`, `update()`, and `mirror()`. Calling `plane.wait()` on a `MirroredControlPlane` raises `AttributeError`.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### H14: `TradeSlot.remaining_size()` and `VenueOrder.remaining_size()` — same name, different semantics
|
||||
|
||||
**Files:** `contracts.py:207-208`, `contracts.py:143-145**
|
||||
|
||||
```python
|
||||
# TradeSlot:
|
||||
def remaining_size(self) -> float:
|
||||
return max(0.0, float(self.size)) # open position size
|
||||
|
||||
# VenueOrder:
|
||||
def remaining_size(self) -> float:
|
||||
return max(0.0, self.intended_size - self.filled_size) # unfilled order qty
|
||||
```
|
||||
|
||||
Same method name, completely different semantics. `TradeSlot.remaining_size()` returns the current open position size. `VenueOrder.remaining_size()` returns the untracked/unfilled order quantity. A caller using `slot.remaining_size()` to check if an order is fully filled gets position size, which doesn't change with fills — it changes with entry/exit.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### H15: `_maybe_close()` — `asyncio.run()` RuntimeError silently swallowed for coroutines
|
||||
|
||||
**File:** `launcher.py:233-243**
|
||||
|
||||
```python
|
||||
if inspect.isawaitable(result):
|
||||
try:
|
||||
asyncio.run(result)
|
||||
except RuntimeError:
|
||||
pass # SILENT — coroutine never executed
|
||||
```
|
||||
|
||||
When `maybe_close` is called from an async context (which it is — `DITAv2LauncherBundle.close()` is used in async test code), `asyncio.run()` raises `RuntimeError("Cannot run the event loop while another loop is running")`. The exception is swallowed, the coroutine is never awaited, and the close/disconnect never happens.
|
||||
|
||||
Also: `break` after calling the first found method means if an object has both `close()` and `disconnect()`, `disconnect()` is never called.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### H16: `_build_launcher_bundle` imports `BingxDirectExecutionAdapter` inside function — import-time side effect is safe but lazy loading masks errors
|
||||
|
||||
**File:** `launcher.py:254**
|
||||
|
||||
```python
|
||||
def _build_venue(...):
|
||||
from prod.clean_arch.adapters.bingx_direct import BingxDirectExecutionAdapter
|
||||
```
|
||||
|
||||
Import inside function — safe, lazy, no side effects. But if the `bingx_direct` module has an import error (missing dependency, version mismatch), it only surfaces at bundle construction time, not at process start. A misconfigured production deployment would fail on the first trade, not on boot.
|
||||
|
||||
**Severity: Informational**
|
||||
|
||||
### H17: `load_dotenv()` at module level — import-time filesystem I/O and env mutation
|
||||
|
||||
**File:** `launcher.py:49-51**
|
||||
|
||||
```python
|
||||
load_dotenv(PROJECT_ROOT / ".env") # executes on module import
|
||||
```
|
||||
|
||||
Runs on every import of `launcher.py` — reads filesystem, mutates process environment. Hard to mock in tests — setting env vars in test setup gets overwritten on module import. Also: if `.env` doesn't exist, `load_dotenv()` silently does nothing — missing config is invisible.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### H18: `_run()` in `BingxVenueAdapter` — `asyncio.run()` thread-pool bridge blocks on every call
|
||||
|
||||
**File:** `bingx_venue.py:225-233**
|
||||
|
||||
```python
|
||||
def _run(self, result):
|
||||
if inspect.isawaitable(result):
|
||||
try:
|
||||
asyncio.get_running_loop()
|
||||
except RuntimeError:
|
||||
return asyncio.run(result)
|
||||
pool = self._get_executor()
|
||||
return pool.submit(asyncio.run, result).result() # BLOCKS
|
||||
```
|
||||
|
||||
Every call to `_run()` that receives an awaitable blocks the calling thread via `.result()`. The BingX HTTP call inside `submit_intent()` can take 1-5 seconds. During this block, the event loop cannot process other tasks. In a single-runtime deployment, this stalls the entire policy cycle.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### H19: `HazelcastClientLike` protocol has zero concrete implementations in workspace
|
||||
|
||||
**File:** `hazelcast_projection.py:13-15**
|
||||
|
||||
```python
|
||||
class HazelcastClientLike(Protocol):
|
||||
def get_map(self, name: str): ...
|
||||
def get_topic(self, name: str): ...
|
||||
```
|
||||
|
||||
Used as a type hint. No code in the workspace creates an object that satisfies this protocol. The Hazelcast client comes from an external package. If the external API changes, the protocol silently drifts — no compilation check.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### H20: `_decode_packet` in RealZinc — no bound check on `size` beyond `> len(buf)-16`
|
||||
|
||||
**Files:** `real_control_plane.py:50-52`, `real_zinc_plane.py:70-81**
|
||||
|
||||
```python
|
||||
seq, size = struct.unpack_from("!QQ", buf, 0)
|
||||
if size <= 0 or size > len(buf) - 16:
|
||||
return {}
|
||||
payload = bytes(buf[16 : 16 + size]).decode("utf-8") # can raise UnicodeDecodeError
|
||||
out = json.loads(payload) # can raise ValueError
|
||||
```
|
||||
|
||||
If shared memory contains a corrupted `size` field within bounds, `.decode()` or `json.loads()` raises — uncaught by callers. A single corrupted byte in shared memory crashes the kernel.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### H21: All Rust crate features enabled by default — `wasm-bindgen` compiled into native shared library
|
||||
|
||||
**File:** `_rust_kernel/Cargo.toml`, transitive through `chrono` → `iana-time-zone` → `js-sys` → `wasm-bindgen`
|
||||
|
||||
The Rust kernel is a native `.so`/`.dylib` but chrono's `iana-time-zone` pulls in `js-sys` and `wasm-bindgen` (WebAssembly support) even on native Linux. Larger binary, longer compile times. `cc` crate pulled in for `iana-time-zone-haiku` which only compiles on Haiku OS.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### H22: `socket.getaddrinfo` monkey-patch in test generator code
|
||||
|
||||
**File:** `gen2.py:295-298**
|
||||
|
||||
Monkey-patches Python stdlib `socket.getaddrinfo` to force IPv4 as a workaround for IPv6 resolution failure in the deployment environment. If copied to production code, would break IPv6 connectivity.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
---
|
||||
|
||||
## Pass 5 Summary
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| H1 | No Python dependency files (requirements.txt, pyproject.toml, etc.) | Build | **Critical** |
|
||||
| H2 | Rust kernel compiled from source on every cold start — no prebuilt binary | Build | **Critical** |
|
||||
| H3 | Zero logging — 16+ silent except:pass sites, no error observability | All | **Critical** |
|
||||
| H4 | `_row_float` rejects zero as valid, `except Exception: continue` swallows all | Venue | **High** |
|
||||
| H5 | `_backend_snapshot` timeout returns stale data/None — callers crash | Venue | **High** |
|
||||
| H6 | All enum-from-raw-string sites crash on unknown variant (17 sites) | Bridge | **High** |
|
||||
| H7 | `_legacy_intent` reads `getattr(intent, "order_type")` not metadata — always MARKET | Venue | **High** |
|
||||
| H8 | Unknown venue status silently mapped to ACKED | Venue | **High** |
|
||||
| H9 | `RealZincPlane.write_slot()` `slot_id >= slot_count` silently lost | Zinc | **High** |
|
||||
| H10 | `RealZincControlPlane.read()` no atomicity with concurrent `update()` | Control | **High** |
|
||||
| H11 | `_RustKernelLib` lazy init with race condition — concurrent cargo build | Bridge | **High** |
|
||||
| H12 | `ExecutionKernel.__del__` use-after-free on Rust handle | Bridge | **High** |
|
||||
| H13 | `MirroredControlPlane` missing protocol methods (wait/notify) | Control | Medium |
|
||||
| H14 | `TradeSlot.remaining_size` vs `VenueOrder.remaining_size` — different semantics | Contracts | Medium |
|
||||
| H15 | `_maybe_close` asyncio.run RuntimeError silently swallowed | Launcher | Medium |
|
||||
| H16 | Lazy import of bingx_direct masks config errors until first trade | Build | Info |
|
||||
| H17 | `load_dotenv()` at module level — import-time I/O side effect | Launcher | Medium |
|
||||
| H18 | `_run()` blocks event loop on every HTTP call via thread pool | Venue | Medium |
|
||||
| H19 | `HazelcastClientLike` protocol has zero concrete implementations | Projection | Low |
|
||||
| H20 | `_decode_packet` uncaught UnicodeDecodeError/ValueError on corrupted SHM | Zinc | Low |
|
||||
| H21 | `wasm-bindgen` compiled into native library unnecessarily | Build | Low |
|
||||
| H22 | `socket.getaddrinfo` monkey-patch in test code | Test | Low |
|
||||
|
||||
### Pass 5 Severity Distribution
|
||||
|
||||
| Severity | Count |
|
||||
|----------|-------|
|
||||
| **Critical** | 3 (H1, H2, H3) |
|
||||
| **High** | 9 (H4-H12) |
|
||||
| Medium | 5 (H13, H14, H15, H17, H18) |
|
||||
| Low | 4 (H19, H20, H21, H22) |
|
||||
| Info | 1 (H16) |
|
||||
|
||||
### Combined Catalog (All 5 Passes)
|
||||
|
||||
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|
||||
|------|-------|-------|----------|------|--------|-----|------|
|
||||
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
|
||||
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
|
||||
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
|
||||
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
||||
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
|
||||
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
|
||||
| **Total** | | **138** | **8** | **30** | **37** | **44** | **19** |
|
||||
|
||||
Reference in New Issue
Block a user