PINK: E2E trace analysis — Pass 15 resource leaks/trust boundaries/security (R1-R14)

Fifteenth pass: exchange REST/WS data parsed without schema validation (R7
Critical), restore_state() deserializes arbitrary JSON full kernel takeover
(R9 Critical), ThreadPoolExecutor never shut down 3 threads leak (R1 High),
BingxVenueAdapter no close() HTTP client unreleasable (R2 High),
_intent_cache unbounded growth (R3 High), shared memory JSON no integrity
check (R8 High), env-based mainnet switch (R10 High), .env secrets exposure
(R11 High), listenKey in WS URL f-string MITM injection (R13 High).
289 total flaws across 15 passes.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
Codex
2026-06-02 12:54:02 +02:00
parent 062b929caf
commit a4c1ec6139
2 changed files with 361 additions and 1 deletions

View File

@@ -5312,3 +5312,339 @@ Covered in Q1 and Q6. Listing all sites: `rust_backend.py:215,241,260` and `real
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 | | P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 | | Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| **Total** | | **275** | **21** | **77** | **79** | **64** | **34** | | **Total** | | **275** | **21** | **77** | **79** | **64** | **34** |
---
## PASS 15 — RESOURCE LEAKS, TRUST BOUNDARIES, SECURITY
### R1: `BingxVenueAdapter` `ThreadPoolExecutor` never shut down — 3 threads leak for process lifetime
**File:** `bingx_venue.py:194-208`
```python
class BingxVenueAdapter(VenueAdapter):
_EXECUTOR: concurrent.futures.ThreadPoolExecutor | None = None
_EXECUTOR_LOCK: threading.Lock = threading.Lock()
@classmethod
def _get_executor(cls) -> concurrent.futures.ThreadPoolExecutor:
if cls._EXECUTOR is None:
with cls._EXECUTOR_LOCK:
if cls._EXECUTOR is None:
cls._EXECUTOR = concurrent.futures.ThreadPoolExecutor(
max_workers=3, thread_name_prefix="bingx_adapter",
)
return cls._EXECUTOR
```
The `ThreadPoolExecutor` is a class-level singleton with **no shutdown path**. No `close()` method, no `atexit` handler, no classmethod for cleanup. The 3 worker threads persist for the entire process lifetime.
`ThreadPoolExecutor.__del__` in CPython calls `shutdown(wait=False)`, but this only runs when the executor object is GC'd. Since `_EXECUTOR` is a class variable, it's only GC'd when the class is (at interpreter shutdown). The CPython source for `ThreadPoolExecutor.__del__` calls `shutdown(wait=False)` which interrupts idle threads but doesn't wait for them. During shutdown, this races with module cleanup — threads accessing module globals see `None`.
**Trigger:** Every call to `BingxVenueAdapter.submit()` (line 371), `cancel()` (line 435), or `snapshot()` (line 509) submits to this executor. After 24+ hours of trading, 3 unnamed worker threads have consumed thread-local resources (stack ~8MB each = 24MB minimum) with no clean reclamation path.
**Severity: High**
### R2: `BingxVenueAdapter` has no `close()` method — backend `BingxDirectExecutionAdapter` HTTP client unreleasable
**File:** `bingx_venue.py` (no close method), `venue.py` (protocol), `launcher.py:262-264`
The `VenueAdapter` protocol (`venue.py`) does not define `close()` or `disconnect()`. `BingxVenueAdapter` holds a `_backend: BingxDirectExecutionAdapter` which in turn holds an HTTP client session. The launcher's `_maybe_close()` tries `.close()` then `.disconnect()` but gets `AttributeError` (caught by `except Exception: pass`).
This means:
- The `BingxDirectExecutionAdapter`'s `aiohttp.ClientSession` is never closed
- The underlying `TCPConnector` connection pool remains open
- Any HTTP keep-alive connections to BingX remain open until OS timeout
- No clean teardown path exists at any level of the venue stack
The `BingxUserStream` (WebSocket handler) has a `close()` method and proper task cancellation. The venue adapter (the synchronous REST path) has none. Asymmetric design.
**Severity: High**
### R3: `real_zinc_plane._intent_cache` grows unboundedly — memory proportional to total lifetime intents
**File:** `real_zinc_plane.py:157,202`
```python
# Line 157 (__init__):
self._intent_cache: List[Dict[str, Any]] = []
# Line 201-203 (publish_intent):
self._intent_cache.append(row)
self._write_region(self.intent_region, self._intent_seq, {"items": self._intent_cache[-512:]})
```
Every call to `publish_intent()` appends one dict to `_intent_cache`. Only the last 512 items are written to shared memory, but **the cache list itself is never trimmed**. Over a 24-hour session at 1 intent/second, this grows to 86,400 dicts — approximately 50-100 MB of memory for the cache alone (each dict contains timestamp, intent_id, asset, side, size, price, etc.).
The `-512` slice on write is a dead giveaway that the developer knew only the last 512 items were relevant — but forgot to trim the source list. The fix is `self._intent_cache = self._intent_cache[-512:]` after the append.
**Compare with:** `account.py`'s `seen_account_event_ids` (Rust side, capped at 1024 via `IndexSet` LRU), and `journal.py`'s `MemoryKernelJournal` (capped at 10,000). Every other cache in the system has a bound; this one doesn't.
**Severity: High**
### R4: `RealZincPlane`/`RealZincControlPlane` partial-construction leak — `SharedRegion` never cleaned up on init failure
**File:** `real_zinc_plane.py:161-176`, `real_control_plane.py:72-83`
```python
# real_zinc_plane.py __init__:
self.intent_region = SharedRegion.create(f"{prefix}_intent", 65536)
self.state_region = SharedRegion.create(f"{prefix}_state", 65536) # if this fails...
self.control_region = SharedRegion.create(f"{prefix}_control", 4096) # ...or this
```
If `SharedRegion.create()` succeeds for `intent_region` but fails for `state_region` (e.g., out of shared memory, permission denied, name collision), the constructor raises. The already-created `intent_region` has **no cleanup path** — `close()` is never called because the caller never gets a valid object reference. The shared memory segment leaks until the OS cleans it on process exit (or reboot on some systems).
Same pattern in `RealZincControlPlane.__init__` with 2 regions.
**Severity: Medium**
### R5: `BingxUserStream` `ClientSession` has no `__del__` fallback — connection pool leak if `close()` not called
**File:** `bingx_user_stream.py:229-230,433-436`
```python
async def close(self) -> None:
self._closed.set()
if self._session is not None and not self._session.closed:
await self._session.close()
```
The `aiohttp.ClientSession` (created in `_get_session()` with `TCPConnector(limit=4)`) is only closed when `close()` is explicitly called. There is no `__del__`, `__aenter__`, or `__aexit__`. If a caller abandons the `BingxUserStream` object without calling `close()` — or if `close()` is never called because an exception occurs before the call — the TCP connection pool (4 connections to BingX) leaks.
During the reconnect loop (`subscribe()`), if `self._session.closed` is detected between retries, a new session and connector are created — the old connector's connections are released by `ClientSession.close()` which is called in the retry path. So the reconnect path itself is clean. But the top-level cleanup depends entirely on external discipline.
**Severity: Medium**
### R6: `test_alpha_blue_untouched_g7.py` — two `open()` calls without context manager, file descriptors leak
**File:** `test_alpha_blue_untouched_g7.py:31,63`
```python
src = open("/mnt/dolphinng5_predict/prod/clean_arch/dita_v2/gen2.py").read() # line 31
src = open(full).read() # line 63
```
Both open a file, chain `.read()` to load contents, but **never close the file handle**. The file descriptor is leaked until garbage collection. In a test suite with thousands of tests, this can exhaust the ulimit (default 1024 on Linux, lower on macOS).
**Severity: Low** (test code, non-production)
### R7: All exchange REST/WS data parsed without schema validation — exchange controls all field values
**Files:** `bingx_venue.py:60-74,80-88,96-121,151-186`, `bingx_user_stream.py:267-379`
The system has a single trust boundary for exchange data: all BingX REST API responses and WebSocket frames are parsed without schema validation. Key entry points:
- **`bingx_user_stream.py:267`**: `json.loads(text)` on raw WebSocket frame — any valid JSON structure is accepted. No schema validation before field access with `.get()`.
- **`bingx_venue.py:60-74`**: `_row_text()` extracts string values from exchange response dicts with no sanitization beyond `.strip()`.
- **`bingx_venue.py:80-88`**: `_row_float()` — catches `ValueError` on float parse but does not filter NaN/Inf (these pass `float()` fine).
- **`bingx_venue.py:297-301`**: `_rate_limit_retry_after_ms()` parses exchange error message with `re.search(r"unblocked after (\d+)", msg)` — exchange controls the error message content.
- **`bingx_venue.py:338-340`**: `cancel()` exception handler includes `str(exc)` directly in the response event dict.
An exchange sending crafted responses could inject:
- Arbitrary strings into `reason`/`msg` fields (propagated to journal/ClickHouse)
- Non-numeric values that fail `float()` only on consumption
- Enormous lists in snapshot responses (OOM risk — no size limit on `snapshot.open_orders` iteration)
- NaN/Inf in price/size fields (pass through `float()` — Rust kernel `is_finite()` check on kernel side catches some but not all)
**Severity: Critical** — exchange controls all inbound data with no schema validation, and data flows to ClickHouse journaling and the Rust kernel memory.
### R8: Shared memory JSON deserialization without integrity check
**Files:** `real_zinc_plane.py:127-128`, `real_control_plane.py:60-61`
```python
# real_zinc_plane.py
def _decode_packet(self, payload: bytes) -> dict:
return json.loads(payload)
# real_control_plane.py
payload = region.read()
if payload:
data = json.loads(payload)
```
Both the Zinc plane and control plane deserialize JSON from shared memory **without any integrity check** (no HMAC, no checksum, no signature). Any process with access to the shared memory segment (`/dev/shm` on Linux, world-readable by default) can:
1. Inject arbitrary `KernelIntent` objects — the control plane reads intents from `intent_region` and dispatches them to `process_intent()`. An attacker could submit fake intents with malicious parameters.
2. Inject fake events into the event stream via the control plane region.
3. Corrupt slot state via the state region.
The shared memory segments are named by the `DITA_V2_PREFIX` env var (default `dita_v2`). On a shared system, any process running as the same user can read/write these segments.
**Severity: High**
### R9: `restore_state()` deserializes arbitrary JSON into full kernel state — no provenance tracking
**File:** `rust_backend.py:293-296` (Python), `_rust_kernel/src/lib.rs:934-968` (Rust)
```python
# Python:
def restore_state(self, json_str: str) -> bool:
result = self.lib.dita_kernel_restore_state_json(self._backend, json_str.encode("utf-8"))
return result == 0
# Rust:
pub extern "C" fn dita_kernel_restore_state_json(handle: *mut KernelHandle, payload: *const c_char) -> i32 {
let payload = ...; // parse to KernelFullSnapshot
let core = unsafe { &mut (*handle).core };
core.restore_full_snapshot(&payload) // overwrites ALL kernel state
}
```
`dita_kernel_restore_state_json` overwrites the entire kernel state — all slots, account balances, fee configuration, seen_event_ids, and capital_frozen flag — from a single JSON string. The method is public on `ExecutionKernel.restore_state()` with **no authentication, no authorization check, and no call stack validation**.
The JSON string can come from:
- The `DITAv2LauncherBundle` (restart path)
- A file read from disk
- An attacker who gains access to the Python runtime (e.g., via shared memory injection, R8)
Once restored, the kernel accepts the state as truth. There is no `restore_state` counter or version chain to prevent replay of old snapshots.
**Severity: Critical**
### R10: `DOLPHIN_BINGX_ENV` + `DOLPHIN_BINGX_ALLOW_MAINNET` — mainnet switch via env var
**File:** `launcher.py:189-190`
```python
DOLPHIN_BINGX_ENV = os.environ.get("DOLPHIN_BINGX_ENV", "VST") # VST = testnet
DOLPHIN_BINGX_ALLOW_MAINNET = os.environ.get("DOLPHIN_BINGX_ALLOW_MAINNET", "false").lower() in ("true", "1", "yes")
```
Setting `DOLPHIN_BINGX_ENV=LIVE` + `DOLPHIN_BINGX_ALLOW_MAINNET=true` switches from testnet to production mainnet BingX. The `DOLPHIN_BINGX_ALLOW_MAINNET` check exists specifically as a safety gate, but both are attacker-controlled env vars with the same provenance as all other env config.
An attacker with access to set env vars (container breakout, CI/CD injection, shared hosting) could:
1. Redirect all trades to mainnet
2. Use real capital instead of testnet funds
3. Cost real money on every trade
**Severity: High**
### R11: `.env` file loaded from project root — secrets exposure risk
**File:** `launcher.py:23,51`
```python
from dotenv import load_dotenv
...
load_dotenv(PROJECT_ROOT / ".env")
```
The `.env` file is loaded from `PROJECT_ROOT`, which is `Path(__file__).resolve().parents[3]` — three directories up from the launcher file. On a shared development machine or CI runner, this file is:
1. World-readable if not explicitly chmod'd (default umask creates files 644)
2. Accessible to any process running as the same user
3. Often committed to version control accidentally (no `.gitignore` guarantee)
4. Visible in Docker layer history if included in the build context
The `.env` file contains `BINGX_API_KEY` and `BINGX_SECRET_KEY` — the exchange credentials. On a shared system, every user with read access can extract these keys.
**Severity: High**
### R12: Unvalidated `int()` on env vars — `DOLPHIN_BINGX_RECV_WINDOW_MS` could accept extreme values
**File:** `launcher.py:191-193`
```python
recv_window_ms = int(os.environ.get("DOLPHIN_BINGX_RECV_WINDOW_MS", "5000"))
default_leverage = int(os.environ.get("DOLPHIN_BINGX_DEFAULT_LEVERAGE", "1"))
exchange_leverage_cap = int(os.environ.get("DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP", "3"))
```
Three env vars are directly passed to `int()` with only a string default — **no bounds checking**. An attacker setting `DOLPHIN_BINGX_RECV_WINDOW_MS=2147483647` could set the exchange recv window to ~24 days, allowing replay attacks on signed requests. An attacker setting `DOLPHIN_BINGX_EXCHANGE_LEVERAGE_CAP=1000` could allow 1000x leverage on the exchange.
**Severity: Medium**
### R13: `BingxUserStream` `listenKey` from exchange response used in WebSocket URL — MITM injection surface
**File:** `bingx_user_stream.py:230,398`
```python
# Line 230:
url = f"{self._ws_url}?listenKey={listen_key}"
# Line 398:
listen_key = resp.get("listenKey", "") # from exchange POST /openApi/user/auth/userDataStream
```
The `listenKey` comes from the BingX REST API response (`POST /openApi/user/auth/userDataStream`). It is used directly in the WebSocket connection URL **with no encoding or validation**. The listenKey is a short opaque string (looks like a UUID), but:
- If an attacker can MITM the REST response (DNS spoofing, proxy, etc.), they control the listenKey value
- A malicious listenKey with URL metacharacters (`&`, `=`, `#`) could inject query parameters into the WebSocket URL
- The `listenKey` is BingX's session authentication mechanism — once an attacker controls it, they can hijack the user data stream
The fix is `urllib.parse.urlencode({"listenKey": listen_key})` but the current code uses an f-string.
**Severity: High**
### R14: `mock_venue._exchange_event_queue` unbounded growth — event enqueue rate can exceed consumption rate
**File:** `mock_venue.py:220,230`
```python
# _queue_exchange_event:
self._exchange_event_queue.append(event)
# subscribe (generator):
if self._exchange_event_queue:
yield self._exchange_event_queue.pop(0)
```
The mock venue's event queue is consumed one event at a time via a generator in `subscribe()`. If `queue_exchange_event()` is called faster than the consumer calls `next()` on the generator (which happens on every `_on_exchange_event()` callback), the list grows unboundedly. In test scenarios with rapid fire events, this can exhaust memory.
Not a production risk (mock is test-only), but the unbounded growth pattern is worth noting.
**Severity: Low** (test code)
---
## Pass 15 Summary
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| R1 | `ThreadPoolExecutor` never shut down — 3 threads leak | Venue | **High** |
| R2 | `BingxVenueAdapter` no `close()` — backend HTTP client unreleasable | Venue | **High** |
| R3 | `real_zinc_plane._intent_cache` grows unboundedly | Plane | **High** |
| R4 | `RealZincPlane`/`ControlPlane` partial-construction SharedRegion leak | Plane | Medium |
| R5 | `BingxUserStream.ClientSession` no `__del__` — connection pool leak | Venue | Medium |
| R6 | `test_alpha_blue_untouched_g7.py` open() without context manager | Test | Low |
| R7 | All exchange REST/WS data parsed without schema validation | Venue | **Critical** |
| R8 | Shared memory JSON deserialization without integrity check | Plane | **High** |
| R9 | `restore_state()` deserializes arbitrary JSON — full kernel takeover | Bridge | **Critical** |
| R10 | `DOLPHIN_BINGX_ENV` + `ALLOW_MAINNET` mainnet switch via env | Config | **High** |
| R11 | `.env` file loaded from project root — secrets exposure | Config | **High** |
| R12 | Unvalidated `int()` on env vars — recv_window, leverage extremes | Config | Medium |
| R13 | `listenKey` from exchange in WS URL f-string — MITM injection | Venue | **High** |
| R14 | `mock_venue._exchange_event_queue` unbounded growth | Test | Low |
### Pass 15 Severity
| Severity | Count |
|----------|-------|
| **Critical** | 2 (R7, R9) |
| **High** | 6 (R1, R2, R3, R8, R10, R11, R13) |
| Medium | 3 (R4, R5, R12) |
| Low | 2 (R6, R14) |
### Combined Catalog (All 15 Passes)
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|------|-------|-------|----------|------|--------|-----|------|
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| **Total** | | **289** | **23** | **83** | **82** | **64** | **37** |

View File

@@ -29,7 +29,8 @@
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 | | O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 | | P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 | | Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
| **Total** | | **275** | **21** | **77** | **79** | **64** | **34** | | R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
| **Total** | | **289** | **23** | **83** | **82** | **64** | **37** |
--- ---
@@ -388,6 +389,29 @@
--- ---
## R-Series: Resource Leaks, Trust Boundaries, Security (Pass 15)
*Full detail in TRACE doc under "PASS 15 — RESOURCE LEAKS, TRUST BOUNDARIES, SECURITY."*
| # | Flaw | Layer | Severity |
|---|------|-------|----------|
| R1 | `ThreadPoolExecutor` never shut down 3 threads leak | Venue | **High** |
| R2 | `BingxVenueAdapter` no `close()` backend HTTP client unreleasable | Venue | **High** |
| R3 | `real_zinc_plane._intent_cache` grows unboundedly | Plane | **High** |
| R4 | `RealZincPlane`/`ControlPlane` partial-construction SharedRegion leak | Plane | Medium |
| R5 | `BingxUserStream.ClientSession` no `__del__` connection pool leak | Venue | Medium |
| R6 | `test_alpha_blue_untouched_g7.py` open() without context manager | Test | Low |
| R7 | All exchange REST/WS data parsed without schema validation | Venue | **Critical** |
| R8 | Shared memory JSON deserialization without integrity check | Plane | **High** |
| R9 | `restore_state()` deserializes arbitrary JSON full kernel takeover | Bridge | **Critical** |
| R10 | `DOLPHIN_BINGX_ENV` + `ALLOW_MAINNET` mainnet switch via env | Config | **High** |
| R11 | `.env` file loaded from project root secrets exposure | Config | **High** |
| R12 | Unvalidated `int()` on env vars recv_window, leverage extremes | Config | Medium |
| R13 | `listenKey` from exchange in WS URL f-string MITM injection | Venue | **High** |
| R14 | `mock_venue._exchange_event_queue` unbounded growth | Test | Low |
---
## H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5) ## H-Series: Edge Domains — Dependencies, Error Handling, Types, Contracts (Pass 5)
*Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."* *Full detail in TRACE doc under "PASS 5 — EDGE DOMAINS."*