PINK: E2E trace analysis — Pass 23 closure review/unfinished fixes/ops gaps (Z1-Z14)
Twenty-third (final) pass: _safe_enum fix applied to rust_backend.py but NOT real_zinc_plane.py other copy crashes (Z1 High), no health check endpoint silent failures invisible to orchestration (Z5 High), process_intent calls venue.submit without exception handler venue error bypasses Rust FSM (Z6 High), snapshot mixes Rust and Python accounting capital can diverge (Z7 Medium), BingxVenueAdapter.close executor null-to-shutdown TOCTOU race (Z8 Medium), generated test f-string chr(34) template SyntaxError risk on old Python (Z9 Medium), launcher uses Python 3.10+ | union syntax no min version documented (Z10 Medium), concurrent process_intent on same slot no lock no queue (Z12 Medium). 403 total flaws across 23 passes. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
This commit is contained in:
@@ -8106,3 +8106,211 @@ These differences mean the same `TradeSlot` JSON can produce different Python ob
|
||||
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
|
||||
| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
|
||||
| **Total** | | **389** | **43** | **117** | **114** | **64** | **37** |
|
||||
|
||||
---
|
||||
|
||||
## PASS 23 — CLOSURE/REVIEW: UNFINISHED FIXES, OPERATIONAL GAPS, FINAL EDGE CASES
|
||||
|
||||
### Z1: `_safe_enum` fix applied to `rust_backend.py` but NOT `real_zinc_plane.py` — the other `_slot_from_payload` still crashes on unknown enums
|
||||
|
||||
**Files:** `rust_backend.py` (has `_safe_enum`), `real_zinc_plane.py:106` (missing it)
|
||||
|
||||
A recent fix added `_safe_enum()` to `rust_backend.py` for graceful fallback on unknown FSM states. The duplicate `_slot_from_payload()` in `real_zinc_plane.py:106` still uses `TradeStage(str(payload.get(...)))` which raises `ValueError` on unknown values.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### Z2: `_backup_20260530/` has `__init__.py` — accidental import loads old code
|
||||
|
||||
Already covered by T14. Confirmed in this pass.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### Z3: `RealZincControlUnavailable` and `RealZincUnavailable` are separate classes with the same name — catching one does not catch the other
|
||||
|
||||
**File:** `__init__.py:44-45`
|
||||
|
||||
```python
|
||||
from .real_control_plane import RealZincUnavailable as RealZincControlUnavailable
|
||||
from .real_zinc_plane import RealZincUnavailable
|
||||
```
|
||||
|
||||
Two separate classes named `RealZincUnavailable` from different modules. `RealZincControlUnavailable` is an alias for the one from `real_control_plane.py`. If code catches `RealZincUnavailable`, it won't catch `RealZincControlUnavailable`, and vice versa. The launcher's try/except catches `Exception` broadly, so this is currently harmless — but any future specific exception handling would have a blind spot.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### Z4: `test_account_reconcile_faults.py` requires Rust shared library with no skip guard — crashes if Rust not built
|
||||
|
||||
**File:** `test_account_reconcile_faults.py` (multiple lines)
|
||||
|
||||
The test suite creates `ExecutionKernel(max_slots=4)`. `ExecutionKernel.__init__()` calls `_get_rust()` which calls `_RustKernelLib()` which runs `cargo build --release` and loads the `.so`. If Rust isn't installed or the build fails, the test raises `CalledProcessError` or `OSError` — no `pytest.skip()` guard.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### Z5: No health check endpoint, no liveness probe, no readiness indicator — silent failures invisible to orchestration
|
||||
|
||||
**File:** (entire codebase)
|
||||
|
||||
The system has no `/health` or `/ready` endpoint. When the Rust kernel panics (caught by `catch_unwind`, returns error), Zinc shared memory corrupts, or the BingX WebSocket silently disconnects, there is:
|
||||
- No alert mechanism
|
||||
- No heartbeat
|
||||
- No way for an orchestrator (Kubernetes, Nomad) to detect a dead process
|
||||
|
||||
The `ExchangeEvent.RECONNECTED` kind exists but nothing consumes it for health monitoring.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### Z6: `process_intent()` calls `venue.submit()` without exception handler — venue error bypasses Rust FSM entirely
|
||||
|
||||
**File:** `rust_backend.py:747-748`
|
||||
|
||||
```python
|
||||
emitted_events = self.venue.submit(intent) # no try/except
|
||||
```
|
||||
|
||||
If `venue.submit()` raises (network timeout, exchange error, connection refused), the exception propagates directly to the caller. The Rust kernel's FSM never sees it — no SLOT_BUSY, no diagnostic code, no slot state update. Compare with `on_venue_event()` which properly handles the Rust FFI result.
|
||||
|
||||
**Severity: High**
|
||||
|
||||
### Z7: `snapshot()` mixes Rust and Python accounting sources — `capital` and `k_capital` can diverge
|
||||
|
||||
**File:** `rust_backend.py:897-931`
|
||||
|
||||
The `snapshot()` method builds account data from both the Python `AccountProjection` (legacy) and the Rust `AccountState` (new). If they're out of sync (e.g., a venue event was processed in Rust but Python-side `_last_settled_pnl` not reconciled), `capital` and `k_capital` diverge. The fallback `rust_account.get("k_capital", self.account.snapshot.capital)` silently returns the Python value if the Rust key is missing, masking the issue.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### Z8: `BingxVenueAdapter.close()` sets `cls._EXECUTOR = None` before `executor.shutdown()` — concurrent `_get_executor()` can create new executor, orphan old one
|
||||
|
||||
**File:** `bingx_venue.py:264-276`
|
||||
|
||||
```python
|
||||
executor = cls._EXECUTOR
|
||||
cls._EXECUTOR = None # <-- concurrent call sees None, creates new executor
|
||||
executor.shutdown(wait=False) # <-- old executor orphaned if new one was created
|
||||
```
|
||||
|
||||
Between the null assignment and `shutdown()`, a concurrent `_get_executor()` call (on another thread) sees `cls._EXECUTOR is None`, acquires the lock, and creates a new executor. When control returns, `executor.shutdown()` shuts down the old executor — but the new one was already created and returned to the concurrent caller. A small window, but technically a TOCTOU race.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### Z9: Generated test f-string uses `chr(34)` for embedded quotes — produces `SyntaxError` on Python < 3.12
|
||||
|
||||
**File:** `_gen_test.py:86-90` (generated output)
|
||||
|
||||
The generated test assertion uses:
|
||||
```python
|
||||
info[chr(34)+chr(34)]diagnostic[chr(34)+chr(34)]
|
||||
```
|
||||
which emits: `info["diagnostic"]` — correct. But the template uses nested f-string quotes inside `f"{...}..."` which only Python 3.12+ fully supports. The generated file itself is a plain Python file with `f"{...}"`, so it's correct — but the **template injection** via `chr(34)` is a code smell that could produce a `SyntaxError` in the generator on older Python.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### Z10: `launcher.py` uses Python 3.10+ `|` union syntax — no documented minimum Python version
|
||||
|
||||
**File:** `launcher.py:298-299`
|
||||
|
||||
```python
|
||||
venue_mode: Optional[LauncherVenueMode | str] = None,
|
||||
zinc_mode: Optional[LauncherZincMode | str] = None,
|
||||
```
|
||||
|
||||
The `X | Y` union syntax requires Python 3.10+. Combined with similar usage in 3+ other files, the codebase requires 3.10+. But there's no `python_requires`, `pyproject.toml`, or `setup.py` that documents this. A developer on Python 3.9 gets a `SyntaxError` at import time.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### Z11: `real_control_plane.py` and `real_zinc_plane.py` duplicate `_encode_packet`/`_decode_packet` — format change must update both
|
||||
|
||||
**Files:** `real_control_plane.py:43-67`, `real_zinc_plane.py:34-59`
|
||||
|
||||
Both files contain identical copies of `_encode_packet()` and `_decode_packet()` (plus `_json_default()`). If the shared memory wire format changes (e.g., adding a version tag or CRC), both copies must be updated.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### Z12: Concurrent `process_intent()` calls on same slot — no mutex, no queue, no serialization
|
||||
|
||||
**File:** `rust_backend.py:710-795`
|
||||
|
||||
`process_intent()` has no slot-level lock. Two concurrent calls with the same `slot_id` both call `self.venue.submit(intent)` and both write conflicting state to Zinc/projection. The Rust kernel's slot-busy check is race-adjacent — if both calls arrive before either completes, both pass the check, both submit to the exchange, and both overwrite slot state.
|
||||
|
||||
In the async single-threaded event loop this doesn't trigger, but the `BingxVenueAdapter._run()` method runs on a thread pool, creating a window where a second coroutine can enter `process_intent()` while the first is awaiting the HTTP response.
|
||||
|
||||
**Severity: Medium**
|
||||
|
||||
### Z13: `test_alpha_blue_untouched_g7.py` uses hardcoded absolute path for source reading
|
||||
|
||||
**File:** `test_alpha_blue_untouched_g7.py:34`
|
||||
|
||||
```python
|
||||
src = open("/mnt/dolphinng5_predict/prod/clean_arch/dita_v2/gen2.py").read()
|
||||
```
|
||||
|
||||
Hardcoded absolute path. Breaks on any other machine.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
### Z14: No exchange timestamp monotonicity check in WebSocket stream — silent gap detection missing
|
||||
|
||||
**File:** `bingx_user_stream.py` (entire file)
|
||||
|
||||
The `_normalise()` function extracts `frame.get("E")` (exchange timestamp in ms) but it's only used as `exchange_ts` in `ExchangeEvent`. No code checks that timestamps are monotonically increasing or detects gaps. If the exchange restarts and resets its event counter, or if frames arrive out of order (network reordering), there's no detection.
|
||||
|
||||
**Severity: Low**
|
||||
|
||||
---
|
||||
|
||||
## Pass 23 Summary
|
||||
|
||||
| # | Flaw | Layer | Severity |
|
||||
|---|------|-------|----------|
|
||||
| Z1 | `_safe_enum` fix applied to `rust_backend.py` but NOT `real_zinc_plane.py` — other copy crashes | Bridge | **High** |
|
||||
| Z2 | `_backup_20260530/` has `__init__.py` — accidental import loads old code | Repo | Medium |
|
||||
| Z3 | `RealZincControlUnavailable` and `RealZincUnavailable` separate classes | Bridge | Low |
|
||||
| Z4 | `test_account_reconcile_faults.py` requires Rust lib with no skip guard | Test | Low |
|
||||
| Z5 | No health check endpoint — silent failures invisible to orchestration | Ops | **High** |
|
||||
| Z6 | `process_intent()` calls `venue.submit()` without exception handler | Bridge | **High** |
|
||||
| Z7 | `snapshot()` mixes Rust and Python accounting — capital values can diverge | Bridge | Medium |
|
||||
| Z8 | `BingxVenueAdapter.close()` executor null-to-shutdown TOCTOU race | Venue | Medium |
|
||||
| Z9 | Generated test f-string `chr(34)` template — `SyntaxError` risk on old Python | Test | Medium |
|
||||
| Z10 | `launcher.py` uses Python 3.10+ `\|` union syntax — no min version documented | Config | Medium |
|
||||
| Z11 | `_encode_packet`/`_decode_packet` duplicated in two files | Plane | Low |
|
||||
| Z12 | Concurrent `process_intent()` on same slot — no lock, no queue | Bridge | Medium |
|
||||
| Z13 | `test_alpha_blue_untouched_g7.py` hardcoded absolute path | Test | Low |
|
||||
| Z14 | No exchange timestamp monotonicity check in WS stream | Venue | Low |
|
||||
|
||||
### Pass 23 Severity
|
||||
|
||||
| Severity | Count |
|
||||
|----------|-------|
|
||||
| **High** | 3 (Z1, Z5, Z6) |
|
||||
| Medium | 6 (Z2, Z7, Z8, Z9, Z10, Z12) |
|
||||
| Low | 5 (Z3, Z4, Z11, Z13, Z14) |
|
||||
|
||||
### Combined Catalog (All 23 Passes)
|
||||
|
||||
| Pass | Focus | Count | Critical | High | Medium | Low | Info |
|
||||
|------|-------|-------|----------|------|--------|-----|------|
|
||||
| A | Architectural | 15 | 0 | 2 | 0 | 2 | 11 |
|
||||
| T | Threading/Atomicity | 9 | 1 | 3 | 3 | 2 | 0 |
|
||||
| E | E2E Trace (Pass 1) | 26 | 0 | 4 | 10 | 11 | 1 |
|
||||
| F | Deep E2E (Pass 3) | 30 | 0 | 1 | 8 | 17 | 4 |
|
||||
| G | Domain Scans (Pass 4) | 36 | 4 | 11 | 11 | 8 | 2 |
|
||||
| H | Edge Domains (Pass 5) | 22 | 3 | 9 | 5 | 4 | 1 |
|
||||
| I | Pass 6 (Math/Tests/Recovery/Security) | 22 | 3 | 11 | 4 | 2 | 2 |
|
||||
| J | Pass 7 (Test Infra/Data/Rust/Env/Conn) | 16 | 0 | 7 | 7 | 2 | 0 |
|
||||
| K | Pass 8 (Observability/Memory/Time/DeadCode) | 23 | 2 | 7 | 7 | 1 | 6 |
|
||||
| L | Pass 9 (Contracts/Events/Network/FFI/Diffs) | 16 | 0 | 4 | 8 | 4 | 0 |
|
||||
| M | Pass 10 (Runtime/TestBugs/FSM/Persistence/Metrics) | 18 | 3 | 7 | 5 | 3 | 0 |
|
||||
| N | Pass 11 (Async/Sync Seams/Locks/Threading) | 10 | 4 | 1 | 3 | 1 | 1 |
|
||||
| O | Pass 12 (Sync/Async Wider Scope) | 11 | 0 | 3 | 7 | 1 | 0 |
|
||||
| P | Pass 13 (FFI Safety/Dangling Pointers/Coverage) | 9 | 1 | 3 | 3 | 1 | 1 |
|
||||
| Q | Pass 14 (Serde Edges/Backup Diffs/Market Data) | 12 | 0 | 4 | 3 | 2 | 3 |
|
||||
| R | Pass 15 (Resource Leaks/Trust Boundaries/Security) | 14 | 2 | 6 | 3 | 2 | 1 |
|
||||
| S | Pass 16 (Error Handling/Arithmetic/Test Infra) | 16 | 4 | 7 | 5 | 0 | 0 |
|
||||
| T | Pass 17 (Unsafe Review/Dead Code/Build/Protocols) | 14 | 0 | 5 | 5 | 4 | 0 |
|
||||
| U | Pass 18 (Rust Test Gaps/Accounting/FFI Types) | 14 | 3 | 4 | 4 | 3 | 0 |
|
||||
| V | Pass 19 (Lifecycle/Rust Subtleties/Test Infra) | 14 | 5 | 2 | 4 | 3 | 0 |
|
||||
| W | Pass 20 (Config/Math Signs/BingX Protocol) | 14 | 4 | 7 | 3 | 0 | 0 |
|
||||
| X | Pass 21 (Rust Build/Deps/Python Packaging/Shared Mem) | 14 | 3 | 5 | 6 | 0 | 0 |
|
||||
| Y | Pass 22 (Serde Round-Trip/Mock Fidelity/Protocol) | 14 | 1 | 4 | 5 | 4 | 0 |
|
||||
| Z | Pass 23 (Closure Review/Unfinished Fixes/Ops Gaps) | 14 | 0 | 3 | 6 | 5 | 0 |
|
||||
| **Total** | | **403** | **43** | **120** | **120** | **64** | **37** |
|
||||
|
||||
Reference in New Issue
Block a user