Commit Graph

5 Commits

Author SHA1 Message Date
Codex
6d08e97e28 BLUE hardening: spool-poison guards, dead-session clock fix, HZ black-box, RETRACT race-safety
Seven uncommitted production fixes to BLUE's main runner that the LIVE
process has already been running since the 2026-06-15 17:23 restart (file
mtime 17:17, pid started 17:23). Each fix answers a documented incident;
committing now so they survive in history and a stray checkout can't
silently revert running-config code on the next restart.

1. bars_held = max(0, int(...)) at BOTH journal sites (terminal + sub-day).
   CH column is UInt16 — a negative value poisons the spool with a
   head-of-line jam (incident 2026-06-12: bars_held=-106).

2. entry_bar = int(restored_entry_bar) at BOTH reconstruction sites; NEVER
   from chain_meta. trade_reconstruction payloads carry the DEAD session's
   bar counter, so the old override reinstated the stale clock frame the
   re-anchor exists to fix → negative bars_held → same UInt16 spool poison
   (zombie-trade resurrections, incident 2026-06-12). restored_entry_bar
   already encodes hold continuity via stored_bars in THIS session's frame.

3. capital parse handles list/ledger-style payloads: when the restore blob
   is a list of update rows, take the latest dict row instead of falling
   through to {} and losing the capital anchor.

4. _connect_hz routes the `hazelcast` logger to stderr at INFO. The
   silent-HZ-death investigation found ZERO client log lines because
   nothing routed them; without this the reactor's health is invisible.

5. _dump_blackbox(reason): forensic thread dump before a watchdog restart —
   lifecycle.is_running, active_connections, every thread's stack, and a
   flag when any hazelcast/reactor-named thread is MISSING (= reactor died,
   the prime suspect for the silent 40min–8h client deaths). print()-only,
   CIFS-safe. _watchdog_restart calls it first.

6. _drain_runtime_commands / _process_runtime_commands gain
   `*, allow_retract=True`; the heartbeat path drains with
   allow_retract=False and re-queues any RETRACT commands. A RETRACT can
   force a terminal close that must run through the scan-thread close
   finalizer, so the heartbeat must not race it.

7. +import traceback (for the black-box stack dumps).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 12:03:20 +02:00
Codex
520257de7a BLUE: TP_FLOOR profit-floor + malformed-OPEN Option A (first add of trader)
TP_FLOOR (LINK 5e05eeeb, -$1,248.71): once the BASE 0.20% TP is crossed,
regression to base exits — caps the left tail of the OB cascade x1.40
TP-widening (which is logged per decision now: dynamic_tp_pct,
tp_mod_factor, cascade_count, ob_regime_signal, tp_floor_armed on
v7_decision_events). Class default OFF (champion parity); live ON via
DOLPHIN_TP_FLOOR.

Malformed-OPEN Option A (causal fix): POSITION_DUST_NOTIONAL_USD shared by
the full-close decision and the single _ps_write_open lifecycle gate (OPEN
rows can never round to zero size on disk); retract terminal leg writes its
trade_exit_legs + trade_reconstruction rows; restore reject-exhaustion halts
for unknown-corruption classes and flat-continues only for the documented
zero-size tombstone class; chain-token mismatch emits a CHAIN_TOKEN_MISMATCH
journal event; restored entry_bar preserves bars_held continuity (negative
entry_bar allowed, Int32) in both CH and HZ restore paths.

Tests: test_tp_floor.py 16/16 incl. LINK golden replay;
test_malformed_open_distal.py 11/11. Suites before/after identical except
one PRE-EXISTING failure fixed (full-close zero-size-row test).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 14:59:49 +02:00
Codex
e7eaa88ce1 PINK Phase 0 and 1: VST WS confirmed plus AccountSnapshotV2 account core 2026-06-01 20:11:03 +02:00
Codex
34d01fe6a4 Add BingX sandbox status sidecar 2026-05-13 19:56:58 +02:00
Codex
0d70c767e4 Wire long-capable prod alpha path 2026-05-08 21:16:53 +02:00