6d08e97e285773e349110d16d8bccee5b3655e8b
Seven uncommitted production fixes to BLUE's main runner that the LIVE
process has already been running since the 2026-06-15 17:23 restart (file
mtime 17:17, pid started 17:23). Each fix answers a documented incident;
committing now so they survive in history and a stray checkout can't
silently revert running-config code on the next restart.
1. bars_held = max(0, int(...)) at BOTH journal sites (terminal + sub-day).
CH column is UInt16 — a negative value poisons the spool with a
head-of-line jam (incident 2026-06-12: bars_held=-106).
2. entry_bar = int(restored_entry_bar) at BOTH reconstruction sites; NEVER
from chain_meta. trade_reconstruction payloads carry the DEAD session's
bar counter, so the old override reinstated the stale clock frame the
re-anchor exists to fix → negative bars_held → same UInt16 spool poison
(zombie-trade resurrections, incident 2026-06-12). restored_entry_bar
already encodes hold continuity via stored_bars in THIS session's frame.
3. capital parse handles list/ledger-style payloads: when the restore blob
is a list of update rows, take the latest dict row instead of falling
through to {} and losing the capital anchor.
4. _connect_hz routes the `hazelcast` logger to stderr at INFO. The
silent-HZ-death investigation found ZERO client log lines because
nothing routed them; without this the reactor's health is invisible.
5. _dump_blackbox(reason): forensic thread dump before a watchdog restart —
lifecycle.is_running, active_connections, every thread's stack, and a
flag when any hazelcast/reactor-named thread is MISSING (= reactor died,
the prime suspect for the silent 40min–8h client deaths). print()-only,
CIFS-safe. _watchdog_restart calls it first.
6. _drain_runtime_commands / _process_runtime_commands gain
`*, allow_retract=True`; the heartbeat path drains with
allow_retract=False and re-queues any RETRACT commands. A RETRACT can
force a terminal close that must run through the scan-thread close
finalizer, so the heartbeat must not race it.
7. +import traceback (for the black-box stack dumps).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Description
No description provided
Languages
Python
100%