PINK DITAv2: kernel-level finiteness guard (no more null-string crash on inf/NaN)

The aborted hard cutover crash-looped with "Rust kernel returned null string" from
process_intent on the first live trading step. Root cause (reproduced): a non-finite
(inf/NaN) numeric field reaching the kernel — Python json.dumps emits the Infinity/NaN
token, serde_json rejects it at parse, and the FFI returned null. Magnitude is fine;
only finiteness was the problem.

Defense in depth, kernel catches it:
- Rust FFI (lib.rs): dita_kernel_process_intent_json / _on_venue_event_json now return
  a clean INVALID_INTENT KernelResult on parse failure (incl. Infinity/NaN tokens) AND
  on serialize failure (a non-finite produced internally) — never a null string.
- Python bridge (rust_backend.py): ExecutionKernel.process_intent validates intent
  finiteness/bounds (target_size, reference_price, limit_price, leverage, exit_leg_ratios;
  size>=0) BEFORE the FFI and rejects INVALID_INTENT, naming the offending field+value.
- contracts.py: add KernelDiagnosticCode.INVALID_INTENT.
- pink_direct.py: on INVALID_INTENT, log full upstream provenance (snapshot.price,
  capital, leverage, sizes) so the numerical SOURCE can be located on the next live run.
- on_venue_event bridge tolerates the fallback's null slot (uses the live slot).

Verified: kernel recompiled; offline 65 + 7 new guard tests green (no regression);
direct-FFI inf payload -> INVALID_INTENT (no null crash). NOTE: this turns the cutover
crash into a clean rejection — the upstream source of the non-finite (the live run's
inf) still needs locating, now aided by the provenance log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Codex
2026-05-31 09:10:13 +02:00
parent 0c15a7698e
commit 9168cf0759
5 changed files with 193 additions and 7 deletions

View File

@@ -14,6 +14,7 @@ from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Sequence
import ctypes
import json
import math
import os
import subprocess
import sys
@@ -281,6 +282,29 @@ def _slot_from_payload(payload: Dict[str, Any]) -> TradeSlot:
)
def _first_invalid_intent_field(intent: KernelIntent) -> Optional[tuple[str, float]]:
"""Return (field, value) for the first non-finite or out-of-bounds numeric
field on an intent, or None if all are sane. Guards the kernel boundary
against inf/NaN that would otherwise crash serde_json serialization."""
scalar_checks = (
("target_size", float(intent.target_size if intent.target_size is not None else 0.0)),
("reference_price", float(intent.reference_price if intent.reference_price is not None else 0.0)),
("leverage", float(intent.leverage if intent.leverage is not None else 0.0)),
("limit_price", float(getattr(intent, "limit_price", 0.0) or 0.0)),
)
for name, value in scalar_checks:
if not math.isfinite(value):
return (name, value)
for idx, ratio in enumerate(intent.exit_leg_ratios or ()): # type: ignore[union-attr]
rv = float(ratio if ratio is not None else 0.0)
if not math.isfinite(rv):
return (f"exit_leg_ratios[{idx}]", rv)
size = float(intent.target_size if intent.target_size is not None else 0.0)
if size < 0.0:
return ("target_size", size)
return None
def _intent_to_payload(intent: KernelIntent) -> Dict[str, Any]:
return {
"timestamp": intent.timestamp.isoformat() if hasattr(intent.timestamp, "isoformat") else str(intent.timestamp),
@@ -561,6 +585,29 @@ class ExecutionKernel:
diagnostic_code=KernelDiagnosticCode.INVALID_SLOT_ID,
details={"reason": "INVALID_SLOT_ID", "slot_id": int(intent.slot_id), "intent_id": intent.intent_id},
)
# Finiteness / sanity guard at the kernel boundary. A non-finite (inf/NaN)
# numeric field would make the Rust core's serde_json serialization return
# a null string (panic). Reject cleanly with INVALID_INTENT instead, naming
# the offending field + value so the upstream numerical source can be located.
bad_field = _first_invalid_intent_field(intent)
if bad_field is not None:
name, value = bad_field
return KernelOutcome(
accepted=False,
slot_id=int(intent.slot_id),
trade_id=intent.trade_id,
state=self._get_slot(int(intent.slot_id)).fsm_state,
diagnostic_code=KernelDiagnosticCode.INVALID_INTENT,
severity=KernelSeverity.WARNING,
details={
"reason": "INVALID_INTENT",
"field": name,
"value": str(value),
"intent_id": intent.intent_id,
"action": intent.action.value,
"asset": intent.asset,
},
)
payload = _intent_to_payload(intent)
result = _get_rust().process_intent(
self._backend,
@@ -648,7 +695,10 @@ class ExecutionKernel:
verbosity=_enum_text(self.control.verbosity),
)
outcome = _outcome_from_payload(result["outcome"])
slot = _slot_from_payload(result["slot"])
# An INVALID_* fallback result carries a null slot; fall back to the
# kernel's current slot so settlement/bookkeeping stays consistent.
slot_payload = result.get("slot")
slot = _slot_from_payload(slot_payload) if slot_payload else self._get_slot(int(outcome.slot_id))
self.state.refresh()
incremental_pnl = slot.realized_pnl - self._last_settled_pnl.get(slot.slot_id, 0.0)
if abs(incremental_pnl) > 1e-12: