Surfaced via NVIDIA/NemoClaw#5662 (native-Linux GPU onboard). During NemoClaw's GPU-patch recreate (a docker stop + docker run to add device passthrough), openshell sandbox list / get reports the sandbox in a terminal Error phase while the underlying container is running, healthy, and exit_code=0.
NemoClaw only reads the reported phase and its own code documents the ownership boundary: the preferred fix lives at the OpenShell gateway/supervisor, and a NemoClaw-side health-aware retry was explicitly rejected (src/lib/onboard/docker-gpu-supervisor-reconnect.ts). NemoClaw #4316 / #4407 already fixed the classification and timing on the NemoClaw side (the fast-fail message), so this report is specifically the upstream condition.
Expected: the supervisor phase reflects the healthy / recreating container during a stop+run recreate rather than surfacing a terminal Error.
Repro context: native Linux, GPU-enabled sandbox, during the GPU-patch recreate window; reporter diagnostics show phase Error while the container is running/healthy with exit_code=0.
Surfaced via NVIDIA/NemoClaw#5662 (native-Linux GPU onboard). During NemoClaw's GPU-patch recreate (a
docker stop+docker runto add device passthrough),openshell sandbox list/getreports the sandbox in a terminalErrorphase while the underlying container is running, healthy, andexit_code=0.NemoClaw only reads the reported phase and its own code documents the ownership boundary: the preferred fix lives at the OpenShell gateway/supervisor, and a NemoClaw-side health-aware retry was explicitly rejected (
src/lib/onboard/docker-gpu-supervisor-reconnect.ts). NemoClaw #4316 / #4407 already fixed the classification and timing on the NemoClaw side (the fast-fail message), so this report is specifically the upstream condition.Expected: the supervisor phase reflects the healthy / recreating container during a stop+run recreate rather than surfacing a terminal
Error.Repro context: native Linux, GPU-enabled sandbox, during the GPU-patch recreate window; reporter diagnostics show phase
Errorwhile the container is running/healthy withexit_code=0.