Phase 2: disjoint queue.ts/compute-worker.ts split
Makes the web container's queue.ts handle ONLY cloud-lab jobs and the compute-worker container handle ONLY local jobs. Both read the same data/compute-jobs/jobs.json but filter on job.target so they never pick
Makes the web container's queue.ts handle ONLY cloud-lab jobs and the compute-worker container handle ONLY local jobs. Both read the same data/compute-jobs/jobs.json but filter on job.target so they never pick each other's work.
Why: the Apr 11 "factorial all zeros" failure, reproduced live today at 12:01:54 UTC when compute-worker polled jobs.json and started a cloud-lab-targeted solver-cache job locally in its own container (PID 390). The compute-worker lacks cloud-lab dispatch logic entirely, so running cloud-lab jobs there runs them in the wrong environment. For factorial it meant no input data -> all zeros -> marked "completed" via exit code 0.
Changes:
- compute-worker.ts:tick() filters
j.target !== "cloud-lab". Adds
target field to the local ComputeJob interface (it was silently missing — the root of the bug).
- queue.ts:processQueue() filters
j.target === "cloud-lab". Web
container no longer spawns local jobs.
- Deleted dormant code: autopilot-runner.ts, mega-queue.ts,
cloud-autopilot.sh. These were the original "Phase 2" target but they were already dead (nothing triggers them); the actual race was between queue.ts and compute-worker.ts.
- CLAUDE.md compute architecture section updated with the split and
the new docker run command (adds ssh-keys volume + env vars).
Does NOT restart the running compute-worker container. Treat C (a benign race-victim currently generating real solver-cache output locally) keeps running until completion. The new filter takes effect on next compute-worker restart.
Refs: docs/specs/cloud-lab-apr11-retro-and-rebuild.md RC6
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>