Cloud-lab v3 retro + Phase 0/1 fixes
Phase 0 (cloud-lab VM): fix LAST=0 idle-check bug that was killing the VM 5 min after every boot. /root/auto-shutdown.sh and /root/watchdog.sh now handle missing /tmp/last-job-activity explicitly (initialize + exit 0)
Phase 0 (cloud-lab VM): fix LAST=0 idle-check bug that was killing the VM 5 min after every boot. /root/auto-shutdown.sh and /root/watchdog.sh now handle missing /tmp/last-job-activity explicitly (initialize + exit 0) instead of computing IDLE = NOW - 0 = 55 years. Scripts + @reboot cron checked in under scripts/cloud-lab-bootstrap/ so VM rebuilds can re-install them.
Phase 1 (compute-worker): solver-cache/backtest/model-sweep-solve timeouts 12h → 24h with measured-workload comment. SSH key path now CLOUD_LAB_SSH_KEY_PATH env var (default unchanged). waitForSSH first-try window 15s → 60s to cover cloud-init startup. Dockerfile installs rsync alongside openssh-client — syncCloudLabResults was silently failing with "rsync: not found" on every cloud-lab job.
Smoke test (scripts/cloud-lab-smoke-test.ts): end-to-end validation of the three things that broke on Apr 11 — SSH reachability from the compute-worker container, rsync binary presence, and fire+sync round-trip. Run before letting production traffic onto any further compute-worker changes.
Full retro + rebuild plan at docs/specs/cloud-lab-apr11-retro-and-rebuild.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>