Inventory: usage rules, sofascore-shot-xg derived reference impl, cleanups
Three changes: 1. CLAUDE.md gains a "How to use the inventory — the workflow" subsection with
Three changes:
- CLAUDE.md gains a "How to use the inventory — the workflow" subsection with
the explicit 5-step lookup ladder (cat INVENTORY.md → npm run data:status → per-league filter → jq → propose only after quoting). Codifies the anti-pattern: never grep to answer "do we have X data?" — always check the inventory first.
- New file USE_DATA_INVENTORY.md — copy-pasteable instructions for any future
session about to investigate "do we have X data". Explains the workflow, the 5 steps, what the inventory tracks, and common anti-patterns to avoid.
- New scripts/compute-sofascore-shot-xg.ts — first reference implementation of
the getOrCompute() derived-data pattern. Reads sofascore_shots from Postgres, converts (x,y) Sofascore→StatsBomb, runs the v1 pre-shot xG model via lib/xg-model/inference.ts, aggregates per-match home/away xG totals, and writes data/derived/sofascore-shot-xg/v1-{contentHash}.json with a manifest. Inputs declared (model file content-hash + sofascore_shots count+max-id stamp) so reruns hit the cache when nothing changed.
Currently uses v1 model because the v3 model file (python/xg-model/models/xg_universal_v3.json) does not exist anywhere in this repo or on disk — only v1 is committed. When v3 is exported, swap MODEL_PATH and bump VERSION constant in the script. Roadmap entry in data/ROADMAP.md tracks that follow-up.
- Inventory generator registers sofascore-shot-xg as a derived source, adds
cloud-lab to IGNORED_DIRS (it's runtime guardian state, not raw data). Regenerated INVENTORY.json/md: 17 sources tracked (was 16), 0 unregistered.
All 9 inventory CI tests pass. Verified in a fresh worktree from origin/main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>