v4
Mirrored from
benchmarks/graph/v4/README.md. Edit the source document in the repository, not this generated page.
Status: FROZEN snapshot of round 4, frozen 2026-04-25. Do not modify files in this directory except for explicit benchmark-result errata.
v4 is a diagnostic round, not a keep/cull round. v3 settled retention of the agent-facing orbit_graph_* MCP surface; v4 maps where the surface helps, where it hurts, and how it fails — so future tool-shaping work has measured targets.
- Method + pre-registered report shape:
METHOD.md - Report:
RESULTS.md - Fixtures:
./tasks/(12 NEW fixtures; no v1/v2/v3 carries) - Synthetic-fixture code island:
./_fixture_code/(re-included in.orbitignorevia narrow negation; see METHOD) - Frozen run data:
./runs/ - Shared scripts:
../scripts/ - Shared harness overview:
../README.md - Prior frozen rounds:
../v3/,../v2/,../v1/ - Follow-up validation round:
../v5/ - Closing entry on the v1–v3 evidence series:
../../../docs/design/knowledge-graph/5_null_result.md
Re-running a single cell against frozen v4:
GRAPH_VERSION=v4 python3 benchmarks/graph/scripts/run.py \ --provider codex --arm hybrid --task callers-2hop-graphbenchpolicy --seed 1See ../../CONVENTIONS.md for version-freeze rules.