Skip to content

v4

Mirrored from benchmarks/graph/v4/README.md. Edit the source document in the repository, not this generated page.

Status: FROZEN snapshot of round 4, frozen 2026-04-25. Do not modify files in this directory except for explicit benchmark-result errata.

v4 is a diagnostic round, not a keep/cull round. v3 settled retention of the agent-facing orbit_graph_* MCP surface; v4 maps where the surface helps, where it hurts, and how it fails — so future tool-shaping work has measured targets.

Re-running a single cell against frozen v4:

Terminal window
GRAPH_VERSION=v4 python3 benchmarks/graph/scripts/run.py \
--provider codex --arm hybrid --task callers-2hop-graphbenchpolicy --seed 1

See ../../CONVENTIONS.md for version-freeze rules.