Knowledge Graph Benchmarks
Mirrored from
benchmarks/graph/README.md. Edit the source document in the repository, not this generated page.
Measures how much navigation budget an agent spends on the same task under three tool surfaces: no-graph (shell only), graph-only (Orbit graph MCP tools only), or hybrid (both).
See docs/design/knowledge-graph/ for the graph itself.
Series closed. Read RESULTS.md for the cross-round synthesis and findings.
Rounds
Section titled “Rounds”| Version | Scope | Report |
|---|---|---|
| v1 | Initial baseline | RESULTS.md |
| v2 | Extended fixtures | RESULTS.md |
| v3 | Calibrated cost; published null result | RESULTS.md |
| v4 | Diagnostic round, 192 planned cells plus Codex post-fix graph-only rerun | RESULTS.md |
| v5 | Feature validation (source_regex), 9 cells | RESULTS.md |
All rounds frozen.
Reproducing
Section titled “Reproducing”# Single cellGRAPH_VERSION=v4 benchmarks/graph/scripts/run.sh graph-only reverse-export-orbit-error 1 --provider codex
# SweepGRAPH_VERSION=v4 python3 benchmarks/graph/scripts/sweep.py --provider codex --arms graph-only --n 3
# AggregateGRAPH_VERSION=v4 python3 benchmarks/graph/scripts/aggregate.py \ --runs benchmarks/graph/v4/runs --tasks benchmarks/graph/v4/tasksOutputs
Section titled “Outputs”benchmarks/graph/<version>/runs/<provider>/<arm>/<task_id>/<seed>.jsonbenchmarks/graph/<version>/runs/_sweeps/<provider>/<sweep_id>/order.jsonRecords (<seed>.json) include verdict, token counts, wall time, and tool-call histogram. v1-v4 retain full transcripts (<seed>.transcript.json); v5 is records-only.
Conventions
Section titled “Conventions”Version freezing rules and round structure: ../CONVENTIONS.md.
Directory Layout
Section titled “Directory Layout”benchmarks/graph/├── README.md # this file (shared across versions)├── v1/ # FROZEN round 1│ ├── README.md # version-specific banner│ ├── METHOD.md│ ├── RESULTS.md│ ├── mcp.json│ ├── scripts/ # harness as it ran│ ├── tasks/ # fixtures as they were graded│ └── runs/ # per-cell records├── v2/ # FROZEN round 2│ └── ...├── v3/ # FROZEN round 3│ └── ...├── v4/ # FROZEN round 4│ └── ...└── v5/ # FROZEN round 5 └── ...