Skip to content

Knowledge Graph — Overview

Mirrored from docs/design/knowledge-graph/1_overview.md. Edit the source document in the repository, not this generated page.

The knowledge graph is Orbit’s durable, queryable codebase map: a content-addressed, branch-scoped tree of directories, files, and extracted symbols annotated with task history. It sits between raw files and the agent prompt so agents can ask “where is AgentRuntime defined?” or “which symbols did task [T20260421-0528] touch?” without re-reading the repo from scratch. This compaction pass [T20260430-22] keeps the overview focused on entry-point concepts and leaves mechanism detail to 2_design.md.

This document is the entry point. 2_design.md specifies the current implementation in detail; 3_vision.md captures open questions and direction.


Unassisted agents mostly navigate with grep, find, and full-file reads. Each has a sharp edge:

  1. Grep is string-level. It mixes call sites, comments, fields, and unrelated names.
  2. Full-file reads bloat context. A 30-line answer often arrives wrapped in hundreds of irrelevant lines.
  3. Cold sessions forget structure. Module layout, trait hierarchies, and import graphs get re-derived every turn.

An LSP solves #1 and parts of #2 but is hostile to agent loops: it is stateful, process-local, and returns results tuned for IDE UX (hover text, rename previews) rather than for token-efficient prompt assembly.

The graph precomputes that structure once per branch, persists it, and exposes prompt-shaped query tools.


The graph has three node kinds:

  • DirNode — a directory with child dirs/files; the root id derives from ".".
  • FileNode — a source file with extracted leaves plus a source_blob_hash.
  • LeafNode — an extracted symbol or doc/config/table leaf with source span, hash, signature fields, and history.

All three share BaseNodeFields: id, identity_key, location, language, description, parent_id, lock state, and a sorted, deduplicated task_ids list attributed from commit history ([T20260421-0528]).

The rename from leaf to symbol at the tool surface happened under [T20260411-0424]; the in-code type is still LeafNode because the internal vocabulary predates that decision and the rename has not yet reached the type name.

Every node has three independent keys with different stability properties:

KeyRoleStability
idPrimary reference within a graph snapshotStable across rebuilds of the same repo state
identity_keyCross-build lineage used by the working graph to track renames and re-identificationStable across rebuilds
object_hashContent hash of the serialized nodeChanges whenever any field changes

Ids are derived from kind + location + discriminator. Object hashes drive deduplication in the object store.

Storage follows a git-style split:

.orbit/knowledge/graph/
├── objects/<hh>/<hash>.json immutable, content-addressed node bodies
├── blobs/<hh>/<hash>.txt immutable, content-addressed file/symbol source
├── index/by-id/<root-graph-hash>.json immutable per-build index
└── refs/heads/<branch>.json mutable branch ref → active index

A rebuild writes new immutable objects/blobs and index, then atomically swings the branch ref. Refs are the only mutable surface, so concurrent builds on different worktrees cannot corrupt each other ([T20260421-0358]). See specs/refs.md.

The current public graph surface is read-only. Agents use it to inspect, search, and pack context, while write coordination happens before dispatch through task context_files and orbit.task.locks.reserve preflight guards. Those guards operate at the workspace/task plane rather than inside a branch-local graph ref, which keeps them meaningful when agents work in separate worktrees.

The in-memory working graph (crates/orbit-knowledge/src/working_graph) remains an internal/deferred mutation substrate. Public graph write tools are absent because branch-local graph locks do not coordinate independent worktrees.

Each node carries a task_ids list. These are populated by the history-walker stage (pipeline::history, introduced in [T20260421-0528]) which parses \[T\d{8}-\d+(?:-\d+)*\] tags out of commit messages, maps hunks to symbols at the commit’s tree, and accumulates the result onto the current graph. A merge commit where both sides touched the same symbol sets structural_conflict: true — informational only; git already resolved the textual conflict.


ConcernWhere it livesPrimary task ID
Crate boundarycrates/orbit-knowledge[T20260411-0008], [T20260411-0424]
Storage layoutsrc/graph/object_store.rs[T20260421-0358]
Build pipelinesrc/pipeline/[T20260411-0424], [T20260417-0639], [T20260426-0139]
History attributionsrc/pipeline/history.rs[T20260421-0528]
Query servicessrc/service/[T20260412-0645-2], [T20260412-0645-3]
Working graphsrc/working_graph/[T20260411-0424]
Lockingsrc/lock.rs[T20260411-0424], [T20260417-0301-2]
Refresh safetysrc/pipeline/mod.rs[T20260417-0307], [T20260416-0719]

  • [T20260411-0008] — Extract orbit-knowledge crate from orbit-tools.
  • [T20260411-0424] — Consolidate orbit-knowledge; prototype graph editing internals (add/delete/move); introduce shared file-based lock store; add show/search CLI; rename leaf to symbol at the tool surface.
  • [T20260412-0645-2] — Add compact orbit.graph.overview output for large repos.
  • [T20260412-0645-3] — Architectural graph navigation: deps, implementors, callers.
  • [T20260416-0719] — Recover from a corrupted default knowledge graph store.
  • [T20260417-0301-2] — Harden graph lock/write/read tooling.
  • [T20260417-0307] — Gate and guard graph refresh/search hot paths.
  • [T20260417-0639] — Speed up workspace-init graph persistence hot path.
  • [T20260421-0358] — Scope graph refs by branch.
  • [T20260421-0528]task_ids schema on every node + git history walker for attribution.
  • [T20260426-0139] — Parallelize per-file hashing and leaf extraction while preserving deterministic graph output.
  • [T20260426-0453] — Remove graph write operations from the public tool/MCP surface and use task lock reservations as preflight write guards.
  • [T20260428-1] — Align graph task-ID attribution/search with current unpadded task IDs.
  • [T20260430-22] — Compact the knowledge-graph design docs and remove duplicate top-level narrative.

Resolve any task above with orbit task show <ID> or git log --grep=<ID>.