Spec: Leaf ID Uniqueness
Mirrored from
docs/design/knowledge-graph/specs/leaf-id-uniqueness.md. Edit the source document in the repository, not this generated page.
Every extractor that emits ExtractedLeaf records must finish with leaf IDs that are unique within the extracted file. The selector envelope remains symbol:{path}#{qualified_name}:{kind}; the contract is that no two leaves for the same path share the same (qualified_name, kind) after extractor finalization.
Why This Exists
Section titled “Why This Exists”The SQLite sidecar stores graph nodes keyed by node.id, while the JSON fallback keeps the full graph.leaves vector. When two leaves in one file derive the same ID, SQL collapses them and fallback preserves them, so read paths cannot be equivalent. This surfaced as duplicate-leaf carve-outs in search/show work and as graph.overview undercounts on real Python and Rust corpora. Leaf IDs therefore have to be unique before persistence, not repaired by SQL read paths.
1. Invariant
Section titled “1. Invariant”For every extracted file:
LeafNode::node_idderived fromsymbol:{path}#{qualified_name}:{kind}is unique for every leaf in that file.parent_qualified_nameandchildren_qualified_namespoint at finalized qualified names, not pre-finalization names.- The finalizer runs after language extraction and before pipeline parent-child wiring.
- A duplicate after finalization is an extractor bug. SQL writers and readers must not rely on
INSERT OR REPLACE, aggregation, or fallback-specific behavior to hide it.
The invariant is per file because path is already part of the leaf ID. Equivalently: within a file, (qualified_name, kind) is unique.
2. Scheme
Section titled “2. Scheme”The chosen scheme is language-natural qualifier + deterministic occurrence suffix.
Layer A: language-natural qualifiers
Section titled “Layer A: language-natural qualifiers”Each extractor first emits the most precise human-readable qualified name that the language syntax makes cheap and stable.
| Language | Required qualifier rule | Natural duplicate covered |
|---|---|---|
| Python | Classes and methods include the dotted enclosing class chain. A method save inside class User is User.save; a method inside nested class Outer.Inner is Outer.Inner.method. Top-level functions stay bare. | Same method name in different classes; nested class methods. |
| Rust | Inherent impl blocks use <Type>; trait impl blocks use <Type as Trait>. Methods under those blocks use <Type>::method or <Type as Trait>::method. | impl Foo and impl Trait for Foo in the same file; methods with the same name on those impls. |
| Java | Methods append parameter arity to the parent-qualified method name: Parent::method#arity. | Overloaded methods with different arity. |
| TypeScript / TSX | Class methods append parameter arity: Parent::method#arity. Top-level function overload declarations keep the function name and rely on Layer B occurrence suffixes. | Class method overloads with different arity; function overload declarations. |
Other extractors keep their existing language-natural qualified names, then still pass through Layer B. If a pre-flight uniqueness fixture finds a duplicate in Go, Kotlin, Ruby, JavaScript, C, or C#, that extractor should add a stable Layer A qualifier when the language syntax exposes one cheaply; otherwise the deterministic occurrence suffix is the required backstop.
Layer B: deterministic occurrence suffix
Section titled “Layer B: deterministic occurrence suffix”Every extractor calls finalize_unique_qualified_names(leaves: &mut [ExtractedLeaf]) as its last extraction step.
The finalizer:
- Groups leaves by
(qualified_name, kind)within the file. - Leaves singletons unchanged.
- Sorts each duplicate group deterministically by
(start_line, end_line, source_hash, original_index). - Keeps the first occurrence unchanged and rewrites the second and later occurrences to
{original}#{ordinal}, with ordinals starting at2. - Rewrites matching
parent_qualified_nameandchildren_qualified_namesreferences so parent-child wiring still points at the finalized names.
The suffix is an occurrence ordinal, not a source line number. It is intentionally the fallback of last resort: Layer A should handle the common human-readable cases, while Layer B makes the uniqueness invariant total for same-arity overloads, generated duplicate declarations, and future extractor regressions.
3. Language Coverage
Section titled “3. Language Coverage”| Duplicate shape | Example finalized names | Resolved by |
|---|---|---|
| Python class methods with the same name | User.save, Order.save | Python dotted enclosing class chain. |
| Python nested class method | Outer.Inner.run | Python dotted enclosing class chain. |
| Rust inherent impl plus trait impl for the same type | <Foo>, <Foo as Runnable> | Rust impl qualifier. |
| Rust methods with the same name across impl blocks | <Foo>::run, <Foo as Runnable>::run | Rust impl qualifier inherited by methods. |
| Java overloads with different arity | Client::connect#1, Client::connect#2 | Java arity suffix. |
| Java overloads with the same arity | Client::connect#1, Client::connect#1#2 | Java arity suffix, then Layer B. |
| TypeScript function overloads | load, load#2, load#3 | Layer B occurrence suffix. |
| TypeScript method overloads with different arity | Service::load#1, Service::load#2 | TypeScript method arity suffix. |
| TypeScript method overload signatures with the same arity | Service::load#1, Service::load#1#2 | TypeScript method arity suffix, then Layer B. |
4. Selector String Breakage
Section titled “4. Selector String Breakage”The selector envelope is unchanged, but the qualified_name portion is intentionally allowed to change. Callers must treat the text between the first selector # and the final :{kind} as opaque. In particular, the qualified portion may now contain ., ::, <...>, spaces from Rust trait-impl formatting, and additional # characters from arity or occurrence suffixes.
| Previous affected selector shape | New selector shape |
|---|---|
symbol:models.py#save:method | symbol:models.py#User.save:method or symbol:models.py#Order.save:method |
symbol:models.py#run:method for a nested class method | symbol:models.py#Outer.Inner.run:method |
symbol:lib.rs#Foo:impl | symbol:lib.rs#<Foo>:impl |
symbol:lib.rs#Foo:impl for a trait impl | symbol:lib.rs#<Foo as Runnable>:impl |
symbol:lib.rs#Foo::run:method | symbol:lib.rs#<Foo>::run:method |
symbol:lib.rs#Foo::run:method for a trait impl method | symbol:lib.rs#<Foo as Runnable>::run:method |
symbol:Client.java#Client::connect:method | symbol:Client.java#Client::connect#1:method or symbol:Client.java#Client::connect#2:method |
Repeated symbol:service.ts#load:function overload declarations | symbol:service.ts#load:function, symbol:service.ts#load#2:function, then higher occurrence suffixes |
Stored selectors for affected leaves are a breaking change after rebuild. Agents and scripts should rediscover affected selectors through orbit.graph.search, orbit.graph.show, or orbit.graph.pack instead of carrying old method/impl/overload selector strings forward.
4.1 In-repo selector-string sweep targets
Section titled “4.1 In-repo selector-string sweep targets”A 2026-05-10 pre-sweep using the task plan’s selector grep found hard-coded selector strings outside docs/, test-only paths, and fixture-only paths. The implementation pass must triage these categories:
| Area | Files | Required handling |
|---|---|---|
| Active selector examples | crates/orbit-cli/src/command/observe/graph.rs, crates/orbit-core/assets/skills/orbit-create-task/SKILL.md, crates/orbit-core/assets/skills/orbit-graph/SKILL.md, website/src/content/docs/concepts/knowledge-graph.md | Keep generic free-function examples if they remain valid; update any method/impl/overload examples that imply the old qualified-name grammar. |
| Selector parsing, lint, and batching code | crates/orbit-common/src/utility/selector.rs, crates/orbit-core/src/command/task/lint.rs, crates/orbit-engine/src/executor/automation/batch/dispatch.rs, crates/orbit-engine/src/executor/automation/batch/parallel.rs, crates/orbit-engine/src/executor/automation/duel/planning_duel/context_files.rs, crates/orbit-engine/src/executor/automation/vcs/commit/mod.rs | Preserve the envelope contract and add coverage for qualified names that contain extra # characters; do not parse language syntax out of the qualified portion. |
| Knowledge command/index expectations | crates/orbit-knowledge/src/commands/refs.rs, crates/orbit-knowledge/src/commands/search.rs, crates/orbit-knowledge/src/graph/object_store.rs, crates/orbit-knowledge/src/graph/sqlite_index.rs, crates/orbit-knowledge/src/service/implementors.rs | Update expected selectors, counts, and comments after the extractor change; simple-name fallback must strip language qualifiers and numeric selector suffixes; the service comment that names Rust impl selector shapes must mention <Type> / <Type as Trait>. |
| Benchmarks and historical transcripts | benchmarks/graph/**, benchmarks/identity-key/**, benchmarks/knowledge-command-equivalence/run.sh | Do not rewrite immutable historical transcripts just for selector churn. Rebaseline active benchmark scripts or expected outputs only when they assert current selector strings. |
Out-of-repo state such as per-user task context files, run traces, and copied selectors is not migratable. The compatibility path is rediscovery through current graph query responses.
5. Migration
Section titled “5. Migration”The implementation that lands this spec bumps GRAPH_SQLITE_INDEX_SCHEMA_VERSION so readers that only understand the old SQLite sidecar return Ok(None) and use the JSON/object fallback instead. Existing graph directories keep working through fallback until the next full graph build. The rebuild emits new unique leaf IDs and a current SQLite sidecar.
There is no selector rewrite migration. Old selectors keep resolving only as long as the old graph snapshot is the one being read. After rebuild, affected selectors use the new qualified-name forms above.
This closes the duplicate-leaf carve-out documented in [T20260510-1], keeps the [T20260510-2] child-table shape valid because parent-child names are patched after finalization, and supersedes the [T20260510-6] overview undercount symptom with a structural fix.
6. Validation
Section titled “6. Validation”The implementation must ship all of these checks:
- Per-language extractor fixtures assert that
leaf_ids.iter().collect::<HashSet<_>>().len() == leaf_ids.len()for Python, Rust, Java, and TypeScript natural duplicate shapes. - Sister-language pre-flight fixtures assert the same invariant for Go, Kotlin, Ruby, JavaScript, C, and C#.
- SQL/fallback equivalence tests compare leaves as multisets across the duplicate fixtures, mirroring the [T20260510-2] equivalence-test shape.
- A schema-version test verifies that old SQLite sidecars fall back gracefully and new sidecars produce the same leaves as the navigator.
- Live corpora verification confirms
orbit.graph.overviewtotals match build-log leaf counts in both SQL and forced-fallback modes for python-medium and rust-medium.
Agent Signature
Section titled “Agent Signature”Drafted by Codex (gpt-5.5) on 2026-05-10 for [T20260510-7].
Task References
Section titled “Task References”- [T20260509-71] — SQLite reader version contract: unsupported sidecar versions return
Ok(None)and fall back. - [T20260510-1] — Search equivalence work that documented the duplicate-leaf carve-out this spec removes.
- [T20260510-2] — Show/children equivalence work whose child-table shape remains valid after finalized-name patching.
- [T20260510-6] — Overview undercount symptom superseded by this structural leaf-ID fix.
- [T20260510-7] — Make leaf IDs unique across extractors and preserve every symbol across SQL and fallback paths.
Resolve any task above with orbit task show <ID> or git log --grep=<ID>.