Agent Families — Decisions

Mirrored from docs/design/agent-families/4_decisions.md. Edit the source document in the repository, not this generated page.

ADR entries are append-only and ordered ascending by global ID. New entries are allocated via orbit.adr.add before the local heading is written — see ../CONVENTIONS.md §4 and the orbit-adr skill. The local heading uses the allocated global ID verbatim.

Historical note: prior to 2026-05-17, ADR-0154 / ADR-0155 / ADR-0156 in this file were authored with locally-invented IDs (ADR-0152 / ADR-0153 / ADR-0154) that did not match the global store. They were re-allocated through orbit.adr.add per [ORB-00098]; the original local IDs survive as legacy_ids so prior citations still resolve.

ADR-0151 — Add Grok (xAI) as a fourth peer agent family

Status: Accepted · 2026-05-16 · [ORB-00042] · [ORB-00043] · [ORB-00044] · [ORB-00045] · [ORB-00046] · [ORB-00052] · legacy_id: agent-families/ADR-0151

Decision: Treat “grok” as a full peer alongside claude, codex, and gemini.

Key Changes:

Extended agent_from_model(), infer_agent_family_from_model(), all_agent_families(), resolve_agent_model_pair(), and provider_from_model()
Added grok.yaml executor skeleton and sandbox support (tasks ORB-00044, ORB-00045)
Added Grok provider to orbit mcp init (ORB-00046)
Created this design doc folder (ORB-00052)

See full ADR-0151 for context, alternatives considered, and cost analysis.

ADR-0154 — Replace `[agent.<role>]` tables with named `[crews.*]` registry

Status: Accepted · 2026-05 · [ORB-00058] · legacy_id: agent-families/ADR-0152

Context. Workspace config previously selected planner, implementer, and reviewer models with three top-level [agent.<role>] tables, while task execution had no durable way to request a different lineup. Layering a new registry beside the old role tables would have forced Orbit to validate and explain two schemas for the same decision.

Decision. Replace the role-keyed config shape wholesale with named [crews.<name>] entries and [workflow].default_crew. A task may store crew, and a run may override it with CLI/tool input; precedence is CLI override, then task field, then workspace default.

Consequences.

“Crew” was chosen over “profile” because profiles sound user-scoped, and over “pair” because the lineup contains planner, implementer, and reviewer.
Run records persist the resolved crew plus the three role model strings so audit trails survive later config edits.
The v2 agent_loop dispatch path reads role models from the crew registry (crates/orbit-core/src/runtime/engine/environment_host.rs). Scoreboard and friction projections use family identity after ADR-0156; exact model strings remain visible through resolved crew/run configuration.
Deferred: duel-plan participant configuration, per-role task overrides, and planner-vs-executor workflow split.
Cost: old workspaces with only [agent.planner], [agent.implementer], and [agent.reviewer] must migrate before config load succeeds.

ADR-0155 — Scope duel-plan candidate and model overrides to `[duel]`

Status: Accepted · 2026-05 · [ORB-00072] · legacy_id: agent-families/ADR-0153

Context. Duel-plan previously walked the full all_agent_families() registry and used the same model-pair resolution chain as non-duel callers. That made local CLI availability load-bearing for every supported family and made reproducible planning-duel scoreboards depend on executor YAML state.

Decision. Add a workspace [duel] section with candidates as a normalized subset of all_agent_families() and [duel.models] as flat orchestrator-only per-family overrides. Duel role selection reads those values through RuntimeHost; non-duel callers continue to use executor overrides and builtin model pairs.

Consequences.

Duel permutations remain dynamic but require at least three distinct configured families.
[duel.models] wins only for duel role-model lookup; helper models and non-duel model identity are unchanged.
The crew registry remains separate from duel participant selection. Reusing [crews.*] for duels was rejected because duels need a family pool, not a fixed planner/implementer/reviewer lineup.

ADR-0156 — Collapse agent identity to family and move model strings to configuration

Status: Accepted · 2026-05 · [ORB-00080] · legacy_id: agent-families/ADR-0154

Context. Planning-duel artifacts and scoreboards compared model strings even though model names drift across aliases, CLI shorthand, and self-reported tool payloads. A Gemini planner configured as pro could produce an artifact stamped gemini-3.1-pro; both values describe the same family but failed equality checks. Alias tables (resolve_agent_model_pair*, matches_model_alias, canonical_model_for_agent) treated the symptom and grew with every provider change.

Decision. Family is identity, model is configuration, and slot is role. Orbit identity surfaces use exactly codex, claude, gemini, or grok. Planning-duel assignments persist family; planner_a, planner_b, and arbiter are explicit slots used in artifact paths and signatures. Exact model strings stay in crew config, [duel.models], CLI invocation translation, and resolved-crew run records.

Consequences.

New planning-duel artifacts are written as planning-duel/{slot}.md and signed *authored by: {family} / {slot}*; historical model-path artifacts remain a legacy read concern.
Runtime tool boundaries treat envelope identity as authoritative. Agent-supplied model fields are overwritten with the canonical family before persistence/comparison so self-report drift cannot affect validation.
Scoreboard and friction projections are family-keyed (by_family) when they answer “who actually ran?”. Resolved-crew projections remain the source for “who was selected?” because they describe configured routing.
The legacy resolver and alias-canonicalization surfaces are deleted from production code. infer_agent_family_from_model remains for legacy artifact recovery and CLI invocation translation.
ORB-00079 and ORB-00071 are superseded by this structural identity change.

ADR-0167 — Favor claude (opus) for planner role on planning duels and design-shaped plans

Status: Proposed · 2026-05-18 · cites AO-002 · (acceptance pending a related task — see lifecycle note in adr-artifact/2_design.md §5)

Context. AO-002 (Instruction surface shapes plan output, not tool selection) closed on 2026-05-18 after four experiments spanning four Gemini-as-planner implementation/audit duels and one 4-model cross-read on an identical UX-design task. Three observations recurred across the thread:

Gemini ranked last on plan depth on every task shape tested (implementation, audit, refactor, UX-taste), losing every duel it played as planner. Arbiter rationales consistently cited graph-discovery gaps, hallucinated symbols, generic findings, and thin ADR content.
Claude won every duel it entered as planner in the window, including the UX-redesign duel (ORB-00154), where claude’s metric-major layout call was the differentiating taste signal. Codex placed in the middle — thorough plans without bold design calls, ranked above grok and gemini but below claude on plan depth in the 4-way UX comparison.
Instruction-surface levers (memory file or per-run prompt) shifted Gemini’s output structure (verification commands, severity-tagged findings, section format) but not its tool selection. Three rounds of prompt strengthening did not change the duel outcome.

AO-002 scope: planning-duel plan quality on the Orbit codebase, single window in May 2026, ad-hoc task selection, model versions not held constant (gemini-2.5-pro on two runs, gemini-3.1-pro-preview on three). The thread is explicitly framed as decision-grade-for-us, not an objective ranking.

Decision. Until new evidence warrants otherwise, default the planner role on planning duels and design-shaped plans to claude (currently claude-opus-4-x per workspace crew config). This applies to both implementation-shaped planning and UX / design-shaped planning. The choice is provisional and bound to AO-002’s evidence window; AO-002’s open questions (post-rubric run on gemini-3.1-pro-preview against an implementation task, plan-depth response to a volume-rubric clause, new model-family releases) are the natural triggers for revisiting.

Consequences.

Crew defaults that select a planner family should favor claude when no per-task override is set. [duel.candidates] continues to include all four families so duels remain genuine plan-vs-plan comparisons; this ADR governs single-planner selection and tie-breaking guidance, not duel participation.
Arbiter selection is unchanged. AO-002 finding D observed that Gemini-as-arbiter performed adequately even where Gemini-as-planner did not; reading completed work is a different cognitive task than producing it.
Implementer selection is unchanged. AO-002’s scope is planning-duel plan quality only; implementer rankings live in a separate observation thread when there’s data.
Re-evaluation triggers: (1) a new Gemini-family release with a stable model alias; (2) the missing AO-002 experimental cell (post-rubric Gemini run on gemini-3.1-pro-preview against an implementation-shaped task) producing a counter-finding; (3) a non-Orbit codebase producing a different ranking; (4) a same-task within-model-version repeat that flips the outcome. Any of these reopens AO-002 or spawns a follow-up observation that this ADR must be reconciled against.
Cost: surrendering planning diversity. Defaulting to one family forfeits the safety net of cross-family disagreement, concentrates dependency on a single provider, and risks anchoring on claude’s distinctive design patterns (e.g. the metric-major preference observed in ORB-00154) as if they were universally correct. The duel mechanism partially mitigates this when explicitly invoked: a duel still gathers multiple plans before selecting one.

Task References

ORB-00042: Onboard Grok (xAI) as a first-class supported agent family.
ORB-00058: Introduce per-task crew override for agent model selection.
ORB-00072: Make duel-plan agent pool and per-family model configurable via [duel].
ORB-00080: Collapse agent identity to family; isolate model strings to invocation surface.

Resolve any task above with orbit task show <ID> or git log --grep=<ID>.

Agent Families — Decisions

ADR-0151 — Add Grok (xAI) as a fourth peer agent family

ADR-0154 — Replace [agent.<role>] tables with named [crews.*] registry

ADR-0155 — Scope duel-plan candidate and model overrides to [duel]

ADR-0156 — Collapse agent identity to family and move model strings to configuration

ADR-0167 — Favor claude (opus) for planner role on planning duels and design-shaped plans

Task References

ADR-0154 — Replace `[agent.<role>]` tables with named `[crews.*]` registry

ADR-0155 — Scope duel-plan candidate and model overrides to `[duel]`