Groundhog — Vision
Mirrored from
docs/design/groundhog/3_vision.md. Edit the source document in the repository, not this generated page.
This document captures where Groundhog should go next. It starts from the implementation built in [T20260420-0509], [T20260420-0509-2], [T20260420-0509-3], [T20260420-0509-4], [T20260420-0510], and [T20260420-0510-2], and treats everything below as a hypothesis rather than a promise. 2_design.md is the current contract; this file is the place to pressure-test that contract before we harden more of it.
1. Open Questions
Section titled “1. Open Questions”The highest-pressure questions are still deviation-era cleanup, shared-verifier adoption, persistence separation, approval-safe checkpoint materialization, and only-when-useful observability.
1.1 Separate prompt memory from audit record
Section titled “1.1 Separate prompt memory from audit record”Should Groundhog move fully to the intended split between prompt-facing memory and audit-only run state? The current Chronicle plus runner-state pair works, but it makes it too easy for prompt and audit concerns to bleed into each other.
1.2 Unify the runner on the shared verifier
Section titled “1.2 Unify the runner on the shared verifier”The codebase already has a richer checkpoint verifier in crates/orbit-engine/src/checkpoint_verifier.rs. Should the Groundhog runner adopt it wholesale so the activity path and the shared verifier stop drifting?
1.3 Approval-safe checkpoint commits
Section titled “1.3 Approval-safe checkpoint commits”Today success commits land directly on the task branch. Is that the right default forever, or should Groundhog materialize successful checkpoints onto internal refs until a later lifecycle step publishes them?
1.4 Plan source and planner interface
Section titled “1.4 Plan source and planner interface”Should GroundhogSpec keep implicitly reading task.plan, or do we want an explicit plan_source contract and possibly a planner activity that produces checkpoints before execution begins?
1.5 Side-effect memory and irreversible actions
Section titled “1.5 Side-effect memory and irreversible actions”The design wants side-effect summaries to survive. How much of that should enter later prompts, and how strict should Groundhog be about disabling irreversible tools unless an activity explicitly opts in?
1.6 Observability and scoreboards
Section titled “1.6 Observability and scoreboards”Which metrics actually matter for deciding whether Groundhog is working? Candidates include attempts per checkpoint, verifier pass/fail rates, rewind counts, blocked outcomes, and checkpoint-commit materialization latency. We should only add scoreboard surfaces that answer real operational questions.
1.7 Cache contract
Section titled “1.7 Cache contract”The chronicle serializer preserves an append-only prefix property, but the current runtime does not yet operationalize cache breakpoints around Groundhog-specific prompt sections. How much cache behavior should be a real Groundhog contract versus a provider-specific optimization?
1.8 Controlled deviation as a follow-on
Section titled “1.8 Controlled deviation as a follow-on”Older Groundhog drafts leaned heavily on executor-authored deviation. Should that return in a later version as a tightly controlled extension, or is “fail fast, fix the plan outside the runner” the healthier long-term discipline?
1.9 Critic-on-retry as a follow-on
Section titled “1.9 Critic-on-retry as a follow-on”The prior design explored a retry critic to combat cognitive entrenchment. Does Groundhog need that, or should we refuse to add a second agent until the simpler checkpoint + rewind + verifier loop has real failure data behind it?
1.10 Dedicated debug surface
Section titled “1.10 Dedicated debug surface”Should Groundhog expose a read-only chronicle/debug view with checkpoint history, verifier summaries, and retained scratch branches, or is task-artifact inspection enough until the runner proves out?
2. Prior Work
Section titled “2. Prior Work”Groundhog is a synthesis of known ideas, not a novel primitive. The interesting question is whether Orbit’s combination is operationally useful, not whether each component is unprecedented.
2.1 Workflow checkpointing
Section titled “2.1 Workflow checkpointing”- LangGraph treats checkpointing and durable execution as first-class workflow concerns.
- Temporal established the mental model of replayable workflow history and explicit activity boundaries.
- Microsoft Agent Framework has made checkpoints part of mainstream agent workflow vocabulary.
Groundhog borrows the core idea that work should resume from a named boundary, not from freeform conversational state.
2.2 Git-backed rollback
Section titled “2.2 Git-backed rollback”- AgentGit frames multi-agent work in version-control terms.
- Aider keeps agent edits tightly coupled to git and makes undo a first-class experience.
- Plandex leans on git-style plan branches and reviewable diffs.
Groundhog narrows this pattern to one need: checkpoint attempts should be rewindable without asking the agent to clean up after itself.
2.3 Context versioning and branching
Section titled “2.3 Context versioning and branching”- Git Context Controller (GCC) treats agent context itself as something like a branchable, mergeable workspace.
Groundhog’s current v1 direction is intentionally simpler. It wants sequential checkpoints and explicit retry boundaries before it revisits richer context branching.
2.4 Reflection and retry
Section titled “2.4 Reflection and retry”- Reflexion showed that retries plus distilled failure summaries can outperform single-shot execution.
- MAR (Multi-Agent Reflexion) highlighted the risk of cognitive entrenchment when the same agent critiques its own failed plan.
These are directly relevant to Groundhog’s retry loop, even if Groundhog chooses not to ship a retry critic in v1.
2.5 Plan-driven coding agents
Section titled “2.5 Plan-driven coding agents”- Plandex is the closest practical analog in shape: plan-oriented coding work with git-native state and explicit subtask boundaries.
Groundhog differs mainly in scope. It is designed to be a runtime primitive inside Orbit’s task/activity system, not a standalone interactive coding workflow.
3. What May Be Distinctive
Section titled “3. What May Be Distinctive”Soft claims only:
- Checkpoint plans as Orbit task data. Groundhog treats checkpoint structure as part of the task artifact itself, not as ad hoc prompt scaffolding rebuilt every run.
- Git scratch branches plus explicit terminal verbs. The combination of scratch-branch rewind and builtin success/failure closure gives the runtime a crisp place to verify and persist outcomes.
- Success-only memory as the default ambition. The long-term target is not “remember the whole conversation better”; it is “remember only the successful checkpoint summaries and the one failure report that matters right now.”
None of these amount to a research contribution. If Groundhog earns its keep, it will be because the discipline improves Orbit task execution in practice, not because the ingredients are unfamiliar.
4. References
Section titled “4. References”Orbit-internal
Section titled “Orbit-internal”- 1_overview.md — feature purpose and core concepts
- 2_design.md — current implementation
- 4_decisions.md — ADR log
- specs/workspace-snapshot.md — git snapshot and rewind contract
- ../activity-job/2_design.md — activity/job model consumed by the Groundhog runner
External
Section titled “External”- LangGraph — https://langchain-ai.github.io/langgraph/
- Temporal — https://temporal.io/
- Microsoft Agent Framework — https://learn.microsoft.com/
- Reflexion — https://arxiv.org/abs/2303.11366
- MAR: Multi-Agent Reflexion Improves Reasoning Abilities in LLMs — https://arxiv.org/abs/2512.20845
- AgentGit — https://arxiv.org/abs/2511.00628
- Git Context Controller — https://arxiv.org/abs/2508.00031
- Plandex — https://github.com/plandex-ai/plandex
Task References
Section titled “Task References”- [T20260420-0509] — Add Groundhog chronicle serializer and shared Groundhog data types.
- [T20260420-0509-2] — Add structured task plan parsing with typed checkpoints and success criteria.
- [T20260420-0509-3] — Add Groundhog builtin verb tools.
- [T20260420-0509-4] — Add Groundhog workspace snapshots and scratch-branch rewind mechanics.
- [T20260420-0510] — Add the shared runtime checkpoint verifier.
- [T20260420-0510-2] — Add the Groundhog v1 activity runner.
- [T20260430-21] — Shorten Groundhog design docs and fold status-priority notes into numbered docs.
Resolve any task above with
orbit task show <ID>orgit log --grep=<ID>.