-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
please explain the motivation behind the feature request.
debugging complex workflows in goose can be challenging when issues occur mid-session. currently, if something goes wrong during a multi-step process, users must restart from scratch or manually reconstruct the context. this creates friction in troubleshooting and learning from mistakes.
use cases:
- session replay: review exactly what happened in a previous session to understand why a task failed
- rewind & retry: go back to a specific point in the conversation and try a different approach
- learning & training: study successful sessions to understand best practices
- debugging recipes: test and refine recipes by replaying them with different inputs
- error investigation: understand what led to an error by reviewing the full context
does this feature solve a particular problem you have been experiencing?
yes - when working on complex tasks (multi-file edits, api integrations, automation recipes), errors often occur several steps into the process. currently, the only option is to start over, which wastes time and makes it difficult to identify the exact point of failure.
what opportunities or use cases would be unlocked with this feature?
- faster debugging: quickly identify where things went wrong without manual reconstruction
- experimentation: try different approaches from the same starting point
- documentation: create tutorials by replaying successful sessions
- quality assurance: review sessions before sharing recipes or workflows
- learning: study how goose solved complex problems in past sessions
describe the solution you'd like
a "time travel debug" feature with three core capabilities:
1. session replay
- view complete history of any past session
- see all messages, tool calls, and outputs in sequence
- search/filter by keywords, tools used, or time range
- export session transcripts for documentation
2. rewind & fork
- select any point in current or past session
- "rewind" to that point and continue from there
- creates a new branch/fork of the session
- original session remains intact for reference
3. session comparison
- compare two sessions side-by-side
- highlight differences in approach or outcomes
- useful for a/b testing different prompts or recipes
user interface:
┌─────────────────────────────────────────┐
│ session timeline │
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│ 0:00 started session │
│ 0:15 ✓ file read: config.yaml │
│ 0:30 ✓ code generation │
│ 0:45 ✗ error: syntax error │ ← rewind to here
│ 1:00 current position │
│ │
│ [◀ rewind] [▶ replay] [⎋ fork] │
└─────────────────────────────────────────┘
technical implementation:
option 1: local session storage
- store complete session history in local database
- lightweight replay engine reads from storage
- no cloud dependency, privacy-first
- estimated: 4-6 weeks implementation
option 2: in-memory snapshots
- take snapshots at key points (tool calls, errors)
- faster but limited history depth
- lower storage overhead
- estimated: 2-3 weeks implementation
option 3: hybrid approach ⭐ recommended
- in-memory for current session (fast rewind)
- persistent storage for session history (full replay)
- best of both worlds
- estimated: 5-7 weeks implementation
architecture:
session manager
↓
┌─────────────┬──────────────┬─────────────┐
│ snapshot │ replay │ comparison │
│ engine │ engine │ engine │
└─────────────┴──────────────┴─────────────┘
↓ ↓ ↓
┌─────────────────────────────────────────┐
│ session storage (sqlite/leveldb) │
└─────────────────────────────────────────┘
key features:
- automatic snapshots before each tool call
- manual snapshot creation (bookmark important moments)
- session search by content, tools, or outcomes
- privacy controls (disable for sensitive sessions)
- export to markdown/json for sharing
describe alternatives you've considered
alternative 1: manual session logs
- users manually save session transcripts
- review logs in external editor
- why insufficient: no interactive replay, can't rewind and retry, time-consuming
alternative 2: copy/paste context
- copy previous messages into new session
- manually reconstruct context
- why insufficient: loses tool outputs, error-prone, doesn't scale for complex sessions
alternative 3: recipe versioning
- save different versions of recipes
- test each version separately
- why insufficient: doesn't help with ad-hoc sessions, no visual timeline, can't compare approaches
alternative 4: external screen recording
- record screen while using goose
- review video later
- why insufficient: can't interact with recording, large file sizes, no searchability
additional context
related features:
- session management (deep link support for starting new goose desktop sessions #6509 - deep links could enable sharing specific session points)
- recipe testing (developer experience tools proposal)
- context persistence (helps with session continuity)
similar tools:
- git time travel (
git reflog,git bisect) - browser devtools timeline
- ide debugger step-back functionality
- jupyter notebook cell re-execution
implementation complexity:
- low: session replay (read-only view of past sessions)
- medium: rewind & fork (requires state management)
- high: session comparison (diff algorithms, ui complexity)
privacy considerations:
- sessions may contain sensitive data (api keys, personal info)
- need opt-in/opt-out controls
- local-only storage by default
- clear data retention policies
testing checklist:
- replay sessions with 100+ messages
- rewind to different points and verify state
- fork sessions and ensure independence
- compare sessions with different outcomes
- test with various tool combinations
- verify performance with large session history
- test privacy controls (disable/enable)
- export sessions in multiple formats
resources:
- session storage best practices: https://developer.mozilla.org/en-US/docs/Web/API/Window/sessionStorage
- state management patterns: https://redux.js.org/understanding/thinking-in-redux/three-principles
- time-travel debugging: https://elm-lang.org/news/time-travel-made-easy
priority: p1 (high user value, moderate complexity)
estimated impact:
- reduces debugging time by 50-70%
- enables experimentation without fear of losing progress
- improves learning curve for new users
- unlocks advanced use cases (recipe testing, documentation)
success metrics:
- 40%+ of users replay at least one session per week
- average debugging time reduced by 10+ minutes per issue
- 20%+ increase in recipe experimentation
[x] i have verified this does not duplicate an existing feature request