Skip to content

time travel debug - session replay and rewind for goose #6829

@jeffa-block

Description

@jeffa-block

please explain the motivation behind the feature request.

debugging complex workflows in goose can be challenging when issues occur mid-session. currently, if something goes wrong during a multi-step process, users must restart from scratch or manually reconstruct the context. this creates friction in troubleshooting and learning from mistakes.

use cases:

  1. session replay: review exactly what happened in a previous session to understand why a task failed
  2. rewind & retry: go back to a specific point in the conversation and try a different approach
  3. learning & training: study successful sessions to understand best practices
  4. debugging recipes: test and refine recipes by replaying them with different inputs
  5. error investigation: understand what led to an error by reviewing the full context

does this feature solve a particular problem you have been experiencing?

yes - when working on complex tasks (multi-file edits, api integrations, automation recipes), errors often occur several steps into the process. currently, the only option is to start over, which wastes time and makes it difficult to identify the exact point of failure.

what opportunities or use cases would be unlocked with this feature?

  • faster debugging: quickly identify where things went wrong without manual reconstruction
  • experimentation: try different approaches from the same starting point
  • documentation: create tutorials by replaying successful sessions
  • quality assurance: review sessions before sharing recipes or workflows
  • learning: study how goose solved complex problems in past sessions

describe the solution you'd like

a "time travel debug" feature with three core capabilities:

1. session replay

  • view complete history of any past session
  • see all messages, tool calls, and outputs in sequence
  • search/filter by keywords, tools used, or time range
  • export session transcripts for documentation

2. rewind & fork

  • select any point in current or past session
  • "rewind" to that point and continue from there
  • creates a new branch/fork of the session
  • original session remains intact for reference

3. session comparison

  • compare two sessions side-by-side
  • highlight differences in approach or outcomes
  • useful for a/b testing different prompts or recipes

user interface:

┌─────────────────────────────────────────┐
│ session timeline                        │
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│ 0:00  started session                   │
│ 0:15  ✓ file read: config.yaml         │
│ 0:30  ✓ code generation                │
│ 0:45  ✗ error: syntax error            │ ← rewind to here
│ 1:00  current position                  │
│                                         │
│ [◀ rewind] [▶ replay] [⎋ fork]        │
└─────────────────────────────────────────┘

technical implementation:

option 1: local session storage

  • store complete session history in local database
  • lightweight replay engine reads from storage
  • no cloud dependency, privacy-first
  • estimated: 4-6 weeks implementation

option 2: in-memory snapshots

  • take snapshots at key points (tool calls, errors)
  • faster but limited history depth
  • lower storage overhead
  • estimated: 2-3 weeks implementation

option 3: hybrid approach ⭐ recommended

  • in-memory for current session (fast rewind)
  • persistent storage for session history (full replay)
  • best of both worlds
  • estimated: 5-7 weeks implementation

architecture:

session manager
    ↓
┌─────────────┬──────────────┬─────────────┐
│ snapshot    │ replay       │ comparison  │
│ engine      │ engine       │ engine      │
└─────────────┴──────────────┴─────────────┘
    ↓               ↓               ↓
┌─────────────────────────────────────────┐
│ session storage (sqlite/leveldb)        │
└─────────────────────────────────────────┘

key features:

  • automatic snapshots before each tool call
  • manual snapshot creation (bookmark important moments)
  • session search by content, tools, or outcomes
  • privacy controls (disable for sensitive sessions)
  • export to markdown/json for sharing

describe alternatives you've considered

alternative 1: manual session logs

  • users manually save session transcripts
  • review logs in external editor
  • why insufficient: no interactive replay, can't rewind and retry, time-consuming

alternative 2: copy/paste context

  • copy previous messages into new session
  • manually reconstruct context
  • why insufficient: loses tool outputs, error-prone, doesn't scale for complex sessions

alternative 3: recipe versioning

  • save different versions of recipes
  • test each version separately
  • why insufficient: doesn't help with ad-hoc sessions, no visual timeline, can't compare approaches

alternative 4: external screen recording

  • record screen while using goose
  • review video later
  • why insufficient: can't interact with recording, large file sizes, no searchability

additional context

related features:

similar tools:

  • git time travel (git reflog, git bisect)
  • browser devtools timeline
  • ide debugger step-back functionality
  • jupyter notebook cell re-execution

implementation complexity:

  • low: session replay (read-only view of past sessions)
  • medium: rewind & fork (requires state management)
  • high: session comparison (diff algorithms, ui complexity)

privacy considerations:

  • sessions may contain sensitive data (api keys, personal info)
  • need opt-in/opt-out controls
  • local-only storage by default
  • clear data retention policies

testing checklist:

  • replay sessions with 100+ messages
  • rewind to different points and verify state
  • fork sessions and ensure independence
  • compare sessions with different outcomes
  • test with various tool combinations
  • verify performance with large session history
  • test privacy controls (disable/enable)
  • export sessions in multiple formats

resources:


priority: p1 (high user value, moderate complexity)

estimated impact:

  • reduces debugging time by 50-70%
  • enables experimentation without fear of losing progress
  • improves learning curve for new users
  • unlocks advanced use cases (recipe testing, documentation)

success metrics:

  • 40%+ of users replay at least one session per week
  • average debugging time reduced by 10+ minutes per issue
  • 20%+ increase in recipe experimentation

[x] i have verified this does not duplicate an existing feature request

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions