time travel debug - session replay and rewind for goose

**please explain the motivation behind the feature request.**

debugging complex workflows in goose can be challenging when issues occur mid-session. currently, if something goes wrong during a multi-step process, users must restart from scratch or manually reconstruct the context. this creates friction in troubleshooting and learning from mistakes.

**use cases:**
1. **session replay**: review exactly what happened in a previous session to understand why a task failed
2. **rewind & retry**: go back to a specific point in the conversation and try a different approach
3. **learning & training**: study successful sessions to understand best practices
4. **debugging recipes**: test and refine recipes by replaying them with different inputs
5. **error investigation**: understand what led to an error by reviewing the full context

**does this feature solve a particular problem you have been experiencing?**

yes - when working on complex tasks (multi-file edits, api integrations, automation recipes), errors often occur several steps into the process. currently, the only option is to start over, which wastes time and makes it difficult to identify the exact point of failure.

**what opportunities or use cases would be unlocked with this feature?**

- **faster debugging**: quickly identify where things went wrong without manual reconstruction
- **experimentation**: try different approaches from the same starting point
- **documentation**: create tutorials by replaying successful sessions
- **quality assurance**: review sessions before sharing recipes or workflows
- **learning**: study how goose solved complex problems in past sessions

---

**describe the solution you'd like**

a "time travel debug" feature with three core capabilities:

**1. session replay**
- view complete history of any past session
- see all messages, tool calls, and outputs in sequence
- search/filter by keywords, tools used, or time range
- export session transcripts for documentation

**2. rewind & fork**
- select any point in current or past session
- "rewind" to that point and continue from there
- creates a new branch/fork of the session
- original session remains intact for reference

**3. session comparison**
- compare two sessions side-by-side
- highlight differences in approach or outcomes
- useful for a/b testing different prompts or recipes

**user interface:**

    ┌─────────────────────────────────────────┐
    │ session timeline                        │
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
    │ 0:00  started session                   │
    │ 0:15  ✓ file read: config.yaml         │
    │ 0:30  ✓ code generation                │
    │ 0:45  ✗ error: syntax error            │ ← rewind to here
    │ 1:00  current position                  │
    │                                         │
    │ [◀ rewind] [▶ replay] [⎋ fork]        │
    └─────────────────────────────────────────┘

**technical implementation:**

**option 1: local session storage**
- store complete session history in local database
- lightweight replay engine reads from storage
- no cloud dependency, privacy-first
- estimated: 4-6 weeks implementation

**option 2: in-memory snapshots**
- take snapshots at key points (tool calls, errors)
- faster but limited history depth
- lower storage overhead
- estimated: 2-3 weeks implementation

**option 3: hybrid approach** ⭐ recommended
- in-memory for current session (fast rewind)
- persistent storage for session history (full replay)
- best of both worlds
- estimated: 5-7 weeks implementation

**architecture:**

    session manager
        ↓
    ┌─────────────┬──────────────┬─────────────┐
    │ snapshot    │ replay       │ comparison  │
    │ engine      │ engine       │ engine      │
    └─────────────┴──────────────┴─────────────┘
        ↓               ↓               ↓
    ┌─────────────────────────────────────────┐
    │ session storage (sqlite/leveldb)        │
    └─────────────────────────────────────────┘

**key features:**
- automatic snapshots before each tool call
- manual snapshot creation (bookmark important moments)
- session search by content, tools, or outcomes
- privacy controls (disable for sensitive sessions)
- export to markdown/json for sharing

---

**describe alternatives you've considered**

**alternative 1: manual session logs**
- users manually save session transcripts
- review logs in external editor
- **why insufficient**: no interactive replay, can't rewind and retry, time-consuming

**alternative 2: copy/paste context**
- copy previous messages into new session
- manually reconstruct context
- **why insufficient**: loses tool outputs, error-prone, doesn't scale for complex sessions

**alternative 3: recipe versioning**
- save different versions of recipes
- test each version separately
- **why insufficient**: doesn't help with ad-hoc sessions, no visual timeline, can't compare approaches

**alternative 4: external screen recording**
- record screen while using goose
- review video later
- **why insufficient**: can't interact with recording, large file sizes, no searchability

---

**additional context**

**related features:**
- session management (#6509 - deep links could enable sharing specific session points)
- recipe testing (developer experience tools proposal)
- context persistence (helps with session continuity)

**similar tools:**
- git time travel (`git reflog`, `git bisect`)
- browser devtools timeline
- ide debugger step-back functionality
- jupyter notebook cell re-execution

**implementation complexity:**
- **low**: session replay (read-only view of past sessions)
- **medium**: rewind & fork (requires state management)
- **high**: session comparison (diff algorithms, ui complexity)

**privacy considerations:**
- sessions may contain sensitive data (api keys, personal info)
- need opt-in/opt-out controls
- local-only storage by default
- clear data retention policies

**testing checklist:**
- [ ] replay sessions with 100+ messages
- [ ] rewind to different points and verify state
- [ ] fork sessions and ensure independence
- [ ] compare sessions with different outcomes
- [ ] test with various tool combinations
- [ ] verify performance with large session history
- [ ] test privacy controls (disable/enable)
- [ ] export sessions in multiple formats

**resources:**
- session storage best practices: https://developer.mozilla.org/en-US/docs/Web/API/Window/sessionStorage
- state management patterns: https://redux.js.org/understanding/thinking-in-redux/three-principles
- time-travel debugging: https://elm-lang.org/news/time-travel-made-easy

---

**priority: p1** (high user value, moderate complexity)

**estimated impact:**
- reduces debugging time by 50-70%
- enables experimentation without fear of losing progress
- improves learning curve for new users
- unlocks advanced use cases (recipe testing, documentation)

**success metrics:**
- 40%+ of users replay at least one session per week
- average debugging time reduced by 10+ minutes per issue
- 20%+ increase in recipe experimentation

---

[x] i have verified this does not duplicate an existing feature request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time travel debug - session replay and rewind for goose #6829

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

time travel debug - session replay and rewind for goose #6829

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions