Skip to content

docs: enterprise test plan with comprehensive gap analysis#63

Merged
kovtcharov-amd merged 11 commits into
mainfrom
docs/enterprise-test-plan
May 13, 2026
Merged

docs: enterprise test plan with comprehensive gap analysis#63
kovtcharov-amd merged 11 commits into
mainfrom
docs/enterprise-test-plan

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

@kovtcharov-amd kovtcharov-amd commented May 13, 2026

Summary

Deep code analysis of the Claudia codebase (~41,400 LOC) identifying testing gaps and defining a phased test plan to reach enterprise-grade reliability.

Key findings

  • 399 existing tests across 10 modules — all passing
  • ~25,000+ LOC across 16 backend modules and 35 frontend components have zero test coverage
  • No integration tests, E2E tests, or performance tests exist
  • 11 known production bugs need regression tests

Changes in this PR

  • docs/plans/enterprise-test-plan.md — comprehensive 6-phase test plan defining ~685 new tests
  • Accurate per-module LOC and test counts (verified against codebase)
  • Known Bug Regression Tests section with specific production failures

Revisions applied during review

  • Fixed per-module test counts (several were inaccurate)
  • Corrected total codebase size from ~31,600 to ~41,400 LOC
  • Added 5 missing backend modules (mobile-page, voice-supervisor, voice-agent-page, opencode-backend, usage-reporter)
  • Added electron package (523 LOC)
  • Corrected frontend component count from ~8,000 to 13,014 LOC
  • Added Known Bug Regression Tests section with 11 production bugs
  • Removed non-doc code changes (those belong in PR fix: buffer PTY output during resize to prevent text corruption #61)

Test plan phases

Phase Tests Priority
Phase 1: Backend Unit Tests ~315 P0
Phase 2: Server Integration ~160 P0
Phase 3: Frontend Tests ~95 P1
Phase 4: Integration & E2E ~35 P1
Phase 5: Robustness & Security ~45 P2
Phase 6: Performance ~10 P2
Known Bug Regressions ~25 P0
Total ~685 new

Ovtcharov and others added 9 commits May 11, 2026 15:40
showBrowseButton was hardcoded to false, preventing users from using
the native folder picker. Set to true so the Browse button appears
and opens the OS folder dialog via the existing WebSocket handler.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add drag-and-drop support on the workspace panel: drop a folder from
  the OS file explorer to add it as a workspace. In Electron, the full
  path is extracted directly. In the browser, opens the path input modal.
- Fix Browse button: use REST endpoint instead of blocking WebSocket
  execFileSync which froze the server. Only show Browse in Electron mode
  where the native dialog works reliably.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: syncWorkspaceMcpConfigs wrote .mcp.json to the claudia
project root on every startup, triggering tsx watch to restart the
server in an infinite loop. Now skips syncing to Claudia's own
workspace directory.

Also:
- Fix Browse button: add -STA flag for Windows PowerShell folder dialog,
  remember last browsed path across sessions, kdialog fallback on Linux
- Re-enable Browse button in Add Workspace dialog
- Fix drag-and-drop: only activate for external OS drops (Files type),
  internal workspace reordering drags pass through unaffected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 512KB caps were too aggressive — users lost scrollback history
after rotation. Disk files now cap at 10MB (rotate keeping 5MB tail),
and clients receive up to 2MB of history for scrollback. Memory loading
on reconnect remains capped at 512KB to prevent OOM.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a scrollbar appears/disappears during active output, the container
width changes by ~15px, flipping cols by 1-2. This caused Claude Code's
TUI to re-render at alternating widths, producing overlapping garbled
text.

Fix:
- Skip resize events where cols changed by <= 2 (scrollbar noise)
- Track last sent cols/rows to deduplicate
- Increase ResizeObserver debounce from 50ms to 150ms
- Use fitTerminal() (fit + refresh) to clear artifacts after resize

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
References #59

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After sending a resize to the backend, buffer incoming PTY output for
250ms. This gives the PTY time to process SIGWINCH and start rendering
at the new width. Without buffering, output rendered at the OLD width
arrives at xterm (already at the NEW width), causing ANSI cursor
positioning commands to misalign and produce garbled overlapping text.

The buffer accumulates output chunks during the transition, then flushes
them all at once after the PTY has caught up. History tracking refs are
updated even during buffering so scroll-up loading stays consistent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deep analysis of the Claudia codebase identified ~15,000+ LOC across 12 backend
modules and all frontend components with zero test coverage. The plan defines
6 phases covering ~660 new tests to reach enterprise-grade reliability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three corruption fixes based on deep code review:

1. Buffer task:output during task:restore processing. Between term.reset()
   and term.write(history) completion, live output was being interleaved
   with the history replay, causing garbled overlapping text on every
   task click. Output is now queued and flushed after history write
   completes.

2. Same fix for loadEarlierChunkIfNeeded scroll-up rewrites — live output
   was interleaving with the reset+rewrite cycle.

3. Fix UTF-8 multi-byte character splitting in readTaskHistoryRange. When
   reading at arbitrary byte offsets, the read could start mid-character
   (e.g., byte 2 of a 3-byte '─' char), producing Unicode replacement
   characters. Now skips leading continuation bytes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ovtcharov added 2 commits May 13, 2026 10:41
…plan

# Conflicts:
#	frontend/src/components/TerminalView.tsx
…ssions

- Fix per-module test counts (several were fabricated, e.g., token-parser
  claimed 46 actual 30, validation claimed 57 actual 28)
- Correct codebase size: 41,400 LOC (was 31,600 — 31% undercount)
- Add 5 missing backend modules: mobile-page (2,088 LOC), voice-supervisor
  (424), voice-agent-page (642), opencode-backend corrected to 853 (was ~400),
  usage-reporter (68)
- Add electron package (523 LOC)
- Correct frontend component count: 35 components / 13,014 LOC (was ~8,000)
- Add Known Bug Regression Tests section with 11 production bugs
- Remove non-doc code changes (belong in separate PR #61)
@kovtcharov-amd kovtcharov-amd changed the title docs: enterprise test plan with gap analysis docs: enterprise test plan with comprehensive gap analysis May 13, 2026
@kovtcharov-amd kovtcharov-amd enabled auto-merge (squash) May 13, 2026 18:11
@kovtcharov-amd kovtcharov-amd merged commit d8a5602 into main May 13, 2026
3 checks passed
@kovtcharov-amd kovtcharov-amd deleted the docs/enterprise-test-plan branch May 13, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants