Skip to content

Storage path uses directory basename, not absolute path (worktree collision risk) #52

@fazleelahhee

Description

@fazleelahhee

Split out from #48. The git-hooks side of that issue is fixed in #53; this is the remaining latent collision issue.

Symptom

Index storage resolves to Path(config.storage_path) / project_dir.name in 30+ call sites (search the codebase for Path(config.storage_path) / project_name). Two checkouts with the same basename — for example ~/work/myrepo and ~/scratch/myrepo, or two git worktrees both ending in myrepo — silently share the same ~/.cce/projects/myrepo/ directory. Their LanceDB index, embedding cache, and memory.db get clobbered into one.

Doesn't bite typical worktree users because they tend to give worktrees suffixed names (myrepo-feat-x, myrepo-bugfix-y), but when it hits, the failure is silent and corrupts state.

Proposed fix

Adopt the existing <basename>-<6hex> slug pattern from src/context_engine/editors.py:145-168 (_project_slug), where the 6-hex hash is derived from Path.resolve(). Same scheme already solves the same collision risk for editor configs — just hasn't been applied to storage paths.

Steps

  1. New helper in a shared spot (likely services.py or a new paths.py):
    def project_storage_dir(config, project_dir: Path) -> Path:
        \"\"\"Return the per-project storage directory for \`project_dir\`. Stable
        across re-runs (uses Path.resolve before hashing) and unique across
        checkouts that share a basename.\"\"\"
  2. Migrate call sites — search `Path(config.storage_path) / project_name` and `_safe_cwd().name` and route every reader through the helper. (cli.py: ~25 sites, plus pipeline.py, mcp_server.py, dashboard/server.py, serve_http.py.)
  3. One-time migration when the helper is called and the legacy `/` directory exists but the new `-/` does not:
    • If a single legacy directory exists, rename it to the new slug.
    • If multiple checkouts share a basename and a legacy directory exists, only the first one to call the helper claims it (via a marker file written into it on first migration); the others get fresh storage. This is the only edge case where any existing user re-indexes from scratch.
    • Migration must NOT touch existing `-/` directories.
  4. Tests:
    • Two `Path` inputs with different absolute paths but identical basename produce different slugs.
    • Two re-resolutions of the same `Path` produce the same slug.
    • Legacy directory rename happens on first call, no-op on subsequent calls.
    • Helper falls back gracefully if `storage_path` doesn't exist yet.
    • Existing single-checkout users keep their data (resolved path → unique slug → migration moves the legacy dir).
  5. Docs: brief note in `docs/wiki/Configuration.md` storage section explaining the layout change and the migration semantics.

Out of scope

Risk

The only behavior change for existing users is one rename of `/.cce/projects//` → `/.cce/projects/-/` on first use after upgrade. As long as the migration is idempotent and conservatively only touches the legacy form, no data is lost.

Why now

Two real signals make this worth doing rather than leaving as latent:

  1. The `cce status` command lists project directories under `storage_path`, so a basename collision today shows one merged project instead of two — confusing, and easy to misread as missing data.
  2. As more users adopt git worktrees (the GA wave following fix(indexer): resolve git hooks dir via git, not hardcoded .git/hooks #53), the rate at which two of them collide goes up. Better to fix before the failure mode ships in the wild.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions