fix(core): recurse into submodule files when crawling git repos#4596
fix(core): recurse into submodule files when crawling git repos#4596he-yufeng wants to merge 2 commits into
Conversation
| // versions; newline-delimited output is fine here (index paths cannot contain newlines). | ||
| const trackedArgs = ['--literal-pathspecs', 'ls-files', '--cached']; | ||
| const trackedArgs = [ | ||
| '--literal-pathspecs', |
There was a problem hiding this comment.
[Critical] git ls-files --deleted --recurse-submodules is unsupported by git (fatal: "unsupported mode"). This creates an asymmetry: when a tracked file inside an initialized submodule is deleted from the working tree, --cached --recurse-submodules still lists it (from the submodule's index), but listDeletedTrackedFiles (--deleted) cannot detect the deletion. The deletedSet check in processTrackedFile therefore does not filter it, and the crawler returns a phantom path that doesn't exist on disk.
Impact: File search results include ghost entries for deleted submodule files. Downstream consumers (read file, @-mention completion) will get ENOENT errors.
| '--literal-pathspecs', | |
| const trackedArgs = [ | |
| '--literal-pathspecs', | |
| 'ls-files', | |
| '--cached', | |
| '--recurse-submodules', | |
| ]; | |
| // NOTE: git does not support `--deleted --recurse-submodules` (fatal error). | |
| // Deleted files inside submodules won't appear in deletedSet. | |
| // Consider adding an fs.access() guard for paths under submodule mount points, | |
| // or use `git submodule foreach --recursive 'git ls-files -z --deleted'` | |
| // to collect submodule-internal deleted files. |
Also: When a submodule is registered but not initialized (common after plain git clone), --cached --recurse-submodules -t emits the gitlink mount point (e.g., H vendor/lib) as a cached entry. processTrackedFile only skips status S, but gitlinks appear as H — so a directory path ends up in the file list. Consider filtering gitlink entries or adding a stat guard.
— qwen3.7-max via Qwen Code /review
|
Addressed the submodule edge cases from review in c806c2a. Changes:
Validation:
|
wenshao
left a comment
There was a problem hiding this comment.
No review findings at this commit. Downgraded from Approve to Comment: CI still running. The implementation correctly addresses the previously raised Critical (deleted-file filtering asymmetry handled via lstatSync guard) and Suggestion (gitlink directory entries filtered by isDirectory check). All 47 crawler tests pass. — qwen3.7-max via Qwen Code /review
Verification Report — PR #4596 (Recurse into Submodule Files)Tested by: wenshao 1. Build / Typecheck / Lint
2. Unit Tests —
|
| Run | Tests | Result | Notes |
|---|---|---|---|
| Default timeout (5s) | 47 | 46 pass, 1 timeout | should recurse into tracked submodules takes ~5.4s, exceeds default 5s |
| Extended timeout (30s) | 47 | 47 pass | All tests pass including the 3 new submodule tests |
New tests added by this PR:
| Test | Type | Result | Time |
|---|---|---|---|
should recurse into tracked submodules on the git path |
Real git fixture | PASS | ~5.3s |
should skip missing tracked paths from submodule indexes |
Mock | PASS | 4ms |
should skip cached gitlink directories from uninitialized submodules |
Mock | PASS | 2ms |
3. Live Submodule Integration (manual verification)
Created a real parent repo with a submodule at vendor/lib containing files at two depth levels.
Before fix (git ls-files --cached -t without --recurse-submodules):
H .gitmodules
H root.txt
H vendor/lib ← gitlink directory only, files inside invisible
After fix (git ls-files --cached -t --recurse-submodules):
H .gitmodules
H root.txt
H vendor/lib/deep/nested/deep-file.txt ← now visible
H vendor/lib/inner.txt ← now visible
Also verified:
- lstatSync filter:
vendor/libgitlink entry correctly identified as directory and filtered out - Stale index entries:
lstatSynctry/catch correctly filters paths that exist in the index but not on disk
4. Findings
⚠️ Test timeout (non-blocking): The real submodule fixture test (should recurse into tracked submodules) takes ~5.3s due togit init+git submodule add+git commitoverhead. It marginally exceeds the default 5s Vitest timeout. CI passes (macOS/Ubuntu/Windows all green) because CI machines are faster, but locally on macOS the test can flap. Consider adding{ timeout: 15_000 }to this specific test for margin.- The code change is minimal and surgical: 1 flag added (
--recurse-submodules) + 1 stat guard (8 lines) to filter stale/directory entries. - No regressions in the existing 44 crawler tests.
5. Verdict
PASS — ready to merge. The core fix is correct and well-tested. The timeout finding is cosmetic (CI is green) but worth a quick fix for local dev experience.
Tested on 2026-05-29 via tmux with real git submodule fixtures and full test suite.
Summary
Fixes #4568.
git ls-files --cached -treports a submodule as the gitlink path, so the crawler only sawvendor/liband missed files tracked inside that submodule. This adds--recurse-submodulesto the tracked-file listing path so submodule contents are crawled the same way regular tracked files are.I also added a real git fixture that creates a parent repo, adds a local submodule, and verifies the crawler returns a file from inside the submodule.
Verification
npx vitest run src/utils/filesearch/crawler.test.ts -t "recurse into tracked submodules"npx vitest run src/utils/filesearch/crawler.test.tsnpm run typecheck --workspace=@qwen-code/qwen-code-corenpm run lint --workspace=@qwen-code/qwen-code-core -- src/utils/filesearch/crawler.ts src/utils/filesearch/crawler.test.tsnpm run build --workspace=@qwen-code/qwen-code-coregit diff --check