Turn a task's declared cache.inputs into concrete data the cache key
can hash:
- a sorted list of absolute file paths whose contents will be hashed
- a sorted list of
[envName, hostValue]pairs
Plus a small helper to resolve cache.outputs.files to actual produced
files for capture.
export interface ResolvedInputs {
files: string[] // absolute paths, sorted
envValues: Array<[name: string, value: string]> // sorted by name
}
export interface ResolveInputsArgs {
projectDir: string
workspaceRoot: string
envSource: NodeJS.ProcessEnv
inputs: CacheInputs | undefined
ownOutputs: string[] // project-relative globs to exclude
nestedProjectDirs: string[] // absolute dirs of nested projects
}
export async function resolveInputs(args: ResolveInputsArgs): Promise<ResolvedInputs>
export async function resolveOutputs(args: {
projectDir: string
outputs: string[]
nestedProjectDirs: string[]
}): Promise<string[]>
/**
* Remove every file currently matching the declared output globs.
* Called before every cache-miss exec AND before every cache-hit
* restore so the project dir lands on a clean slate.
*/
export async function cleanOutputs(args: {
projectDir: string
outputs: string[]
nestedProjectDirs: string[]
}): Promise<void>The candidate file set comes from git ls-files --cached --others --exclude-standard when the project is inside a git repo, falling
back to a Bun.Glob walker when it isn't. The user's
cache.inputs.files globs are then applied as a filter on top.
- Candidate enumeration:
- Git path (default when a
.gitwork-tree exists).git ls-filesyields tracked files PLUS untracked-but-not-ignored files..gitignorecascades (workspace + every nested), plus.git/info/exclude+ global excludes, are honored — git applies them for us. This matches what Turbo and Nx do internally. - Fallback path (no git available).
Bun.Glob.scan(projectDir)walks the FS. Theignorelibrary applies workspace-root + project-root.gitignorepatterns, with the caveat that project-level anchored patterns are evaluated against workspace-relative paths (sopkg/.gitignore: src/skip.tsmisbehaves — match git semantics by adopting a git workspace).
- Git path (default when a
- Positive globs —
cache.inputs.filesstrings without!. The default whencache.inputs.filesis undefined is['**/*']. Each is checked against the candidate set viaBun.Glob.match. - Negative globs — entries starting with
!. The!is stripped; the rest becomes aBun.Globand any matched path is removed. - Always-ignored — hard-coded
(
**/node_modules/**,**/.git/**,**/.vx/**,**/*.tsbuildinfo) — applied as a defense-in-depth even if git happens to track something there. - Boundary ignores — every nested project's directory (relative
to this project) →
<rel>/**. Cross-project isolation contract. - Own outputs — declared
cache.outputs.filesare excluded. Prevents self-invalidation. - Existence check —
git ls-files --cachedcan surface a deleted-but-tracked path; we drop entries that don't exist on disk so the hasher doesn't throw ENOENT.
The matched absolute paths are sorted alphabetically and returned.
Listed cache.inputs.env names are looked up in envSource (the
host's process.env):
- Set names →
[name, value]pair. - Unset names →
[name, ''](distinguishable from "name was never listed"). - Sorted by name for deterministic key ordering.
resolveOutputs is a simpler glob pass:
- Globs run against the project dir.
- Always-ignored paths excluded (
node_modules, etc.). - Nested-project subtrees excluded (boundary isolation).
- No gitignore filter — outputs like
dist/are usually gitignored on purpose, and we still want to capture them.
Returns sorted absolute paths.
- Doesn't hash file content (that's
cache.ts:hashFiles). - Doesn't apply
inputs.tasksfiltering (that'sorchestrator.filterUpstreamHashes). - Doesn't support workspace-relative globs in
inputs.files— intentionally scoped per-project. For workspace-shared files, see the deferredWorkspaceConfig.globalInputsin../schema.md. - Doesn't follow symlinks specially.
Two test files cover this module:
tests/inputs.test.ts — direct unit tests against resolveInputs,
resolveOutputs, cleanOutputs. Split into FS-walker tests (no git
init in fixture) and git-path tests (init a real git repo in the
fixture). The git-path block verifies:
- Nested
.gitignorepatterns are correctly anchored (the v13 bug). - Untracked-but-not-ignored files participate immediately.
- Workspace-root
.gitignoreexcludes via git. .git/info/excludehonored.- Deleted-but-tracked files skipped (no ENOENT).
- Declared outputs still excluded under the git path.
- Nested-project boundary still excludes under the git path.
- Negation in
inputs.filesstill strips under the git path. node_modulesalways-ignored even when force-added to git.
tests/orchestrator.test.ts — e2e behaviour:
- default = all files (gitignore-aware)
- narrow globs limit cache busting
- negation excludes
- self-invalidation guard (declared outputs excluded)
- boundary isolation (nested project files don't leak)
- gitignored files excluded; negated gitignore re-included
- empty
files: []produces stable hash - env input value changes bust cache; unset vs empty differ
Possible directions:
- Auto-tracking inputs (vite-task style) — instead of static
globs, capture the files the command actually read via syscall
spying. Replace
resolveFileswith a strategy that runs the command in a tracing wrapper. Significant scope. - Cross-project inputs — add a notion of "this file from that
project" (e.g.,
{ project: 'lib-a', files: '...' }). Today this is expressed only via thedependsOn+ upstream-hash propagation; direct file references across projects are forbidden. - Faster hashing — current implementation reads files sequentially. Parallelizing would help on very large input sets.