Skip to content

fix: include max_file_size in S3 cache key to prevent stale digests#570

Open
Louiszk wants to merge 1 commit intocoderamp-labs:mainfrom
Louiszk:fix/s3-cache-size
Open

fix: include max_file_size in S3 cache key to prevent stale digests#570
Louiszk wants to merge 1 commit intocoderamp-labs:mainfrom
Louiszk:fix/s3-cache-size

Conversation

@Louiszk
Copy link
Copy Markdown

@Louiszk Louiszk commented Mar 12, 2026

This PR fixes a caching bug where S3 cache keys did not account for max_file_size (see #568).

Previously, if a repository was requested with the default 50KB limit, and then requested again with a 100MB limit, the server would return the cached 50KB version because the cache key only relied on include/exclude patterns and the commit hash.

  • Updated generate_s3_file_path in src/server/s3_utils.py to accept max_file_size and append it to the hashing string.
  • Passed query.max_file_size into both calls to generate_s3_file_path inside src/server/query_processor.py.

While looking into this, I realized this strict hashing approach might cause unnecessary cache misses. For example, if a user requests a 500KB limit, and then a 2MB limit on a repo where the largest file is only 100KB, the current fix will treat them as different keys and trigger a re-clone.

Ideally, the cache would stay the same in this instance. In the future, it might be worth adding largest_file_encountered to the S3Metadata JSON and updating the lookup logic to allow compatible cache hits.

For now, adding the size to the hash key is a fast and reliable way to prevent the UI from serving incorrectly limited files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant