fix(ai): cap generate prompt at 120k; tiered reference density (#59 phase 1)#88
Conversation
- Clamp buildGeneratePrompt budget with BUILD_GENERATE_PROMPT_MAX_TOKENS so large-context models cannot inflate code-analysis payload - Size each code file with maxCharsForCodeFileContent; skip overflow instead of always inlining the first file - feat(scoring): tiered architectural weights for reference density (caliber-ai-org#59 phase 1)
…ts (resolves PR 88 comments) - Replaces hardcoded tier lists with dynamic filesystem resolution - Weights resolved paths double to reward correct project grounding - Fixes case-sensitivity bugs by relying on raw path resolution instead of lowercased basenames - Fixes dead entries bug where ignored lockfiles wouldn't be weighted - Fixes misleading detail string to correctly report ratio components - Adds tests for dynamic reference resolution
alonp98
left a comment
There was a problem hiding this comment.
Thanks for the contribution @sachinsharma3191! This is a substantial PR — here's a detailed review.
General: Two unrelated changes in one PR
This bundles a generate prompt cap and a grounding density rewrite. These should be separate PRs — they touch different subsystems and have independent risk profiles.
Part 1: Generate prompt cap at 120k (src/ai/generate.ts)
The motivation is valid — large-context models can inflate the code section beyond what's useful. However:
-
Arbitrary magic number —
120_000is hardcoded with no rationale or configurability. Why 120k and not 100k or 150k? -
Changed skip-vs-stop semantics — The original loop does
breakwhen a file exceeds budget. The new code doescontinue(skips oversized files), which means smaller files later in the sorted list get included while larger, potentially more important files get silently dropped. This is a behavioral change that could degrade output quality — the sort order exists for a reason. -
maxCharsForCodeFileContentuses* 4approximation — This is already used elsewhere so it's consistent, but introducing another layer of char↔token conversion compounds the approximation error.
Part 2: Weighted reference density (src/scoring/)
What's good
- Respects the "no hardcoded mappings" principle — weights by resolution, not filename importance. The constants comment explicitly notes this.
pathReferenceResolvesInProjectis stack-agnostic.- Tests are well-structured and properly scoped (unlike some other PRs).
Issues
-
Code duplication with
validateFileReferences()(lines 306-336 inutils.ts) — Both functions do nearly identical work: filter out URLs, version numbers,#/@prefixes, globs,..paths, thenexistsSync. The new function should reuse or extend the existing one rather than duplicating the logic. -
O(refs × entries) double-loop —
pathReferenceResolvesInProjectiterates over allprojectFilesandprojectDirscheckingendsWith. On large projects this could be slow. A normalizedSetlookup would be more efficient. -
fix.datashape changed —currentDensity→densityPercent,currentRefsremoved, new fields added. This is a breaking change for anything consuming fix data downstream (e.g.score-refine.tsuses fix data to generate refinement prompts). -
PR title says
fixbut this isfeat— New scoring behavior is a feature, not a bug fix.
Recommendation
Split this into two PRs:
- Generate prompt cap — with rationale for the 120k number and preserving the original break-on-budget semantics (or explaining why skip-and-continue is better)
- Resolution-weighted density — refactored to reuse
validateFileReferences, with thefix.datashape change verified againstscore-refine.ts
|
Hey @sachinsharma3191, just checking in on this. Are you still planning to address the review feedback? Let me know if you need any help. |
|
@alonp98 The changes were implemented 2 week ago and please review the changes |
Implemented changes (for reviewers)Generate prompt (
Reference density / #59 (aligned with PR #79 direction)
Please re-review when you have time. |
alonp98
left a comment
There was a problem hiding this comment.
Hey @sachinsharma3191, thanks for the update! The scoring approach is solid — stack-agnostic resolution with no hardcoded tiers is exactly right.
A few things to address before this can land:
-
Rebase needed — The
generate.tschanges (120k cap) are now included in #80 which was just approved. Once it merges, this PR will conflict. Please rebase on master and drop thegenerate.tschanges from this PR so only the scoring/grounding work remains. -
Code duplication with
validateFileReferences()—pathReferenceResolvesInProjectdoes very similar filtering (URLs, version numbers,#/@prefixes, globs,..paths) to the existingvalidateFileReferences()in the same file. Could you refactor to share the common logic? -
O(n²) loop in
pathReferenceResolvesInProject— For each ref, you iterate allprojectFilesandprojectDirscheckingendsWith. On large projects this could get slow. A normalizedSetlookup (e.g. storing basename → full path mappings) would bring this to O(1) per ref. -
fix.datashape change — The data went from{ currentDensity, currentRefs, lines }to 6 new fields. Can you verify this doesn't breakscore-refine.tswhich consumes fix data to generate refinement prompts? -
PR title — This is a
feat, not afix— the weighted density scoring is new behavior.
Looking forward to the next iteration!
|
Thanks for the effort @sachinsharma3191! This was a good catch at the time. However, this has been superseded by #173 (merged Apr 17) which added a Closing as superseded — appreciate the contribution! |
Summary
buildGeneratePromptnow clamps the effective token budget to 120k (in addition togetMaxPromptTokens()), so large-context models (e.g. GPT-4.1) no longer inflate the code-analysis section past whatgenerate.test.tsand typical provider limits expect.maxCharsForCodeFileContent; we no longer always inline the first file when it would exceed the budget.src/scoring/reference-weight.ts.Testing
npm run test(full suite green).Closes / relates to: #59 (phase 1 heuristic weighting). Generate prompt behavior fixes the previously failing
buildGeneratePromptsize tests without changinggetMaxPromptTokens()semantics for other call sites.