fix(ai): cap generate prompt at 120k; tiered reference density (#59 phase 1) by sachinsharma3191 · Pull Request #88 · caliber-ai-org/ai-setup

sachinsharma3191 · 2026-03-25T06:03:51Z

Summary

Generate prompt budget: buildGeneratePrompt now clamps the effective token budget to 120k (in addition to getMaxPromptTokens()), so large-context models (e.g. GPT-4.1) no longer inflate the code-analysis section past what generate.test.ts and typical provider limits expect.
Code file loop: Each project file is sized with maxCharsForCodeFileContent; we no longer always inline the first file when it would exceed the budget.
Grounding (Proposal: Smarter Grounding Checks (Heuristic Weighting & Critical Context Detection) #59 phase 1): Path-like references are weighted by architectural importance (tier 1–3) for reference density scoring; see src/scoring/reference-weight.ts.

Testing

npm run test (full suite green).

Closes / relates to: #59 (phase 1 heuristic weighting). Generate prompt behavior fixes the previously failing buildGeneratePrompt size tests without changing getMaxPromptTokens() semantics for other call sites.

- Clamp buildGeneratePrompt budget with BUILD_GENERATE_PROMPT_MAX_TOKENS so large-context models cannot inflate code-analysis payload - Size each code file with maxCharsForCodeFileContent; skip overflow instead of always inlining the first file - feat(scoring): tiered architectural weights for reference density (caliber-ai-org#59 phase 1)

…ts (resolves PR 88 comments) - Replaces hardcoded tier lists with dynamic filesystem resolution - Weights resolved paths double to reward correct project grounding - Fixes case-sensitivity bugs by relying on raw path resolution instead of lowercased basenames - Fixes dead entries bug where ignored lockfiles wouldn't be weighted - Fixes misleading detail string to correctly report ratio components - Adds tests for dynamic reference resolution

alonp98

Thanks for the contribution @sachinsharma3191! This is a substantial PR — here's a detailed review.

General: Two unrelated changes in one PR

This bundles a generate prompt cap and a grounding density rewrite. These should be separate PRs — they touch different subsystems and have independent risk profiles.

Part 1: Generate prompt cap at 120k (`src/ai/generate.ts`)

The motivation is valid — large-context models can inflate the code section beyond what's useful. However:

Arbitrary magic number — 120_000 is hardcoded with no rationale or configurability. Why 120k and not 100k or 150k?
Changed skip-vs-stop semantics — The original loop does break when a file exceeds budget. The new code does continue (skips oversized files), which means smaller files later in the sorted list get included while larger, potentially more important files get silently dropped. This is a behavioral change that could degrade output quality — the sort order exists for a reason.
maxCharsForCodeFileContent uses * 4 approximation — This is already used elsewhere so it's consistent, but introducing another layer of char↔token conversion compounds the approximation error.

Part 2: Weighted reference density (`src/scoring/`)

What's good

Respects the "no hardcoded mappings" principle — weights by resolution, not filename importance. The constants comment explicitly notes this.
pathReferenceResolvesInProject is stack-agnostic.
Tests are well-structured and properly scoped (unlike some other PRs).

Issues

Code duplication with validateFileReferences() (lines 306-336 in utils.ts) — Both functions do nearly identical work: filter out URLs, version numbers, #/@ prefixes, globs, .. paths, then existsSync. The new function should reuse or extend the existing one rather than duplicating the logic.
O(refs × entries) double-loop — pathReferenceResolvesInProject iterates over all projectFiles and projectDirs checking endsWith. On large projects this could be slow. A normalized Set lookup would be more efficient.
fix.data shape changed — currentDensity → densityPercent, currentRefs removed, new fields added. This is a breaking change for anything consuming fix data downstream (e.g. score-refine.ts uses fix data to generate refinement prompts).
PR title says fix but this is feat — New scoring behavior is a feature, not a bug fix.

Recommendation

Split this into two PRs:

Generate prompt cap — with rationale for the 120k number and preserving the original break-on-budget semantics (or explaining why skip-and-continue is better)
Resolution-weighted density — refactored to reuse validateFileReferences, with the fix.data shape change verified against score-refine.ts

alonp98 · 2026-04-13T06:47:19Z

Hey @sachinsharma3191, just checking in on this. Are you still planning to address the review feedback? Let me know if you need any help.

sachinsharma3191 · 2026-04-13T18:02:17Z

@alonp98 The changes were implemented 2 week ago and please review the changes

sachinsharma3191 · 2026-04-13T18:22:50Z

Implemented changes (for reviewers)

Generate prompt (buildGeneratePrompt)

Effective budget is min(getMaxPromptTokens(), 120_000) so large-context models cannot inflate the code-analysis section past the cap expected by tests and typical Claude-class usage.
Code files are sized with maxCharsForCodeFileContent; the loop no longer always inlines the first file when it would exceed the budget.

Reference density / #59 (aligned with PR #79 direction)

No hardcoded filename tier lists. Path-like refs are weighted by whether they resolve to this repo (scanned project files/dirs + on-disk check): pathReferenceResolvesInProject / sumPathReferenceDensityWeights in src/scoring/utils.ts.
reference_density detail spells out reference units, path-like count, how many resolve, inline marks, and line-reference density so the percentage is not misleading.
Tests: path-reference-resolution.test.ts; grounding integration test for resolved vs unresolved paths; POINTS_REFERENCE_DENSITY comment in constants.ts.

Please re-review when you have time.

alonp98

Hey @sachinsharma3191, thanks for the update! The scoring approach is solid — stack-agnostic resolution with no hardcoded tiers is exactly right.

A few things to address before this can land:

Rebase needed — The generate.ts changes (120k cap) are now included in #80 which was just approved. Once it merges, this PR will conflict. Please rebase on master and drop the generate.ts changes from this PR so only the scoring/grounding work remains.
Code duplication with validateFileReferences() — pathReferenceResolvesInProject does very similar filtering (URLs, version numbers, #/@ prefixes, globs, .. paths) to the existing validateFileReferences() in the same file. Could you refactor to share the common logic?
O(n²) loop in pathReferenceResolvesInProject — For each ref, you iterate all projectFiles and projectDirs checking endsWith. On large projects this could get slow. A normalized Set lookup (e.g. storing basename → full path mappings) would bring this to O(1) per ref.
fix.data shape change — The data went from { currentDensity, currentRefs, lines } to 6 new fields. Can you verify this doesn't break score-refine.ts which consumes fix data to generate refinement prompts?
PR title — This is a feat, not a fix — the weighted density scoring is new behavior.

Looking forward to the next iteration!

alonp98 · 2026-04-25T08:01:05Z

Thanks for the effort @sachinsharma3191! This was a good catch at the time.

However, this has been superseded by #173 (merged Apr 17) which added a MAX_EXISTING_DOCS_CHARS budget with proportional truncation in src/ai/refresh.ts. The prompt-too-long issue is now handled differently.

Closing as superseded — appreciate the contribution!

sachinsharma3191 added 3 commits March 24, 2026 23:03

test(scoring): add explicit test cases for PR 59 bug fixes

52a4f3e

alonp98 requested changes Mar 28, 2026

View reviewed changes

Merge branch 'master' into fix/generate-prompt-cap-grounding-weights

103c61f

sachinsharma3191 requested a review from alonp98 April 13, 2026 18:02

alonp98 reviewed Apr 16, 2026

View reviewed changes

Merge branch 'master' into fix/generate-prompt-cap-grounding-weights

e31827a

sachinsharma3191 requested a review from alonp98 April 24, 2026 19:13

alonp98 closed this Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai): cap generate prompt at 120k; tiered reference density (#59 phase 1)#88

fix(ai): cap generate prompt at 120k; tiered reference density (#59 phase 1)#88
sachinsharma3191 wants to merge 5 commits intocaliber-ai-org:masterfrom
sachinsharma3191:fix/generate-prompt-cap-grounding-weights

sachinsharma3191 commented Mar 25, 2026

Uh oh!

alonp98 left a comment

Uh oh!

alonp98 commented Apr 13, 2026

Uh oh!

sachinsharma3191 commented Apr 13, 2026

Uh oh!

sachinsharma3191 commented Apr 13, 2026

Uh oh!

alonp98 left a comment

Uh oh!

alonp98 commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sachinsharma3191 commented Mar 25, 2026

Summary

Testing

Uh oh!

alonp98 left a comment

Choose a reason for hiding this comment

General: Two unrelated changes in one PR

Part 1: Generate prompt cap at 120k (src/ai/generate.ts)

Part 2: Weighted reference density (src/scoring/)

What's good

Issues

Recommendation

Uh oh!

alonp98 commented Apr 13, 2026

Uh oh!

sachinsharma3191 commented Apr 13, 2026

Uh oh!

sachinsharma3191 commented Apr 13, 2026

Implemented changes (for reviewers)

Uh oh!

alonp98 left a comment

Choose a reason for hiding this comment

Uh oh!

alonp98 commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Part 1: Generate prompt cap at 120k (`src/ai/generate.ts`)

Part 2: Weighted reference density (`src/scoring/`)