Skip to content

fix(ai): cap generate prompt at 120k; tiered reference density (#59 phase 1)#88

Closed
sachinsharma3191 wants to merge 5 commits intocaliber-ai-org:masterfrom
sachinsharma3191:fix/generate-prompt-cap-grounding-weights
Closed

fix(ai): cap generate prompt at 120k; tiered reference density (#59 phase 1)#88
sachinsharma3191 wants to merge 5 commits intocaliber-ai-org:masterfrom
sachinsharma3191:fix/generate-prompt-cap-grounding-weights

Conversation

@sachinsharma3191
Copy link
Copy Markdown
Contributor

Summary

  • Generate prompt budget: buildGeneratePrompt now clamps the effective token budget to 120k (in addition to getMaxPromptTokens()), so large-context models (e.g. GPT-4.1) no longer inflate the code-analysis section past what generate.test.ts and typical provider limits expect.
  • Code file loop: Each project file is sized with maxCharsForCodeFileContent; we no longer always inline the first file when it would exceed the budget.
  • Grounding (Proposal: Smarter Grounding Checks (Heuristic Weighting & Critical Context Detection) #59 phase 1): Path-like references are weighted by architectural importance (tier 1–3) for reference density scoring; see src/scoring/reference-weight.ts.

Testing

npm run test (full suite green).

Closes / relates to: #59 (phase 1 heuristic weighting). Generate prompt behavior fixes the previously failing buildGeneratePrompt size tests without changing getMaxPromptTokens() semantics for other call sites.

- Clamp buildGeneratePrompt budget with BUILD_GENERATE_PROMPT_MAX_TOKENS so large-context models cannot inflate code-analysis payload
- Size each code file with maxCharsForCodeFileContent; skip overflow instead of always inlining the first file
- feat(scoring): tiered architectural weights for reference density (caliber-ai-org#59 phase 1)
…ts (resolves PR 88 comments)

- Replaces hardcoded tier lists with dynamic filesystem resolution
- Weights resolved paths double to reward correct project grounding
- Fixes case-sensitivity bugs by relying on raw path resolution instead of lowercased basenames
- Fixes dead entries bug where ignored lockfiles wouldn't be weighted
- Fixes misleading detail string to correctly report ratio components
- Adds tests for dynamic reference resolution
Copy link
Copy Markdown
Contributor

@alonp98 alonp98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @sachinsharma3191! This is a substantial PR — here's a detailed review.

General: Two unrelated changes in one PR

This bundles a generate prompt cap and a grounding density rewrite. These should be separate PRs — they touch different subsystems and have independent risk profiles.


Part 1: Generate prompt cap at 120k (src/ai/generate.ts)

The motivation is valid — large-context models can inflate the code section beyond what's useful. However:

  1. Arbitrary magic number120_000 is hardcoded with no rationale or configurability. Why 120k and not 100k or 150k?

  2. Changed skip-vs-stop semantics — The original loop does break when a file exceeds budget. The new code does continue (skips oversized files), which means smaller files later in the sorted list get included while larger, potentially more important files get silently dropped. This is a behavioral change that could degrade output quality — the sort order exists for a reason.

  3. maxCharsForCodeFileContent uses * 4 approximation — This is already used elsewhere so it's consistent, but introducing another layer of char↔token conversion compounds the approximation error.


Part 2: Weighted reference density (src/scoring/)

What's good

  • Respects the "no hardcoded mappings" principle — weights by resolution, not filename importance. The constants comment explicitly notes this.
  • pathReferenceResolvesInProject is stack-agnostic.
  • Tests are well-structured and properly scoped (unlike some other PRs).

Issues

  1. Code duplication with validateFileReferences() (lines 306-336 in utils.ts) — Both functions do nearly identical work: filter out URLs, version numbers, #/@ prefixes, globs, .. paths, then existsSync. The new function should reuse or extend the existing one rather than duplicating the logic.

  2. O(refs × entries) double-looppathReferenceResolvesInProject iterates over all projectFiles and projectDirs checking endsWith. On large projects this could be slow. A normalized Set lookup would be more efficient.

  3. fix.data shape changedcurrentDensitydensityPercent, currentRefs removed, new fields added. This is a breaking change for anything consuming fix data downstream (e.g. score-refine.ts uses fix data to generate refinement prompts).

  4. PR title says fix but this is feat — New scoring behavior is a feature, not a bug fix.


Recommendation

Split this into two PRs:

  1. Generate prompt cap — with rationale for the 120k number and preserving the original break-on-budget semantics (or explaining why skip-and-continue is better)
  2. Resolution-weighted density — refactored to reuse validateFileReferences, with the fix.data shape change verified against score-refine.ts

@alonp98
Copy link
Copy Markdown
Contributor

alonp98 commented Apr 13, 2026

Hey @sachinsharma3191, just checking in on this. Are you still planning to address the review feedback? Let me know if you need any help.

@sachinsharma3191
Copy link
Copy Markdown
Contributor Author

@alonp98 The changes were implemented 2 week ago and please review the changes

@sachinsharma3191
Copy link
Copy Markdown
Contributor Author

Implemented changes (for reviewers)

Generate prompt (buildGeneratePrompt)

  • Effective budget is min(getMaxPromptTokens(), 120_000) so large-context models cannot inflate the code-analysis section past the cap expected by tests and typical Claude-class usage.
  • Code files are sized with maxCharsForCodeFileContent; the loop no longer always inlines the first file when it would exceed the budget.

Reference density / #59 (aligned with PR #79 direction)

  • No hardcoded filename tier lists. Path-like refs are weighted by whether they resolve to this repo (scanned project files/dirs + on-disk check): pathReferenceResolvesInProject / sumPathReferenceDensityWeights in src/scoring/utils.ts.
  • reference_density detail spells out reference units, path-like count, how many resolve, inline marks, and line-reference density so the percentage is not misleading.
  • Tests: path-reference-resolution.test.ts; grounding integration test for resolved vs unresolved paths; POINTS_REFERENCE_DENSITY comment in constants.ts.

Please re-review when you have time.

Copy link
Copy Markdown
Contributor

@alonp98 alonp98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sachinsharma3191, thanks for the update! The scoring approach is solid — stack-agnostic resolution with no hardcoded tiers is exactly right.

A few things to address before this can land:

  1. Rebase needed — The generate.ts changes (120k cap) are now included in #80 which was just approved. Once it merges, this PR will conflict. Please rebase on master and drop the generate.ts changes from this PR so only the scoring/grounding work remains.

  2. Code duplication with validateFileReferences()pathReferenceResolvesInProject does very similar filtering (URLs, version numbers, #/@ prefixes, globs, .. paths) to the existing validateFileReferences() in the same file. Could you refactor to share the common logic?

  3. O(n²) loop in pathReferenceResolvesInProject — For each ref, you iterate all projectFiles and projectDirs checking endsWith. On large projects this could get slow. A normalized Set lookup (e.g. storing basename → full path mappings) would bring this to O(1) per ref.

  4. fix.data shape change — The data went from { currentDensity, currentRefs, lines } to 6 new fields. Can you verify this doesn't break score-refine.ts which consumes fix data to generate refinement prompts?

  5. PR title — This is a feat, not a fix — the weighted density scoring is new behavior.

Looking forward to the next iteration!

@alonp98
Copy link
Copy Markdown
Contributor

alonp98 commented Apr 25, 2026

Thanks for the effort @sachinsharma3191! This was a good catch at the time.

However, this has been superseded by #173 (merged Apr 17) which added a MAX_EXISTING_DOCS_CHARS budget with proportional truncation in src/ai/refresh.ts. The prompt-too-long issue is now handled differently.

Closing as superseded — appreciate the contribution!

@alonp98 alonp98 closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants