feat(core): /llms.txt and /llms-full.txt PoC#3338
Draft
vmarcosp wants to merge 1 commit into
Draft
Conversation
Adds two feature-flagged routes that expose curated, LLM-friendly content for the storefront: - /llms.txt: brand header + top categories (via GraphQL allCollections) + CMS-discovered pages + optional custom sections + sitemap/full-content cross-link. - /llms-full.txt: inlines SEO description and whitelisted CMS section text (BannerText, Hero by default; extensible per store) with per-page (8KB) and total-file (500KB) caps. Both routes are disabled by default (discovery.config.llms.enabled = false) and ship with sensible defaults so a store opts in with minimal config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's the purpose of this pull request?
llms.txtis a 2024 proposal for a curated Markdown index at/llms.txt, scoped so an LLM with a small context window can ingest a site's high-signal content. This PR explores what a FastStore-native implementation looks like: two feature-flagged Next.js routes that expose curated, LLM-friendly content for the storefront.Two routes ship behind a single flag:
/llms.txt— brand header + top categories (live from GraphQLallCollections) + CMS-discovered pages + optional custom sections + sitemap and full-content cross-links./llms-full.txt— inlines SEO descriptions and whitelisted CMS section text (BannerText,Heroby default; extensible per store) so an LLM can answer questions, not just navigate. Per-page (8KB) and total-file (500KB) caps prevent rogue pages from blowing up the file.Both routes are disabled by default (
discovery.config.llms.enabled = false) and ship with sensible defaults so a store opts in with minimal config.How it works?
packages/core/src/server/llms/:build.ts—resolveSources()orchestrates GraphQL + CMS fetches with config-driven gating.sources.ts—fetchTopCategories()viaexecute()with an inlineallCollectionsquery (filtered toDepartment/Categorytypes);fetchPagesByContentTypes()viacontentService.getMultipleContentwith slug allow/deny lists.extract.ts— pure section-text extractor that pullstitle/subtitle/caption/text/body/descriptionout of whitelisted CMS sections. Unknown sections and unknown fields are ignored.sections.ts— section builders (brand header, categories, pages, customer service, FAQ, custom sections, contact, optional).full.ts—buildLlmsFullTxt()composes the brand header, the categories block, the per-page blocks, and the custom-pages block, with size caps and page-boundary truncation when the total cap is exceeded.index.ts—buildLlmsTxt()and re-exports.packages/core/src/pages/api/fs/llms.tsandllms-full.tswithtext/markdown; charset=utf-8,Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400,X-Robots-Tag: noindex, 405 on non-GET, and 404 when the feature flag is off.next.config.jsrewrites/llms.txt→/api/fs/llmsand/llms-full.txt→/api/fs/llms-full, placed beforestoreConfig.rewritesso store overrides win.discovery.config.default.jsships thellmsblock withenabled: falseand asourcessub-block (categories.enabled,categories.limit,contentTypes,slugAllowList,slugDenyList,textSectionNames).Background docs
The
docs/llms-txt/directory ships research notes, the v1 PoC plan, the v2 plan implemented here, and a rendered HTML handoff. They live in the PR for reviewer context — they're not part of the runtime surface and can be removed before any non-PoC merge.How to read
docs/llms-txt/handoff.htmlhandoff.htmlis the rendered HTML version ofhandoff.md. To read it:open docs/llms-txt/handoff.html(macOS) orxdg-open docs/llms-txt/handoff.html(Linux). The file is self-contained (no external assets), so it renders the same anywhere.raw.githack.com(paste the GitHub raw URL), or readdocs/llms-txt/handoff.mdinstead — it has the same content with GitHub's native Markdown rendering.If you only have time for one document, read
handoff.mdfor the research andplan-v2.mdfor the implementation contract.How to test it?
In
packages/core/src/customizations/discovery.config.jsset thellms.enabledflag (the file already ships with a smoke-test config in this PR) and:Then:
GET http://localhost:3000/llms.txt→ 200,Content-Type: text/markdown; charset=utf-8, body starts with#and contains## Shop by category,## Pages,## Contact,## Optional(the last linking to/llms-full.txt).GET http://localhost:3000/llms-full.txt→ 200, parses as Markdown, has one##block per discovered page, includes prose not just titles.llms.enabledtofalse→ both routes return 404.llms.sources.categories.enabled→## Shop by categorydisappears from both files.llms.sources.slugDenyList→ that page disappears from## Pagesand from/llms-full.txt.POSTorPUTto either route → 405.Starters Deploy Preview
References
docs/llms-txt/handoff.mddocs/llms-txt/plan.mddocs/llms-txt/plan-v2.mdNotes on what this PR is and isn't
Is: a working spike that ships two real endpoints reading live storefront data. Useful to validate the approach, the config contract, and the CMS / GraphQL wiring against a real store before committing to a feature.
Isn't:
sourcesshape is forward-compatible but not frozen./{locale}/llms.txtfan-out is deferred (see open questions inhandoff.md).robots.txtintegration. The VTEX edge proxy behavior for/llms.txtstill needs platform-team confirmation.Checklist
PR Title and Commit Messages
feat(core): ...)PR Description
Dependencies
pnpm-lock.yamlupdatesDocumentation
docs/llms-txt/Preview
llms.txt for FastStore — PR handoff.pdf