Skip to content

feat(core): /llms.txt and /llms-full.txt PoC#3338

Draft
vmarcosp wants to merge 1 commit into
mainfrom
feat/llms-txt-poc
Draft

feat(core): /llms.txt and /llms-full.txt PoC#3338
vmarcosp wants to merge 1 commit into
mainfrom
feat/llms-txt-poc

Conversation

@vmarcosp

@vmarcosp vmarcosp commented May 20, 2026

Copy link
Copy Markdown
Member

Status: Proof of Concept. Not intended for merge as-is. Opened to gather feedback on the approach, config contract, and source wiring before promoting to a real feature.

What's the purpose of this pull request?

llms.txt is a 2024 proposal for a curated Markdown index at /llms.txt, scoped so an LLM with a small context window can ingest a site's high-signal content. This PR explores what a FastStore-native implementation looks like: two feature-flagged Next.js routes that expose curated, LLM-friendly content for the storefront.

Two routes ship behind a single flag:

  • /llms.txt — brand header + top categories (live from GraphQL allCollections) + CMS-discovered pages + optional custom sections + sitemap and full-content cross-links.
  • /llms-full.txt — inlines SEO descriptions and whitelisted CMS section text (BannerText, Hero by default; extensible per store) so an LLM can answer questions, not just navigate. Per-page (8KB) and total-file (500KB) caps prevent rogue pages from blowing up the file.

Both routes are disabled by default (discovery.config.llms.enabled = false) and ship with sensible defaults so a store opts in with minimal config.

How it works?

  • New server module at packages/core/src/server/llms/:
    • build.tsresolveSources() orchestrates GraphQL + CMS fetches with config-driven gating.
    • sources.tsfetchTopCategories() via execute() with an inline allCollections query (filtered to Department / Category types); fetchPagesByContentTypes() via contentService.getMultipleContent with slug allow/deny lists.
    • extract.ts — pure section-text extractor that pulls title / subtitle / caption / text / body / description out of whitelisted CMS sections. Unknown sections and unknown fields are ignored.
    • sections.ts — section builders (brand header, categories, pages, customer service, FAQ, custom sections, contact, optional).
    • full.tsbuildLlmsFullTxt() composes the brand header, the categories block, the per-page blocks, and the custom-pages block, with size caps and page-boundary truncation when the total cap is exceeded.
    • index.tsbuildLlmsTxt() and re-exports.
  • New API routes under packages/core/src/pages/api/fs/llms.ts and llms-full.ts with text/markdown; charset=utf-8, Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400, X-Robots-Tag: noindex, 405 on non-GET, and 404 when the feature flag is off.
  • next.config.js rewrites /llms.txt/api/fs/llms and /llms-full.txt/api/fs/llms-full, placed before storeConfig.rewrites so store overrides win.
  • discovery.config.default.js ships the llms block with enabled: false and a sources sub-block (categories.enabled, categories.limit, contentTypes, slugAllowList, slugDenyList, textSectionNames).

Background docs

The docs/llms-txt/ directory ships research notes, the v1 PoC plan, the v2 plan implemented here, and a rendered HTML handoff. They live in the PR for reviewer context — they're not part of the runtime surface and can be removed before any non-PoC merge.

How to read docs/llms-txt/handoff.html

handoff.html is the rendered HTML version of handoff.md. To read it:

  • Locally: open the file in any browser — for example open docs/llms-txt/handoff.html (macOS) or xdg-open docs/llms-txt/handoff.html (Linux). The file is self-contained (no external assets), so it renders the same anywhere.
  • From the PR: GitHub does not render HTML files inline. Either preview the raw file via raw.githack.com (paste the GitHub raw URL), or read docs/llms-txt/handoff.md instead — it has the same content with GitHub's native Markdown rendering.

If you only have time for one document, read handoff.md for the research and plan-v2.md for the implementation contract.

How to test it?

In packages/core/src/customizations/discovery.config.js set the llms.enabled flag (the file already ships with a smoke-test config in this PR) and:

pnpm --filter @faststore/core dev

Then:

  • GET http://localhost:3000/llms.txt → 200, Content-Type: text/markdown; charset=utf-8, body starts with # and contains ## Shop by category, ## Pages, ## Contact, ## Optional (the last linking to /llms-full.txt).
  • GET http://localhost:3000/llms-full.txt → 200, parses as Markdown, has one ## block per discovered page, includes prose not just titles.
  • Flip llms.enabled to false → both routes return 404.
  • Disable llms.sources.categories.enabled## Shop by category disappears from both files.
  • Add a slug to llms.sources.slugDenyList → that page disappears from ## Pages and from /llms-full.txt.
  • Send a POST or PUT to either route → 405.

Starters Deploy Preview

References

Notes on what this PR is and isn't

Is: a working spike that ships two real endpoints reading live storefront data. Useful to validate the approach, the config contract, and the CMS / GraphQL wiring against a real store before committing to a feature.

Isn't:

  • Production-ready. No tests in this PR (left out intentionally to keep the spike small).
  • A final config contract. The sources shape is forward-compatible but not frozen.
  • A multi-binding solution. The routes use the current request's binding context; a /{locale}/llms.txt fan-out is deferred (see open questions in handoff.md).
  • An edge-cache or robots.txt integration. The VTEX edge proxy behavior for /llms.txt still needs platform-team confirmation.

Checklist

PR Title and Commit Messages

  • PR title and commit messages follow Conventional Commits (feat(core): ...)

PR Description

  • Context, implementation summary, and verification steps included
  • Label (PoC / spike) to be added by the reviewer

Dependencies

  • No dependency changes; no pnpm-lock.yaml updates

Documentation

  • PR description and docs/llms-txt/

Preview

llms.txt for FastStore — PR handoff.pdf

Screenshot 2026-05-20 at 08 51 36

Adds two feature-flagged routes that expose curated, LLM-friendly content
for the storefront:

- /llms.txt: brand header + top categories (via GraphQL allCollections) +
  CMS-discovered pages + optional custom sections + sitemap/full-content
  cross-link.
- /llms-full.txt: inlines SEO description and whitelisted CMS section text
  (BannerText, Hero by default; extensible per store) with per-page (8KB)
  and total-file (500KB) caps.

Both routes are disabled by default (discovery.config.llms.enabled = false)
and ship with sensible defaults so a store opts in with minimal config.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf37c96a-94b0-452a-bdd8-d62394268d11

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/llms-txt-poc

Comment @coderabbitai help to get the list of available commands and usage tips.

@codesandbox-ci

Copy link
Copy Markdown

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant