Skip to content

Conversation

@BboyAkers
Copy link
Member

No description provided.

@BboyAkers BboyAkers changed the title dded a pluging to make the documenation llm friendly Added a plugin to make the documentation LLM friendly Jan 13, 2026
@github-actions
Copy link

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

@github-actions github-actions bot temporarily deployed to pr-413 January 13, 2026 06:49 Inactive
"@docusaurus/theme-search-algolia": "3.9.1",
"@easyops-cn/docusaurus-search-local": "0.52.1",
"@mdx-js/react": "3.1.1",
"@signalwire/docusaurus-plugin-llms-txt": "^1.2.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets pin this version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a process of ensuring this stays up-to-date? Does that process provide some safety when combined with pinning? I'd think that we want to keep this pretty up-to-date, and like always, defaulting to pinning seems like a risky choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same as our other repos that are relying on socket / renovate to update dependencies automatically. I just noticed many of the other dependencies here are pinned so wanted to stick to that pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed any socket or renovate PRs and https://github.com/HarperFast/documentation/commits/main/package.json looks pretty quiet, which makes me question this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - we don't have those tools set up here yet, but we have pinned dependency versions. Let's get the tools enabled asap

Comment on lines 272 to 279
siteTitle: 'Harper Documentation',
siteDescription: 'Comprehensive guide to developing on and using the Harper platform',
depth: 2,
content: {
includeBlog: true,
includePages: true,
enableLlmsFullTxt: true // Optional: generates llms-full.txt
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include some more comments or add a section to a markdown doc on what this configuration all means? Like what is the depth 2? Why not 3? And what about include Blog? We don't really have a blog here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along those lines, how did the organization vary and improve at 2 vs 3? Claude is recommending 3 for docs (for better organization - in its 'opinion' 😄 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, nvm, I see the comment now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's claude's analysis after changing the depth to 3 and setting includeVersionedDocs: false:

⏺ Perfect! The changes are working beautifully. Here's the comparison:

  📊 Before vs After Comparison

  File Size Reduction
  ┌────────────────────────────┬─────────┬────────┬──────────────────────────┐
  │           Metric           │ Before  │ After  │          Change          │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ llms.txt                   │ 220KB   │ 67KB   │ ✅ 69% smaller           │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ llms-full.txt              │ 5.4MB   │ 1.2MB  │ ✅ 78% smaller           │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ Line count (llms.txt)      │ 1,083   │ 1,528  │ More organized structure │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ Line count (llms-full.txt) │ 120,078 │ 27,732 │ ✅ 77% reduction         │
  └────────────────────────────┴─────────┴────────┴──────────────────────────┘
  Content Filtering
  ┌─────────────────────────────┬────────┬───────┬──────────────────────────┐
  │           Metric            │ Before │ After │          Result          │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Documents processed         │ ~730+  │ 317   │ ✅ Only current version  │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Old version links (4.1-4.6) │ 610    │ 0     │ ✅ Completely removed    │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Current version links       │ 114    │ 114   │ ✅ Preserved             │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Routes excluded             │ 0      │ 616   │ ✅ Old versions filtered │
  └─────────────────────────────┴────────┴───────┴──────────────────────────┘
  Hierarchy Structure with depth: 3

  The depth: 3 setting creates a much better organized structure:

  ## docs                          ← Level 2: Main section
    ### administration             ← Level 3: Category
      #### harper-studio           ← Level 4: Subcategory
        - [Create Account](...)
        - [Login](...)
        - [Manage Databases](...)
      #### logging                 ← Level 4: Subcategory
        - [Standard Logging](...)
        - [Audit Logging](...)

  Hierarchy breakdown:
  - 5 Level 2 sections (## - top level: docs, fabric, learn, etc.)
  - 21 Level 3 sections (### - categories: administration, developers, deployments)
  - 219 Level 4 sections (#### - subcategories: harper-studio, logging, cloning)

  Why depth: 3 is Perfect

  With your URL structure like /docs/administration/harper-studio/create-account:
  - depth: 1 → Would group by /docs only (too broad)
  - depth: 2 → Would group by /docs/administration (okay, but loses subcategory structure)
  - depth: 3 ✅ → Groups by /docs/administration/harper-studio (ideal - preserves logical structure)

  Benefits for LLMs

  1. Cleaner context - Only current (4.7) documentation, no confusion from old versions
  2. Smaller files - 78% reduction means faster processing
  3. Better organization - Deep hierarchy helps LLMs understand document relationships
  4. Focused information - 317 current docs vs 730+ mixed versions

  Recommendation

  These changes look excellent! The output is:
  - ✅ Much smaller and faster to process
  - ✅ Only includes current documentation
  - ✅ Well-organized with proper hierarchy
  - ✅ No old version confusion

  This is exactly what you want for LLM-friendly documentation. The PR should include these configuration settings in the final version.

@kriszyp
Copy link
Member

kriszyp commented Jan 13, 2026

At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI?
Anyway, this does seem like a simple path, but would love more understanding how it is better.

@github-actions
Copy link

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

@BboyAkers
Copy link
Member Author

At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI? Anyway, this does seem like a simple path, but would love more understanding how it is better.

I'm assuming markdown is preferred due to the lower token consumption vs html. That'd be my only guess 🤷🏾

@kriszyp
Copy link
Member

kriszyp commented Jan 13, 2026

lower token consumption

There is token consumption in GEO? Whose tokens are being consumed?

@Ethan-Arrowood
Copy link
Member

I don't think it has much to do with token consumption but rather the style of the input. These LLMs are only really good at words; they aren't servers. They can't necessarily parse HTML as easily as we assume. So since Markdown is much more readable that is what the LLMs prefer.

@Ethan-Arrowood
Copy link
Member

I also agree though; why can't we just expose the markdown source for our pages rather than something else? But I don't really know what this is all about anyways so maybe its just a latest wave of how we can optimize our site for AI robots.

@kriszyp
Copy link
Member

kriszyp commented Jan 13, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants