-
Notifications
You must be signed in to change notification settings - Fork 9
Added a plugin to make the documentation LLM friendly #413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚀 Preview DeploymentYour preview deployment is ready! 🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413 This preview will update automatically when you push new commits. |
| "@docusaurus/theme-search-algolia": "3.9.1", | ||
| "@easyops-cn/docusaurus-search-local": "0.52.1", | ||
| "@mdx-js/react": "3.1.1", | ||
| "@signalwire/docusaurus-plugin-llms-txt": "^1.2.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets pin this version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a process of ensuring this stays up-to-date? Does that process provide some safety when combined with pinning? I'd think that we want to keep this pretty up-to-date, and like always, defaulting to pinning seems like a risky choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same as our other repos that are relying on socket / renovate to update dependencies automatically. I just noticed many of the other dependencies here are pinned so wanted to stick to that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't noticed any socket or renovate PRs and https://github.com/HarperFast/documentation/commits/main/package.json looks pretty quiet, which makes me question this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - we don't have those tools set up here yet, but we have pinned dependency versions. Let's get the tools enabled asap
| siteTitle: 'Harper Documentation', | ||
| siteDescription: 'Comprehensive guide to developing on and using the Harper platform', | ||
| depth: 2, | ||
| content: { | ||
| includeBlog: true, | ||
| includePages: true, | ||
| enableLlmsFullTxt: true // Optional: generates llms-full.txt | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include some more comments or add a section to a markdown doc on what this configuration all means? Like what is the depth 2? Why not 3? And what about include Blog? We don't really have a blog here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along those lines, how did the organization vary and improve at 2 vs 3? Claude is recommending 3 for docs (for better organization - in its 'opinion' 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, nvm, I see the comment now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's claude's analysis after changing the depth to 3 and setting includeVersionedDocs: false:
⏺ Perfect! The changes are working beautifully. Here's the comparison:
📊 Before vs After Comparison
File Size Reduction
┌────────────────────────────┬─────────┬────────┬──────────────────────────┐
│ Metric │ Before │ After │ Change │
├────────────────────────────┼─────────┼────────┼──────────────────────────┤
│ llms.txt │ 220KB │ 67KB │ ✅ 69% smaller │
├────────────────────────────┼─────────┼────────┼──────────────────────────┤
│ llms-full.txt │ 5.4MB │ 1.2MB │ ✅ 78% smaller │
├────────────────────────────┼─────────┼────────┼──────────────────────────┤
│ Line count (llms.txt) │ 1,083 │ 1,528 │ More organized structure │
├────────────────────────────┼─────────┼────────┼──────────────────────────┤
│ Line count (llms-full.txt) │ 120,078 │ 27,732 │ ✅ 77% reduction │
└────────────────────────────┴─────────┴────────┴──────────────────────────┘
Content Filtering
┌─────────────────────────────┬────────┬───────┬──────────────────────────┐
│ Metric │ Before │ After │ Result │
├─────────────────────────────┼────────┼───────┼──────────────────────────┤
│ Documents processed │ ~730+ │ 317 │ ✅ Only current version │
├─────────────────────────────┼────────┼───────┼──────────────────────────┤
│ Old version links (4.1-4.6) │ 610 │ 0 │ ✅ Completely removed │
├─────────────────────────────┼────────┼───────┼──────────────────────────┤
│ Current version links │ 114 │ 114 │ ✅ Preserved │
├─────────────────────────────┼────────┼───────┼──────────────────────────┤
│ Routes excluded │ 0 │ 616 │ ✅ Old versions filtered │
└─────────────────────────────┴────────┴───────┴──────────────────────────┘
Hierarchy Structure with depth: 3
The depth: 3 setting creates a much better organized structure:
## docs ← Level 2: Main section
### administration ← Level 3: Category
#### harper-studio ← Level 4: Subcategory
- [Create Account](...)
- [Login](...)
- [Manage Databases](...)
#### logging ← Level 4: Subcategory
- [Standard Logging](...)
- [Audit Logging](...)
Hierarchy breakdown:
- 5 Level 2 sections (## - top level: docs, fabric, learn, etc.)
- 21 Level 3 sections (### - categories: administration, developers, deployments)
- 219 Level 4 sections (#### - subcategories: harper-studio, logging, cloning)
Why depth: 3 is Perfect
With your URL structure like /docs/administration/harper-studio/create-account:
- depth: 1 → Would group by /docs only (too broad)
- depth: 2 → Would group by /docs/administration (okay, but loses subcategory structure)
- depth: 3 ✅ → Groups by /docs/administration/harper-studio (ideal - preserves logical structure)
Benefits for LLMs
1. Cleaner context - Only current (4.7) documentation, no confusion from old versions
2. Smaller files - 78% reduction means faster processing
3. Better organization - Deep hierarchy helps LLMs understand document relationships
4. Focused information - 317 current docs vs 730+ mixed versions
Recommendation
These changes look excellent! The output is:
- ✅ Much smaller and faster to process
- ✅ Only includes current documentation
- ✅ Well-organized with proper hierarchy
- ✅ No old version confusion
This is exactly what you want for LLM-friendly documentation. The PR should include these configuration settings in the final version.
|
At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI? |
🚀 Preview DeploymentYour preview deployment is ready! 🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413 This preview will update automatically when you push new commits. |
I'm assuming markdown is preferred due to the lower token consumption vs html. That'd be my only guess 🤷🏾 |
There is token consumption in GEO? Whose tokens are being consumed? |
|
I don't think it has much to do with token consumption but rather the style of the input. These LLMs are only really good at words; they aren't servers. They can't necessarily parse HTML as easily as we assume. So since Markdown is much more readable that is what the LLMs prefer. |
|
I also agree though; why can't we just expose the markdown source for our pages rather than something else? But I don't really know what this is all about anyways so maybe its just a latest wave of how we can optimize our site for AI robots. |
No description provided.