Skip to content

Add MarkCrawl - web crawler for RAG pipelines#12

Open
AIMLPM wants to merge 1 commit into
coree:mainfrom
AIMLPM:add-markcrawl
Open

Add MarkCrawl - web crawler for RAG pipelines#12
AIMLPM wants to merge 1 commit into
coree:mainfrom
AIMLPM:add-markcrawl

Conversation

@AIMLPM
Copy link
Copy Markdown

@AIMLPM AIMLPM commented Apr 4, 2026

Summary

Adds MarkCrawl to the Tools section.

MarkCrawl is an MIT-licensed Python tool for the full RAG ingestion pipeline:

  • Crawl any website into clean Markdown + JSONL index
  • Chunk with configurable word count and overlap
  • Embed via OpenAI embeddings
  • Upload to Supabase/pgvector for semantic search

Also includes LLM-powered structured extraction, MCP server, LangChain tools, and auto-citation on every output.

Install: pip install markcrawl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant