CrawlAI RAG

CrawlAI RAG is an AI-powered website intelligence platform that allows users to crawl entire websites, index their content, and ask natural-language questions using Retrieval-Augmented Generation (RAG).

It transforms static websites into queryable knowledge bases.

Key Features

Website Crawling

Crawls all internal pages of a website
Extracts clean, readable text

RAG-Based Question Answering

Uses vector embeddings + LLM
Answers are grounded in website content
Minimizes hallucinations

Multi-Website Indexing

Index multiple websites
All content stored in a shared vector database

Fast & Scalable Backend

Built with FastAPI
ChromaDB for vector storage

Simple Frontend

Built with Streamlit
Clean, single-query interface

Secure Configuration

Environment variables via .env
API keys are never committed to GitHub

Tech Stack

Layer	Technology
Backend	FastAPI
Frontend	Streamlit
AI / RAG	LangChain
Vector Database	ChromaDB
Embeddings	Sentence-Transformers
LLM	Groq (LLaMA 3.3 70B)
Web Scraping	BeautifulSoup4 & Playwright
Configuration	python-dotenv

Usage Guide

1. Index a Website

Enter a website URL
Click Index Website
Website content is crawled, chunked, and embedded

2. Ask Questions

Ask natural-language questions such as:

What is this website about?
List all services mentioned
Who is the author?

The system returns accurate, grounded answers based only on the indexed website content.

How It Works

Website is crawled and text is extracted
Text is split into manageable chunks
Embeddings are generated and stored in ChromaDB
User query retrieves the most relevant chunks
LLM generates an answer using retrieved context

This is true Retrieval-Augmented Generation (RAG).

Use Cases

Website documentation Q&A
Internal knowledge bases
Research and analysis
Client website intelligence
Portfolio / demo RAG application

Author

CrawlAI RAG
Built by Ankit Kumar Nayak

Support

If you like this project:

Give it a star
Fork it
Contribute or suggest improvements

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
rag		rag
scraper		scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrawlAI RAG

Key Features

Website Crawling

RAG-Based Question Answering

Multi-Website Indexing

Fast & Scalable Backend

Simple Frontend

Secure Configuration

Tech Stack

Usage Guide

1. Index a Website

2. Ask Questions

How It Works

Use Cases

Author

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrawlAI RAG

Key Features

Website Crawling

RAG-Based Question Answering

Multi-Website Indexing

Fast & Scalable Backend

Simple Frontend

Secure Configuration

Tech Stack

Usage Guide

1. Index a Website

2. Ask Questions

How It Works

Use Cases

Author

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages