Skip to content

parbhatkapila4/RepoDocs

Repository files navigation

RepoDoc: AI infrastructure for understanding code

RepoDoc

The AI infrastructure for understanding code.

Connect any GitHub repository. Ask questions in plain English. Get answers with exact file and line references.

Live DemoGet StartedPricing



The Problem

Developers spend ~80% of their time reading and understanding code rather than writing it.

Onboarding to new codebases takes weeks. Finding where specific logic lives means grep-ing through thousands of files. Documentation is always outdated.

The Solution

RepoDoc indexes your entire codebase into a vector database, then lets you query it conversationally with AI.

  • Ask "How does authentication work?" → Get the answer with links to src/lib/auth.ts:45-89
  • Ask "Where are API rate limits configured?" → Instantly see the relevant files
  • Generate production-ready READMEs and technical docs in one click

No more digging through files. No more outdated wikis. Just ask.


Design Philosophy

RepoDoc is engineering infrastructure for understanding code, not just an AI chatbot. A few principles shape how it works:

  • Retrieval before generation: Relevant code is retrieved first; the LLM answers from that context to reduce hallucination.
  • Structured semantic memory: Repo memory stores durable knowledge (concepts, decisions, relationships) with embeddings, instead of relying only on raw chat history.
  • Operational observability: Every AI request is recorded (route, model, tokens, retrieval/memory counts, latency, cost, success/failure) so the system is auditable and debuggable.
  • Explicit cost awareness: Token usage and estimated cost are tracked per request; optional per-project budget limits and threshold alerts keep cost predictable.
  • Deterministic model fallback: A clear strategy for which model is used (e.g. primary vs fallback) so behavior is predictable under rate limits or outages.
  • Layered separation: Indexing (ingestion, summarization, embedding), retrieval (vector search, memory search), and reasoning (LLM) are separate; each layer can be understood and evolved independently.

How It Works

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│   1. CONNECT          2. INDEX              3. QUERY                    │
│   ───────────         ─────────             ─────────                   │
│   Paste your          Every file gets       Ask anything.               │
│   GitHub URL          summarized, embedded  RAG retrieves relevant      │
│                       and stored in         code + LLM generates        │
│                       PostgreSQL/pgvector   answers with citations      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Under the hood:

  1. Ingestion → LangChain's GithubRepoLoader pulls all files from your repo
  2. Summarization → Each file is summarized by Gemini to capture its purpose
  3. Embedding → Summaries are converted to 768-dim vectors using text-embedding-004
  4. Storage → Vectors stored in PostgreSQL with pgvector extension for similarity search
  5. Retrieval → When you ask a question, we embed your query and find the top 5 most similar code chunks
  6. Generation → Retrieved context + your question → Gemini 2.5 Flash generates a detailed answer

Features

Intelligence Layer

💬 Conversational Code Search: Chat with your codebase like you'd chat with a senior engineer who knows every line. Ask follow-up questions. Get code snippets with syntax highlighting. See exactly which files informed each answer.

📄 One-Click Documentation: Generate comprehensive technical documentation from your codebase automatically. The AI analyzes your code structure, patterns, and architecture to produce docs that actually reflect your implementation.

📝 README Generation: Get professional README files generated from your code. Includes proper sections for installation, usage, API references, and more, all inferred from your actual implementation.

📊 Repository Analytics: Visualize your codebase at a glance:

  • Language distribution with percentages
  • File counts and project metrics
  • Stars, forks, and activity from GitHub
  • Dependency insights

🔗 Shareable Documentation: Generate public links to share your documentation with teammates, contributors, or the world. Each link is tokenized and can be revoked anytime.

🔄 Iterative Refinement: Don't like something in the generated docs? Ask the AI to modify it. "Add a troubleshooting section" or "Update the API examples" , the docs evolve through conversation.

🏗️ Architecture View: Explore your codebase as a high-level architecture map. The AI analyzes your repo structure and surfaces modules, dependencies, and entry points so you can understand how the system is organized at a glance.

📋 Diff Analysis: Paste or upload a diff and get AI-powered analysis: what changed, impact, and suggestions. Supports query, diff, and architecture route types with token and cost tracking.

🧠 Repo Memory: RAG can use stored repo memories (semantic chunks with embeddings) for better context. Memory hit counts and retrieval counts are tracked for observability.

Operational Layer

📊 Query Metrics (Observability): AI query observability is stored per request: route type (query / diff / architecture), model used, token counts, retrieval and memory hits, latency, estimated cost, and success/error. Indexed by project and time for analytics.

Cost tracking: Token usage and estimated cost per request; 7-day cost breakdown by route type and 30-day rolling view.

Budget guardrails: Optional per-project budget limits and threshold alerts (warning / limit exceeded).

Health status: Per-project status (healthy / warning / critical) for monitoring.

Cold start detection: First query or long idle gap is flagged so latency spikes are explainable.

Cache metrics: Cache-hit detection when a cached answer is served; visibility into cache effectiveness.

Infrastructure Layer

⚙️ Background Indexing: Indexing runs as a serverless job queue (Vercel cron + Postgres leasing). No blocking on project create or regenerate: jobs are queued, processed by a worker, and report progress. Retry and cancel from the UI.

Model fallback: Primary model (e.g. Gemini) with deterministic fallback (e.g. OpenRouter) under rate limits or outages so behavior is predictable.


Tech Stack

Layer Technology
Framework Next.js 16 (App Router, React 18)
Language TypeScript 5
Styling Tailwind CSS 4.1, Radix UI
State Redux Toolkit
Database PostgreSQL + pgvector
ORM Prisma 6
AI/LLM Google Gemini 2.5 Flash, OpenRouter
Embeddings text-embedding-004 (768 dimensions)
Auth Clerk
Payments Stripe
Forms React Hook Form + Zod
Animation Motion (Framer Motion)
Testing Jest, React Testing Library
Deployment Vercel

Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                              NEXT.JS APP ROUTER                              │
│                                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │    Chat     │  │  Dashboard  │  │    Docs     │  │      README         │  │
│  │   Page      │  │    Page     │  │    Page     │  │    Generation       │  │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────────┬──────────┘  │
│  ┌──────┴──────┐  ┌──────┴──────┐                                            │
│  │ Architecture│  │    Diff     │  ← Architecture view, diff analysis        │
│  │   Page      │  │   Page      │                                            │
│  └──────┬──────┘  └──────┬──────┘                                            │
│         └────────────────┴───────────────────────────────────────────────────┘
│                                    │                                         │
│                           ┌────────▼──────────────┐                          │
│                           │   API Routes          │                          │
│                           │                       │                          │
│                           │  /api/query           │ ← RAG Pipeline           │
│                           │  /api/search          │ ← Vector Search          │
│                           │  /api/analytics       │ ← Metrics Aggregation    │
│                           │  /api/architecture    │ ← Architecture extraction│
│                           │  /api/analyze-diff    │ ← Diff analysis          │
│                           │  /api/indexing-worker │ ← Cron job (indexing)    │
│                           └────────┬──────────────┘                          │
└────────────────────────────────────┼─────────────────────────────────────────┘
                                     │
          ┌──────────────────────────┼──────────────────────────┐
          │                          │                          │
          ▼                          ▼                          ▼
┌─────────────────────┐  ┌─────────────────────┐  ┌─────────────────────────┐
│    PostgreSQL       │  │    GitHub API       │  │      AI Services        │
│    + pgvector       │  │    (Octokit)        │  │                         │
│                     │  │                     │  │  • Gemini 2.5 Flash     │
│  • Users            │  │  • Repo metadata    │  │  • text-embedding-004   │
│  • Projects         │  │  • File contents    │  │  • OpenRouter fallback  │
│  • Embeddings       │  │  • Languages        │  │                         │
│  • Docs/READMEs     │  │  • Stats            │  │                         │
│  • Share tokens     │  │                     │  │                         │
│  • RepoMemory       │  │                     │  │                         │
│  • IndexingJob      │  │                     │  │                         │
│  • QueryMetrics     │  │                     │  │                         │
└─────────────────────┘  └─────────────────────┘  └─────────────────────────┘

Operational Model

AI is treated as a production system. Each AI request is recorded with: route type (query / diff / architecture), model used, prompt and completion tokens, retrieval count and memory hit count, latency, estimated cost, success or failure, cold-start detection (first query or long idle gap), and cache-hit detection when a cached answer is served.

Per-project observability (via the observability API and UI) includes: 7-day cost breakdown by route type, 30-day rolling cost and budget tracking, budget threshold alerts (warning / limit exceeded), error-rate monitoring, health status (healthy / warning / critical), and memory quality metrics (e.g. hit rate, average similarity). This is how the system is run and debugged, not a sales pitch.


Database Schema

model User {
  id           String    @id @default(uuid())
  emailAddress String    @unique
  credits      Int       @default(150)
  plan         String    @default("starter") 
  projects     Project[]
}

model Project {
  id                   String                 @id @default(uuid())
  name                 String
  repoUrl              String
  userId               String
  sourceCodeEmbeddings SourceCodeEmbeddings[]
  docs                 Docs?
  readme               Readme?
}

model SourceCodeEmbeddings {
  id               String                  @id @default(uuid())
  fileName         String
  sourceCode       String
  Summary          String
  summaryEmbedding Unsupported("vector")? 
  projectId        String
}

model Docs {
  id          String      @id @default(uuid())
  content     String
  projectId   String      @unique
  qnaHistory  DocsQna[]
  publicShare DocsShare?
}

model Readme {
  id          String       @id @default(uuid())
  content     String
  projectId   String       @unique
  qnaHistory  ReadmeQna[]
  publicShare ReadmeShare?
}

// Additional models: RepoMemory (RAG memory + embeddings), IndexingJob (queue + status),
// QueryMetrics (per-request observability: routeType, tokens, latency, cost, success).

Tradeoffs & Constraints

Embedding similarity has limitations: semantic match is not perfect, and very similar phrasing can rank higher than conceptually relevant but differently worded code. Context window and retrieval depth are limited, we send a bounded number of chunks. There is a latency vs retrieval-depth tradeoff: more chunks improve coverage but increase latency and cost. Repo memory can drift after major refactors (stale facts until re-indexing or new Q&A). Cost vs model quality is a tradeoff (e.g. cheaper vs more capable models). Architecture inference is best-effort (e.g. static import analysis; dynamic or runtime behavior may be missed).

Mitigations in place: top-k retrieval caps, similarity-based ranking, explicit labeling of memory vs code context in prompts, in-memory query cache for repeated questions, and budget guardrails.


Known Failure Modes

Stale or misleading memory after large refactors. Architecture view missing dynamically loaded or generated imports. Diff analysis is advisory, impact and risk are suggestions, not authoritative. Cold starts after indexing or long idle periods cause higher latency. Model rate limiting can lead to fallback or errors. Heavy indexing of large repos can increase latency or load. Observability (query metrics, error rate, health status, cold-start and cache metrics) is in place to surface these conditions.


Scaling Strategy

Evolution path at higher scale: async embedding pipelines so indexing does not block requests; batched embedding jobs for efficiency; sharded or dedicated vector storage if one database becomes a bottleneck; horizontal scaling of indexing workers; background memory compaction or pruning; more aggressive or distributed caching; model tiering (e.g. cheaper models for simple queries, premium for complex ones). This is an evolution path; not all of it is implemented today.


RAG Pipeline

The core intelligence lives in src/lib/rag.ts:

// 1. Embed the user's question
const queryEmbedding = await getGenerateEmbeddings(query);

// 2. Vector similarity search with pgvector
const results = await prisma.$queryRaw`
  SELECT 
    "fileName",
    "sourceCode",
    "Summary",
    1 - ("summaryEmbedding" <=> ${queryEmbedding}::vector) as similarity
  FROM "SourceCodeEmbeddings"
  WHERE "projectId" = ${projectId}
  ORDER BY "summaryEmbedding" <=> ${queryEmbedding}::vector
  LIMIT 5
`;

// 3. Build context from retrieved chunks
const codeContext = results
  .map(
    (code, idx) => `
  [Source ${idx + 1}: ${code.fileName}] (Relevance: ${(code.similarity * 100).toFixed(1)}%)
  Summary: ${code.summary}
  Code: ${code.sourceCode.slice(0, 1000)}
`,
  )
  .join("\n\n");

// 4. Generate answer with Gemini
const answer = await openrouterChatCompletion({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "system", content: systemPrompt + codeContext },
    ...conversationHistory,
    { role: "user", content: question },
  ],
  temperature: 0.3,
});

Getting Started

Prerequisites

  • Node.js 20+
  • PostgreSQL with pgvector extension
  • GitHub account
  • Clerk account
  • Google AI API key (Gemini)

Installation

# Clone the repository
git clone https://github.com/parbhatkapila4/repodoc.git
cd repodoc

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env

Environment Variables

# Database
DATABASE_URL="postgresql://user:password@localhost:5432/repodoc"

# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_...
CLERK_SECRET_KEY=sk_...
CLERK_WEBHOOK_SECRET=whsec_...

# AI Services
GOOGLE_API_KEY=your_gemini_api_key
OPENROUTER_API_KEY=your_openrouter_key

# GitHub
GITHUB_TOKEN=ghp_...

# Stripe (Optional)
STRIPE_SECRET_KEY=sk_...
STRIPE_WEBHOOK_SECRET=whsec_...

# App
NEXT_PUBLIC_APP_URL=http://localhost:3000

Database Setup

# Generate Prisma client
npm run db:generate

# Run migrations
npm run db:migrate

# (Optional) Open Prisma Studio
npm run db:studio

Development

# Start development server
npm run dev

# Run tests
npm test

# Type check
npm run type-check

# Lint
npm run lint

Project Structure

repodoc/
├── src/
│   ├── app/
│   │   ├── (app)/              # Landing page
│   │   ├── (auth)/             # Sign in/up, user sync
│   │   ├── (protected)/        # Authenticated routes
│   │   │   ├── chat/           # AI chat interface
│   │   │   ├── dashboard/      # Project management
│   │   │   ├── docs/           # Documentation viewer
│   │   │   ├── readme/         # README editor
│   │   │   ├── analytics/      # Platform analytics
│   │   │   ├── search/         # Semantic search
│   │   │   ├── architecture/   # Architecture view
│   │   │   └── diff/           # Diff analysis
│   │   └── api/
│   │       ├── query/          # RAG endpoint
│   │       ├── search/         # Vector search
│   │       ├── analytics/      # Metrics API
│   │       ├── architecture/   # Architecture extraction
│   │       ├── analyze-diff/   # Diff analysis
│   │       ├── indexing-worker/# Cron-triggered indexing job
│   │       ├── create-checkout/# Stripe checkout
│   │       └── webhooks/       # Clerk & Stripe webhooks
│   ├── components/
│   │   ├── ui/                 # Radix-based primitives
│   │   └── landing/            # Marketing components
│   ├── lib/
│   │   ├── rag.ts              # RAG implementation
│   │   ├── github.ts           # GitHub integration
│   │   ├── gemini.ts           # AI embeddings & generation
│   │   ├── openrouter.ts       # LLM fallback
│   │   ├── prisma.ts           # Database client
│   │   ├── actions.ts          # Server actions
│   │   ├── actions-indexing.ts # Indexing job actions (status, retry, cancel)
│   │   ├── rate-limiter.ts     # API protection
│   │   ├── memory.ts           # Repo memory (RAG)
│   │   ├── architecture.ts    # Architecture extraction
│   │   ├── diff.ts             # Diff analysis
│   │   └── redis.ts            # Upstash Redis (locking utilities)
│   └── hooks/                  # Custom React hooks
├── prisma/
│   ├── schema.prisma           # Database schema
│   └── migrations/             # Migration history
└── __tests__/                  # Jest test suites

Pricing

Plan Price Projects Features
Starter $10/mo 3 AI chat, README generation, docs generation, basic analytics
Professional $20/mo 10 Everything in Starter + public sharing, priority processing, email support
Enterprise $49/mo Unlimited Everything in Professional + team features, SLA, custom integrations

API Reference

POST /api/query

Query your codebase with natural language.

// Request
{
  "projectId": "uuid",
  "question": "How does authentication work?",
  "conversationHistory": [
    { "role": "user", "content": "previous question" },
    { "role": "assistant", "content": "previous answer" }
  ]
}

// Response
{
  "answer": "Authentication in this codebase is handled by...",
  "sources": [
    {
      "fileName": "src/lib/auth.ts",
      "similarity": 0.89,
      "summary": "Handles user authentication..."
    }
  ]
}

POST /api/search

Semantic search across your codebase.

// Request
{
  "projectId": "uuid",
  "query": "rate limiting",
  "limit": 10
}

// Response
{
  "results": [
    {
      "fileName": "src/lib/rate-limiter.ts",
      "sourceCode": "...",
      "summary": "...",
      "similarity": 0.92
    }
  ]
}

Security

  • Authentication: All routes protected by Clerk middleware
  • Input validation: Zod schemas on all API inputs
  • SQL injection: Prevented by Prisma parameterized queries
  • Rate limiting: Token bucket algorithm on API endpoints
  • XSS protection: Next.js built-in escaping
  • CSRF protection: Same-origin verification

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

# Fork the repo
git checkout -b feature/your-feature
git commit -m "Add your feature"
git push origin feature/your-feature
# Open a PR

Built by Parbhat Kapila

WebsiteTwitterEmail

About

An AI-Powered Code Documentation Platform Automated documentation engine with hybrid search (BM25 + vector). Transformed 200+ repos into queryable knowledge bases, cutting onboarding time 80%

Topics

Resources

Contributing

Stars

Watchers

Forks

Languages