KRLabsOrg · adaamko · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,44 @@
+name: CI
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install ruff
+      - run: ruff check packages/core/verbatim_core/ verbatim_rag/ api/ tests/
+      - run: ruff format --check packages/core/verbatim_core/ verbatim_rag/ api/ tests/
+
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - run: pip install pytest pytest-asyncio
+      - run: pip install -e packages/core/
+      - run: pytest tests/ -v
+
+  pip-audit:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install pip-audit
+      - run: pip install packages/core/
+      - run: pip-audit
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,25 @@
+name: Deploy Docs
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - "docs/**"
+      - "mkdocs.yml"
+      - "packages/core/**"
+  workflow_dispatch:
+
+permissions:
+  contents: write
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install mkdocs-material "mkdocstrings[python]>=0.24"
+      - run: pip install -e packages/core/
+      - run: mkdocs gh-deploy --force
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,40 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+## [0.2.0] - 2026-03-10
+
+### Added
+- **verbatim-core** package: lean, separately installable extraction core (openai + pydantic only)
+- Optional `[model]` extra for verbatim-core: `pip install verbatim-core[model]` for ModernBERT/Zilliz extractors
+- CI pipeline: linting (ruff), tests (Python 3.10-3.12), pip-audit
+- MkDocs Material documentation site with API reference
+- 88 unit tests covering models, response builder, LLM client, extractors, templates, and transform
+- CONTRIBUTING.md
+
+### Changed
+- Moved `verbatim_core/` into `packages/core/` (standard Python monorepo layout)
+- `verbatim-rag` now depends on `verbatim-core` (transitive dependency)
+- Lazy imports for numpy, torch, and transformers in verbatim_core to keep lean install clean
+- Import sorting and linting fixes across verbatim_core, verbatim_rag, api, and tests
+
+### Removed
+- Old test files that required heavy dependencies (torch, milvus)
+
+## [0.1.9] - 2026-02-28
+
+### Added
+- `max_tokens` parameter to `SemanticHighlightExtractor`
+- Flag to return search results along with RAG response
+- `verify_spans` parameter to `LLMSpanExtractor`
+
+## [0.1.8] - 2026-01-15
+
+### Added
+- `SemanticHighlightExtractor` using Zilliz semantic-highlight model
+- Reranker support (Cohere, Jina, SentenceTransformers)
+- Intent detection for query routing
+- `VerbatimDOC` for document-level processing
+- `UniversalDocument` for unified document representation
+- Hybrid search (dense + sparse) support
+- SPLADE embedding provider for CPU-only operation
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,65 @@
+# Contributing to Verbatim RAG
+
+Thanks for your interest in contributing!
+
+## Development Setup
+
+```bash
+git clone https://github.com/KRLabsOrg/verbatim-rag.git
+cd verbatim-rag
+pip install -e packages/core/
+pip install -e ".[dev]"
+```
+
+## Project Structure
+
+```
+packages/core/verbatim_core/   # Lean extraction package (verbatim-core on PyPI)
+verbatim_rag/                  # Full RAG pipeline (verbatim-rag on PyPI)
+api/                           # FastAPI server
+frontend/                      # React UI
+tests/                         # Tests (run against verbatim-core only)
+docs/                          # MkDocs documentation
+```
+
+## Running Tests
+
+```bash
+pytest tests/ -v
+```
+
+Tests only depend on `verbatim-core` (openai + pydantic). All LLM calls are mocked.
+
+## Linting
+
+```bash
+ruff check packages/core/verbatim_core/ verbatim_rag/ api/ tests/
+ruff format packages/core/verbatim_core/ verbatim_rag/ api/ tests/
+```
+
+## Making Changes
+
+1. Fork the repo and create a branch from `main`
+2. Make your changes
+3. Run tests and linting
+4. Open a pull request against `main`
+
+CI runs lint, tests (Python 3.10-3.12), and pip-audit on all PRs.
+
+## Releasing
+
+Both packages share the same version number. To release:
+
+1. Bump version in `packages/core/pyproject.toml` and `pyproject.toml`
+2. Update `CHANGELOG.md`
+3. Commit, tag, and push: `git tag v0.x.y && git push --tags`
+4. Create a GitHub release from the tag
+5. Publish to PyPI (extractor first, then rag):
+   ```bash
+   cd packages/core && python -m build && twine upload dist/*
+   cd ../.. && python -m build && twine upload dist/*
+   ```
+
+## License
+
+By contributing, you agree that your contributions will be licensed under the MIT License.
diff --git a/README.md b/README.md
@@ -29,6 +29,37 @@ With this approach, **the whole RAG pipeline can be run without any usage of LLM
 pip install verbatim-rag
 ```
 
+For local development:
+
+```bash
+pip install -e packages/core/
+pip install -e .
+```
+
+## Lightweight Core
+
+If you only need the reusable verbatim core without the full RAG pipeline (no torch, transformers, or Milvus):
+
+```bash
+pip install verbatim-core
+```
+
+```python
+from verbatim_core import VerbatimTransform
+
+vt = VerbatimTransform()
+response = vt.transform(
+    question="What is the main finding?",
+    context=[
+        {"content": "The study found that X leads to Y.", "title": "Paper A"},
+        {"content": "Results show Z is significant.", "title": "Paper B"},
+    ],
+)
+print(response.answer)
+```
+
+Dependencies: only `openai` and `pydantic`.
+
 ## Quick Start
 
 ```python
@@ -200,4 +231,4 @@ If you use Verbatim RAG in your research, please cite our paper:
     ISBN = "979-8-89176-276-3",
     abstract = "We present a lightweight, domain{-}agnostic verbatim pipeline for evidence{-}grounded question answering. Our pipeline operates in two steps: first, a sentence-level extractor flags relevant note sentences using either zero-shot LLM prompts or supervised ModernBERT classifiers. Next, an LLM drafts a question-specific template, which is filled verbatim with sentences from the extraction step. This prevents hallucinations and ensures traceability. In the ArchEHR{-}QA 2025 shared task, our system scored 42.01{\%}, ranking top{-}10 in core metrics and outperforming the organiser{'}s 70B{-}parameter Llama{-}3.3 baseline. We publicly release our code and inference scripts under an MIT license."
 }
-```
+```
diff --git a/api/app.py b/api/app.py
@@ -7,9 +7,9 @@
 import os
 import sys
 from contextlib import asynccontextmanager
-from typing import Annotated, Optional, Any
+from typing import Annotated, Any, Optional
 
-from fastapi import FastAPI, HTTPException, Depends
+from fastapi import Depends, FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import StreamingResponse as FastAPIStreamingResponse
 from pydantic import BaseModel, Field
@@ -22,22 +22,21 @@
 
 try:
     from verbatim_rag import (
-        QueryRequest,
         QueryResponse,
-        VerbatimRAG,
-        TemplateManager,
         StreamingRAG,
+        TemplateManager,
+        VerbatimRAG,
     )
 except ImportError as e:
     print(f"Error importing verbatim_rag: {e}")
     sys.exit(1)
 
 from api.config import APIConfig, get_config
 from api.dependencies import (
+    check_system_ready,
     get_api_service,
     get_rag_instance,
     get_template_manager,
-    check_system_ready,
 )
 from api.services.rag_service import APIService
 
@@ -162,9 +161,7 @@ async def get_documents(
                     )
             else:
                 # Fallback: return a message indicating documents are indexed but not retrievable
-                logger.info(
-                    "Documents are indexed but document metadata retrieval not implemented"
-                )
+                logger.info("Documents are indexed but document metadata retrieval not implemented")
 
             return {"documents": documents}
         else:
@@ -279,7 +276,7 @@ async def verbatim_transform_endpoint(request: VerbatimTransformRequest):
 
 
 @app.post("/api/query/async", response_model=QueryResponse)
-async def query_async_endpoint(
+async def query_async_slash_endpoint(
     request: QueryRequestModel,
     api_service: Annotated[APIService, Depends(get_api_service)],
     _: Annotated[bool, Depends(check_system_ready)],
@@ -369,9 +366,7 @@ async def generate_clean_response():
                     search_params=request.search_params,
                 ):
                     stage_count += 1
-                    logger.info(
-                        f"Yielding stage {stage_count}: {stage.get('type', 'unknown')}"
-                    )
+                    logger.info(f"Yielding stage {stage_count}: {stage.get('type', 'unknown')}")
                     yield json.dumps(stage) + "\n"
 
                 if stage_count == 0:
@@ -392,9 +387,7 @@ async def generate_clean_response():
                 import traceback
 
                 traceback.print_exc()
-                yield (
-                    json.dumps({"type": "error", "error": str(e), "done": True}) + "\n"
-                )
+                yield (json.dumps({"type": "error", "error": str(e), "done": True}) + "\n")
 
         # Return streaming response with proper headers
         return FastAPIStreamingResponse(

diff --git a/api/config.py b/api/config.py
@@ -3,8 +3,9 @@
 """
 
 from pathlib import Path
-from pydantic_settings import BaseSettings
+
 from pydantic import Field
+from pydantic_settings import BaseSettings
 
 
 class APIConfig(BaseSettings):
@@ -16,9 +17,7 @@ class APIConfig(BaseSettings):
     debug: bool = Field(default=False, env="API_DEBUG")
 
     # CORS configuration
-    cors_origins: list[str] = Field(
-        default=["http://localhost:3000"], env="CORS_ORIGINS"
-    )
+    cors_origins: list[str] = Field(default=["http://localhost:3000"], env="CORS_ORIGINS")
     cors_allow_credentials: bool = Field(default=True, env="CORS_ALLOW_CREDENTIALS")
 
     # RAG system paths

diff --git a/api/dependencies.py b/api/dependencies.py
@@ -4,13 +4,13 @@
 
 import logging
 from typing import Annotated
+
 from fastapi import Depends, HTTPException
-from verbatim_rag.core import VerbatimRAG
 from verbatim_core.templates import TemplateManager
-from verbatim_rag.core import LLMClient
 
 from api.config import APIConfig, get_config
 from api.services.rag_service import APIService
+from verbatim_rag.core import LLMClient, VerbatimRAG
 
 logger = logging.getLogger(__name__)
 
@@ -27,33 +27,31 @@ def get_rag_instance(config: Annotated[APIConfig, Depends(get_config)]) -> Verba
 
     if _rag_instance is None:
         try:
+            from verbatim_rag.embedding_providers import SentenceTransformersProvider
             from verbatim_rag.index import VerbatimIndex
             from verbatim_rag.vector_stores import LocalMilvusStore
-            from verbatim_rag.embedding_providers import SpladeProvider
 
             llm_client = LLMClient(
-                model="gpt-4o-mini",
-                temperature=1.0,
+                model="moonshotai/kimi-k2-instruct-0905",
+                api_base="https://api.groq.com/openai/v1/",
             )
 
-            # Create providers
-            sparse_provider = SpladeProvider(
-                model_name="opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill",
+            dense_provider = SentenceTransformersProvider(
+                model_name="ibm-granite/granite-embedding-small-english-r2",
                 device="cpu",
             )
 
             # Create vector store
             vector_store = LocalMilvusStore(
                 db_path=str(config.index_path),
-                collection_name="verbatim_rag",
-                enable_dense=False,
-                enable_sparse=True,
+                collection_name="acl",
+                enable_dense=True,
+                enable_sparse=False,
+                dense_dim=dense_provider.get_dimension(),
             )
 
             # Create index
-            index = VerbatimIndex(
-                vector_store=vector_store, sparse_provider=sparse_provider
-            )
+            index = VerbatimIndex(vector_store=vector_store, dense_provider=dense_provider)
 
             # Create RAG instance with the index
             _rag_instance = VerbatimRAG(