AI‑Powered Administrative Document Intelligence (OCR · NER · Layout Understanding · Structured Extraction)
AdminDoc‑X is an end‑to‑end document intelligence platform that understands administrative documents: it reads scans, detects layout, extracts structured fields, and exposes them through a simple API and a modern landing page.
This repository contains both:
- Backend – Flask + LayoutLMv3 + OCR for document classification & field extraction
- Frontend – React + TypeScript + Vite landing page to showcase the platform
- Real‑world AI project combining:
- OCR (Tesseract) + image preprocessing
- Fine‑tuned LayoutLMv3 for NER on administrative documents
- Document type classification + structured field extraction
- Production‑style API: Flask REST API with CORS, ready to be consumed by any client
- Modern frontend stack: React + TypeScript + Vite + Tailwind CSS + shadcn/ui
- Clean architecture:
- Clear separation between frontend and backend
- Trainable model pipeline with dataset preparation and evaluation
- Recruiter‑friendly:
- Demonstrates ML, backend, and frontend skills in one cohesive project
- Shows experience with MLOps‑style workflows (training, evaluation, inference)
Given a scanned administrative document (e.g., review form, report, official form):
-
OCR & Layout Analysis
- Preprocesses the image (OpenCV)
- Runs Tesseract OCR for text extraction and bounding boxes
-
NER with LayoutLMv3
- Uses a fine‑tuned LayoutLMv3 model to detect entities such as:
- Dates
- Authors / people
- Titles
- Reference / registration numbers
- Recommendations & comments
- Uses a fine‑tuned LayoutLMv3 model to detect entities such as:
-
Document Structuring
- Predicts the document type
- Returns a structured JSON with:
- Key fields (registration_number, date, authors, title, recommendation, …)
- Raw OCR preview
- Processing metadata (model used, timestamp, etc.)
-
Frontend Experience
- Landing page explaining pipeline & use cases
- Interactive sections: hero animation, pipeline visualization, before/after, tech stack, demo section
AdminDoc-X/
├── frontend/ # React + TS + Vite landing page
│ ├── public/
│ ├── src/
│ │ ├── components/
│ │ │ ├── ui/ # shadcn/ui primitives
│ │ │ ├── BeforeAfterSection.tsx
│ │ │ ├── DemoSection.tsx
│ │ │ ├── FeaturesSection.tsx
│ │ │ ├── FloatingDocuments.tsx
│ │ │ ├── Footer.tsx
│ │ │ ├── HeroSection.tsx
│ │ │ ├── Navbar.tsx
│ │ │ ├── NavLink.tsx
│ │ │ ├── PipelineSection.tsx
│ │ │ ├── TechStackSection.tsx
│ │ │ └── UseCasesSection.tsx
│ │ ├── hooks/
│ │ │ ├── use-mobile.tsx
│ │ │ └── use-toast.ts
│ │ ├── lib/
│ │ │ └── utils.ts
│ │ ├── pages/
│ │ │ ├── Index.tsx
│ │ │ └── NotFound.tsx
│ │ ├── App.tsx
│ │ ├── main.tsx
│ │ └── index.css
│ ├── components.json
│ ├── tailwind.config.ts
│ ├── tsconfig.json
│ └── vite.config.ts
│
└── backend/ # Flask API + LayoutLMv3 + OCR
├── api.py # Flask REST API
├── model.py # LayoutLMv3 NER inference pipeline
├── ocr_llm_extractor.py # OCR + preprocessing
├── train.py # Model training script
├── prepare_dataset.py # Dataset preparation utilities
├── train_data.jsonl # Training data (sample format)
├── dataset/
│ ├── training_data/
│ │ ├── images/
│ │ └── annotations/
│ └── testing_data/
│ ├── images/
│ └── annotations/
├── models/ # Trained weights (excluded from git)
├── uploads/ # Temporary file storage
├── results_simple.json # Evaluation metrics
├── results_improved.json
└── results_final.jsonNote: Large model weights are excluded via
.gitignore(models/,uploads/,*.pt,*.pth,*.safetensors, etc.).
- Python, Flask, Flask‑CORS
- PyTorch, Transformers (LayoutLMv3), datasets
- Tesseract OCR (
pytesseract) - OpenCV, Pillow, numpy
- Optional CUDA acceleration for faster inference
- React (TypeScript)
- Vite (bundler / dev server)
- Tailwind CSS
- shadcn/ui + Radix UI (accessible UI primitives)
- Lucide React (icons)
- React Hook Form, Zustand, TanStack Query, React Router
- Utility libraries:
clsx,class-variance-authority,date-fns
You can run backend and frontend separately.
- Python 3.8+
- Tesseract OCR
- (Optional) CUDA‑compatible GPU + CUDA drivers
Windows
# Download and install:
# https://github.com/UB-Mannheim/tesseract/wiki
# Then note the path, e.g.:
# C:\Program Files\Tesseract-OCR\tesseract.exeUbuntu / Debian
sudo apt-get update
sudo apt-get install -y tesseract-ocrmacOS (Homebrew)
brew install tesseractcd backend
# Optionally create a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Core dependencies
pip install flask flask-cors pillow pytesseract opencv-python numpy
# PyTorch (adjust CUDA version if needed)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Transformers, datasets, date parsing
pip install transformers datasets dateparserIf Tesseract is not in your PATH, set the path in ocr_llm_extractor.py and model.py:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"PATH_TO_TESSERACT_EXECUTABLE"
# Example (Windows):
# r"C:\Program Files\Tesseract-OCR\tesseract.exe"You can:
- Train your own model using the provided scripts, or
- Place pre‑trained weights in
backend/models/and update the path inmodel.py.
Training from your dataset:
-
Prepare dataset:
cd backend python prepare_dataset.pyYour JSONL should look like:
{ "image": "path/to/image.png", "tokens": ["word1", "word2", "..."], "bboxes": [[x1, y1, x2, y2], "..."], "ner_tags": ["O", "B-DATE", "I-DATE", "..."] } -
Train the LayoutLMv3 model:
python train.py
Default configuration (can be changed inside
train.py):- Model:
microsoft/layoutlmv3-base - Batch size: 2
- Learning rate:
5e-5 - Epochs: 10
- Output:
models/layoutlmv3_trained/
- Model:
-
Evaluate results – check:
results_simple.jsonresults_improved.jsonresults_final.json
cd backend
python api.pyBy default, the server runs at:
http://localhost:5000
- Node.js v18+
npm(comes with Node) orbun
cd frontend
# using npm
npm install
# or using bun
bun installCreate a .env inside frontend/ if you want to call the backend API from the UI:
VITE_API_URL=http://localhost:5000Use in code:
const apiUrl = import.meta.env.VITE_API_URL;cd frontend
# with npm
npm run dev
# or with bun
bun devThe app will be available at:
http://localhost:5173
Once the backend is running on http://localhost:5000:
POST /process
Uploads a document image and returns classification, fields, and OCR preview.
Example (cURL):
curl -X POST http://localhost:5000/process \
-F "file=@document.png"Response:
{
"document_type": "scientific_review_form",
"confidence": 0.92,
"fields": {
"registration_number": "REF-2024-001",
"date": "2024-03-15",
"authors": ["Dr. Smith", "Prof. Johnson"],
"title": "Research Paper Title",
"recommendation": "Accept with minor revisions",
"suggested_revision": "Improve methodology section"
},
"raw_ocr_preview": "Full OCR text...",
"processing_info": {
"ocr_processing": true,
"model_used": "layoutlmv3",
"timestamp": "2024-03-15T10:30:00"
}
}| Entity Type | Description | Example |
|---|---|---|
B-DATE/I-DATE |
Dates | 15/03/2024 |
B-PERSON/I-PERSON |
Names | Dr. John Smith |
B-TITLE/I-TITLE |
Document titles | Annual Report |
B-REF/I-REF |
Reference IDs | REF-2024-001 |
B-REC/I-REC |
Recommendations | Approved |
O |
Other tokens | - |
The frontend showcases AdminDoc‑X through:
- Hero Section with floating documents animation
- Pipeline Visualization explaining OCR → NER → structured output
- Feature Sections describing core capabilities
- Interactive Demo Section (optional wiring to
/processendpoint) - Use Case Section for administrative & enterprise scenarios
- Tech Stack Overview
- Before / After comparison of raw scans vs. structured JSON
- Responsive Design with dark mode support
| Command | Description |
|---|---|
npm run dev |
Start dev server with hot reload |
npm run build |
Production build |
npm run build:dev |
Development‑mode build (if configured) |
npm run lint |
Run ESLint |
npm run preview |
Preview production build |
Typical workflows:
# Start API
python api.py
# Prepare dataset
python prepare_dataset.py
# Train model
python train.pyContributions are welcome!
- Fork the repository
- Create your feature branch
git checkout -b feature/amazing-feature
- Commit your changes
git commit -m "Add amazing feature" - Push to your branch
git push origin feature/amazing-feature
- Open a Pull Request
- Keep components and modules small and focused
- Follow existing TypeScript, Python, and Tailwind patterns
- Update documentation if you change behavior
- Add tests where applicable
- Ensure all checks (lint / tests) pass before submitting
- Aya Mekni
- Tasnim Mtir
- Ikram Menyaoui
- Nour Saibi
- Microsoft LayoutLMv3
- Tesseract OCR
- Hugging Face Transformers
- Vite
- shadcn/ui
- Lucide Icons
- Tailwind CSS
Specify your license here, for example:
This project is licensed under the MIT License – see the LICENSE file for details.
⭐ If you find this project interesting or useful, please consider starring the repository on GitHub.


