A modern, AI-powered platform for uploading, searching, classifying, and analyzing documents in the cloud.
π Try it online now: Cloud Document Analytics Platform
- Features Overview
- Feature Comparison Table
- Screenshots
- How It Works
- Installation & Setup
- Usage Guide
- Tech Stack
- Security
- Contributing
- License
- Credits
- Project Requirements & Approach
- Algorithms & Platform Choices
- Documentation & Reporting
- Drag-and-Drop Upload: Upload PDF, DOC, DOCX, TXT, and more with a modern interface
- Web Scraping: Extract and analyze content directly from web pages via URL
- AI-Powered Classification: Automatic, explainable document categorization with multiple algorithms (AI, rule-based, hybrid)
- Advanced Search: Full-text, fuzzy, and filtered search with instant results
- Analytics Dashboard: Visualize document types, upload trends, and category distributions
- Secure Cloud Storage: All files and metadata stored securely with Supabase
- User Authentication: Secure sign-up, login, and access control
- Responsive UI: Works beautifully on desktop and mobile
- Download & Delete: Manage your documents with ease
- Explainable AI: See why documents are classified a certain way
- Category Tree: Hierarchical classification for academic, technical, business, and legal documents
- Real-Time Feedback: Toasts and progress indicators for all actions
- Persistent Stats: Track your document stats over time
- Role-Based Access: Admin and user roles supported
- Live Demo: Always-available online version for instant access
| Feature | Local Version | Online Demo | Description |
|---|---|---|---|
| Drag-and-Drop Upload | β | β | Upload documents from your device |
| Web Scraping (URL Import) | β | β | Import and analyze web pages |
| AI Classification | β | β | Automatic, explainable document categorization |
| Advanced Search | β | β | Full-text, fuzzy, and filtered search |
| Analytics Dashboard | β | β | Visualize document types, trends, and categories |
| Secure Cloud Storage | β | β | Files and metadata stored in Supabase |
| User Authentication | β | β | Sign up, login, and access control |
| Download & Delete | β | β | Manage your documents |
| Explainable AI | β | β | See classification confidence and rationale |
| Category Tree | β | β | Hierarchical document classification |
| Real-Time Feedback | β | β | Toasts, progress bars, and instant updates |
| Persistent Stats | β | β | Track document stats over time |
| Role-Based Access | β | β | Admin and user roles |
| Mobile Responsive | β | β | Works on all devices |
| Live Demo | β | β | No setup required, use instantly online |
- Sign Up & Login:
- Secure authentication with Supabase Auth
- Role-based access for users and admins
- Upload Documents:
- Drag and drop files or select from your device
- Supported formats: PDF, DOC, DOCX, TXT, XLSX, PPTX, images, and more
- Optionally, enter a URL to scrape and analyze web content
- AI Classification:
- Choose your preferred classification method (AI, rule-based, hybrid)
- Documents are categorized into Academic, Technical, Business, Legal, and subcategories
- Confidence scores and algorithm details are shown
- Search & Filter:
- Use the search bar to find documents by content, title, or metadata
- Apply filters by type, category, or upload date
- Sort results and view document details
- Analytics Dashboard:
- Visualize your document collection with bar, pie, and line charts
- See trends, type distributions, and category breakdowns
- Manage Documents:
- Download, delete, or view details for each document
- Real-time feedback for all actions
git clone git@github.com:Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service.git
cd Cloud-Based-Document-Analytics-Servicenpm install- Copy
.env.exampleto.envand fill in your Supabase credentials
npm run dev- Open http://localhost:5173 in your browser
- Create an account or log in securely
- Drag and drop files or select from your device
- Optionally, enter a URL to scrape and analyze web content
- Choose your preferred classification method (AI, rule-based, hybrid)
- Use the search bar to find documents by content, title, or metadata
- Apply filters by type, category, or upload date
- Browse your documents in a sortable, filterable list
- Download, delete, or view details for each document
- See automatic document categorization with confidence scores
- Explore analytics: document type distribution, upload trends, and more
| Layer | Technology/Service |
|---|---|
| Frontend | React, TypeScript, Vite |
| UI | Shadcn/UI, Lucide Icons, Tailwind |
| Backend | Supabase (Postgres, Auth, Storage) |
| AI/ML | Custom & hybrid classification |
| Deployment | Vercel (CI/CD, CDN, Analytics) |
- All data is protected with Supabase Auth and Row Level Security
- Files are stored in user-specific buckets for privacy
- Environment variables are required for all sensitive credentials
- HTTPS enforced on the online demo
- User actions are logged for auditability
- Fork the repo and create your branch
- Make your changes and add tests if needed
- Open a pull request with a clear description
- Follow the code style and best practices
MIT License. See LICENSE for details.
- Developed by Yousef M. Y. Al Sabbah
- Islamic University of Gaza - Faculty of Information Technology
This project was developed as a cloud-based program for basic data analytics, document search, sorting, and classification. Below is a summary of the requirements and how they are addressed in this platform:
| Requirement | How It Is Addressed |
|---|---|
| Collect a large number of PDF/Word documents | Upload via drag-and-drop, file picker, or web scraping from URLs. |
| Store documents in the cloud | Uses Supabase for secure, scalable cloud storage and database. |
| Update collection anytime | Upload new documents or scrape new sources at any time via the interface. |
| Sort documents by title (extracted from document, not filename) | Title extraction from document content; sorting and filtering in the UI. |
| Search documents for text/keywords | Full-text and fuzzy search with instant results; highlights found keywords in document previews. |
| Highlight search text in output documents | Search results show highlighted keywords in context. |
| Classify documents by a predefined tree using any algorithm | Hierarchical classification tree (Academic, Technical, Business, Legal, etc.) with AI, rule-based, or hybrid methods. |
| Provide statistics (size, number, search/sort/classify time, etc.) | Analytics dashboard shows document count, size, upload trends, and operation timings. |
| Use any programming language and cloud platform | Built with React/TypeScript (frontend), Supabase (cloud backend), Vercel (deployment). |
| Well-documented, readable, and maintainable source code | Modular, commented codebase; clear folder structure; usage and contribution guides in README. |
| GitHub repository and cloud program link | GitHub Source Code and Live Demo |
| Write a report describing algorithms, platform, and usage | See below for a summary of algorithms and platform choices. |
- Title Extraction:
- Extracts the actual document title from PDF/Word content using custom parsing utilities.
- Sorting:
- Sorts documents by extracted title, not just filename, for more meaningful organization.
- Search:
- Supports keyword, phrase, and fuzzy search. Highlights found terms in document previews.
- Classification:
- Uses a hybrid approach: combines AI/ML (e.g., text embeddings, TF-IDF) with rule-based logic for robust, explainable classification.
- Classification tree includes Academic, Technical, Business, Legal, and their subcategories.
- Analytics:
- Tracks and displays statistics: document count, total size, upload/search/classification times, and trends over time.
- Cloud Platform:
- Supabase for authentication, storage, and database; Vercel for deployment and CDN; React/TypeScript for frontend.
- The source code is fully documented and organized for easy understanding and extension.
- This README serves as both a user and developer guide.
- For a detailed report on algorithms, platform decisions, and usage, see the attached project report template (if provided by your instructor).
- GitHub Repository: https://github.com/Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service
- Live Cloud Program: https://cloud-based-document-analytics-serv.vercel.app
Last updated: June 8, 2025