Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
-
Updated
Jan 7, 2026 - Kotlin
Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
Run a <400ms latency Voice Agent on just 4GB VRAM. Fully offline, no API keys required. Optimized for GTX 1650 and edge robotics with zero-copy inference. (Apache 2.0)
🚀 A powerful Flutter-based AI chat application that lets you run LLMs directly on your mobile device or connect to local model servers. Features offline model execution, Ollama/LLMStudio integration, and a beautiful modern UI. Privacy-focused, cross-platform, and fully open source.
Local LLM proxy, DevOps friendly
🖼️ Python Image and 🎥 Video Generator using LLM providers and models — built with Claude Code 💻 CLI
A framework for using local LLMs (Qwen2.5-coder 7B) that are fine-tuned using RL to generate, debug, and optimize code solutions through iterative refinement.
An advanced, fully local, and GPU-accelerated RAG pipeline. Features a sophisticated LLM-based preprocessing engine, state-of-the-art Parent Document Retriever with RAG Fusion, and a modular, Hydra-configurable architecture. Built with LangChain, Ollama, and ChromaDB for 100% private, high-performance document Q&A.
A fully customizable, super light-weight, cross-platform GenAI based Personal Assistant that can be run locally on your private hardware!
🤖 An Intelligent Chatbot: Powered by the locally hosted Ollama 3.2 LLM 🧠 and ChromaDB 🗂️, this chatbot offers semantic search 🔍, session-aware responses 🗨️, and an interactive Streamlit interface 🎨 for seamless user interaction. 🚀
LLM Router is a service that can be deployed on‑premises or in the cloud. It adds a layer between any application and the LLM provider. In real time it controls traffic, distributes a load among providers of a specific LLM, and enables analysis of outgoing requests from a security perspective (masking, anonymization, prohibited content).
An autonomous AI agent for intelligently updating, maintaining, and curating a LightRAG knowledge base.
An AI-powered assistant to streamline knowledge management, member discovery, and content generation across Telegram and Twitter, while ensuring privacy with local LLM deployment.
**Ask CLI** is a command-line tool for interacting with a local LLM (Large Language Model) server. It allows you to send queries and receive concise command-line responses.
This repository has code to securely run SLM (Small language models) locally using nodejs (servers side) or inside browser .
Voice-powered productivity for Windows
Python CLI/TUI for intelligent media file organization. Features atomic operations, rollback safety, and integrity checks, with a local LLM workflow for context-aware renaming and categorization from API-sourced metadata.
A lightweight frontend for LM Studio local server APIs. Built using React, Vite, and Tailwind CSS with full support for streaming responses and GitHub Flavored Markdown.
JV-Archon is my personal offline LLM ecosystem.
WoolyChat - open-source AI chat app for locally hosted Ollama models. Written in Flask/JavaScript.
An AI Chat/Agent C# library with heavy focus on robust toolings, and swapping between LLM providers OpenAi, Gemini or a local Ollama.
Add a description, image, and links to the local-llm-integration topic page so that developers can more easily learn about it.
To associate your repository with the local-llm-integration topic, visit your repo's landing page and select "manage topics."