LlamaLib is a high-level C++ and C# library for running Large Language Models (LLMs) anywhere - from PCs to mobile devices and VR headsets.
-
β High-Level API
C++ and C# implementations with intuitive object-oriented design. -
π¦ Self-Contained and Embedded
Runs embedded within your application.
No need for a separate server, open ports or external processes.
Zero external dependencies. -
π Runs Anywhere
Cross-platform and cross-device.
Works on all major platforms:- Desktop:
Windows,macOS,Linux - Mobile:
Android,iOS - VR/AR:
Meta Quest,Apple Vision,Magic Leap
and hardware architectures:
- CPU: Intel, AMD, Apple Silicon
- GPU: NVIDIA, AMD, Metal
- Desktop:
-
π Architecture Detection at runtime
Automatically selects the optimal backend at runtime supporting all major GPU and CPU architectures. -
πΎ Small footprint
Integration requires around 100 MB for CPU architectures and offers GPU support with 70MB (Vulkan) / 370 MB (tinyBLAS) / 1.3 GB (cuBLAS). -
π οΈ Production ready
Designed for easy integration into C++ and C# applications.
Supports both local and client-server deployment.
- Direct implementation of LLM operations (completion, tokenization, embeddings)
- Clean architecture for services, clients, and agents
- Simple server-client setup with built-in SSL and authentication support
- The only library that lets you build for any hardware with runtime detection unlike alternatives limited to specific GPU vendors or CPU-only execution
- GPU backend auto-selection: Automatically chooses NVIDIA, AMD, Metal or switch to CPU
- CPU optimization: Identifies and uses optimal CPU instruction sets
- Embedded deployment: No need for open ports or external processes
- Small footprint: Compact builds ideal for PC or mobile deployment
- Battle-tested: Powers LLM for Unity, the most widely used LLM integration for games
- β Star the repo and spread the word!
- β€οΈ Sponsor development or support with a
- π¬ Join our Discord community
- π Contribute with feature requests, bug reports, or pull requests
- LLM for Unity: The most widely used solution to integrate LLMs in games
Language Guides:
LlamaLib provides three main classes for different use cases:
| Class | Purpose | Best For |
|---|---|---|
| LLMService | LLM backend engine | Building standalone apps or servers |
| LLMClient | Local or remote LLM access | Connecting to existing LLM services |
| LLMAgent | Conversational AI with memory | Building chatbots or interactive AI |
#include "LlamaLib.h"
int main() {
// LlamaLib automatically detects your hardware and selects optimal backend
LLMService llm("path/to/model.gguf");
/* Optional parameters:
threads=-1, // CPU threads (-1 = auto)
gpu_layers=0, // GPU layers (0 = CPU only)
num_slots=1 // parallel slots/clients
*/
// Start service
llm.start();
// Generate completion
std::string response = llm.completion("Hello, how are you?");
std::cout << response << std::endl;
// Supports streaming operation to your function:
// llm.completion(prompt, streaming_callback);
return 0;
}π See the C++ guide for installation, building, and complete API reference.
using LlamaLib;
class Program {
static void Main() {
// Same API, different language
LLMService llm = new LLMService("path/to/model.gguf");
/* Optional parameters:
threads=-1, // CPU threads (-1 = auto)
gpu_layers=0, // GPU layers (0 = CPU only)
num_slots=1 // parallel slots/clients
*/
llm.Start();
string response = llm.Completion("Hello, how are you?");
Console.WriteLine(response);
// Supports streaming operation to your function:
// llm.Completion(prompt, streamingCallback);
}
}π See the C# guide for installation, NuGet setup, and complete API reference.
LlamaLib is licensed under the Apache 2.0.
