Local LLM Benchmarking & Hardware Suitability Tool
RigRank is a CLI tool written in Go that benchmarks how well Large Language Models (LLMs) run on your specific hardware. It measures real-world performance metrics via Ollama and tells you whether your "rig" is ready for AI workloads.
RigRank answers: "How fast does this model run on MY hardware?"
| β RigRank Measures | β RigRank Does NOT Measure |
|---|---|
| Tokens per second (throughput) | Model accuracy or intelligence |
| Time to first token (latency) | Response quality or correctness |
| Prompt processing speed | Benchmark scores (MMLU, HumanEval, etc.) |
| Model load times (cold start) | Reasoning or factual correctness |
Why this distinction matters: Local LLMs aren't 'one size fits all.' Depending on your CPU, GPU, and RAM, you might be forced to trade model intelligence (quantization) for usable speed. RigRank provides the telemetry you need to navigate these constraints, helping you find the perfect balance between reasoning depth and desktop snappiness.
- Hardware Telemetry: Automatically detects CPU cores, RAM type/speed (macOS), and GPU VRAM/Model.
- 5-Stage Benchmark Suite:
- Atomic Check: TTFT (Time To First Token) latency test.
- Code Generation: Evaluation of structured output performance.
- Story Generation: Long-context generation throughput.
- Summarization: Context ingestion speed testing.
- Reasoning: Logical processing capabilities.
- Ollama Integration: Seamlessly connects to your local Ollama instance.
- JSON Reporting: detailed, machine-readable output for analysis.
- Ollama: Must be installed and running (
ollama serve). - Models: You need at least one model pulled (e.g.,
llama3,gemma:2b). - (Optional) Go 1.25+ (only if building from source).
The easiest way to install RigRank is to use the installation script.
Open your terminal and run:
curl -sL https://raw.githubusercontent.com/rohanelukurthy/rig-rank/main/install.sh | bashOpen PowerShell and run:
irm https://raw.githubusercontent.com/rohanelukurthy/rig-rank/main/install.ps1 | iexIf you prefer not to use the installation scripts, you can download the pre-built binaries directly from the Releases page.
If you prefer to compile RigRank yourself or are developing features:
git clone https://github.com/rohanelukurthy/rig-rank.git
cd rigrank
make build-all # or go build -o rigrank ./cmd/rigrankEnsure Ollama is running (ollama serve), then run the benchmark suite:
# Run with default model (llama3)
./rigrank run
# Run with a specific model
./rigrank run --model gemma2:9b
# Run with debug logging enabled
./rigrank run --model mistral --debug
# Run and wait for system to be idle first
./rigrank run --model phi3 --quiet-wait
# Run and save results to a JSON file
./rigrank run --model qwen2:7b --output results.json| Flag | Shorthand | Description | Default |
|---|---|---|---|
--model |
-m |
Ollama model name to benchmark | llama3 |
--context-window |
-c |
Context window size for the model | 4096 |
--output |
-o |
Path to save JSON results | |
--debug |
-d |
Enable verbose debug logging | false |
--quiet-wait |
Wait for system to become idle before benchmarking | false |
|
--quiet-cpu |
Maximum CPU usage percentage allowed during quiet wait | 15 |
|
--quiet-ram-mb |
Minimum free RAM (MB) required during quiet wait | 2048 |
|
--quiet-timeout |
Timeout in seconds to wait for quiet state | 60 |
|
--quiet-wait-secs |
Duration in seconds of sustained quiet state required | 5 |
|
--help |
-h |
Show help for command |
RigRank displays a human-friendly Report Card followed by detailed JSON metrics:
π Model Report Card: gemma3:1b
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Benchmark Startup Writing Speed Reading Speed β
β (first word) (output) (input) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Atomic Check 343ms ~70 words/sec ~531 words/sec β
β Code Gen 579ms ~18 words/sec ~657 words/sec β
β Story Gen 624ms ~19 words/sec ~525 words/sec β
β Summarization 732ms ~17 words/sec ~9.7k words/sec β
β Reasoning 495ms ~16 words/sec ~1.1k words/sec β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Writing Speed: Excellent across all tasks.
β οΈ Startup: Noticeable pause before responses begin.
This model is suitable for most tasks, but may struggle with some heavy workloads.
For the full JSON output schema, see examples/sample_output.json.
The JSON output contains three key performance stats for each benchmark:
| Metric | Full Name | Plain English Explanation |
|---|---|---|
ttft_ms |
Time To First Token | The "Snappiness" Metric. How long you wait (in milliseconds) for the model to generate the very first word. Lower numbers mean the model feels more responsive. |
gen_tps |
Generation Tokens/Sec | The "Writing Speed" Metric. How fast the model generates the text of its response. Higher numbers mean long stories or code blocks finish faster. |
prompt_tps |
Prompt Processing Tokens/Sec | The "Reading Speed" Metric. How fast the model processes your input before it starts thinking. Crucial for summarizing large documents or chatting with long context. |
See Architecture.md for the high-level design and dependency graph.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License.