Skip to content

rohanelukurthy/rig-rank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RigRank: Local LLM Benchmarking & Hardware Suitability CLI

Build Status Release Go Report Card License: MIT

Local LLM Benchmarking & Hardware Suitability Tool

RigRank is a CLI tool written in Go that benchmarks how well Large Language Models (LLMs) run on your specific hardware. It measures real-world performance metrics via Ollama and tells you whether your "rig" is ready for AI workloads.

🎯 What RigRank Measures (and What It Doesn't)

RigRank answers: "How fast does this model run on MY hardware?"

βœ… RigRank Measures ❌ RigRank Does NOT Measure
Tokens per second (throughput) Model accuracy or intelligence
Time to first token (latency) Response quality or correctness
Prompt processing speed Benchmark scores (MMLU, HumanEval, etc.)
Model load times (cold start) Reasoning or factual correctness

Why this distinction matters: Local LLMs aren't 'one size fits all.' Depending on your CPU, GPU, and RAM, you might be forced to trade model intelligence (quantization) for usable speed. RigRank provides the telemetry you need to navigate these constraints, helping you find the perfect balance between reasoning depth and desktop snappiness.

πŸš€ Features

  • Hardware Telemetry: Automatically detects CPU cores, RAM type/speed (macOS), and GPU VRAM/Model.
  • 5-Stage Benchmark Suite:
    • Atomic Check: TTFT (Time To First Token) latency test.
    • Code Generation: Evaluation of structured output performance.
    • Story Generation: Long-context generation throughput.
    • Summarization: Context ingestion speed testing.
    • Reasoning: Logical processing capabilities.
  • Ollama Integration: Seamlessly connects to your local Ollama instance.
  • JSON Reporting: detailed, machine-readable output for analysis.

πŸ› οΈ Prerequisites

  • Ollama: Must be installed and running (ollama serve).
  • Models: You need at least one model pulled (e.g., llama3, gemma:2b).
  • (Optional) Go 1.25+ (only if building from source).

πŸ“¦ Installation

The easiest way to install RigRank is to use the installation script.

macOS / Linux

Open your terminal and run:

curl -sL https://raw.githubusercontent.com/rohanelukurthy/rig-rank/main/install.sh | bash

Windows (PowerShell)

Open PowerShell and run:

irm https://raw.githubusercontent.com/rohanelukurthy/rig-rank/main/install.ps1 | iex

Alternative Methods

If you prefer not to use the installation scripts, you can download the pre-built binaries directly from the Releases page.

Building from Source

If you prefer to compile RigRank yourself or are developing features:

git clone https://github.com/rohanelukurthy/rig-rank.git
cd rigrank
make build-all # or go build -o rigrank ./cmd/rigrank

πŸƒ Usage

Ensure Ollama is running (ollama serve), then run the benchmark suite:

# Run with default model (llama3)
./rigrank run

# Run with a specific model
./rigrank run --model gemma2:9b

# Run with debug logging enabled
./rigrank run --model mistral --debug

# Run and wait for system to be idle first
./rigrank run --model phi3 --quiet-wait

# Run and save results to a JSON file
./rigrank run --model qwen2:7b --output results.json

Options

Flag Shorthand Description Default
--model -m Ollama model name to benchmark llama3
--context-window -c Context window size for the model 4096
--output -o Path to save JSON results
--debug -d Enable verbose debug logging false
--quiet-wait Wait for system to become idle before benchmarking false
--quiet-cpu Maximum CPU usage percentage allowed during quiet wait 15
--quiet-ram-mb Minimum free RAM (MB) required during quiet wait 2048
--quiet-timeout Timeout in seconds to wait for quiet state 60
--quiet-wait-secs Duration in seconds of sustained quiet state required 5
--help -h Show help for command

πŸ“Š Output Example

RigRank displays a human-friendly Report Card followed by detailed JSON metrics:

  πŸ“Š Model Report Card: gemma3:1b

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Benchmark       Startup      Writing Speed    Reading Speed      β”‚
  β”‚                  (first word) (output)         (input)            β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚  Atomic Check    343ms        ~70 words/sec    ~531 words/sec     β”‚
  β”‚  Code Gen        579ms        ~18 words/sec    ~657 words/sec     β”‚
  β”‚  Story Gen       624ms        ~19 words/sec    ~525 words/sec     β”‚
  β”‚  Summarization   732ms        ~17 words/sec    ~9.7k words/sec    β”‚
  β”‚  Reasoning       495ms        ~16 words/sec    ~1.1k words/sec    β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  βœ… Writing Speed: Excellent across all tasks.
  ⚠️  Startup: Noticeable pause before responses begin.

  This model is suitable for most tasks, but may struggle with some heavy workloads.

For the full JSON output schema, see examples/sample_output.json.

πŸ“ˆ Understanding the Metrics

The JSON output contains three key performance stats for each benchmark:

Metric Full Name Plain English Explanation
ttft_ms Time To First Token The "Snappiness" Metric. How long you wait (in milliseconds) for the model to generate the very first word. Lower numbers mean the model feels more responsive.
gen_tps Generation Tokens/Sec The "Writing Speed" Metric. How fast the model generates the text of its response. Higher numbers mean long stories or code blocks finish faster.
prompt_tps Prompt Processing Tokens/Sec The "Reading Speed" Metric. How fast the model processes your input before it starts thinking. Crucial for summarizing large documents or chatting with long context.

πŸ—οΈ Architecture

See Architecture.md for the high-level design and dependency graph.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

Distributed under the MIT License.

About

A Go CLI tool to benchmark local LLMs via Ollama, measuring Time To First Token (TTFT) and throughput on your specific hardware.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors