ZK Benchmarks for Cartesi Machine

Benchmarking framework for measuring ZK proving performance of Cartesi machine state transitions. Supports multiple ZK implementations (RISC-0, Zisk) to enable cross-prover comparison.

Overview

What We Measure

ZK provers like RISC-0 have two distinct phases:

Execution - Running the program to collect metrics
Proving - Generating cryptographic proofs (99% of total time)

Full proving is extremely slow (impossible on Mac, slow even on GPU). However, provers offer a dev mode that executes without generating real proofs. This is useful because the key metrics we need are available from execution alone:

Metric	Description
Cycles	RISC-V cycles spent on execution
Segments	Proof chunks (RISC-0 splits execution into parallelizable segments)
Page Count	Memory pages touched (correlates strongly with cycle count)

Time Estimation

With cycles measured, we can estimate real-world proving time using hardware throughput data:

Proving Time = Total Cycles / Hardware Throughput (cycles/second)

For example, RISC-0 publishes throughput benchmarks:

NVIDIA RTX 4090: ~85 kHz (85,000 cycles/second)
NVIDIA RTX 3090 Ti: ~60 kHz
CPU (x86): ~10 kHz

This approach lets us benchmark quickly in dev mode while providing actionable time estimates for real hardware.

Why Chunks Matter

Cartesi machine execution consists of:

Boot phase - Linux kernel boot (always similar)
Payload phase - The actual benchmark workload

We never prove the entire execution end-to-end. Instead, we prove chunks (windows) of execution:

Divergence happens at specific cycles - we only need to prove around that point
Different chunks have different characteristics (memory usage, page count)
We need to know the worst-case chunk to set reasonable limits

This is why benchmarks test various chunk sizes and use Monte Carlo sampling to understand the distribution.

Parallelization

RISC-0 segments are independently provable. With N GPUs, proving time divides by N. The segment size is configurable - smaller segments mean more parallelization but higher overhead.

Benchmark Modes

Sweep Mode

Tests proving across a range of chunk sizes:

mode: sweep
step_sizes:
  min: 50000      # Minimum chunk size (cycles)
  max: 500000     # Maximum chunk size
  increment: 20000 # Step between measurements

Produces graphs showing how cycles, segments, and estimated time scale with chunk size.

Monte Carlo Mode

Samples random windows across the entire program execution:

mode: monte-carlo
monte_carlo:
  num_samples: 100    # Number of random samples
  window_size: 100000 # Fixed window size to sample

Produces histograms showing the distribution of metrics across different parts of execution. Useful for understanding variance and worst-case scenarios.

Installation

Prerequisites

Python 3.10+
Rust toolchain (for building provers)
Machine emulator dependencies (see deps/machine/README.md)

Setup

# Clone with submodules
git clone --recursive https://github.com/cartesi/zk-benchmarks.git
cd zk-benchmarks

# Install Python dependencies
pip install -r requirements.txt

# Initialize submodules
git submodule update --init --recursive

# Build the machine emulator and prover
python benchmark.py --build-only

Prover-Specific Setup

RISC-0 (built automatically):

# Dev mode is enabled by default (no GPU required)
# Set in config.yaml: RISC0_DEV_MODE: "1"

Zisk (requires pre-built ELF):

# Install Zisk toolchain
curl -L https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash
~/.zisk/bin/ziskup

# Build Zisk ELF in machine-emulator (requires LLVM 20)
cd deps/machine/zisk
make LLVM20_DIR=/path/to/llvm@20 all

Usage

List Available Benchmarks

python benchmark.py --list

Run a Single Benchmark

# Run with default settings
python benchmark.py stress-int64

# Override step sizes
python benchmark.py stress-int64 --min-step 10000 --max-step 100000 --increment 5000

# Run Monte Carlo mode
python benchmark.py mc-heapsort --monte-carlo --num-samples 50 --window-size 50000

Run with Specific Prover

# Run only RISC-0
python benchmark.py stress-int64 --prover risc0

# Run only Zisk
python benchmark.py stress-int64 --prover zisk

Run All Benchmarks

python benchmark.py

Configuration

Global Config (`config.yaml`)

# Kernel and rootfs images
linux_image_url: https://github.com/cartesi/machine-linux-image/releases/...
rootfs_image_url: https://github.com/cartesi/machine-rootfs-image/releases/...

# Hardware profiles for time estimation
hardware_profiles:
  risc0_rtx_4090:
    name: "RISC0 NVIDIA RTX 4090"
    prover: risc0
    throughput_khz: 85
  risc0_cpu:
    name: "RISC0 CPU (x86)"
    prover: risc0
    throughput_khz: 10

# Prover-specific settings
provers:
  risc0:
    env:
      RISC0_DEV_MODE: "1"
  zisk:
    zisk_home: null  # Uses ~/.zisk by default

Benchmark Config (`benchmarks/*.yaml`)

name: stress-int64
description: "stress-ng int64 benchmark"
command: "stress-ng --cpu 1 --cpu-method int64 --cpu-ops 400 --metrics"
mode: sweep

# Which provers to run
provers:
  - risc0
  - zisk

# Hardware profiles for time estimation (per prover)
hardware_profiles:
  risc0:
    - risc0_rtx_4090
    - risc0_cpu

# Sweep mode settings
step_sizes:
  min: 50000
  max: 500000
  increment: 20000

Output

Results are saved to results/<benchmark>_<prover>_<timestamp>/:

Sweep Mode Output

results/stress-int64_risc0_20260112_174343/
├── log.bin          # Step log from Cartesi machine
├── receipt.bin      # Proof receipt (dev mode)
├── results.json     # Raw metrics for each step size
└── plots.png        # Visualization graphs

results.json contains per-step metrics:

[
  {
    "step": 50000,
    "total_cycles": 192937984,
    "number_of_segments": 184,
    "user_cycles": 174285398,
    "page_count": 122,
    "execution_time": "4.66s"
  }
]

Monte Carlo Output

results/mc-heapsort_monte_carlo_20260112_162122/
├── log.bin
├── receipt.bin
├── mc-heapsort_100000.jsonl    # One JSON object per sample
└── mc-heapsort_100000_histograms.png

Understanding the Graphs

Sweep Mode Plots

Four graphs are generated:

Execution Time - Time to run in dev mode (not meaningful for production estimates)
Number of Segments - Segments increase with chunk size
Total Cycles - RISC-V cycles required to prove
Page Count - Memory pages touched (correlates with cycles)

Additionally, estimated proving time graphs show real-world projections for each hardware profile.

Monte Carlo Histograms

Histograms show the distribution of metrics across random samples:

Page Count Distribution - How memory usage varies across execution
Total Cycles Distribution - Cycle count variance
Estimated Time Distribution - Real-world time estimates

Statistics include: min, max, mean, median, P90, P95, P99.

Workload Types

Benchmarks use stress-ng to simulate different workloads:

Benchmark	Workload Type	Description
`stress-int64`	CPU-bound	64-bit integer operations
`stress-fp`	CPU-bound	Floating point operations
`stress-loop`	CPU-bound	Tight loop iterations
`mc-heapsort`	Memory-bound	Heap sort with allocations
`mc-loop`	CPU-bound	Loop with Monte Carlo sampling
`mc-tree`	Memory-bound	Tree operations

Project Structure

zk-benchmarks/
├── benchmark.py          # Main benchmark runner
├── config.yaml           # Global configuration
├── requirements.txt      # Python dependencies
├── provers/              # Prover adapter layer
│   ├── base.py           # Abstract ProverAdapter class
│   ├── risc0.py          # RISC-0 adapter
│   └── zisk.py           # Zisk adapter
├── benchmarks/           # Benchmark definitions
│   ├── stress-int64.yaml
│   ├── mc-heapsort.yaml
│   └── ...
├── deps/
│   └── machine/          # Machine emulator submodule
│       ├── risc0/        # RISC-0 prover integration
│       └── zisk/         # Zisk prover integration
├── images/               # Downloaded kernel/rootfs
└── results/              # Benchmark output

Adding New Benchmarks

Create a YAML file in benchmarks/:

name: my-benchmark
description: "Description of what this tests"
command: "command-to-run-inside-cartesi-machine"
mode: sweep  # or monte-carlo

provers:
  - risc0
  - zisk

hardware_profiles:
  risc0:
    - risc0_rtx_4090
    - risc0_cpu

step_sizes:  # for sweep mode
  min: 50000
  max: 500000
  increment: 20000

Run: python benchmark.py my-benchmark

Adding New Provers

Create adapter in provers/:

from .base import ProverAdapter, ProverResult

class NewProverAdapter(ProverAdapter):
    name = "new-prover"

    def is_built(self) -> bool:
        # Check if prover binaries exist
        pass

    def build(self) -> bool:
        # Build the prover
        pass

    def prove(self, start_hash, end_hash, log_path, step_size, output_dir) -> ProverResult:
        # Run the prover and return metrics
        pass

Register in provers/__init__.py
Add configuration to config.yaml
Add hardware profiles if available

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmarks		benchmarks
deps		deps
provers		provers
results		results
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
benchmark.py		benchmark.py
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ZK Benchmarks for Cartesi Machine

Overview

What We Measure

Time Estimation

Why Chunks Matter

Parallelization

Benchmark Modes

Sweep Mode

Monte Carlo Mode

Installation

Prerequisites

Setup

Prover-Specific Setup

Usage

List Available Benchmarks

Run a Single Benchmark

Run with Specific Prover

Run All Benchmarks

Configuration

Global Config (config.yaml)

Benchmark Config (benchmarks/*.yaml)

Output

Sweep Mode Output

Monte Carlo Output

Understanding the Graphs

Sweep Mode Plots

Monte Carlo Histograms

Workload Types

Project Structure

Adding New Benchmarks

Adding New Provers

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Global Config (`config.yaml`)

Benchmark Config (`benchmarks/*.yaml`)

Packages