`SurfaceBench`: Benchmark for 3D Scientific Surface Discovery

This is the official repository for the paper "SurfaceBench: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces?" (TMLR 2026)

Overview

In this paper, we introduce SurfaceBench, a benchmark built to push equation discovery beyond simple scalar functions and into the richer setting of symbolic surface recovery. The benchmark contains 183 tasks covering 15 different categories, each capturing a distinct kind of symbolic or geometric complexity. Every task provides a ground-truth surface equation along with variable meanings and a 3D point cloud sampled from it, spanning scientific areas such as fluid dynamics, robotics, electromagnetics, and classical geometry. Our goal is to challenge LLM-based symbolic regression systems in a way that existing datasets do not: the surfaces in SurfaceBench are constructed to avoid easy memorization, include diverse representation forms (explicit, implicit, parametric), and reflect realistic scientific structure.

Get Started

Installation

To run the code, create a conda environment and install the dependencies provided in the requirements.txt or environment.yml:

conda create -n SurfaceBench python=3.11.7
conda activate SurfaceBench
pip install -r requirements.txt

Note: Requires Python ≥ 3.9

You also need to install other packages for each search method from their original github repositories.

Datasets

The data for the benchmark will be automatically downloaded from huggingface.

Supported methods

We provide implementation for OpenEvolve in the OpenEvolve/ folder, LLMSR, LaSR, SGA, PySR in the methods/ folder.

In order to include a new method, please refer to the implementation section for detailed instructions on how to add a new discovery method to the project. This includes setting up the necessary configurations, implementing the searcher class, and ensuring compatibility with the existing framework.

How to run

Activate the appropriate conda environment.
Launch a local LLM server. While our implementation utilizes vllm, you can also opt for other libraries as long as you implement the necessary functionality in the searcher class. For example, to start the server with the vllm library, use the command:

vllm serve meta-llama/Llama-3.1-8B-Instruct --dtype auto --api-key token-abc123 --port 10005

Configure the environment variables in the .env file. Duplicate .env.example to .env and specify the following:
- VLLM_API_KEY: Your API key for the local vLLM server (e.g., 'token-abc123').
- OPENAI_API_KEY: Your OpenAI API key if you are utilizing OpenAI models.
- SGA_PYTHON_PATH: The path to the Python executable in your SGA conda environment if you are using the SGA searcher.
Execute the eval.py script with the required arguments:
- --searcher_config: Path to the YAML configuration file for the searcher (mandatory).
- --dataset: The name of the dataset to evaluate (mandatory).
- --resume_from: The path to a previous run directory to continue from (optional).
- --problem_name: The specific problem name to evaluate (optional).
- --local_llm_port: The port number for the local LLM server (optional).
Available dataset options include:
- Nonlinear Analytic Composition Surfaces (Non-Nonlinear_Analytic_Composition_Surfaces)
- Piecewise-Defined Surfaces (Piecewise-Defined_Surfaces)
- Mixed Transcendental Analytic Surfaces (Mixed_Transcendental_Analytic_Surfaces)
- Conditional Multi-Regime Surfaces (Conditional_Multi-Regime_Surfaces)
- Oscillatory Composite Surfaces (Oscillatory_Composite_Surfaces)
- Trigonometric–Exponential Composition Surfaces (Trigonometric–Exponential_Composition_Surfaces)
- Multi-Operator Composite Surfaces (Multi-Operator_Composite_Surfaces)
- Elementary Bivariate Surfaces (Elementary_Bivariate_Surfaces)
- Discrete Integer-Grid Surfaces (Discrete_Integer-Grid_Surfaces)
- Nonlinear Coupled Surfaces (Nonlinear_Coupled_Surfaces)
- Exponentially-Modulated Trigonometric Surfaces (Exponentially-Modulated_Trigonometric_Surfaces)
- Localized and Radially-Decaying Surfaces (Localized_and_Radially-Decaying_Surfaces)
- Polynomial–Transcendental Mixtures (Polynomial–Transcendental_Mixtures)
- High-Degree Implicit Surfaces (High-Degree_Implicit_Surfaces)
- Parametric Multi-Output Surfaces (Parametric_Multi-Output_Surfaces)

For example, for running discovery methods on all the Elementary Bivariate Surfaces datasets with open LLM backbone Qwen-3-8B on local server, you can use the following commands:

LLM-SR:

python eval.py --dataset "Elementary_Bivariate_Surfaces" --searcher_config configs/llmsr_qwen3-8B.yaml --local_llm_port 10005

LaSR:

python eval.py --dataset "Elementary_Bivariate_Surfaces" --searcher_config configs/lasr_qwen3-8B.yaml --local_llm_port 10005

SGA method:

python eval.py --dataset "Elementary_Bivariate_Surfaces" --searcher_config configs/sga_qwen3-8B.yaml --local_llm_port 10005

More evaluation scripts for running discovery methods with different LLM backbones on different datasets are provided in example_script.sh.

The execution of eval.py will generate log files in the logs/ folder. You can resume your run using the --resume_from <log_dir> option. For instance, --resume_from logs/Elementary_Bivariate_Surfaces/Elementary_Bivariate_Surfaces/llmsr-Qwen3-8B/09-20-2025_18-09-19-121210 will bypass already completed problems.

Project Structure

The working directory structure will be as follows:

project
│   README.md
|   eval.py
|   .env
└───bench/
|
└───methods/
|   └───llmsr
|   └───lasr
|   └───PySR
|   └───sga_sr
|
└───datasets/
|
└───OpenEvolve/
|
└───logs/
    └───<dataset-name>
        └───<method-name>
            └───<date>

Implementing a new searcher

To implement a new searcher, you must create a class that inherits from the base class BaseSearcher. This base class provides the foundational structure for your LLM-based searcher, including essential methods that need to be overridden.

class BaseSearcher:
    def __init__(self, name) -> None:
        self._name = name

    def discover(self, task: SEDTask) -> List[SearchResult]:
        '''
        Return:
            List of SearchResult
        '''
        raise NotImplementedError

    def __str__(self):
        return self._name

The input task will provide a description of the target equation, input variables, and training data points.

An example of a searcher is

class NewSearcher(BaseSearcher):
    def __init__(self, name, num_sample, api_type, api_model, api_url):
        super().__init__(name)
        self.num_samples = num_samples
        self.llm = LLM(api_type, api_model, api_url)

    def discover(self, task: SEDTask):
        dataset = task.samples
        symbol_descs = task.symbol_descs

        prompt = f"Find the mathematical function skeleton that represents {symbol_descs[0]}, given data on {", ".join(symbol_descs[1:-1]) + ", and " + symbol_descs[-1]}"
        
        best_program, best_score = None, -np.inf
        for _ in range(self.num_samples):
            program_str, aux = self.llm.sample_program(prompt)
            score = evaluate(program_str, dataset)
            if score > best_score:
                best_program = program_str

        best_equation = Equation(
            symbols=info["symbols"],
            symbol_descs=info["symbol_descs"],
            symbol_properties=info["symbol_properties"],
            expression=None,
            program_format = best_program,
            lambda_format = programstr2lambda(best_program)
        )

        return [
            SearchResult(
                equation=best_equation,
                aux=aux
            )
        ]

Once you’ve implemented your searcher, create a corresponding configuration file in the configs/ folder. For example:

name: NewSearcher-Llama31_8b
class_name: NewSearcher
api_type: "vllm"
api_model: "meta-llama/Llama-3.1-8B-Instruct"
api_url: "http://localhost:{}/v1/"
num_samples: 1000

To evaluate with this searcher, run eval.py and provide the path to its corresponding configuration file; this will load the settings and initiate the evaluation process on the specified dataset.

To Run OpenEvolve:

1. Configure API Keys

The API key is read from the environment OPENAI_API_KEY by default. You can check create_config() in data_api.py.

2. Load Benchmark Tasks & Generate Initial Programs

The data_api.py script is crucial for setting up the environment. It prepares tasks from the SurfaceBench dataset (defined by classes in ./bench, and will be located at ./problems).

For each benchmark task, this script will automatically generate:

initial_program.py: A starting Python program, typically a simple linear model.
evaluator.py: A tailored evaluation script for the task.
config.yaml: An OpenEvolve configuration file specific to the task.

Run the script from your terminal:

python data_api.py

This will create subdirectories for each benchmark task, populated with the necessary files.

3. Run OpenEvolve

Use the provided shell script scripts.sh to execute OpenEvolve across the generated benchmark tasks. This script iterates through the task-specific configurations and applies the evolutionary process.

bash scripts.sh

4. Evaluate Results

After OpenEvolve has completed its runs, you can evaluate the performance on different subsets of tasks. The eval.py script collates the results and provides a summary.

python eval.py <subset_path>

For example, to evaluate results for the 'Elementary_Bivariate_Surfaces' subset located in ./problems/Elementary_Bivariate_Surfaces/, you would run:

python eval.py ./problems/Elementary_Bivariate_Surfaces

This script will also save a JSON file containing detailed results for your analysis.

To Run PySR (Non-LLM Method):

python eval.py --dataset "Elementary_Bivariate_Surfaces" --searcher_config configs/pysr_explicit.yaml

Citation

If you find this work useful, please cite:

@article{
kabra2026surfacebench,
title={{SURFACEBENCH}: A Geometry-Aware Benchmark for Symbolic Surface Discovery},
author={Sanchit Kabra and Shobhnik Kriplani and Parshin Shojaee and Chandan K. Reddy},
journal={Transactions on Machine Learning Research},
year={2026},
url={https://openreview.net/forum?id=sHLTzkczSi}
}

License

This repository is licensed under MIT licence.

This work is built on top of other open source projects, including Openevolve, LLM-SR, LaSR, SGA, and PySR, and is inspired by the effort behind LLM-SRBench. We sincerely thank the original contributors of these works for open-sourcing their valuable source codes.

Contact Us

For any questions or issues, you are welcome to open an issue in this repo, or contact us at sanchit23@vt.edu and shobhnik1@gmail.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`SurfaceBench`: Benchmark for 3D Scientific Surface Discovery

Overview

Get Started

Installation

Datasets

Supported methods

How to run

Project Structure

Implementing a new searcher

To Run OpenEvolve:

1. Configure API Keys

2. Load Benchmark Tasks & Generate Initial Programs

3. Run OpenEvolve

4. Evaluate Results

To Run PySR (Non-LLM Method):

Citation

License

Contact Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
OpenEvolve		OpenEvolve
bench		bench
configs		configs
images		images
methods		methods
README.md		README.md
eval.py		eval.py
example_script.sh		example_script.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SurfaceBench: Benchmark for 3D Scientific Surface Discovery

Overview

Get Started

Installation

Datasets

Supported methods

How to run

Project Structure

Implementing a new searcher

To Run OpenEvolve:

1. Configure API Keys

2. Load Benchmark Tasks & Generate Initial Programs

3. Run OpenEvolve

4. Evaluate Results

To Run PySR (Non-LLM Method):

Citation

License

Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`SurfaceBench`: Benchmark for 3D Scientific Surface Discovery

Packages