chore: Consolidate TPC benchmark scripts by andygrove · Pull Request #3538 · apache/datafusion-comet

andygrove · 2026-02-16T22:10:27Z

Summary

Consolidate individual per-engine shell scripts (spark-tpch.sh, comet-tpcds.sh, etc.) into a single Python runner (benchmarks/tpc/run.py) driven by TOML engine configs in engines/
Rename create-iceberg-tpch.py to create-iceberg-tables.py with a --benchmark {tpch,tpcds} flag to support converting both TPC-H and TPC-DS Parquet data to Iceberg tables
Add check_benchmark_env() in the runner to validate benchmark-specific env vars (TPCH_QUERIES / TPCDS_QUERIES, etc.) and default ICEBERG_DATABASE to the benchmark name
Remove hardcoded TPC-H assumptions from comet-iceberg.toml so it works for both benchmarks

Test plan

python3 run.py --engine comet-iceberg --benchmark tpch --dry-run produces correct command
python3 run.py --engine comet-iceberg --benchmark tpcds --dry-run produces correct command with --database tpcds and TPC-DS executor settings
python3 create-iceberg-tables.py --help shows both tpch and tpcds choices
Other engines (spark, comet, gluten, blaze) still work for both benchmarks

🤖 Generated with Claude Code

Replace 9 per-engine shell scripts with a single `run.py` that loads per-engine TOML config files. This eliminates duplicated Spark conf boilerplate and makes it easier to add new engines or modify shared settings. Usage: `python3 run.py --engine comet --benchmark tpch [--dry-run]` Also moves benchmarks from `dev/benchmarks/` to `benchmarks/tpc/` and updates all documentation references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename create-iceberg-tpch.py to create-iceberg-tables.py with --benchmark flag supporting both tpch and tpcds table sets - Remove hardcoded TPCH_QUERIES from comet-iceberg.toml required env vars - Remove hardcoded ICEBERG_DATABASE default of "tpch" from comet-iceberg.toml - Add check_benchmark_env() in run.py to validate benchmark-specific env vars and default ICEBERG_DATABASE to the benchmark name - Update README with TPC-DS Iceberg table creation examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The script now configures the Iceberg catalog via SparkSession.builder instead of requiring --conf flags on the spark-submit command line. This adds --warehouse as a required CLI arg, makes --catalog optional (default: local), and validates paths with clear error messages before starting Spark. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove · 2026-02-17T13:34:02Z

@mbutrovich @comphead I have finished testing this PR and it is now ready for review

andygrove and others added 2 commits February 16, 2026 14:58

andygrove changed the title ~~Consolidate TPC benchmark scripts and add TPC-DS Iceberg support~~ Consolidate TPC benchmark scripts [WIP] Feb 16, 2026

Remove blaze engine configuration and references

49be494

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Feb 16, 2026

feat: Add Docker Compose support for TPC benchmarks [WIP] #3539

Draft

feat: unified benchmark runner with composable config [will not merge] #3534

Closed

andygrove changed the title ~~Consolidate TPC benchmark scripts [WIP]~~ chore: Consolidate TPC benchmark scripts [WIP] Feb 16, 2026

Format README.md with prettier

233f919

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove changed the title ~~chore: Consolidate TPC benchmark scripts [WIP]~~ chore: Consolidate TPC benchmark scripts Feb 17, 2026

andygrove marked this pull request as ready for review February 17, 2026 02:16

andygrove marked this pull request as draft February 17, 2026 02:27

andygrove and others added 2 commits February 17, 2026 06:22

prettier

b4f2af0

andygrove marked this pull request as ready for review February 17, 2026 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Consolidate TPC benchmark scripts#3538

chore: Consolidate TPC benchmark scripts#3538
andygrove wants to merge 6 commits intoapache:mainfrom
andygrove:consolidate-benchmark-scripts

andygrove commented Feb 16, 2026

Uh oh!

andygrove commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

andygrove commented Feb 16, 2026

Summary

Test plan

Uh oh!

andygrove commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments