Conversation
Rewrites benchmarks.yml to the 52 active experiments from Table 4.2.1 (RQ1: 4 query types x 7 size variants = 28; RQ2: 6 single-node + 18 Sedona = 24). RQ1 pairs three engines at small and two at medium/large via `related_script_ids`; Sedona variants run unpaired. Adds `_strip_dataset_size_suffix` in benchmark_runner.py so 52 unique yml ids (each carrying a -small/-medium/-large tail) dispatch to the shared base entrypoint while dataset_size flows in via --dataset-size. Collapses docker-compose.yml to 20 active services (one per (query_type, engine) pair) and trims both CI build matrices to match. Out-of-scope benchmarks stay in tree but are commented out at the bottom of benchmarks.yml, docker-compose.yml, and the workflow matrices so re-enabling them is a single search-and-uncomment. Adds a Test matrix table to the README documenting which experiments launch as a parallel pair group and which run alone.
Closed
9 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Reorganizes the benchmark configuration to align with Table 4.2.1 (52 active experiments: 28 for RQ1, 24 for RQ2) by introducing per-size experiment ids while keeping a single entrypoint per (query_type, engine) pair. The orchestrator already forwards --dataset-size separately, so a small suffix-stripping helper in benchmark_runner.py keeps the dispatch table compact. Out-of-scope items are commented out (not deleted) in all four affected files so re-enabling is uniform.
Changes:
- Rewrites
benchmarks.ymlto 52 size-suffixed experiments with correctly scopedrelated_script_ids(3-way atsmall, 2-way atmedium/large, Sedona unpaired); preserves out-of-scope entries as comments. - Adds
_strip_dataset_size_suffixinbenchmark_runner.pyso size-suffixed ids dispatch to the existing base cases; the match table now correctly covers every active base id (verified all 12 RQ1 + 8 RQ2 base ids exist as cases). - Trims
docker-compose.ymland the two GitHub Actions matrices to the 20 active services, and extends the README with a Test matrix section documenting pair groups (12 RQ1 + 21 RQ2 = 33, consistent with 52 experiments).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
benchmarks.yml |
Replaces flat experiment list with 52 size-suffixed entries; mutual related_script_ids are symmetric at each size; commented out-of-scope block. |
benchmark_runner.py |
Adds _strip_dataset_size_suffix and wires it into the dispatch match; iterates DatasetSize enum so small/medium/large suffixes are stripped before lookup. |
docker-compose.yml |
Reduces to 20 active services (one per base id) plus commented out-of-scope mirror. |
.github/workflows/push-containers-to-acr.yml |
Trims build matrix to the 20 active images; commented mirror at the bottom. |
.github/workflows/pull-request-tests.yml |
Mirrors the trimmed service matrix for PR tests. |
README.md |
Adds TOC entry and "Test matrix" subsection documenting RQ1/RQ2 pair groupings and Shapefile-only-at-small policy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rewrites benchmarks.yml to the 52 active experiments from Table 4.2.1 (RQ1: 4 query types x 7 size variants = 28; RQ2: 6 single-node + 18 Sedona = 24). RQ1 pairs three engines at small and two at medium/large via
related_script_ids; Sedona variants run unpaired.Adds
_strip_dataset_size_suffixin benchmark_runner.py so 52 unique yml ids (each carrying a -small/-medium/-large tail) dispatch to the shared base entrypoint while dataset_size flows in via --dataset-size.Collapses docker-compose.yml to 20 active services (one per (query_type, engine) pair) and trims both CI build matrices to match. Out-of-scope benchmarks stay in tree but are commented out at the bottom of benchmarks.yml, docker-compose.yml, and the workflow matrices so re-enabling them is a single search-and-uncomment.
Adds a Test matrix table to the README documenting which experiments launch as a parallel pair group and which run alone.