Skip to content

Feature/281 refactor test matrix#282

Merged
jathavaan merged 4 commits into
mainfrom
feature/281-refactor-test-matrix
May 19, 2026
Merged

Feature/281 refactor test matrix#282
jathavaan merged 4 commits into
mainfrom
feature/281-refactor-test-matrix

Conversation

@jathavaan
Copy link
Copy Markdown
Collaborator

This pull request updates the README and CI/CD workflow files to reflect the current set of supported query patterns and Databricks cluster configurations, aligning documentation and automation with the active benchmark suite. The main changes involve removing references to deprecated or out-of-scope query types, updating the cluster sizes and join strategies for Apache Sedona on Databricks, and ensuring the workflows only include the relevant services.

Workflow and Service Updates:

  • Removed all attribute-spatial-compound-filter services (DuckDB, PostGIS, and Shapefile variants) from both the pull request test workflow (pull-request-tests.yml) and the container push workflow (push-containers-to-acr.yml), as these query patterns are no longer supported. [1] [2]
  • Cleaned up both workflows by removing or commenting out a large set of out-of-scope services (such as database scans, bounding box filtering, vector tile fetching, spatial aggregation, and ordered range queries) to focus only on the active query patterns. [1] [2]
  • Added new services for Apache Sedona national-scale spatial joins on Databricks, covering all three join strategies (default, broadcast, partitioned) and at five cluster sizes (2, 4, 8, 12, and 16 nodes). These are now included in both workflows. [1] [2]

Documentation Updates (README.md):

  • Updated the list of supported query patterns to only include point-in-polygon lookups, k-nearest-neighbour search, bounding-box filtering, and national-scale spatial joins. [1] [2]
  • Revised the description of cluster sizes for Databricks/Sedona benchmarks to cover 2, 4, 8, 12, and 16 nodes, and clarified the join strategy variants (default, broadcast, partitioned) and their roles in the benchmarks. [1] [2]
  • Updated the architecture table and methodology sections to reflect the new cluster sizes and join strategies, and described the three notebook variants corresponding to each join strategy. [1] [2]

jathavaan added 4 commits May 19, 2026 15:42
Add the apples-to-apples Sedona `default` join-strategy variant (no
broadcast hint, no Sedona partitioner config; Spark CBO picks the plan)
and extend the existing broadcast and partitioned variants to 12- and
16-node clusters.

- src/config.py: add DATABRICKS_{LOCAL_SCRIPT,WORKSPACE_NOTEBOOK}_PATH_DEFAULT
- IDatabricksService Literal["broadcast","partitioned"] -> include "default"
- DatabricksService NotebookVariant + dispatcher helpers extended
- _databricks_benchmark_runner.NotebookVariant extended
- new notebook src/presentation/databricks/national_scale_spatial_join_default.py
- new entrypoints:
  - national_scale_spatial_join_databricks_default_{2,4,8,12,16}_nodes
  - national_scale_spatial_join_databricks_broadcast_{12,16}_nodes
  - national_scale_spatial_join_databricks_partitioned_{12,16}_nodes
- wiring updates in entrypoints/__init__.py, app_config.py, benchmark_runner.py
Activate the consolidated experiment matrix: drop attribute-spatial
compound filter and the medium tier from RQ1, omit the 2-node row at
small for the broadcast and partitioned strategies in RQ2, and add 11
default-strategy rows plus 4 large-tier extension rows. Total stays at
52 experiments (15 RQ1 + 37 RQ2); 6 RQ1 + 34 RQ2 = 40 pair groups.

- delete 3 attribute_spatial_compound_filter_* entrypoints + wiring
- benchmarks.yml: -7 compound + -6 medium + -2 small-2-node + +11 default + +4 large
- docker-compose.yml: -3 compound services + +9 sedona services
- pull-request-tests.yml + push-containers-to-acr.yml matrices updated
Drop 20 entrypoints that were carried only as commented-out stubs in
benchmarks.yml, docker-compose.yml and the CI workflow matrices. None
were active in the current matrix; keeping them was no longer a
load-bearing escape hatch since the test design has stabilised.

- db-scan-{blob-storage,postgis}
- bbox-filtering-{simple-local,simple-blob-storage,advanced-duckdb,advanced-postgis}
- bbox-filtering-result-set-sizes-{municipality,county}-{duckdb,local,postgis}
- vector-tiles-{single-tile,100k}-{pmtiles,vmt}
- spatial-aggregation-grid-{duckdb,postgis}
- ordered-range-query-{duckdb,postgis}

Touches: entrypoint files (deleted), entrypoints/__init__.py, app_config.py
wiring list, benchmark_runner.py imports + cases, benchmarks.yml +
docker-compose.yml + both CI workflow files (commented-out blocks removed).
- RQ1 matrix: drop compound-filter row and medium-tier columns
- RQ2 matrix: add Sedona default-strategy row, extend broadcast/partitioned
  to 12/16 nodes, note the small/2-node omission
- Pair-group count: 33 -> 40 (Sedona singletons inclusive)
- Engine table: list 5 Databricks cluster sizes
- Quota note: target >=72 vCPU in Sweden Central; mention 12-node hedge
- Databricks lifecycle section: list all three notebook variants
- Research-gaps text: drop removed query patterns from the catalog
- script-id flag example: refresh removed db-scan reference
@jathavaan jathavaan self-assigned this May 19, 2026
Copilot AI review requested due to automatic review settings May 19, 2026 14:01
@jathavaan jathavaan linked an issue May 19, 2026 that may be closed by this pull request
39 tasks
@jathavaan jathavaan enabled auto-merge May 19, 2026 14:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the benchmark/test matrix to match the currently active experiment suite by removing deprecated/out-of-scope query entrypoints and CI matrix entries, and expanding the Databricks/Sedona national-scale spatial-join benchmarks to include a new default strategy plus additional cluster sizes.

Changes:

  • Removed multiple deprecated benchmark entrypoints (db scans, vector tiles, spatial aggregation, ordered range query, attribute+spatial compound filter, and various bbox variants) and their dispatch/import wiring.
  • Added Databricks/Sedona national-scale spatial-join entrypoints for default strategy and extended broadcast/partitioned to 12/16-node clusters, plus a new Databricks notebook variant for default.
  • Updated benchmarks.yml, docker-compose.yml, GitHub Actions workflows, and README documentation to reflect the new/trimmed matrix.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/presentation/entrypoints/vector_tiles_single_tile_vmt.py Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/vector_tiles_single_tile_pmtiles.py Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/vector_tiles_100k_vmt.py Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/vector_tiles_100k_pmtiles.py Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/spatial_aggregation_grid_postgis.py Removed deprecated spatial-aggregation entrypoint
src/presentation/entrypoints/spatial_aggregation_grid_duckdb.py Removed deprecated spatial-aggregation entrypoint
src/presentation/entrypoints/ordered_range_query_postgis.py Removed deprecated ordered-range-query entrypoint
src/presentation/entrypoints/ordered_range_query_duckdb.py Removed deprecated ordered-range-query entrypoint
src/presentation/entrypoints/db_scan_postgis.py Removed deprecated db-scan entrypoint
src/presentation/entrypoints/db_scan_blob_storage.py Removed deprecated db-scan entrypoint
src/presentation/entrypoints/bbox_filtering_simple_local.py Removed deprecated bbox variant entrypoint
src/presentation/entrypoints/bbox_filtering_simple_blob_storage.py Removed deprecated bbox variant entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_municipality_postgis.py Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_municipality_local.py Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_municipality_duckdb.py Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_county_postgis.py Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_county_local.py Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_county_duckdb.py Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_advanced_postgis.py Removed deprecated bbox advanced entrypoint
src/presentation/entrypoints/bbox_filtering_advanced_duckdb.py Removed deprecated bbox advanced entrypoint
src/presentation/entrypoints/attribute_spatial_compound_filter_postgis.py Removed deprecated attribute+spatial compound filter entrypoint
src/presentation/entrypoints/attribute_spatial_compound_filter_local.py Removed deprecated attribute+spatial compound filter entrypoint
src/presentation/entrypoints/attribute_spatial_compound_filter_duckdb.py Removed deprecated attribute+spatial compound filter entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_partitioned_16_nodes.py Added Databricks partitioned 16-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_partitioned_12_nodes.py Added Databricks partitioned 12-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_8_nodes.py Added Databricks default 8-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_4_nodes.py Added Databricks default 4-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_2_nodes.py Added Databricks default 2-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_16_nodes.py Added Databricks default 16-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_12_nodes.py Added Databricks default 12-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_broadcast_16_nodes.py Added Databricks broadcast 16-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_broadcast_12_nodes.py Added Databricks broadcast 12-worker entrypoint
src/presentation/entrypoints/_databricks_benchmark_runner.py Extended notebook-variant typing to include default
src/presentation/entrypoints/init.py Updated exported entrypoints (remove deprecated, add new Databricks variants)
src/presentation/databricks/national_scale_spatial_join_default.py Added new Databricks notebook variant implementing “default” strategy
src/presentation/configuration/app_config.py Updated DI wiring module list to match active entrypoints
src/infra/infrastructure/services/databricks_service.py Added default notebook-variant routing to script/workspace paths
src/config.py Added Config paths for the default Databricks notebook
src/application/contracts/databricks_service_interface.py Updated interface typing/docs to include default notebook variant
README.md Updated supported patterns and matrix/Databricks strategy documentation
docker-compose.yml Removed deprecated benchmark services; added new Databricks variants
benchmarks.yml Updated experiment definitions to match refactored matrix
benchmark_runner.py Removed dispatch cases/imports for deprecated benchmarks; added new Databricks variants
.github/workflows/push-containers-to-acr.yml Updated build/push matrix to match active services
.github/workflows/pull-request-tests.yml Updated PR build matrix to match active services

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
@jathavaan jathavaan disabled auto-merge May 19, 2026 14:15
@jathavaan jathavaan merged commit f88b23e into main May 19, 2026
34 checks passed
@jathavaan jathavaan deleted the feature/281-refactor-test-matrix branch May 19, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor test matrix

2 participants