Feature/281 refactor test matrix by jathavaan · Pull Request #282 · kartAI/doppa

jathavaan · 2026-05-19T14:01:02Z

This pull request updates the README and CI/CD workflow files to reflect the current set of supported query patterns and Databricks cluster configurations, aligning documentation and automation with the active benchmark suite. The main changes involve removing references to deprecated or out-of-scope query types, updating the cluster sizes and join strategies for Apache Sedona on Databricks, and ensuring the workflows only include the relevant services.

Workflow and Service Updates:

Removed all attribute-spatial-compound-filter services (DuckDB, PostGIS, and Shapefile variants) from both the pull request test workflow (pull-request-tests.yml) and the container push workflow (push-containers-to-acr.yml), as these query patterns are no longer supported. [1] [2]
Cleaned up both workflows by removing or commenting out a large set of out-of-scope services (such as database scans, bounding box filtering, vector tile fetching, spatial aggregation, and ordered range queries) to focus only on the active query patterns. [1] [2]
Added new services for Apache Sedona national-scale spatial joins on Databricks, covering all three join strategies (default, broadcast, partitioned) and at five cluster sizes (2, 4, 8, 12, and 16 nodes). These are now included in both workflows. [1] [2]

Documentation Updates (README.md):

Updated the list of supported query patterns to only include point-in-polygon lookups, k-nearest-neighbour search, bounding-box filtering, and national-scale spatial joins. [1] [2]
Revised the description of cluster sizes for Databricks/Sedona benchmarks to cover 2, 4, 8, 12, and 16 nodes, and clarified the join strategy variants (default, broadcast, partitioned) and their roles in the benchmarks. [1] [2]
Updated the architecture table and methodology sections to reflect the new cluster sizes and join strategies, and described the three notebook variants corresponding to each join strategy. [1] [2]

Add the apples-to-apples Sedona `default` join-strategy variant (no broadcast hint, no Sedona partitioner config; Spark CBO picks the plan) and extend the existing broadcast and partitioned variants to 12- and 16-node clusters. - src/config.py: add DATABRICKS_{LOCAL_SCRIPT,WORKSPACE_NOTEBOOK}_PATH_DEFAULT - IDatabricksService Literal["broadcast","partitioned"] -> include "default" - DatabricksService NotebookVariant + dispatcher helpers extended - _databricks_benchmark_runner.NotebookVariant extended - new notebook src/presentation/databricks/national_scale_spatial_join_default.py - new entrypoints: - national_scale_spatial_join_databricks_default_{2,4,8,12,16}_nodes - national_scale_spatial_join_databricks_broadcast_{12,16}_nodes - national_scale_spatial_join_databricks_partitioned_{12,16}_nodes - wiring updates in entrypoints/__init__.py, app_config.py, benchmark_runner.py

Activate the consolidated experiment matrix: drop attribute-spatial compound filter and the medium tier from RQ1, omit the 2-node row at small for the broadcast and partitioned strategies in RQ2, and add 11 default-strategy rows plus 4 large-tier extension rows. Total stays at 52 experiments (15 RQ1 + 37 RQ2); 6 RQ1 + 34 RQ2 = 40 pair groups. - delete 3 attribute_spatial_compound_filter_* entrypoints + wiring - benchmarks.yml: -7 compound + -6 medium + -2 small-2-node + +11 default + +4 large - docker-compose.yml: -3 compound services + +9 sedona services - pull-request-tests.yml + push-containers-to-acr.yml matrices updated

Drop 20 entrypoints that were carried only as commented-out stubs in benchmarks.yml, docker-compose.yml and the CI workflow matrices. None were active in the current matrix; keeping them was no longer a load-bearing escape hatch since the test design has stabilised. - db-scan-{blob-storage,postgis} - bbox-filtering-{simple-local,simple-blob-storage,advanced-duckdb,advanced-postgis} - bbox-filtering-result-set-sizes-{municipality,county}-{duckdb,local,postgis} - vector-tiles-{single-tile,100k}-{pmtiles,vmt} - spatial-aggregation-grid-{duckdb,postgis} - ordered-range-query-{duckdb,postgis} Touches: entrypoint files (deleted), entrypoints/__init__.py, app_config.py wiring list, benchmark_runner.py imports + cases, benchmarks.yml + docker-compose.yml + both CI workflow files (commented-out blocks removed).

- RQ1 matrix: drop compound-filter row and medium-tier columns - RQ2 matrix: add Sedona default-strategy row, extend broadcast/partitioned to 12/16 nodes, note the small/2-node omission - Pair-group count: 33 -> 40 (Sedona singletons inclusive) - Engine table: list 5 Databricks cluster sizes - Quota note: target >=72 vCPU in Sweden Central; mention 12-node hedge - Databricks lifecycle section: list all three notebook variants - Research-gaps text: drop removed query patterns from the catalog - script-id flag example: refresh removed db-scan reference

Copilot

Pull request overview

This PR refactors the benchmark/test matrix to match the currently active experiment suite by removing deprecated/out-of-scope query entrypoints and CI matrix entries, and expanding the Databricks/Sedona national-scale spatial-join benchmarks to include a new default strategy plus additional cluster sizes.

Changes:

Removed multiple deprecated benchmark entrypoints (db scans, vector tiles, spatial aggregation, ordered range query, attribute+spatial compound filter, and various bbox variants) and their dispatch/import wiring.
Added Databricks/Sedona national-scale spatial-join entrypoints for default strategy and extended broadcast/partitioned to 12/16-node clusters, plus a new Databricks notebook variant for default.
Updated benchmarks.yml, docker-compose.yml, GitHub Actions workflows, and README documentation to reflect the new/trimmed matrix.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/presentation/entrypoints/vector_tiles_single_tile_vmt.py	Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/vector_tiles_single_tile_pmtiles.py	Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/vector_tiles_100k_vmt.py	Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/vector_tiles_100k_pmtiles.py	Removed deprecated vector-tile benchmark entrypoint
src/presentation/entrypoints/spatial_aggregation_grid_postgis.py	Removed deprecated spatial-aggregation entrypoint
src/presentation/entrypoints/spatial_aggregation_grid_duckdb.py	Removed deprecated spatial-aggregation entrypoint
src/presentation/entrypoints/ordered_range_query_postgis.py	Removed deprecated ordered-range-query entrypoint
src/presentation/entrypoints/ordered_range_query_duckdb.py	Removed deprecated ordered-range-query entrypoint
src/presentation/entrypoints/db_scan_postgis.py	Removed deprecated db-scan entrypoint
src/presentation/entrypoints/db_scan_blob_storage.py	Removed deprecated db-scan entrypoint
src/presentation/entrypoints/bbox_filtering_simple_local.py	Removed deprecated bbox variant entrypoint
src/presentation/entrypoints/bbox_filtering_simple_blob_storage.py	Removed deprecated bbox variant entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_municipality_postgis.py	Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_municipality_local.py	Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_municipality_duckdb.py	Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_county_postgis.py	Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_county_local.py	Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_result_set_sizes_county_duckdb.py	Removed deprecated bbox result-set-size entrypoint
src/presentation/entrypoints/bbox_filtering_advanced_postgis.py	Removed deprecated bbox advanced entrypoint
src/presentation/entrypoints/bbox_filtering_advanced_duckdb.py	Removed deprecated bbox advanced entrypoint
src/presentation/entrypoints/attribute_spatial_compound_filter_postgis.py	Removed deprecated attribute+spatial compound filter entrypoint
src/presentation/entrypoints/attribute_spatial_compound_filter_local.py	Removed deprecated attribute+spatial compound filter entrypoint
src/presentation/entrypoints/attribute_spatial_compound_filter_duckdb.py	Removed deprecated attribute+spatial compound filter entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_partitioned_16_nodes.py	Added Databricks partitioned 16-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_partitioned_12_nodes.py	Added Databricks partitioned 12-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_8_nodes.py	Added Databricks default 8-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_4_nodes.py	Added Databricks default 4-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_2_nodes.py	Added Databricks default 2-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_16_nodes.py	Added Databricks default 16-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_default_12_nodes.py	Added Databricks default 12-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_broadcast_16_nodes.py	Added Databricks broadcast 16-worker entrypoint
src/presentation/entrypoints/national_scale_spatial_join_databricks_broadcast_12_nodes.py	Added Databricks broadcast 12-worker entrypoint
src/presentation/entrypoints/_databricks_benchmark_runner.py	Extended notebook-variant typing to include `default`
src/presentation/entrypoints/init.py	Updated exported entrypoints (remove deprecated, add new Databricks variants)
src/presentation/databricks/national_scale_spatial_join_default.py	Added new Databricks notebook variant implementing “default” strategy
src/presentation/configuration/app_config.py	Updated DI wiring module list to match active entrypoints
src/infra/infrastructure/services/databricks_service.py	Added `default` notebook-variant routing to script/workspace paths
src/config.py	Added Config paths for the `default` Databricks notebook
src/application/contracts/databricks_service_interface.py	Updated interface typing/docs to include `default` notebook variant
README.md	Updated supported patterns and matrix/Databricks strategy documentation
docker-compose.yml	Removed deprecated benchmark services; added new Databricks variants
benchmarks.yml	Updated experiment definitions to match refactored matrix
benchmark_runner.py	Removed dispatch cases/imports for deprecated benchmarks; added new Databricks variants
.github/workflows/push-containers-to-acr.yml	Updated build/push matrix to match active services
.github/workflows/pull-request-tests.yml	Updated PR build matrix to match active services

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jathavaan added 4 commits May 19, 2026 15:42

jathavaan self-assigned this May 19, 2026

Copilot AI review requested due to automatic review settings May 19, 2026 14:01

jathavaan linked an issue May 19, 2026 that may be closed by this pull request

Refactor test matrix #281

Closed

39 tasks

jathavaan enabled auto-merge May 19, 2026 14:01

Copilot started reviewing on behalf of jathavaan May 19, 2026 14:01 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread README.md

jathavaan disabled auto-merge May 19, 2026 14:15

jathavaan merged commit f88b23e into main May 19, 2026
34 checks passed

jathavaan deleted the feature/281-refactor-test-matrix branch May 19, 2026 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/281 refactor test matrix#282

Feature/281 refactor test matrix#282
jathavaan merged 4 commits into
mainfrom
feature/281-refactor-test-matrix

jathavaan commented May 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jathavaan commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants