Skip to content

Feature/273 add knn search benchmarks and the missing shapefile variants for point in polygon and attribute spatial compound filter#278

Merged
jathavaan merged 8 commits into
mainfrom
feature/273-add-knn-search-benchmarks-and-the-missing-shapefile-variants-for-point-in-polygon-and-attribute-spatial-compound-filter
May 19, 2026
Merged

Feature/273 add knn search benchmarks and the missing shapefile variants for point in polygon and attribute spatial compound filter#278
jathavaan merged 8 commits into
mainfrom
feature/273-add-knn-search-benchmarks-and-the-missing-shapefile-variants-for-point-in-polygon-and-attribute-spatial-compound-filter

Conversation

@jathavaan
Copy link
Copy Markdown
Collaborator

This pull request adds new local and KNN search benchmark entrypoints, expands the benchmarks configuration, and updates dependency injection and imports to support these additions. The main focus is on enabling local (GeoPandas-based) and KNN search benchmarks for the buildings dataset, both in code and in the orchestration configuration.

New benchmark entrypoints and orchestration:

  • Added new local (GeoPandas-based) benchmark entrypoints for attribute-spatial-compound-filter and point-in-polygon-lookup, as well as three KNN search entrypoints (DuckDB, local, and PostGIS). These are implemented in new files: attribute_spatial_compound_filter_local.py, point_in_polygon_lookup_local.py, knn_search_duckdb.py, knn_search_local.py, and knn_search_postgis.py. [1] [2] [3] [4] [5]
  • Updated the experiment definitions in benchmarks.yml to include the new local and KNN search benchmarks, with appropriate images, resource requirements, and related script IDs. [1] [2]

Integration and dependency injection:

  • Registered the new entrypoints in the DI configuration (app_config.py) and the entrypoint module imports (__init__.py), ensuring they are available for orchestration and CLI invocation. [1] [2] [3]

Benchmark runner updates:

  • Updated benchmark_runner.py to dispatch to the new local and KNN search entrypoints based on the script ID. [1] [2] [3]

Configuration:

  • Added a new constant TRONDHEIM_CENTER_WGS84 to Config for use as the reference point in KNN search benchmarks.

These changes collectively enable a broader set of spatial benchmarks, including local (GeoPandas) and KNN search scenarios, and ensure they are integrated into the benchmarking framework and orchestration.


New benchmark entrypoints:

  • Implemented attribute_spatial_compound_filter_local, point_in_polygon_lookup_local, knn_search_duckdb, knn_search_local, and knn_search_postgis entrypoints for local and KNN search benchmarking. [1] [2] [3] [4] [5]

Benchmark orchestration and configuration:

  • Added new experiment definitions for local and KNN benchmarks in benchmarks.yml, including Docker images, resources, and related scripts. [1] [2]
  • Added TRONDHEIM_CENTER_WGS84 constant to Config for use in KNN search.

Dependency injection and imports:

  • Registered new entrypoints in DI (app_config.py) and imported them in src/presentation/entrypoints/__init__.py. [1] [2] [3]

Benchmark runner updates:

  • Updated benchmark_runner.py to dispatch to the new entrypoints. [1] [2] [3]

@jathavaan jathavaan self-assigned this May 19, 2026
Copilot AI review requested due to automatic review settings May 19, 2026 07:45
@jathavaan jathavaan enabled auto-merge May 19, 2026 07:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the benchmarking framework by adding new GeoPandas-based “local” entrypoints and KNN search entrypoints (DuckDB/local/PostGIS), wiring them into dependency injection, orchestration (benchmarks.yml), and the CLI benchmark dispatcher.

Changes:

  • Added new benchmark entrypoints for local shapefile workflows (attribute+spatial filter, point-in-polygon) and KNN search (DuckDB/local/PostGIS).
  • Updated orchestration config (benchmarks.yml) to include the new experiments and related script IDs.
  • Registered and dispatched the new entrypoints via DI (app_config.py), entrypoint exports (__init__.py), and benchmark_runner.py, plus added Config.TRONDHEIM_CENTER_WGS84.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/presentation/entrypoints/attribute_spatial_compound_filter_local.py New GeoPandas-based local benchmark for attribute + spatial filtering on the buildings shapefile.
src/presentation/entrypoints/point_in_polygon_lookup_local.py New GeoPandas-based local point-in-polygon lookup benchmark with generated sample points.
src/presentation/entrypoints/knn_search_duckdb.py New DuckDB KNN benchmark scanning parquet and sorting by distance to a fixed reference point.
src/presentation/entrypoints/knn_search_local.py New GeoPandas-based local KNN benchmark over the shapefile.
src/presentation/entrypoints/knn_search_postgis.py New PostGIS KNN benchmark using GiST KNN ordering (<->).
src/presentation/entrypoints/init.py Exports newly added entrypoints for import/dispatch.
src/presentation/configuration/app_config.py Wires new entrypoint modules into Dependency Injector initialization.
src/config.py Adds TRONDHEIM_CENTER_WGS84 constant used by KNN benchmarks.
benchmarks.yml Adds new experiments for local and KNN benchmarks and updates related-script links.
benchmark_runner.py Adds dispatch cases for the new script IDs and imports the new entrypoints.
Comments suppressed due to low confidence (2)

src/presentation/entrypoints/knn_search_local.py:73

  • The _download_data implementation is duplicated across multiple local shapefile-based entrypoints (and is also very similar to bbox_filtering_local). This increases maintenance cost (e.g., updating the blob prefix/ext list). Consider extracting a shared helper and reusing it across the local entrypoints.
    blob_storage_service: IBlobStorageService = Provide[Containers.blob_storage_service],
) -> None:
    Config.BUILDINGS_SHAPEFILE.parent.mkdir(parents=True, exist_ok=True)

    blob_prefix = "copies/shapefile"
    base = Config.BUILDINGS_SHAPEFILE.with_suffix("")
    for ext in (".shp", ".shx", ".dbf", ".prj", ".cpg", ".qix"):
        blob_name = f"{blob_prefix}/{base.name}{ext}"
        data = blob_storage_service.download_file(
            container_name=StorageContainer.DATA,
            blob_name=blob_name,
        )
        if data is not None:
            base.with_suffix(ext).write_bytes(data)

src/presentation/entrypoints/point_in_polygon_lookup_local.py:98

  • The _download_data implementation is duplicated across multiple local shapefile-based entrypoints (and is also very similar to bbox_filtering_local). This increases maintenance cost (e.g., updating the blob prefix/ext list). Consider extracting a shared helper and reusing it across the local entrypoints.
    Config.BUILDINGS_SHAPEFILE.parent.mkdir(parents=True, exist_ok=True)

    blob_prefix = "copies/shapefile"
    base = Config.BUILDINGS_SHAPEFILE.with_suffix("")
    for ext in (".shp", ".shx", ".dbf", ".prj", ".cpg", ".qix"):
        blob_name = f"{blob_prefix}/{base.name}{ext}"
        data = blob_storage_service.download_file(
            container_name=StorageContainer.DATA,
            blob_name=blob_name,
        )
        if data is not None:
            base.with_suffix(ext).write_bytes(data)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/presentation/entrypoints/knn_search_local.py
Comment thread src/presentation/entrypoints/point_in_polygon_lookup_local.py Outdated
Comment thread src/presentation/entrypoints/attribute_spatial_compound_filter_local.py Outdated
@jathavaan jathavaan disabled auto-merge May 19, 2026 07:59
@jathavaan jathavaan merged commit 480c8c5 into main May 19, 2026
2 checks passed
@jathavaan jathavaan deleted the feature/273-add-knn-search-benchmarks-and-the-missing-shapefile-variants-for-point-in-polygon-and-attribute-spatial-compound-filter branch May 19, 2026 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add KNN search benchmarks and the missing Shapefile variants for point-in-polygon and attribute-spatial compound filter

2 participants