Feature/273 add knn search benchmarks and the missing shapefile variants for point in polygon and attribute spatial compound filter#278
Conversation
9 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
This PR expands the benchmarking framework by adding new GeoPandas-based “local” entrypoints and KNN search entrypoints (DuckDB/local/PostGIS), wiring them into dependency injection, orchestration (benchmarks.yml), and the CLI benchmark dispatcher.
Changes:
- Added new benchmark entrypoints for local shapefile workflows (attribute+spatial filter, point-in-polygon) and KNN search (DuckDB/local/PostGIS).
- Updated orchestration config (
benchmarks.yml) to include the new experiments and related script IDs. - Registered and dispatched the new entrypoints via DI (
app_config.py), entrypoint exports (__init__.py), andbenchmark_runner.py, plus addedConfig.TRONDHEIM_CENTER_WGS84.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/presentation/entrypoints/attribute_spatial_compound_filter_local.py | New GeoPandas-based local benchmark for attribute + spatial filtering on the buildings shapefile. |
| src/presentation/entrypoints/point_in_polygon_lookup_local.py | New GeoPandas-based local point-in-polygon lookup benchmark with generated sample points. |
| src/presentation/entrypoints/knn_search_duckdb.py | New DuckDB KNN benchmark scanning parquet and sorting by distance to a fixed reference point. |
| src/presentation/entrypoints/knn_search_local.py | New GeoPandas-based local KNN benchmark over the shapefile. |
| src/presentation/entrypoints/knn_search_postgis.py | New PostGIS KNN benchmark using GiST KNN ordering (<->). |
| src/presentation/entrypoints/init.py | Exports newly added entrypoints for import/dispatch. |
| src/presentation/configuration/app_config.py | Wires new entrypoint modules into Dependency Injector initialization. |
| src/config.py | Adds TRONDHEIM_CENTER_WGS84 constant used by KNN benchmarks. |
| benchmarks.yml | Adds new experiments for local and KNN benchmarks and updates related-script links. |
| benchmark_runner.py | Adds dispatch cases for the new script IDs and imports the new entrypoints. |
Comments suppressed due to low confidence (2)
src/presentation/entrypoints/knn_search_local.py:73
- The
_download_dataimplementation is duplicated across multiple local shapefile-based entrypoints (and is also very similar tobbox_filtering_local). This increases maintenance cost (e.g., updating the blob prefix/ext list). Consider extracting a shared helper and reusing it across the local entrypoints.
blob_storage_service: IBlobStorageService = Provide[Containers.blob_storage_service],
) -> None:
Config.BUILDINGS_SHAPEFILE.parent.mkdir(parents=True, exist_ok=True)
blob_prefix = "copies/shapefile"
base = Config.BUILDINGS_SHAPEFILE.with_suffix("")
for ext in (".shp", ".shx", ".dbf", ".prj", ".cpg", ".qix"):
blob_name = f"{blob_prefix}/{base.name}{ext}"
data = blob_storage_service.download_file(
container_name=StorageContainer.DATA,
blob_name=blob_name,
)
if data is not None:
base.with_suffix(ext).write_bytes(data)
src/presentation/entrypoints/point_in_polygon_lookup_local.py:98
- The
_download_dataimplementation is duplicated across multiple local shapefile-based entrypoints (and is also very similar tobbox_filtering_local). This increases maintenance cost (e.g., updating the blob prefix/ext list). Consider extracting a shared helper and reusing it across the local entrypoints.
Config.BUILDINGS_SHAPEFILE.parent.mkdir(parents=True, exist_ok=True)
blob_prefix = "copies/shapefile"
base = Config.BUILDINGS_SHAPEFILE.with_suffix("")
for ext in (".shp", ".shx", ".dbf", ".prj", ".cpg", ".qix"):
blob_name = f"{blob_prefix}/{base.name}{ext}"
data = blob_storage_service.download_file(
container_name=StorageContainer.DATA,
blob_name=blob_name,
)
if data is not None:
base.with_suffix(ext).write_bytes(data)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds new local and KNN search benchmark entrypoints, expands the benchmarks configuration, and updates dependency injection and imports to support these additions. The main focus is on enabling local (GeoPandas-based) and KNN search benchmarks for the buildings dataset, both in code and in the orchestration configuration.
New benchmark entrypoints and orchestration:
attribute_spatial_compound_filter_local.py,point_in_polygon_lookup_local.py,knn_search_duckdb.py,knn_search_local.py, andknn_search_postgis.py. [1] [2] [3] [4] [5]benchmarks.ymlto include the new local and KNN search benchmarks, with appropriate images, resource requirements, and related script IDs. [1] [2]Integration and dependency injection:
app_config.py) and the entrypoint module imports (__init__.py), ensuring they are available for orchestration and CLI invocation. [1] [2] [3]Benchmark runner updates:
benchmark_runner.pyto dispatch to the new local and KNN search entrypoints based on the script ID. [1] [2] [3]Configuration:
TRONDHEIM_CENTER_WGS84toConfigfor use as the reference point in KNN search benchmarks.These changes collectively enable a broader set of spatial benchmarks, including local (GeoPandas) and KNN search scenarios, and ensure they are integrated into the benchmarking framework and orchestration.
New benchmark entrypoints:
attribute_spatial_compound_filter_local,point_in_polygon_lookup_local,knn_search_duckdb,knn_search_local, andknn_search_postgisentrypoints for local and KNN search benchmarking. [1] [2] [3] [4] [5]Benchmark orchestration and configuration:
benchmarks.yml, including Docker images, resources, and related scripts. [1] [2]TRONDHEIM_CENTER_WGS84constant toConfigfor use in KNN search.Dependency injection and imports:
app_config.py) and imported them insrc/presentation/entrypoints/__init__.py. [1] [2] [3]Benchmark runner updates:
benchmark_runner.pyto dispatch to the new entrypoints. [1] [2] [3]