Skip to content

fix: Fix incorrect nearest neighbor search results when the query window is not a point#151

Merged
paleolimbot merged 4 commits intogeorust:mainfrom
Kontinuation:pr-fix-nn-search
Feb 6, 2026
Merged

fix: Fix incorrect nearest neighbor search results when the query window is not a point#151
paleolimbot merged 4 commits intogeorust:mainfrom
Kontinuation:pr-fix-nn-search

Conversation

@Kontinuation
Copy link
Contributor

@Kontinuation Kontinuation commented Jan 13, 2026

  • I agree to follow the project's code of conduct.
  • I added an entry to the project's change log file if knowledge of this change could be valuable to users.
    • Usually called CHANGES.md or CHANGELOG.md
    • Prefix changelog entries for breaking changes with "BREAKING: "

This patch fixes nearest neighbor search when the query window is more complex than a point. The index traversal algorithm incorrectly uses the centroid of the query window to compute the distance from the query window to the nodes, which leads to incorrect or out of order results. The fix is to correctly compute geometry to box distances. Tests were added to verify that the fix is working.

The nearest neighbor search methods also do not work well with empty rtree index, this patch also includes a fix and test cases for that.

@Kontinuation Kontinuation marked this pull request as ready for review January 13, 2026 10:07
Copy link
Contributor

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs a +1 from @kylebarron or somebody who has been here for longer, but this PR looks good to me and really does fix a critical bug (silently incorrect nearest neighbour results).

Also just a note that most of this diff is the fuzz/regression test verifying the fix.

Comment on lines +169 to +184
// Check that results are in non-decreasing distance order (the critical bug!)
for i in 1..rtree_with_distances.len() {
let prev_dist = rtree_with_distances[i - 1].1;
let curr_dist = rtree_with_distances[i].1;
assert!(
prev_dist <= curr_dist + 1e-10, // Small epsilon for floating point
"neighbors_geometry returned results out of order at position {} in {}: \
idx {} has dist {}, but previous idx {} has dist {}",
i,
test_description,
rtree_with_distances[i].0,
curr_dist,
rtree_with_distances[i - 1].0,
prev_dist
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the main test code that should be failing with the current implementation.

Just confirming that this is the case (i.e., are the geometries generated in this test case sufficient to trigger the incorrect results?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Actually the original implementation does not need hammering too hard to exhibit incorrect behavior:

failures:
    test::neighbors_geometry::test_neighbors_empty_tree_returns_empty
    test::neighbors_geometry::test_neighbors_geometry_empty_tree_returns_empty
    test::neighbors_geometry::test_neighbors_geometry_mixed_sizes
    test::neighbors_geometry::test_neighbors_geometry_point_index_polygon_query
    test::neighbors_geometry::test_neighbors_geometry_polygon_index_polygon_query

Comment on lines +16 to +25
/// Options for generating random geometries
#[derive(Debug, Clone)]
struct RandomGeometryOptions {
/// Bounding box for geometry generation
bounds: Rect,
/// Size range for generated geometries (min, max)
size_range: (f64, f64),
/// Number of vertices for polygons (min, max)
vertices_per_polygon_range: (usize, usize),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this is inlined from SedonaDB's random geometry generator:

https://github.com/apache/sedona-db/blob/5540845095c46a4e35d651dee4a5e96a65973bc2/rust/sedona-testing/src/datagen.rs#L469-L479

(If there's ever interest we can separate that out into a standalone crate or upstream it to an appropriate crate in georust if there is one!)

Copy link
Contributor

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct and well-tested. Thanks!

Copy link
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. I'm curious whether the instability derives from code ported from https://github.com/mourner/flatbush or from APIs we added on top for geometry comparison

@Kontinuation
Copy link
Contributor Author

This seems fine. I'm curious whether the instability derives from code ported from https://github.com/mourner/flatbush or from APIs we added on top for geometry comparison

It is from the APIs added by ourselves to support non-point geometry types. Relevant PR: #141

Copy link
Contributor

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the fix + the reviews!

@paleolimbot paleolimbot merged commit d70f049 into georust:main Feb 6, 2026
5 checks passed
@michaelkirk michaelkirk mentioned this pull request Feb 10, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants