fix: Fix incorrect nearest neighbor search results when the query window is not a point#151
Conversation
paleolimbot
left a comment
There was a problem hiding this comment.
This also needs a +1 from @kylebarron or somebody who has been here for longer, but this PR looks good to me and really does fix a critical bug (silently incorrect nearest neighbour results).
Also just a note that most of this diff is the fuzz/regression test verifying the fix.
src/test/neighbors_geometry.rs
Outdated
| // Check that results are in non-decreasing distance order (the critical bug!) | ||
| for i in 1..rtree_with_distances.len() { | ||
| let prev_dist = rtree_with_distances[i - 1].1; | ||
| let curr_dist = rtree_with_distances[i].1; | ||
| assert!( | ||
| prev_dist <= curr_dist + 1e-10, // Small epsilon for floating point | ||
| "neighbors_geometry returned results out of order at position {} in {}: \ | ||
| idx {} has dist {}, but previous idx {} has dist {}", | ||
| i, | ||
| test_description, | ||
| rtree_with_distances[i].0, | ||
| curr_dist, | ||
| rtree_with_distances[i - 1].0, | ||
| prev_dist | ||
| ); | ||
| } |
There was a problem hiding this comment.
This seems like the main test code that should be failing with the current implementation.
Just confirming that this is the case (i.e., are the geometries generated in this test case sufficient to trigger the incorrect results?).
There was a problem hiding this comment.
Yes. Actually the original implementation does not need hammering too hard to exhibit incorrect behavior:
failures:
test::neighbors_geometry::test_neighbors_empty_tree_returns_empty
test::neighbors_geometry::test_neighbors_geometry_empty_tree_returns_empty
test::neighbors_geometry::test_neighbors_geometry_mixed_sizes
test::neighbors_geometry::test_neighbors_geometry_point_index_polygon_query
test::neighbors_geometry::test_neighbors_geometry_polygon_index_polygon_query
| /// Options for generating random geometries | ||
| #[derive(Debug, Clone)] | ||
| struct RandomGeometryOptions { | ||
| /// Bounding box for geometry generation | ||
| bounds: Rect, | ||
| /// Size range for generated geometries (min, max) | ||
| size_range: (f64, f64), | ||
| /// Number of vertices for polygons (min, max) | ||
| vertices_per_polygon_range: (usize, usize), | ||
| } |
There was a problem hiding this comment.
For context, this is inlined from SedonaDB's random geometry generator:
(If there's ever interest we can separate that out into a standalone crate or upstream it to an appropriate crate in georust if there is one!)
zhangfengcdt
left a comment
There was a problem hiding this comment.
The fix looks correct and well-tested. Thanks!
kylebarron
left a comment
There was a problem hiding this comment.
This seems fine. I'm curious whether the instability derives from code ported from https://github.com/mourner/flatbush or from APIs we added on top for geometry comparison
It is from the APIs added by ourselves to support non-point geometry types. Relevant PR: #141 |
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you all for the fix + the reviews!
CHANGES.mdorCHANGELOG.mdThis patch fixes nearest neighbor search when the query window is more complex than a point. The index traversal algorithm incorrectly uses the centroid of the query window to compute the distance from the query window to the nodes, which leads to incorrect or out of order results. The fix is to correctly compute geometry to box distances. Tests were added to verify that the fix is working.
The nearest neighbor search methods also do not work well with empty rtree index, this patch also includes a fix and test cases for that.