Add buffer3 class and accelerate Demons registration via O(1) access and multithreading#84
Add buffer3 class and accelerate Demons registration via O(1) access and multithreading#84
Conversation
…t access - Create src/Alignment_Buffer3.h: templated 3D array class with O(1) random access, Gaussian smoothing, separable convolution, parallel operations via work_queue, and marshalling to/from planar_image_collection - Refactor smooth_vector_field to use buffer3 (was largest bottleneck) - Refactor compute_gradient to use buffer3 for O(1) z-neighbour access - Replace get_image() calls in main AlignViaDemons loop with iterators - Add 17 comprehensive buffer3 unit tests - All 58 test cases pass (976 assertions) Co-authored-by: hdclark <934858+hdclark@users.noreply.github.com>
Co-authored-by: hdclark <934858+hdclark@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a new contiguous 3D buffer abstraction (buffer3<T>) to speed up Demons deformable registration by avoiding repeated std::list indexing in planar_image_collection and enabling slice-parallel smoothing/convolution patterns.
Changes:
- Added
src/Alignment_Buffer3.himplementingbuffer3<T>with marshalling to/fromplanar_image_collection, separable Gaussian smoothing, and slice-parallel visitor helpers. - Refactored
src/Alignment_Demons.ccto usebuffer3for vector-field smoothing and gradient computation, and replacedget_image()indexing with lockstep iterator traversal. - Expanded
src/Alignment_Demons_Tests.ccwith unit tests coveringbuffer3construction, access, marshalling, and smoothing behaviors.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/Alignment_Buffer3.h | Adds buffer3<T> (contiguous 3D storage) + marshalling + smoothing/convolution + parallel slice traversal helpers. |
| src/Alignment_Demons.cc | Switches smoothing/gradient paths to buffer3 and removes get_image()-based list traversal hot spots. |
| src/Alignment_Demons_Tests.cc | Adds buffer3 unit tests and round-trip/multithreaded smoothing checks. |
| #include <cmath> | ||
| #include <limits> | ||
| #include <list> | ||
| #include <mutex> |
There was a problem hiding this comment.
This header uses std::atomic and std::condition_variable but does not include <atomic> or <condition_variable>, relying on transitive includes from Thread_Pool.h. To keep the header self-contained and avoid fragile build breaks if includes change, add the missing standard headers explicitly.
| #include <mutex> | |
| #include <mutex> | |
| #include <atomic> | |
| #include <condition_variable> |
There was a problem hiding this comment.
Added <atomic>, <condition_variable>, and <thread> includes in 9f5001b.
src/Alignment_Demons.cc
Outdated
| } | ||
|
|
||
| return gradient; | ||
| return grad.to_planar_image_collection(); |
There was a problem hiding this comment.
compute_gradient() now marshals through buffer3::from_planar_image_collection(), which spatially sorts slices, and then returns grad.to_planar_image_collection(), which emits images in sorted order. If stationary.images is not already spatially ordered, gradient.images will no longer be aligned with stationary/warped_moving during the lockstep iterator traversal below, producing incorrect updates. Consider writing gradients back into a collection that matches the original input ordering/metadata (e.g., allocate an output collection mirroring img_coll geometry, then use buffer3<double>::write_to_planar_image_collection() to populate it), or avoid sorting for this code path.
| return grad.to_planar_image_collection(); | |
| // Write gradients back into a planar_image_collection that mirrors the | |
| // original input ordering/metadata instead of returning a spatially | |
| // re-sorted collection. | |
| planar_image_collection<double, double> gradient_images; | |
| grad.write_to_planar_image_collection(img_coll, gradient_images); | |
| return gradient_images; |
There was a problem hiding this comment.
Fixed. The compute_gradient wrapper now allocates an output collection mirroring img_coll geometry/ordering and uses write_to_planar_image_collection() to populate it, avoiding the re-sorting issue. Additionally, AlignViaDemons now operates entirely on buffer3 internally so the ordering concern no longer applies in the main loop. See 9f5001b.
src/Alignment_Demons.cc
Outdated
| // Use buffer3 for O(1) random access and multithreaded smoothing. | ||
| auto buf = buffer3<double>::from_planar_image_collection(field); | ||
| buf.gaussian_smooth(sigma_mm); | ||
| buf.write_to_planar_image_collection(field); |
There was a problem hiding this comment.
buffer3::gaussian_smooth(sigma_mm) uses an internal work_queue constructed with 1 worker, so this call path is single-threaded. If the intent is to actually leverage the new multithreaded implementation for Demons performance, pass a work_queue with >1 workers (or reuse an existing queue) into gaussian_smooth(sigma_mm, wq).
There was a problem hiding this comment.
Fixed. The no-argument gaussian_smooth overload now uses std::thread::hardware_concurrency() threads instead of 1. Additionally, AlignViaDemons creates a shared work_queue with hardware_concurrency() threads and passes it to all smoothing calls. See 9f5001b.
| void visit_slice_xy(int64_t slice, const std::function<void(int64_t row, int64_t col)> &f) { | ||
| for(int64_t r = 0; r < N_rows; ++r){ | ||
| for(int64_t c = 0; c < N_cols; ++c){ | ||
| f(r, c); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
visit_slice_xy(int64_t slice, ...) ignores its slice parameter entirely. This is easy to misuse (callers may expect bounds checking or slice-specific behavior) and can trigger unused-parameter warnings depending on build flags. Either remove the parameter, use it for validation (e.g., throw if out of range), or incorporate it into the visitor interface.
There was a problem hiding this comment.
Added bounds checking — visit_slice_xy now throws std::out_of_range if the slice index is invalid, in 9f5001b.
|
@copilot apply changes based on the comments in this thread Additionally, the helper function interfaces in src/Alignment_Demons.cc still use planar_image_collection and marshall back-and-forth with the buffer3 class, which is needlessly slow. Keep the public interface for AlignViaDemons(...) using planar_image_collection, but convert all other function interfaces in src/Alignment_Demons.cc to use buffer3 to speed up calculations. There are a few uses of some of the helper functions throughout the rest of the code base (e.g., src/Operations/RegisterImagesDemons.cc and src/Operations/WarpImages.cc), but it is OK to add conversion code planar_image_collection <-> buffer3 as needed to access these helper functions. Prioritize optimizing the code in the following files, replacing planar_image_collection with buffer3 wherever possible, which are currently the largest bottleneck for performing deformable image registration:
|
…iew comments - Add missing <atomic>, <condition_variable>, <thread> includes - Fix visit_slice_xy to validate slice parameter with bounds check - Make gaussian_smooth use hardware_concurrency() threads by default - Add buffer3-native overloads: smooth_vector_field(buffer3&), compute_gradient(buffer3&), warp_image_with_field(buffer3&, buffer3&) - Keep planar_image_collection wrappers for external callers - Fix compute_gradient wrapper to preserve original collection ordering - Rewrite AlignViaDemons to use buffer3 throughout the iterative loop, only converting at boundary (input: from_planar_image_collection, output: write_to_planar_image_collection) - Eliminates all per-iteration marshalling overhead - All 58 tests pass (976 assertions) Co-authored-by: hdclark <934858+hdclark@users.noreply.github.com>
All review comments addressed and the major refactoring is done in 9f5001b:
Note: I haven't split |
|
There appears to be a race condition introduced in the previous commit. Attempting to run the code causes execution to halt after a non-deterministic number of iterations on a medium-sized set of images (50 rows x 50 columns x 50 images with Perlin noise). |
<atomic>,<condition_variable>,<thread>includes (comment 2792230850)💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.