Skip to content

Comments

Parallelize sampling external sources and threadsafe rejection counters#3830

Open
eepeterson wants to merge 11 commits intoopenmc-dev:developfrom
eepeterson:threadsafe_sampling
Open

Parallelize sampling external sources and threadsafe rejection counters#3830
eepeterson wants to merge 11 commits intoopenmc-dev:developfrom
eepeterson:threadsafe_sampling

Conversation

@eepeterson
Copy link
Contributor

Description

This PR parallelizes sampling external sources from Python through openmc.lib and the C API by adding an additional optional threads argument to openmc_sample_external_source function to batch the work across multiple OpenMP threads if desired. It also provides the option to return results from the sample_external_source method as a numpy structured array with a new d_type that mimics the _SourceSite struct to avoid the expensive conversion (both time and memory) to a ParticleList of SourceParticle objects. Along the way I also noticed that the source site rejection and acceptance counters that we use to determine if the sampling of the source is too inefficient and error out were static int so I made those global scope atomics and added a max_source_rejections_per_sample setting so we can keep a local rejection counter to error out rather than relying solely on the global check.

Performance results for two source examples are shown below. One is a simple isotropic, point, Watt spectrum IndependentSource and the other is a custom compiled source that simulates the Frascati Neutron Generator (FNG). I sampled both sources for 100k, 1M, 10M, and 50M particles on different numbers of threads and using the new output option of a numpy structured array. On my laptop the existing implementation on the develop branch and my implementation that returns a ParticleList cannot sample 50M particles because the memory requirements eat up all 32GB of RAM (largely due to Python object overhead). Returning the numpy array instead buys about a factor of 3 in memory savings and allows me to sample over 100M particles but I didn't both pushing it.

The plan is then to implement batching source sites to file for very large sample sizes that wouldn't fit in memory and implement the ability to generate histograms or source distributions from very large numbers of discrete samples, but that will be done in a separate PR. The parallelization implemented here is in support of these use cases.

sampling_mode_comparison

Fixes # (issue)
N/A

Checklist

  • I have performed a self-review of my own code
  • I have run clang-format (version 15) on any C++ source files (if applicable)
  • I have followed the style guidelines for Python source files (if applicable)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

- Add SOURCE_SAMPLE_BATCH_SIZE (default 1M = ~104 MB) to bound memory
  usage when sampling more sites than the batch size.  The function
  internally loops over batches, reusing the same cached ctypes buffer.
  Seed offsets ensure bitwise-identical results regardless of batch size.
- Value-initialize SourceSite (SourceSite site {}) to zero progeny_id,
  parent_id, parent_nuclide fields that are not set during sampling.
  Fixes non-deterministic garbage across threads from uninitialized stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant