-
|
Hello All, Now, I will first like to say I love using ImpactX for space-charge effects and how nice and compact it is. However, there seems to be lacking of examples on how to actually run impactx, at least I could not find good resources. For python usage, it is amazing and very well rounded. The question is how to determine the accurate physics and how to speed up the simulations while maintaining accuracy? For grids less than 64x64x64, I find that a single MPI rank and a good amount of OMP threads seem to be working fastest for multigrid solver. However, when I try to use FFT solver, the specified OMP threads are not being used and it becomes a single core job mostly. If I use 48x48x48 for both multgrid and FFT solver, they seem to be fast enough. One thing that is not mentioned is what to do with padding? For multigrid solver, the boundary condition is Dirichlet so the padding should be large enough so the fields dont get messed up, but for FFT solver it is open boundary conditions so the padding can be smaller? Here for padding usage, I am referring to prob_relative: from impactx import Config, ImpactX, elements
sim = ImpactX()
sim.prob_relative = [2.2] But if the padding is too large, then number of cells per For a single MPI rank case, I am using the following OMP environment sets: import os
os.environ["OMP_NUM_THREADS"] = "48"
os.environ["OMP_PLACES"] = "cores"
os.environ["OMP_PROC_BIND"] = "spread"
os.environ["OMP_DYNAMIC"] = "FALSE"
# Add these for stack robustness
os.environ["OMP_STACKSIZE"] = "256M" # important for multi-thread stability
os.environ["KMP_STACKSIZE"] = "256M" # helps if Intel OpenMP runtime is in play
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
# Optional but useful for “silent” native crashes:
os.environ["PYTHONFAULTHANDLER"] = "1"I know people say to do convergence tests, but it is almost impossible for me to do a convergence test for FFT solver on 64x64x64 grid where the bottle neck is the number of FFT calls. I thought using MPI will speed things up, but it does not for my case at least. Also, I am reading my distribution from a file, and using MPI just duplicates the number of particles on each rank instead of distributing them. So to accurately distribute particles, I need to read the distribution on rank 0, and resize the mesh, and than call redistribute function which I am not sure if this is mentioned anywhere on the docs before. I guess my questions are:
TinyProfiler total time across processes [min...avg...max]: 35.76 ... 35.76 ... 35.76
--------------------------------------------------------------------------------------------------------------
Name NCalls Excl. Min Excl. Avg Excl. Max Max %
--------------------------------------------------------------------------------------------------------------
ParticleContainer::RedistributeCPU() 400 10.41 10.41 10.41 29.13%
impactx::transformation::CoordinateTransformation 800 5.154 5.154 5.154 14.41%
FFT::R2C::forward(in) 800 4.336 4.336 4.336 12.13%
impactx::spacecharge::GatherAndPush 400 3.313 3.313 3.313 9.26%
ImpactXParticleContainer::DepositCharge 400 3.205 3.205 3.205 8.96%
FFT::R2C::backward(out) 400 3.071 3.071 3.071 8.59%
OpenBCSolver::setGreensFunction 400 1.995 1.995 1.995 5.58%
impactx::push::CFbend 180 0.4391 0.4391 0.4391 1.23%
OpenBCSolver::solve 400 0.3178 0.3178 0.3178 0.89%
impactx::push::Drift 96 0.3102 0.3102 0.3102 0.87%
impactx::transformation::CoordinateTransformation::to_fixed_s 53 0.2912 0.2912 0.2912 0.81%
impactx::push::Sol 117 0.237 0.237 0.237 0.66%
ImpactXParticleContainer::MinAndMaxPositions 402 0.2112 0.2112 0.2112 0.59%
FabArray::setVal() 2400 0.181 0.181 0.181 0.51%
impactX::collect_lost_particles 400 0.1793 0.1793 0.1793 0.50%
impactx::push::BeamMonitor 2 0.1783 0.1783 0.1783 0.50%
ablastr::particles::deposit_charge::ChargeDeposition 54 0.1737 0.1737 0.1737 0.49%
impactx::transformation::CoordinateTransformation::to_fixed_t 55 0.1538 0.1538 0.1538 0.43%
impactx::particles::wakefields::HandleSpacecharge 400 0.1139 0.1139 0.1139 0.32%
FabArray::ParallelCopy_nowait() 800 0.1118 0.1118 0.1118 0.31%
impactx::spacecharge::ForceFromSelfFields 400 0.07499 0.07499 0.07499 0.21%
ImpactX::evolve::slice_step 400 0.01385 0.01385 0.01385 0.04%
impactx::spacecharge::PoissonSolve 400 0.01365 0.01365 0.01365 0.04%
ImpactX::ResizeMesh 401 0.01025 0.01025 0.01025 0.03%
ablastr::fields::computePhiIGF 400 0.009439 0.009439 0.009439 0.03%
impactx::push 400 0.001023 0.001023 0.001023 0.00%
computePhi 400 0.0008178 0.0008178 0.0008178 0.00%
FabArray::ParallelCopy() 800 0.0006794 0.0006794 0.0006794 0.00%
ImpactX::track_particles 1 0.0002856 0.0002856 0.0002856 0.00%
Other 5683 0.1904 0.1904 0.1904 0.53%
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
Name NCalls Incl. Min Incl. Avg Incl. Max Max %
--------------------------------------------------------------------------------------------------------------
ImpactX::track_particles 1 34.7 34.7 34.7 97.03%
ImpactX::evolve::slice_step 400 34.69 34.69 34.69 97.02%
impactx::particles::wakefields::HandleSpacecharge 400 33.27 33.27 33.27 93.04%
ParticleContainer::RedistributeCPU() 400 10.41 10.41 10.41 29.13%
impactx::spacecharge::PoissonSolve 400 10.05 10.05 10.05 28.11%
computePhi 400 9.978 9.978 9.978 27.91%
ablastr::fields::computePhiIGF 400 9.968 9.968 9.968 27.88%
OpenBCSolver::solve 400 5.766 5.766 5.766 16.13%
impactx::transformation::CoordinateTransformation 800 5.599 5.599 5.599 15.66%
FFT::R2C::forward(in) 800 4.336 4.336 4.336 12.13%
OpenBCSolver::setGreensFunction 400 4.191 4.191 4.191 11.72%
ImpactXParticleContainer::DepositCharge 400 3.452 3.452 3.452 9.65%
impactx::spacecharge::GatherAndPush 400 3.313 3.313 3.313 9.26%
FFT::R2C::backward(out) 400 3.115 3.115 3.115 8.71%
impactx::push 400 1.229 1.229 1.229 3.44%
impactx::push::CFbend 180 0.4392 0.4392 0.4392 1.23%
impactx::push::Drift 96 0.3103 0.3103 0.3103 0.87%
impactx::transformation::CoordinateTransformation::to_fixed_s 53 0.2912 0.2912 0.2912 0.81%
impactx::push::Sol 117 0.237 0.237 0.237 0.66%
ImpactX::ResizeMesh 401 0.2214 0.2214 0.2214 0.62%
impactx::push::BeamMonitor 2 0.2137 0.2137 0.2137 0.60%
ImpactXParticleContainer::MinAndMaxPositions 402 0.2112 0.2112 0.2112 0.59%
FabArray::setVal() 2400 0.181 0.181 0.181 0.51%
impactX::collect_lost_particles 400 0.1793 0.1793 0.1793 0.50%
ablastr::particles::deposit_charge::ChargeDeposition 54 0.1737 0.1737 0.1737 0.49%
impactx::transformation::CoordinateTransformation::to_fixed_t 55 0.1538 0.1538 0.1538 0.43%
FabArray::ParallelCopy() 800 0.1131 0.1131 0.1131 0.32%
FabArray::ParallelCopy_nowait() 800 0.1124 0.1124 0.1124 0.31%
impactx::spacecharge::ForceFromSelfFields 400 0.1038 0.1038 0.1038 0.29%
Other 5683 0.3083 0.3083 0.3083 0.86%
--------------------------------------------------------------------------------------------------------------
Pinned Memory Usage:
--------------------------------------------------------------------
Name Nalloc Nfree AvgMem MaxMem
--------------------------------------------------------------------
ParticleContainer::addParticles 72 72 309 KiB 55 MiB
The_Pinned_Arena::Initialize() 1 1 611 B 8192 KiB
ImpactX::early_param_check 1 1 0 B 32 B
ImpactX::init_grids 2 2 0 B 32 B
ImpactX::track_particles 2 2 0 B 32 B
impactx::push 3 3 0 B 32 B
impactx::spacecharge::PoissonSolve 1596 1596 0 B 32 B
--------------------------------------------------------------------
Cpu Memory Usage:
--------------------------------------------------------------------------
Name Nalloc Nfree AvgMem MaxMem
--------------------------------------------------------------------------
FFT::R2C 2 2 19 MiB 20 MiB
ablastr::fields::computePhiIGF 1 1 10 MiB 10 MiB
ImpactX::init_grids 40 40 7955 KiB 7959 KiB
FillBoundary_nowait() 22400 22400 510 B 395 KiB
ImpactXParticleContainer::DepositCharge 1600 1600 4728 B 110 KiB
--------------------------------------------------------------------------
AMReX (25.12) finalized
Total execution time: 0.60 mins
When I use 64x64x64 with the same setup: TinyProfiler total time across processes [min...avg...max]: 124.2 ... 124.2 ... 124.2
--------------------------------------------------------------------------------------------------------------
Name NCalls Excl. Min Excl. Avg Excl. Max Max %
--------------------------------------------------------------------------------------------------------------
FFT::R2C::forward(in) 800 69.16 69.16 69.16 55.69%
FFT::R2C::backward(out) 400 35.41 35.41 35.41 28.52%
ParticleContainer::RedistributeCPU() 400 5.075 5.075 5.075 4.09%
OpenBCSolver::setGreensFunction 400 4.019 4.019 4.019 3.24%
impactx::transformation::CoordinateTransformation 800 2.712 2.712 2.712 2.18%
impactx::spacecharge::GatherAndPush 400 1.725 1.725 1.725 1.39%
ImpactXParticleContainer::DepositCharge 400 1.424 1.424 1.424 1.15%
impactx::transformation::CoordinateTransformation::to_fixed_s 227 0.584 0.584 0.584 0.47%
OpenBCSolver::solve 400 0.401 0.401 0.401 0.32%
ablastr::particles::deposit_charge::ChargeDeposition 222 0.3487 0.3487 0.3487 0.28%
impactx::push::CFbend 180 0.2964 0.2964 0.2964 0.24%
impactx::transformation::CoordinateTransformation::to_fixed_t 245 0.2789 0.2789 0.2789 0.22%
FabArray::ParallelCopy_nowait() 800 0.2375 0.2375 0.2375 0.19%
impactx::push::BeamMonitor 2 0.237 0.237 0.237 0.19%
impactx::push::Sol 117 0.1899 0.1899 0.1899 0.15%
impactx::particles::wakefields::HandleSpacecharge 400 0.1241 0.1241 0.1241 0.10%
ImpactX::evolve::slice_step 400 0.01626 0.01626 0.01626 0.01%
impactx::spacecharge::PoissonSolve 400 0.01459 0.01459 0.01459 0.01%
ablastr::fields::computePhiIGF 400 0.01448 0.01448 0.01448 0.01%
FabArray::ParallelCopy() 800 0.001644 0.001644 0.001644 0.00%
impactx::push 400 0.0009322 0.0009322 0.0009322 0.00%
computePhi 400 0.0008181 0.0008181 0.0008181 0.00%
ImpactX::track_particles 1 0.0003462 0.0003462 0.0003462 0.00%
Other 9950 0.9104 0.9104 0.9104 0.73%
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
Name NCalls Incl. Min Incl. Avg Incl. Max Max %
--------------------------------------------------------------------------------------------------------------
ImpactX::track_particles 1 123.2 123.2 123.2 99.20%
ImpactX::evolve::slice_step 400 123.2 123.2 123.2 99.19%
impactx::particles::wakefields::HandleSpacecharge 400 122 122 122 98.28%
impactx::spacecharge::PoissonSolve 400 109.4 109.4 109.4 88.14%
computePhi 400 109.4 109.4 109.4 88.07%
ablastr::fields::computePhiIGF 400 109.3 109.3 109.3 88.06%
OpenBCSolver::solve 400 70.66 70.66 70.66 56.90%
FFT::R2C::forward(in) 800 69.16 69.16 69.16 55.69%
OpenBCSolver::setGreensFunction 400 38.67 38.67 38.67 31.14%
FFT::R2C::backward(out) 400 35.48 35.48 35.48 28.57%
ParticleContainer::RedistributeCPU() 400 5.075 5.075 5.075 4.09%
impactx::transformation::CoordinateTransformation 800 3.575 3.575 3.575 2.88%
ImpactXParticleContainer::DepositCharge 400 1.859 1.859 1.859 1.50%
impactx::spacecharge::GatherAndPush 400 1.725 1.725 1.725 1.39%
impactx::push 400 0.9263 0.9263 0.9263 0.75%
impactx::transformation::CoordinateTransformation::to_fixed_s 227 0.584 0.584 0.584 0.47%
ablastr::particles::deposit_charge::ChargeDeposition 222 0.3487 0.3487 0.3487 0.28%
impactx::push::CFbend 180 0.2965 0.2965 0.2965 0.24%
impactx::transformation::CoordinateTransformation::to_fixed_t 245 0.2789 0.2789 0.2789 0.22%
impactx::push::BeamMonitor 2 0.2619 0.2619 0.2619 0.21%
FabArray::ParallelCopy() 800 0.2402 0.2402 0.2402 0.19%
FabArray::ParallelCopy_nowait() 800 0.2386 0.2386 0.2386 0.19%
impactx::push::Sol 117 0.19 0.19 0.19 0.15%
Other 9950 1.163 1.163 1.163 0.94%
--------------------------------------------------------------------------------------------------------------
Pinned Memory Usage:
--------------------------------------------------------------------
Name Nalloc Nfree AvgMem MaxMem
--------------------------------------------------------------------
ParticleContainer::addParticles 171 171 114 KiB 55 MiB
The_Pinned_Arena::Initialize() 1 1 174 B 8192 KiB
ImpactX::early_param_check 1 1 0 B 32 B
ImpactX::init_grids 2 2 0 B 32 B
ImpactX::track_particles 2 2 0 B 32 B
impactx::push 3 3 0 B 32 B
impactx::spacecharge::PoissonSolve 1596 1596 0 B 32 B
--------------------------------------------------------------------
Cpu Memory Usage:
-------------------------------------------------------------------------
Name Nalloc Nfree AvgMem MaxMem
-------------------------------------------------------------------------
FFT::R2C 2 2 43 MiB 43 MiB
ablastr::fields::computePhiIGF 1 1 21 MiB 22 MiB
ImpactX::init_grids 40 40 15 MiB 15 MiB
FillBoundary_nowait() 22400 22400 307 B 669 KiB
ImpactXParticleContainer::DepositCharge 4056 4056 1354 B 221 KiB
-------------------------------------------------------------------------
AMReX (25.12) finalized
Total execution time: 2.08 minsFor reference simulation properties: sim.n_cell = [64,64,64]
sim.poisson_solver = "fft"
sim.prob_relative = [1.5]
sim.particle_shape = 3
sim.space_charge = "3D"
sim.slice_step_diagnostics = FalseTherefore, I do not understand why on 64x64x64 the FFT solver slows down drastically... I have 48 physical cores with SMT so 96 threads, but it seems like SMT does not add anything to the speed so I am using physical cores. Any suggestions/ help will be appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
|
Hi @OLuckyG, Thank you for the kind words, detailed report and tests! Before we dig into the performance details, can you share how you installed ImpactX and its dependencies? With regards to padding:
For MLMG, your padding is part of the convergence test. For the FFT solver, it is not. Examples: |
Beta Was this translation helpful? Give feedback.
-
That is a bit weird -- and without What machine is this on? Sounds to me like a system watchdog process terminates the costly process you started and detached from your session (via |
Beta Was this translation helpful? Give feedback.
Thank you for the details!
We think we found the problem: our FFTW usage in AMReX does not yet use threading/OpenMP for the FFTs, even when available.
This is an oversight, we had another initial implementation for the IGF that was implemented in ABLASTR, where the FFT wrappers already use threading when available.
We opened issues in ImpactX and AMReX for tracking: