-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Point of Contact
Robert Caddy
Contact Details
Vendor/ALCF/other tickets/IDs
VASP-33391
Reproducer Path
/lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue
/lus/flare/projects/Tools/jkwack-tools-reproducer/Robert_Caddy/vtune_issue
Status
Open
Details
Summary
Running vtune on my code (Fornax) takes an extremely long time, like 6+ hours, before timing out. At the suggestion of Patrick Steinbrecher at Intel I set vtune to start with collection paused then only unpause during the sections of code that I actually wanted to profile. That didn't help so I tried just leaving collection paused for the entire run and it still timed out.
In all cases vtune itself will finish eventually, though that takes 20+ minutes even with collection paused on a single rank, single GPU tile run that takes about 15s on its own. What times out is copying the results back from node local store to Flare; if I try to run directly out of Flare then vtune times out instead. Even trying to delete what does get copied to flare takes 30-60 minutes of running rm non-stop.
How to Build
cd into the Fornax directory within the vtune_issue, source /lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue/Fornax/machines/alcf/aurora.env and run make.
How to Run
Copy the fornax executable into run_directory which is next to the Fornax directory. The submission script is at /lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue/run_directory/submission_script.sh and it should have everything set up to run Fornax
Is this a priority/blocking bug?
- Priority
ETA
2026.0