Skip to content

Vtune times out even when run with collection paused #107

@bcaddy

Description

@bcaddy

Point of Contact

Robert Caddy

Contact Details

rcaddy@princeton.edu

Vendor/ALCF/other tickets/IDs

VASP-33391

Reproducer Path

/lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue
/lus/flare/projects/Tools/jkwack-tools-reproducer/Robert_Caddy/vtune_issue

Status

Open

Details

Summary

Running vtune on my code (Fornax) takes an extremely long time, like 6+ hours, before timing out. At the suggestion of Patrick Steinbrecher at Intel I set vtune to start with collection paused then only unpause during the sections of code that I actually wanted to profile. That didn't help so I tried just leaving collection paused for the entire run and it still timed out.

In all cases vtune itself will finish eventually, though that takes 20+ minutes even with collection paused on a single rank, single GPU tile run that takes about 15s on its own. What times out is copying the results back from node local store to Flare; if I try to run directly out of Flare then vtune times out instead. Even trying to delete what does get copied to flare takes 30-60 minutes of running rm non-stop.

How to Build

cd into the Fornax directory within the vtune_issue, source /lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue/Fornax/machines/alcf/aurora.env and run make.

How to Run

Copy the fornax executable into run_directory which is next to the Fornax directory. The submission script is at /lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue/run_directory/submission_script.sh and it should have everything set up to run Fornax

Is this a priority/blocking bug?

  • Priority

ETA

2026.0

Metadata

Metadata

Assignees

Labels

Tools (profilers and debuggers)VTune, Advisor, APS, GDB, Sanitizer, DDT, HPCToolkit, TAU, and so on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions