Skip to content

HDF5 sanity check failure: Sibling GID mirror inconsistency with AddressSanitizer enabled #495

@hsinhaoHHuang

Description

@hsinhaoHHuang

🔎 What happened?

We expect sibling->sibling_mirror = itself ([SibGID][ MirrorSib[s] ] == GID) for a patch.
For example, normally it should be this in the HDF5 snapshot:

['Tree']['Sibling'][4478][17] = 9840
['Tree']['Sibling'][9840][14] = 4478

where GID 4478 and GID 9840 are sibling patches (and 14 and 17 are MirrorSib mutually).

However, in some cases running with AddressSanitizer, the values are corrupted for unknown reasons, e.g.,

['Tree']['Sibling'][4478][17] = 9840
['Tree']['Sibling'][9840][14] = 0

If GAMER_DEBUG is enabled, there will be an error from the sanity check in Output/Output_DumpData_Total_HDF5.cpp:

Output_DumpData_Total_HDF5 (DumpID = 0)     ...
********************************************************************************
ERROR : Lv 1, GID 4478, sib 17, SibGID 9840 != SibGID->sibling 0 !!
        Rank <0>, file <Output/Output_DumpData_Total_HDF5.cpp>, line <1623>, function <Output_DumpData_Total_HDF5>
********************************************************************************

If AddressSanitizer is disabled, without any other changes, the values are normal.

📃 Steps to reproduce

  1. Copy the ClusterMerger test problem from example/Hydro/ClusterMerger/*.
  2. Build GAMER with the machine configuration file eureka_gnu.config without GPU and with the sanitizer flags -fsanitize=undefined -fsanitize=address enabled.
  3. Run gamer until the first snapshot Data_000000 is output.
  4. Check the sibling GID in the HDF5 snapshot
    python -c "import h5py; print( h5py.File('Data_000000', 'r')['Tree']['Sibling'][4478][17] )"
    python -c "import h5py; print( h5py.File('Data_000000', 'r')['Tree']['Sibling'][9840][14] )"
    

⌚ Commit hash

ce3bb85

🔧 Configuration command

python configure.py --machine=eureka_gnu --mpi=true --hdf5=true --fftw=FFTW3 --gpu=false --model=HYDRO --particle=true --gravity=true --passive=2 --par_attribute_int=1

🔨 Source files modified

configs/eureka_gnu.config

@@ -42,8 +42,8 @@ NVCCFLAG_POT -Xptxas -dlcm=ca
 # for debugging
 #CXXFLAG -fstack-protector-all
 #CXXFLAG -fstack-protector-strong
-#CXXFLAG -fsanitize=undefined -fsanitize=address
-#LIBFLAG -fsanitize=undefined -fsanitize=address
+CXXFLAG -fsanitize=undefined -fsanitize=address
+LIBFLAG -fsanitize=undefined -fsanitize=address
 
 # gpu
 GPU_COMPUTE_CAPABILITY 750    # 2080 Ti

💻 Operating system

linux (x86)

💾 Machine configuration file

eureka_gnu.config with sanitizer flags turned on:

CXXFLAG -fsanitize=undefined -fsanitize=address
LIBFLAG -fsanitize=undefined -fsanitize=address

🔖 Related topics

  • Hydro
  • MHD
  • FDM
  • AMR
  • Gravity
  • Particle
  • Parallelization
  • GPU
  • Memory
  • YT
  • Tool
  • Docs
  • Other

💬 Additional information

  1. Not sure whether it is related: during the compilation with ASan, it showed
Compiling LoadBalance/LB_RecordExchangeDataPatchID.cpp
LoadBalance/LB_RecordExchangeDataPatchID.cpp: In function ‘void LB_RecordExchangeDataPatchID(int, bool)’:
LoadBalance/LB_RecordExchangeDataPatchID.cpp:28:6: note: variable tracking size limit exceeded with ‘-fvar-tracking-assignments’, retrying without
   28 | void LB_RecordExchangeDataPatchID( const int Lv, const bool AfterRefine )
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compiling LoadBalance/LB_GetBufferData.cpp
  1. Somehow this issue cannot be reproduced by the test problems BlastWave and Plummer.

  2. We are inclined to believe there is an unexposed bug in GAMER rather than a problem in ASan so far.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageTriage needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions