Bug Description
I'm seeing a CUDA_ERROR_ILLEGAL_ADDRESS in gpu4pyscf.dft.numint.eval_ao when evaluating a subset of shells via shls_slice.
The crash happens when passing the global ao_loc array from SortedMole. It looks like the kernel (or the Python calling sequence) expects ao_loc_slice to contain offsets relative to the start of the slice, not global offsets. Even if you manaually re-index ao_loc to start at 0 for the slice, the resulting AO values are often scrambled because SortedMole reorders atoms/shells in a way that makes mapping a specific AO subset back to the original molecule basis very error-prone.
I've had to resort to rebuilding sub-molecules from original atom indices to get correct O(N) evaluation, but it would be much better if eval_ao supported this natively without crashing.
Reproduction
import numpy as np
import cupy as cp
from pyscf import gto
from gpu4pyscf.dft import numint as gni
def reproduce():
# Setup Molecule (CHEMBL100179_00)
mol = gto.Mole()
mol.atom = """
C -4.08900000 0.24860000 0.26420000
N -2.64900000 0.35520000 0.26430000
C -2.12700000 0.52000000 -1.07690000
C -0.63450000 0.46570000 -1.11830000
C 0.11700000 0.03750000 -0.10670000
C 1.58620000 0.02360000 -0.19000000
C 2.26090000 1.11590000 -0.73550000
C 3.63920000 1.13130000 -0.82990000
C 4.34860000 0.03570000 -0.37350000
F 5.69730000 0.04180000 -0.45920000
C 3.71350000 -1.06580000 0.16870000
C 2.33440000 -1.06260000 0.25960000
C -0.53640000 -0.41080000 1.17080000
C -2.00220000 -0.75840000 0.92830000
"""
mol.basis = 'gth-tzv2p'
mol.pseudo = 'gth-pbe'
mol.unit = 'Angstrom'
mol.build()
grid_coords = np.zeros((100, 3))
grid_coords[:, 0] = np.linspace(-5, 5, 100)
ni_gpu = gni.NumInt()
# build to get gdftopt/sorted_mol
ni_gpu.build(mol, grid_coords[:1])
opt = ni_gpu.gdftopt
sorted_mol = opt._sorted_mol
# Target a subset of shells (simulating screening)
active_shls = np.arange(0, min(80, sorted_mol.nbas), dtype=np.int32)
ao_loc_sorted = sorted_mol.ao_loc_nr()
active_ao_count = sum(ao_loc_sorted[ish+1] - ao_loc_sorted[ish] for ish in active_shls)
print(f"Triggering gni.eval_ao with {len(active_shls)} shells...")
chunk_gpu = cp.asarray(grid_coords)
# This call causes CUDA_ERROR_ILLEGAL_ADDRESS
ao_chunk_gpu = gni.eval_ao(
sorted_mol,
chunk_gpu,
shls_slice=cp.asarray(active_shls),
ao_loc_slice=cp.asarray(ao_loc_sorted),
nao_slice=active_ao_count,
ctr_offsets_slice=opt.l_ctr_offsets,
gdftopt=opt,
transpose=True
)
cp.cuda.Device().synchronize()
if __name__ == "__main__":
reproduce()
Environment
- GPU: NVIDIA GeForce RTX 4090 (Driver 550.120, Compute 8.9)
- CUDA: 12.2
- pyscf: 2.12.1
- gpu4pyscf: 1.6.1
- cupy: 14.0.1
- torch: 2.10.0+cu128
Bug Description
I'm seeing a
CUDA_ERROR_ILLEGAL_ADDRESSingpu4pyscf.dft.numint.eval_aowhen evaluating a subset of shells viashls_slice.The crash happens when passing the global
ao_locarray fromSortedMole. It looks like the kernel (or the Python calling sequence) expectsao_loc_sliceto contain offsets relative to the start of the slice, not global offsets. Even if you manaually re-indexao_locto start at 0 for the slice, the resulting AO values are often scrambled becauseSortedMolereorders atoms/shells in a way that makes mapping a specific AO subset back to the original molecule basis very error-prone.I've had to resort to rebuilding sub-molecules from original atom indices to get correct O(N) evaluation, but it would be much better if
eval_aosupported this natively without crashing.Reproduction
Environment