Skip to content

CUDA_ERROR_ILLEGAL_ADDRESS in dft.numint.eval_ao with shls_slice #723

@MauriceDHanisch

Description

@MauriceDHanisch

Bug Description

I'm seeing a CUDA_ERROR_ILLEGAL_ADDRESS in gpu4pyscf.dft.numint.eval_ao when evaluating a subset of shells via shls_slice.

The crash happens when passing the global ao_loc array from SortedMole. It looks like the kernel (or the Python calling sequence) expects ao_loc_slice to contain offsets relative to the start of the slice, not global offsets. Even if you manaually re-index ao_loc to start at 0 for the slice, the resulting AO values are often scrambled because SortedMole reorders atoms/shells in a way that makes mapping a specific AO subset back to the original molecule basis very error-prone.

I've had to resort to rebuilding sub-molecules from original atom indices to get correct O(N) evaluation, but it would be much better if eval_ao supported this natively without crashing.

Reproduction

import numpy as np
import cupy as cp
from pyscf import gto
from gpu4pyscf.dft import numint as gni

def reproduce():
    # Setup Molecule (CHEMBL100179_00)
    mol = gto.Mole()
    mol.atom = """
    C      -4.08900000      0.24860000      0.26420000
    N      -2.64900000      0.35520000      0.26430000
    C      -2.12700000      0.52000000     -1.07690000
    C      -0.63450000      0.46570000     -1.11830000
    C       0.11700000      0.03750000     -0.10670000
    C       1.58620000      0.02360000     -0.19000000
    C       2.26090000      1.11590000     -0.73550000
    C       3.63920000      1.13130000     -0.82990000
    C       4.34860000      0.03570000     -0.37350000
    F       5.69730000      0.04180000     -0.45920000
    C       3.71350000     -1.06580000      0.16870000
    C       2.33440000     -1.06260000      0.25960000
    C      -0.53640000     -0.41080000      1.17080000
    C      -2.00220000     -0.75840000      0.92830000
    """
    mol.basis = 'gth-tzv2p'
    mol.pseudo = 'gth-pbe'
    mol.unit = 'Angstrom'
    mol.build()
    
    grid_coords = np.zeros((100, 3))
    grid_coords[:, 0] = np.linspace(-5, 5, 100)
    
    ni_gpu = gni.NumInt()
    # build to get gdftopt/sorted_mol
    ni_gpu.build(mol, grid_coords[:1])
    opt = ni_gpu.gdftopt
    sorted_mol = opt._sorted_mol
    
    # Target a subset of shells (simulating screening)
    active_shls = np.arange(0, min(80, sorted_mol.nbas), dtype=np.int32)
    ao_loc_sorted = sorted_mol.ao_loc_nr()
    active_ao_count = sum(ao_loc_sorted[ish+1] - ao_loc_sorted[ish] for ish in active_shls)

    print(f"Triggering gni.eval_ao with {len(active_shls)} shells...")
    chunk_gpu = cp.asarray(grid_coords)
    
    # This call causes CUDA_ERROR_ILLEGAL_ADDRESS
    ao_chunk_gpu = gni.eval_ao(
        sorted_mol, 
        chunk_gpu, 
        shls_slice=cp.asarray(active_shls),
        ao_loc_slice=cp.asarray(ao_loc_sorted), 
        nao_slice=active_ao_count,
        ctr_offsets_slice=opt.l_ctr_offsets, 
        gdftopt=opt, 
        transpose=True
    )
    
    cp.cuda.Device().synchronize()

if __name__ == "__main__":
    reproduce()

Environment

  • GPU: NVIDIA GeForce RTX 4090 (Driver 550.120, Compute 8.9)
  • CUDA: 12.2
  • pyscf: 2.12.1
  • gpu4pyscf: 1.6.1
  • cupy: 14.0.1
  • torch: 2.10.0+cu128

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions