Switch over to hip-tests by default + HIP7 Fixes by pvelesko · Pull Request #1161 · CHIP-SPV/chipStar

pvelesko · 2026-02-27T10:09:47Z

No description provided.

Update known_failures.yaml to achieve 100% pass rate across all platforms after switching to hip-tests by default. The new hip-tests submodule introduces ~480 additional tests compared to the old HIP/tests suite. Key changes: - Fix syncthreads regex: Unit_syncthreads_* → .*syncthreads_* to match the triple-underscore test names (Unit___syncthreads_*) - Fix Unit_hipMultiThreadStreams2 regex to use .* suffix for new naming - Add cupcake-specific exclusions for Intel GPU failures: - .*_MemoryTest1/2 wildcards (hipArray/hipMalloc3D not supported) - DeviceTest, StreamTest, EventTest, etc. timeouts/failures - Add salami-specific timeout exclusions (ARM Mali iGPU) - Add pastrami-specific PoCL exclusions for macOS: - .*_MemoryTest1/2, .*_UnitDeviceTests, .*_EventTest wildcards - Replace 100+ individual entries with category-level wildcards - Update hip-tests submodule with macOS compatibility fixes ALL.txt exclusion regex size: 18154 bytes (well under 57000 byte limit) Test results (100% pass rate on all platforms): - cupcake dgpu level0: 639 tests passed - cupcake dgpu opencl: 618 tests passed - cupcake igpu opencl: 618 tests passed - cupcake cpu opencl: 663 tests passed - salami igpu opencl: 1517 tests passed - pastrami pocl opencl: 347 tests passed

pvelesko · 2026-06-07T17:27:24Z

/run-aurora-ci

- Switch default test suite from hip-testsuite to hip-tests - Fix device-side __assert_rtn for macOS kernel assert support - Fix cooperative_groups assert with __builtin_trap on macOS - Return correct error codes for invalid kernel block dimensions - Implement missing hipDeviceGetAttribute cases (40+ attributes) - Update known_failures.yaml for all platforms (cupcake, meatloaf, pastrami, salami, pocl-cpu CI) - Move PoCL category wildcards to ANY.OPENCL_POCL for CI coverage

The build-and-test-libraries job sets IGC_EnableDPEmulation/ OverrideDefaultFP64Settings only in the 'Run library tests' step, but rocRAND's double-precision kernels (poisson/xorwow generators) are compiled during the 'Build chipStar + all libraries' step. On the Intel Arc runner (no native fp64) those compiles fail with 'Double type is not supported on this platform'. Set the emulation flags for the build too.

…eonsi The new hip-tests suite introduces test names not covered by the chipstar-rusticl known-failures section, surfacing genuine rusticl/radeonsi (Mesa/ACO) driver gaps: - Unit_kernel_chkConstantViaKernel_KernelTest and Unit_kernel_chkGlobalArrAndGlobalVaribleViaKernelFn_KernelTest abort because Mesa cannot consume program-scope CrossWorkgroup globals (__constant__/global vars) -- the same gap already documented for this runner. - Unit___threadfence_block_Positive_Basic_Shared_ThreadfenceTest hangs the GPU (amdgpu CS cancelled / hard recovery) on radeonsi. All three pass on Level Zero and PoCL; only the experimental rusticl runner is affected.

sync_and_util.hh declares device-side memset()/memcpy() overloads at global scope but only included <cstddef>/<cstdint>. Host code that reaches these declarations without <cstring> in scope (e.g. rocPRIM/hipCUB device headers compiled for the host via the cuspv/CUDA-compat include path) then sees only the __device__ memset overload -- and no host strcmp at all -- producing: error: use of undeclared identifier 'strcmp' error: no matching function for call to 'memset' (candidate: __device__ void *memset(...) -- call from __host__) This was a latent fragility: it only compiled when some other header pulled in <cstring> transitively first. Make the dependency explicit so the host string functions are always visible alongside the device overloads; HIP host/device overloading keeps the two distinct (verified: device kernels using memset and host code using memset/strcmp both compile with both declarations in scope). Fixes the build-and-test-libraries (rocPRIM/hipCUB/rocThrust/...) failures.

<cstring> adds std::memset/std::memcpy as __host__ functions. Device kernels that call std::memset (e.g. rocPRIM's test_block_radix_sort) then get "reference to __host__ function in __global__ function". Add __device__ overloads of both in namespace std so device code resolves to the device version.

…n rusticl Both Unit___threadfence_Positive_Basic_Shared and Unit___threadfence_block_Positive_Basic_Shared trigger amdgpu CS cancellation / context loss (hard recovery) on the W6400/rusticl runner on the shared-memory path. Extend the existing regex key to cover both variants: Unit___threadfence(_block)?_Positive_Basic_Shared_ThreadfenceTest Root cause is a Mesa/radeonsi driver limitation; not a chipStar regression.

…ames The bumped hip-tests submodule dropped the _ThreadfenceTest suffix from CTest names. Make the suffix optional so the pattern matches both Unit___threadfence_Positive_Basic_Shared (new) and Unit___threadfence_Positive_Basic_Shared_ThreadfenceTest (old).

The bumped hip-tests submodule no longer appends the test executable name as a suffix to CTest test names. Update 238 existing entries to drop suffixes like _DeviceTest, _ErrorHandlingTest, _StreamTest, etc. so they match the new (suffix-free) test names.

Commit 1181a15 stripped suffixes like _PrintfTest, _ComplexTest, _LaunchBoundsTest, _AssertionTest, _synchronizationTests from known_failures entries under the assumption that the hip-tests submodule dropped them from CTest names. That assumption is wrong: the tests still register with the suffix in CTest, so the exclusion patterns no longer matched and 14 tests ran and failed on macOS/pastrami. Restore the suffixed names for the 14 affected entries across ALL sections (ANY/ALL, ANY/OPENCL_POCL, etc.) so the -E exclusion patterns correctly match the actual CTest test names again.

All hip-tests executables still append their name as a CTest suffix via the CATCH2_DISCOVER_TESTS_COMPILE_TIME mechanism. The bulk strip in 1181a15 incorrectly removed 238 suffix patterns, causing masked tests to run and fail. Restore known_failures.yaml to the state at cffa961 (only the threadfence block-vs-device regex broadening).

…-decls The new-hip-tests / HIP7 work needs device-side memset/memcpy plus host string functions to coexist with libstdc++ <cstring> in single-source HIP. Two problems had to be solved together: 1. Host build break. <cstring> does 'using ::memset/::memcpy', so *defining* std::memset/std::memcpy (as the earlier approach did) collides with it: 'target of using declaration conflicts with declaration already in scope'. That broke host compilation on every lane (pocl, macOS, salami, rusticl), and on macOS surfaced as 'call to memcpy is ambiguous' in hip_fp16_gcc.h. Fix: instead of redefining them, declare the __device__ memset/memcpy overloads at global scope and surface them into std:: with using-decls. A duplicate 'using ::memset/::memcpy' is harmless and, placed after the __device__ overloads, makes std::memcpy/std::memset resolve to the device (__chip_*) versions in device code. 2. Library device pass. Library headers (e.g. rocPRIM) reference strcmp/ memset/memcpy in __host__ functions (device-property setup), whose bodies are still parsed during the device pass. Guarding <cstring> out of device compilation (a previous attempt to keep the SPIR-V module small) left those identifiers undeclared and broke the device pass -- hipCUB/rocPRIM/rocRAND failed to build on the x86 Intel GPU libraries lane. Fix: include <cstring> unconditionally. It contributes only declarations, not definitions, so it does not enlarge the module; the using-decls above keep device mem* calls pointing at __chip_*. The DG2 IGC kernel-drop that motivated the device-pass <cstring> guard is tracked/handled separately (PR #1297) and is not reintroduced by this change.

Enabling IGC fp64 emulation in the library build step (so rocRAND's double kernels compile) has a side effect: chip-kernel-verify's ocloc compile now succeeds where on other branches it errors with 'Double type is not supported' and verification is silently skipped. With it running for real, it surfaces IGC's #403 RetryManager kernel drops in rocPRIM/rocRAND -- a known upstream IGC bug, not something this PR introduces or fixes. Set HIPCC_VERIFY=warn so the verifier reports the dropped kernels as warnings (keeping visibility) without failing the library build.

…s suite Validated the new hip-tests suite on Aurora (Intel PVC, llvm-22-native) on both Level Zero and OpenCL. After excluding these, both backends pass 100% (level0 582/582, opencl 530/530): - Stress_* (multi-device memcpy, MallocManaged/HMM oversubscription, prefetch, max-allocation, stream-create, module load/unload): resource/timing-sensitive on a single-tile PVC run; time out, OOM, or exercise unsupported capabilities. - Perf_* (memory benchmarks, dispatch speed, HostNumaAlloc): timing benchmarks not gated in CI; HostNumaAlloc segfaults (NUMA host alloc unsupported). - TestPositiveHasNoIGBAs: IGBA detection is platform-sensitive on Aurora (both backends); already a known failure on macOS PoCL. All under the existing 'x4\\d\\d\\dc\\ds\\db0n0' (aurora) host key.

pvelesko · 2026-06-12T16:02:10Z

/run-aurora-ci

colleeneb · 2026-06-12T16:58:57Z

Thanks! It looks like far less tests than before, it was something over 1000 previously. Were a lot of tests changed or dropped?

Catch2 test discovery (catch_discover_tests) runs each test binary with --list-tests, which triggers the compiler-inserted __hipRegisterFatBinary static initializer -> CHIPInitialize(). On a machine without a GPU this failed, killing discovery before any test could be listed -- so building the hip-tests suite on a GPU-less node registered zero Catch2 tests. __hipRegisterFatBinary already wraps CHIPInitialize() in try/catch to tolerate this (see 'Allow test discovery without devices'), but two things defeated it: - CHIPInitializeCallOnce() reported the explicit-backend 'no device' failure via CHIPERR_LOG_AND_ABORT (std::abort), which escapes the catch. Switch it to CHIPERR_LOG_AND_THROW, matching the default-backend path; the exception is now caught and real HIP API calls still surface hipErrorInitializationError. - std::call_once does not latch when its callable throws, so init was retried for every registered fat binary (one per kernel TU), hanging startup. Guard the call so it is attempted at most once. With both, 'UnitDeviceTests --list-tests' etc. enumerate their cases on a node with no GPU, and the full ~1900-test suite registers as expected.

The '.*_MemoryTest1' / '.*_MemoryTest2' wildcards were hiding ~410 memory tests that actually pass on chipStar now that Device::getAttr handles the new HIP7 attributes, and the make_vector wildcard hid all 48 make_vector tests that now compile and pass. Replace the two memory wildcards with the 79 specific cases that still fail (verified on Aurora Level Zero + OpenCL) and drop the make_vector wildcard entirely. Verified on Aurora: Level Zero 100% (1001 tests), OpenCL 100% (962 tests), with the ~458 recovered tests now actually executing.

Pulls in hip-tests b86c0e6, which clamps the helper-kernel launch block size in memcpy1d/memcpy2d test commons to the device's maxThreadsPerBlock. Fixes Unit_hipMemcpy{,2D,DtoH,DtoHAsync}_Positive_Basic failing with hipErrorInvalidValue / CL_INVALID_WORK_GROUP_SIZE on devices that report a per-block limit below 1024 (Intel iGPUs, some rusticl/radeonsi configs). These were previously hidden by the over-broad _MemoryTest exclusion.

This negative test aborts on rusticl/radeonsi with "Destination is nullptr" because rusticl cannot consume program-scope CrossWorkgroup globals (the symbol address resolves to null) — the same root cause already recorded for Unit_hipMemcpyFromToSymbol_Negative. Surfaced by narrowing the over-broad _MemoryTest exclusion. rusticl-specific (passes elsewhere).

rusticl/radeonsi advertises maxThreadsPerBlock=1024 for the W6400, but the buffer-fill helper kernels (VectorSet/Iota) have a lower achievable work-group size, so the fill launch is rejected with CL_INVALID_WORK_GROUP_SIZE even after the test clamps to the reported device max (hip-tests b86c0e6). This is a rusticl/radeonsi advertise-vs-actual mismatch; the same tests pass on devices that report their true per-block limit (Intel iGPU 512, Intel PVC 1024). These were previously hidden by the over-broad _MemoryTest exclusion.

pvelesko · 2026-06-13T11:58:59Z

/run-aurora-ci

pvelesko · 2026-06-13T15:29:48Z

Thanks! It looks like far less tests than before, it was something over 1000 previously. Were a lot of tests changed or dropped?

1.8k tests total. I ran on Aurora and hip-tests had the old way of detecting tests which required compilation on a GPU node.

pvelesko force-pushed the new-hip-tests branch 3 times, most recently from c3ff91c to c903bc2 Compare March 1, 2026 07:13

pvelesko force-pushed the new-hip-tests branch from a11370d to 1d0ed16 Compare March 16, 2026 15:34

pvelesko force-pushed the new-hip-tests branch from abaf1ba to 549a2ac Compare March 19, 2026 07:56

pvelesko force-pushed the main branch from f23f3f3 to b508f59 Compare March 19, 2026 12:00

pvelesko force-pushed the new-hip-tests branch 4 times, most recently from ecaa5dd to 0ab4702 Compare March 20, 2026 16:58

pvelesko force-pushed the new-hip-tests branch 5 times, most recently from 74d561b to b7391a9 Compare June 7, 2026 15:45

pvelesko force-pushed the new-hip-tests branch 2 times, most recently from 88afa25 to 4074cd5 Compare June 8, 2026 06:59

pvelesko added 10 commits June 12, 2026 09:27

pvelesko force-pushed the new-hip-tests branch from c87bb46 to 613651a Compare June 12, 2026 06:29

pvelesko force-pushed the new-hip-tests branch from 613651a to cc4b470 Compare June 12, 2026 06:38

pvelesko force-pushed the new-hip-tests branch from cc4b470 to 55f8d22 Compare June 12, 2026 07:25

pvelesko added 2 commits June 12, 2026 13:30

pvelesko added 5 commits June 13, 2026 10:05

pvelesko merged commit 82fe3dd into main Jun 13, 2026
29 checks passed

pvelesko deleted the new-hip-tests branch June 13, 2026 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch over to hip-tests by default + HIP7 Fixes#1161

Switch over to hip-tests by default + HIP7 Fixes#1161
pvelesko merged 18 commits into
mainfrom
new-hip-tests

pvelesko commented Feb 27, 2026

Uh oh!

pvelesko commented Jun 7, 2026

Uh oh!

pvelesko commented Jun 12, 2026 •

edited

Loading

Uh oh!

colleeneb commented Jun 12, 2026

Uh oh!

pvelesko commented Jun 13, 2026

Uh oh!

pvelesko commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pvelesko commented Feb 27, 2026

Uh oh!

pvelesko commented Jun 7, 2026

Uh oh!

pvelesko commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colleeneb commented Jun 12, 2026

Uh oh!

pvelesko commented Jun 13, 2026

Uh oh!

pvelesko commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pvelesko commented Jun 12, 2026 •

edited

Loading