This is an unofficial interface to the AMD ROCM SMI library for Golang applications. It is heavily
inspired by go-nvml by also using cgo, c-for-go and its dlopen wrapper.
This Golang interface is planned to be used in cc-metric-collector.
Disclaimer: These bindings are created without any collaboration with AMD. Use them as you like but we, the developers of these bindings, are not responsible for any damage or anything that was caused by them. If you want official Golang bindings for the ROCm SMI library, use this package.
package main
import (
"fmt"
"log"
"github.com/ClusterCockpit/go-rocm-smi/pkg/rocm_smi"
)
func main() {
ret := rocm_smi.Init()
if ret != rocm_smi.STATUS_SUCCESS {
log.Fatalf("Unable to initialize ROCM SMI: %v", rocm_smi.StatusStringNoError(ret))
}
defer func() {
ret := rocm_smi.Shutdown()
if ret != rocm_smi.STATUS_SUCCESS {
log.Fatalf("Unable to shutdown ROCM SMI: %v", rocm_smi.StatusStringNoError(ret))
}
}()
count, ret := rocm_smi.NumMonitorDevices()
if ret != rocm_smi.STATUS_SUCCESS {
log.Fatalf("Unable to get device count: %v", rocm_smi.StatusStringNoError(ret))
}
for i := 0; i < count; i++ {
device, ret := rocm_smi.DeviceGetHandleByIndex(i)
if ret != rocm_smi.STATUS_SUCCESS {
log.Fatalf("Unable to get device at index %d: %v", i, rocm_smi.StatusStringNoError(ret))
}
uuid, ret := device.GetUniqueId()
if ret != rocm_smi.STATUS_SUCCESS {
log.Fatalf("Unable to get uuid of device at index %d: %v", i, rocm_smi.StatusStringNoError(ret))
}
fmt.Printf("%v\n", uuid)
}
}The librocm_smi64.so is dynamically loaded by the rocm_smi package. Make sure that the directory containing this library is in your LD_LIBRARY_PATH.
See pkg.go.dev.
There are three ROCm SMI Headers, all located at rocm_smi/rocm_smi
rocm_smi.hrocm_smi64Config.hkfd_ioctl.h
The files are copied from ROCm 6.4.1. For the generation, the rocm_smi.h header is changed to support c-for-go's parser.
boolis renamed to_Bool, sincec-for-goappars to choke on the former.- The
union idis renamed tounion id_renameto avoid problems with clang. The type is never addressed with the nameidbut atypedefname.
Calling c-for-go with the rocm_smi.yml as input
After the generation, the types.go file still contains the C types but it is more suitable to have
Golang types for them. Luckly cgo has a bootstrapping option -godefs to
generate the Go types.
Before:
type RSMI_pcie_bandwidth C.rsmi_pcie_bandwidth_tAfter:
type RSMI_pcie_bandwidth struct {
Rate RSMI_frequencies
Lanes [32]uint32
}In the end, the generated functions are wrapped to have more Golang style. This is similar to the
wrappers created in go-nvml. Most of them are straight-forward
with a little bit of casting.
// rocm_smi.DeviceGetSerial()
func DeviceGetSerial(Device DeviceHandle) (string, RSMI_status) {
var Serial []byte = make([]byte, 100)
sptr := &Serial[0]
ret := rsmi_dev_serial_number_get(Device.index, sptr, 100)
return bytes2String(Serial), ret
}
func (Device DeviceHandle) DeviceGetSerial() (string, RSMI_status) {
return DeviceGetSerial(Device)
}For most libraries which handle multiple devices (go-nvml is an example), the user at first requests a handle for each device, mostly through the logical index in the list of available devices. The official rocm_smi library uses the logical index instead but in order to get everything right, you have to do quite some work to know what is supported. The rocm_smi provides a feature (APISupport in rocm_smi.h) to determine which functions are supported for a device and if a function accepts arguments, which ones are valid for this device. An example would be the function to get the firmware version and the list of GPU parts that provide such a version. The go-rocm-smi bindings introduce a virtual type DeviceHandle, retrivable through the logical index (so similar to go-nvml), which encapsulates the APISupport lookup: DeviceGetHandleByIndex(). The DeviceHandle is used for all device related calls in go-rocm-smi. You can get the logical index by deviceHandle.Index(), the not unique ID of a GPU by deviceHandle.ID() and the list of supported functions through deviceHandle.Supported()
- The symbol
rsmi_dev_sku_getis defined by therocm_smi.hheader but on the test system with ROCm 6.4.1, the symbol lookup fails. There is now anupdateFunctionPointers()function that is called atInit(). This is quite similar the functionupdateVersionedSymbols()ingo-nvml. TheAPISupportfeature of therocm_smilibrary shows,rsmi_dev_sku_getis supported by the device. - The function
rsmi_status_stringcannot use the wrapper generated byc-for-gobecause it requires a pointer to achararray whilec-for-gowants to use thechararray directly. There is a manually created version to get the status stringStatusString(). One issue is when using it in prints (see example) becausersmi_status_stringaccepts a status and returns a new status and the string. To drop the new status, useStatusStringNoError(). - I havn't found a way to access the
Buildfield inRSMI_version. It is achar*inrocm_smibutc-for-gogenerates an*int8entry for it. c-for-godoesn't handle enums correctly. See xlab/c-for-go#133. In C the type of an enum is implementation defined, and inc-for-gothey are always defined asint32. Becauserocm_smi.huses enums of the full uint32 range (and sometimes even uint64 range), we have to manually fixup the types in the Makefile.