1111| DimmPlugin | sh -c 'dmidecode -t 17 \| tr -s " " \| grep -v "Volatile\\ | None\\ | Module" \| grep Size' 2>/dev/null<br >dmidecode<br >wmic memorychip get Capacity | - | [ DimmDataModel] ( #DimmDataModel-Model ) | [ DimmCollector] ( #Collector-Class-DimmCollector ) | - |
1212| DkmsPlugin | dkms status<br >dkms --version | ** Analyzer Args:** <br >- ` dkms_status ` : Union[ str, list] <br >- ` dkms_version ` : Union[ str, list] <br >- ` regex_match ` : bool | [ DkmsDataModel] ( #DkmsDataModel-Model ) | [ DkmsCollector] ( #Collector-Class-DkmsCollector ) | [ DkmsAnalyzer] ( #Data-Analyzer-Class-DkmsAnalyzer ) |
1313| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
14+ | FabricsPlugin | ibstat<br >ibv_devinfo<br >ls -l /sys/class/infiniband/* /device/net<br >mst start<br >mst status -v<br >ofed_info -s<br >rdma dev<br >rdma link | - | [ FabricsDataModel] ( #FabricsDataModel-Model ) | [ FabricsCollector] ( #Collector-Class-FabricsCollector ) | - |
1415| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [ JournalData] ( #JournalData-Model ) | [ JournalCollector] ( #Collector-Class-JournalCollector ) | - |
1516| KernelPlugin | sh -c 'uname -a'<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` exp_kernel ` : Union[ str, list] <br >- ` regex_match ` : bool | [ KernelDataModel] ( #KernelDataModel-Model ) | [ KernelCollector] ( #Collector-Class-KernelCollector ) | [ KernelAnalyzer] ( #Data-Analyzer-Class-KernelAnalyzer ) |
1617| KernelModulePlugin | cat /proc/modules<br >modinfo amdgpu<br >wmic os get Version /Value | ** Analyzer Args:** <br >- ` kernel_modules ` : dict[ str, dict] <br >- ` regex_filter ` : list[ str] | [ KernelModuleDataModel] ( #KernelModuleDataModel-Model ) | [ KernelModuleCollector] ( #Collector-Class-KernelModuleCollector ) | [ KernelModuleAnalyzer] ( #Data-Analyzer-Class-KernelModuleAnalyzer ) |
1718| MemoryPlugin | free -b<br >lsmem<br >numactl -H<br >wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | ** Analyzer Args:** <br >- ` ratio ` : float<br >- ` memory_threshold ` : str | [ MemoryDataModel] ( #MemoryDataModel-Model ) | [ MemoryCollector] ( #Collector-Class-MemoryCollector ) | [ MemoryAnalyzer] ( #Data-Analyzer-Class-MemoryAnalyzer ) |
18- | NetworkPlugin | ip addr show<br >sudo ethtool {interface}<br >ip neighbor show<br >ip route show<br >ip rule show | - | [ NetworkDataModel] ( #NetworkDataModel-Model ) | [ NetworkCollector] ( #Collector-Class-NetworkCollector ) | - |
19+ | NetworkPlugin | ip addr show<br >ethtool {interface}<br >lldpcli show neighbor< br >lldpctl< br > ip neighbor show< br >niccli --dev {device_num} qos --ets --show< br >niccli --list_devices< br >nicctl show card< br >nicctl show dcqcn< br >nicctl show environment< br >nicctl show pcie ats< br >nicctl show port< br >nicctl show qos< br >nicctl show rdma statistics< br >nicctl show version firmware< br >nicctl show version host-software <br >ip route show<br >ip rule show | - | [ NetworkDataModel] ( #NetworkDataModel-Model ) | [ NetworkCollector] ( #Collector-Class-NetworkCollector ) | - |
1920| NvmePlugin | nvme smart-log {dev}<br >nvme error-log {dev} --log-entries=256<br >nvme id-ctrl {dev}<br >nvme id-ns {dev}{ns}<br >nvme fw-log {dev}<br >nvme self-test-log {dev}<br >nvme get-log {dev} --log-id=6 --log-len=512<br >nvme telemetry-log {dev} --output-file={dev}_ {f_name} | - | [ NvmeDataModel] ( #NvmeDataModel-Model ) | [ NvmeCollector] ( #Collector-Class-NvmeCollector ) | - |
2021| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/* release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br >cat /etc/* release \| grep VERSION_ID<br >wmic os get Version /value<br >wmic os get Caption /Value | ** Analyzer Args:** <br >- ` exp_os ` : Union[ str, list] <br >- ` exact_match ` : bool | [ OsDataModel] ( #OsDataModel-Model ) | [ OsCollector] ( #Collector-Class-OsCollector ) | [ OsAnalyzer] ( #Data-Analyzer-Class-OsAnalyzer ) |
2122| PackagePlugin | dnf list --installed<br >dpkg-query -W<br >pacman -Q<br >cat /etc/* release<br >wmic product get name,version | ** Analyzer Args:** <br >- ` exp_package_ver ` : Dict[ str, Optional[ str]] <br >- ` regex_match ` : bool<br >- ` rocm_regex ` : Optional[ str] <br >- ` enable_rocm_regex ` : bool | [ PackageDataModel] ( #PackageDataModel-Model ) | [ PackageCollector] ( #Collector-Class-PackageCollector ) | [ PackageAnalyzer] ( #Data-Analyzer-Class-PackageAnalyzer ) |
@@ -224,6 +225,42 @@ DmesgData
224225- dmesg --time-format iso -x
225226- ls -1 /var/log/dmesg* 2>/dev/null | grep -E '^/var/log/dmesg(\. [ 0-9] +(\. gz)?)?$' || true
226227
228+ ## Collector Class FabricsCollector
229+
230+ ### Description
231+
232+ Collect InfiniBand/RDMA fabrics configuration details
233+
234+ ** Bases** : [ 'InBandDataCollector']
235+
236+ ** Link to code** : [ fabrics_collector.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/fabrics/fabrics_collector.py )
237+
238+ ### Class Variables
239+
240+ - ** CMD_IBSTAT** : ` ibstat `
241+ - ** CMD_IBV_DEVINFO** : ` ibv_devinfo `
242+ - ** CMD_IB_DEV_NETDEVS** : ` ls -l /sys/class/infiniband/*/device/net `
243+ - ** CMD_OFED_INFO** : ` ofed_info -s `
244+ - ** CMD_MST_START** : ` mst start `
245+ - ** CMD_MST_STATUS** : ` mst status -v `
246+ - ** CMD_RDMA_DEV** : ` rdma dev `
247+ - ** CMD_RDMA_LINK** : ` rdma link `
248+
249+ ### Provides Data
250+
251+ FabricsDataModel
252+
253+ ### Commands
254+
255+ - ibstat
256+ - ibv_devinfo
257+ - ls -l /sys/class/infiniband/* /device/net
258+ - mst start
259+ - mst status -v
260+ - ofed_info -s
261+ - rdma dev
262+ - rdma link
263+
227264## Collector Class JournalCollector
228265
229266### Description
@@ -341,7 +378,20 @@ Collect network configuration details using ip command
341378- ** CMD_ROUTE** : ` ip route show `
342379- ** CMD_RULE** : ` ip rule show `
343380- ** CMD_NEIGHBOR** : ` ip neighbor show `
344- - ** CMD_ETHTOOL_TEMPLATE** : ` sudo ethtool {interface} `
381+ - ** CMD_ETHTOOL_TEMPLATE** : ` ethtool {interface} `
382+ - ** CMD_LLDPCLI_NEIGHBOR** : ` lldpcli show neighbor `
383+ - ** CMD_LLDPCTL** : ` lldpctl `
384+ - ** CMD_NICCLI_LISTDEV** : ` niccli --list_devices `
385+ - ** CMD_NICCLI_GETQOS_TEMPLATE** : ` niccli --dev {device_num} qos --ets --show `
386+ - ** CMD_NICCTL_CARD** : ` nicctl show card `
387+ - ** CMD_NICCTL_DCQCN** : ` nicctl show dcqcn `
388+ - ** CMD_NICCTL_ENVIRONMENT** : ` nicctl show environment `
389+ - ** CMD_NICCTL_PCIE_ATS** : ` nicctl show pcie ats `
390+ - ** CMD_NICCTL_PORT** : ` nicctl show port `
391+ - ** CMD_NICCTL_QOS** : ` nicctl show qos `
392+ - ** CMD_NICCTL_RDMA_STATISTICS** : ` nicctl show rdma statistics `
393+ - ** CMD_NICCTL_VERSION_HOST_SOFTWARE** : ` nicctl show version host-software `
394+ - ** CMD_NICCTL_VERSION_FIRMWARE** : ` nicctl show version firmware `
345395
346396### Provides Data
347397
@@ -350,8 +400,21 @@ NetworkDataModel
350400### Commands
351401
352402- ip addr show
353- - sudo ethtool {interface}
403+ - ethtool {interface}
404+ - lldpcli show neighbor
405+ - lldpctl
354406- ip neighbor show
407+ - niccli --dev {device_num} qos --ets --show
408+ - niccli --list_devices
409+ - nicctl show card
410+ - nicctl show dcqcn
411+ - nicctl show environment
412+ - nicctl show pcie ats
413+ - nicctl show port
414+ - nicctl show qos
415+ - nicctl show rdma statistics
416+ - nicctl show version firmware
417+ - nicctl show version host-software
355418- ip route show
356419- ip rule show
357420
@@ -769,6 +832,26 @@ Data model for in band dmesg log
769832### Model annotations and fields
770833
771834- ** dmesg_content** : ` str `
835+ - ** skip_log_file** : ` bool `
836+
837+ ## FabricsDataModel Model
838+
839+ ### Description
840+
841+ Complete InfiniBand/RDMA fabrics configuration data
842+
843+ ** Link to code** : [ fabricsdata.py] ( https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/fabrics/fabricsdata.py )
844+
845+ ** Bases** : [ 'DataModel']
846+
847+ ### Model annotations and fields
848+
849+ - ** ibstat_devices** : ` List[nodescraper.plugins.inband.fabrics.fabricsdata.IbstatDevice] `
850+ - ** ibv_devices** : ` List[nodescraper.plugins.inband.fabrics.fabricsdata.IbvDeviceInfo] `
851+ - ** ibdev_netdev_mappings** : ` List[nodescraper.plugins.inband.fabrics.fabricsdata.IbdevNetdevMapping] `
852+ - ** ofed_info** : ` Optional[nodescraper.plugins.inband.fabrics.fabricsdata.OfedInfo] `
853+ - ** mst_status** : ` Optional[nodescraper.plugins.inband.fabrics.fabricsdata.MstStatus] `
854+ - ** rdma_info** : ` Optional[nodescraper.plugins.inband.fabrics.fabricsdata.RdmaInfo] `
772855
773856## JournalData Model
774857
@@ -840,6 +923,17 @@ Complete network configuration data
840923- ** rules** : ` List[nodescraper.plugins.inband.network.networkdata.RoutingRule] `
841924- ** neighbors** : ` List[nodescraper.plugins.inband.network.networkdata.Neighbor] `
842925- ** ethtool_info** : ` Dict[str, nodescraper.plugins.inband.network.networkdata.EthtoolInfo] `
926+ - ** broadcom_nic_devices** : ` List[nodescraper.plugins.inband.network.networkdata.BroadcomNicDevice] `
927+ - ** broadcom_nic_qos** : ` Dict[int, nodescraper.plugins.inband.network.networkdata.BroadcomNicQos] `
928+ - ** pensando_nic_cards** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicCard] `
929+ - ** pensando_nic_dcqcn** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicDcqcn] `
930+ - ** pensando_nic_environment** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicEnvironment] `
931+ - ** pensando_nic_pcie_ats** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicPcieAts] `
932+ - ** pensando_nic_ports** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicPort] `
933+ - ** pensando_nic_qos** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicQos] `
934+ - ** pensando_nic_rdma_statistics** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicRdmaStatistics] `
935+ - ** pensando_nic_version_host_software** : ` Optional[nodescraper.plugins.inband.network.networkdata.PensandoNicVersionHostSoftware] `
936+ - ** pensando_nic_version_firmware** : ` List[nodescraper.plugins.inband.network.networkdata.PensandoNicVersionFirmware] `
843937
844938## NvmeDataModel Model
845939
0 commit comments