Skip to content

Commit 1406a40

Browse files
Merge pull request #104 from amd/automated-plugin-docs-update
docs: Update plugin documentation [automated]
2 parents 7fef169 + 3d24fa0 commit 1406a40

File tree

1 file changed

+97
-3
lines changed

1 file changed

+97
-3
lines changed

docs/PLUGIN_DOC.md

Lines changed: 97 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@
1111
| DimmPlugin | sh -c 'dmidecode -t 17 \| tr -s " " \| grep -v "Volatile\\|None\\|Module" \| grep Size' 2>/dev/null<br>dmidecode<br>wmic memorychip get Capacity | - | [DimmDataModel](#DimmDataModel-Model) | [DimmCollector](#Collector-Class-DimmCollector) | - |
1212
| DkmsPlugin | dkms status<br>dkms --version | **Analyzer Args:**<br>- `dkms_status`: Union[str, list]<br>- `dkms_version`: Union[str, list]<br>- `regex_match`: bool | [DkmsDataModel](#DkmsDataModel-Model) | [DkmsCollector](#Collector-Class-DkmsCollector) | [DkmsAnalyzer](#Data-Analyzer-Class-DkmsAnalyzer) |
1313
| DmesgPlugin | dmesg --time-format iso -x<br>ls -1 /var/log/dmesg* 2>/dev/null \| grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' \|\| true | **Built-in Regexes:**<br>- Out of memory error: `(?:oom_kill_process.*)\|(?:Out of memory.*)`<br>- I/O Page Fault: `IO_PAGE_FAULT`<br>- Kernel Panic: `\bkernel panic\b.*`<br>- SQ Interrupt: `sq_intr`<br>- SRAM ECC: `sram_ecc.*`<br>- Failed to load driver. IP hardware init error.: `\[amdgpu\]\] \*ERROR\* hw_init of IP block.*`<br>- Failed to load driver. IP software init error.: `\[amdgpu\]\] \*ERROR\* sw_init of IP block.*`<br>- Real Time throttling activated: `sched: RT throttling activated.*`<br>- RCU preempt detected stalls: `rcu_preempt detected stalls.*`<br>- RCU preempt self-detected stall: `rcu_preempt self-detected stall.*`<br>- QCM fence timeout: `qcm fence wait loop timeout.*`<br>- General protection fault: `(?:[\w-]+(?:\[[0-9.]+\])?\s+)?general protectio...`<br>- Segmentation fault: `(?:segfault.*in .*\[)\|(?:[Ss]egmentation [Ff]au...`<br>- Failed to disallow cf state: `amdgpu: Failed to disallow cf state.*`<br>- Failed to terminate tmr: `\*ERROR\* Failed to terminate tmr.*`<br>- Suspend of IP block failed: `\*ERROR\* suspend of IP block <\w+> failed.*`<br>- amdgpu Page Fault: `(amdgpu \w{4}:\w{2}:\w{2}\.\w:\s+amdgpu:\s+\[\S...`<br>- Page Fault: `page fault for address.*`<br>- Fatal error during GPU init: `(?:amdgpu)(.*Fatal error during GPU init)\|(Fata...`<br>- PCIe AER Error: `(?:pcieport )(.*AER: aer_status.*)\|(aer_status.*)`<br>- Failed to read journal file: `Failed to read journal file.*`<br>- Journal file corrupted or uncleanly shut down: `journal corrupted or uncleanly shut down.*`<br>- ACPI BIOS Error: `ACPI BIOS Error`<br>- ACPI Error: `ACPI Error`<br>- Filesystem corrupted!: `EXT4-fs error \(device .*\):`<br>- Error in buffered IO, check filesystem integrity: `(Buffer I\/O error on dev)(?:ice)? (\w+)`<br>- PCIe card no longer present: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- PCIe Link Down: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(Slot\(...`<br>- Mismatched clock configuration between PCIe device and host: `pcieport (\w+:\w+:\w+\.\w+):\s+(\w+):\s+(curren...`<br>- RAS Correctable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Uncorrectable Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Deferred Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- RAS Corrected PCIe Error: `((?:\[Hardware Error\]:\s+)?event severity: cor...`<br>- GPU Reset: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- GPU reset failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- ACA Error: `(Accelerator Check Architecture[^\n]*)(?:\n[^\n...`<br>- MCE Error: `\[Hardware Error\]:.+MC\d+_STATUS.*(?:\n.*){0,5}`<br>- Mode 2 Reset Failed: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)? (...`<br>- RAS Corrected Error: `(?:\d{4}-\d+-\d+T\d+:\d+:\d+,\d+[+-]\d+:\d+)?(....`<br>- SGX Error: `x86/cpu: SGX disabled by BIOS`<br>- GPU Throttled: `amdgpu \w{4}:\w{2}:\w{2}.\w: amdgpu: WARN: GPU ...`<br>- LNet: ko2iblnd has no matching interfaces: `(?:\[[^\]]+\]\s*)?LNetError:.*ko2iblnd:\s*No ma...`<br>- LNet: Error starting up LNI: `(?:\[[^\]]+\]\s*)?LNetError:\s*.*Error\s*-?\d+\...`<br>- Lustre: network initialisation failed: `LustreError:.*ptlrpc_init_portals\(\).*network ...` | [DmesgData](#DmesgData-Model) | [DmesgCollector](#Collector-Class-DmesgCollector) | [DmesgAnalyzer](#Data-Analyzer-Class-DmesgAnalyzer) |
14+
| FabricsPlugin | ibstat<br>ibv_devinfo<br>ls -l /sys/class/infiniband/*/device/net<br>mst start<br>mst status -v<br>ofed_info -s<br>rdma dev<br>rdma link | - | [FabricsDataModel](#FabricsDataModel-Model) | [FabricsCollector](#Collector-Class-FabricsCollector) | - |
1415
| JournalPlugin | journalctl --no-pager --system --output=short-iso | - | [JournalData](#JournalData-Model) | [JournalCollector](#Collector-Class-JournalCollector) | - |
1516
| KernelPlugin | sh -c 'uname -a'<br>wmic os get Version /Value | **Analyzer Args:**<br>- `exp_kernel`: Union[str, list]<br>- `regex_match`: bool | [KernelDataModel](#KernelDataModel-Model) | [KernelCollector](#Collector-Class-KernelCollector) | [KernelAnalyzer](#Data-Analyzer-Class-KernelAnalyzer) |
1617
| KernelModulePlugin | cat /proc/modules<br>modinfo amdgpu<br>wmic os get Version /Value | **Analyzer Args:**<br>- `kernel_modules`: dict[str, dict]<br>- `regex_filter`: list[str] | [KernelModuleDataModel](#KernelModuleDataModel-Model) | [KernelModuleCollector](#Collector-Class-KernelModuleCollector) | [KernelModuleAnalyzer](#Data-Analyzer-Class-KernelModuleAnalyzer) |
1718
| MemoryPlugin | free -b<br>lsmem<br>numactl -H<br>wmic OS get FreePhysicalMemory /Value; wmic ComputerSystem get TotalPhysicalMemory /Value | **Analyzer Args:**<br>- `ratio`: float<br>- `memory_threshold`: str | [MemoryDataModel](#MemoryDataModel-Model) | [MemoryCollector](#Collector-Class-MemoryCollector) | [MemoryAnalyzer](#Data-Analyzer-Class-MemoryAnalyzer) |
18-
| NetworkPlugin | ip addr show<br>sudo ethtool {interface}<br>ip neighbor show<br>ip route show<br>ip rule show | - | [NetworkDataModel](#NetworkDataModel-Model) | [NetworkCollector](#Collector-Class-NetworkCollector) | - |
19+
| NetworkPlugin | ip addr show<br>ethtool {interface}<br>lldpcli show neighbor<br>lldpctl<br>ip neighbor show<br>niccli --dev {device_num} qos --ets --show<br>niccli --list_devices<br>nicctl show card<br>nicctl show dcqcn<br>nicctl show environment<br>nicctl show pcie ats<br>nicctl show port<br>nicctl show qos<br>nicctl show rdma statistics<br>nicctl show version firmware<br>nicctl show version host-software<br>ip route show<br>ip rule show | - | [NetworkDataModel](#NetworkDataModel-Model) | [NetworkCollector](#Collector-Class-NetworkCollector) | - |
1920
| NvmePlugin | nvme smart-log {dev}<br>nvme error-log {dev} --log-entries=256<br>nvme id-ctrl {dev}<br>nvme id-ns {dev}{ns}<br>nvme fw-log {dev}<br>nvme self-test-log {dev}<br>nvme get-log {dev} --log-id=6 --log-len=512<br>nvme telemetry-log {dev} --output-file={dev}_{f_name} | - | [NvmeDataModel](#NvmeDataModel-Model) | [NvmeCollector](#Collector-Class-NvmeCollector) | - |
2021
| OsPlugin | sh -c '( lsb_release -ds \|\| (cat /etc/*release \| grep PRETTY_NAME) \|\| uname -om ) 2>/dev/null \| head -n1'<br>cat /etc/*release \| grep VERSION_ID<br>wmic os get Version /value<br>wmic os get Caption /Value | **Analyzer Args:**<br>- `exp_os`: Union[str, list]<br>- `exact_match`: bool | [OsDataModel](#OsDataModel-Model) | [OsCollector](#Collector-Class-OsCollector) | [OsAnalyzer](#Data-Analyzer-Class-OsAnalyzer) |
2122
| PackagePlugin | dnf list --installed<br>dpkg-query -W<br>pacman -Q<br>cat /etc/*release<br>wmic product get name,version | **Analyzer Args:**<br>- `exp_package_ver`: Dict[str, Optional[str]]<br>- `regex_match`: bool<br>- `rocm_regex`: Optional[str]<br>- `enable_rocm_regex`: bool | [PackageDataModel](#PackageDataModel-Model) | [PackageCollector](#Collector-Class-PackageCollector) | [PackageAnalyzer](#Data-Analyzer-Class-PackageAnalyzer) |
@@ -224,6 +225,42 @@ DmesgData
224225
- dmesg --time-format iso -x
225226
- ls -1 /var/log/dmesg* 2>/dev/null | grep -E '^/var/log/dmesg(\.[0-9]+(\.gz)?)?$' || true
226227

228+
## Collector Class FabricsCollector
229+
230+
### Description
231+
232+
Collect InfiniBand/RDMA fabrics configuration details
233+
234+
**Bases**: ['InBandDataCollector']
235+
236+
**Link to code**: [fabrics_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/fabrics/fabrics_collector.py)
237+
238+
### Class Variables
239+
240+
- **CMD_IBSTAT**: `ibstat`
241+
- **CMD_IBV_DEVINFO**: `ibv_devinfo`
242+
- **CMD_IB_DEV_NETDEVS**: `ls -l /sys/class/infiniband/*/device/net`
243+
- **CMD_OFED_INFO**: `ofed_info -s`
244+
- **CMD_MST_START**: `mst start`
245+
- **CMD_MST_STATUS**: `mst status -v`
246+
- **CMD_RDMA_DEV**: `rdma dev`
247+
- **CMD_RDMA_LINK**: `rdma link`
248+
249+
### Provides Data
250+
251+
FabricsDataModel
252+
253+
### Commands
254+
255+
- ibstat
256+
- ibv_devinfo
257+
- ls -l /sys/class/infiniband/*/device/net
258+
- mst start
259+
- mst status -v
260+
- ofed_info -s
261+
- rdma dev
262+
- rdma link
263+
227264
## Collector Class JournalCollector
228265

229266
### Description
@@ -341,7 +378,20 @@ Collect network configuration details using ip command
341378
- **CMD_ROUTE**: `ip route show`
342379
- **CMD_RULE**: `ip rule show`
343380
- **CMD_NEIGHBOR**: `ip neighbor show`
344-
- **CMD_ETHTOOL_TEMPLATE**: `sudo ethtool {interface}`
381+
- **CMD_ETHTOOL_TEMPLATE**: `ethtool {interface}`
382+
- **CMD_LLDPCLI_NEIGHBOR**: `lldpcli show neighbor`
383+
- **CMD_LLDPCTL**: `lldpctl`
384+
- **CMD_NICCLI_LISTDEV**: `niccli --list_devices`
385+
- **CMD_NICCLI_GETQOS_TEMPLATE**: `niccli --dev {device_num} qos --ets --show`
386+
- **CMD_NICCTL_CARD**: `nicctl show card`
387+
- **CMD_NICCTL_DCQCN**: `nicctl show dcqcn`
388+
- **CMD_NICCTL_ENVIRONMENT**: `nicctl show environment`
389+
- **CMD_NICCTL_PCIE_ATS**: `nicctl show pcie ats`
390+
- **CMD_NICCTL_PORT**: `nicctl show port`
391+
- **CMD_NICCTL_QOS**: `nicctl show qos`
392+
- **CMD_NICCTL_RDMA_STATISTICS**: `nicctl show rdma statistics`
393+
- **CMD_NICCTL_VERSION_HOST_SOFTWARE**: `nicctl show version host-software`
394+
- **CMD_NICCTL_VERSION_FIRMWARE**: `nicctl show version firmware`
345395

346396
### Provides Data
347397

@@ -350,8 +400,21 @@ NetworkDataModel
350400
### Commands
351401

352402
- ip addr show
353-
- sudo ethtool {interface}
403+
- ethtool {interface}
404+
- lldpcli show neighbor
405+
- lldpctl
354406
- ip neighbor show
407+
- niccli --dev {device_num} qos --ets --show
408+
- niccli --list_devices
409+
- nicctl show card
410+
- nicctl show dcqcn
411+
- nicctl show environment
412+
- nicctl show pcie ats
413+
- nicctl show port
414+
- nicctl show qos
415+
- nicctl show rdma statistics
416+
- nicctl show version firmware
417+
- nicctl show version host-software
355418
- ip route show
356419
- ip rule show
357420

@@ -769,6 +832,26 @@ Data model for in band dmesg log
769832
### Model annotations and fields
770833

771834
- **dmesg_content**: `str`
835+
- **skip_log_file**: `bool`
836+
837+
## FabricsDataModel Model
838+
839+
### Description
840+
841+
Complete InfiniBand/RDMA fabrics configuration data
842+
843+
**Link to code**: [fabricsdata.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/fabrics/fabricsdata.py)
844+
845+
**Bases**: ['DataModel']
846+
847+
### Model annotations and fields
848+
849+
- **ibstat_devices**: `List[nodescraper.plugins.inband.fabrics.fabricsdata.IbstatDevice]`
850+
- **ibv_devices**: `List[nodescraper.plugins.inband.fabrics.fabricsdata.IbvDeviceInfo]`
851+
- **ibdev_netdev_mappings**: `List[nodescraper.plugins.inband.fabrics.fabricsdata.IbdevNetdevMapping]`
852+
- **ofed_info**: `Optional[nodescraper.plugins.inband.fabrics.fabricsdata.OfedInfo]`
853+
- **mst_status**: `Optional[nodescraper.plugins.inband.fabrics.fabricsdata.MstStatus]`
854+
- **rdma_info**: `Optional[nodescraper.plugins.inband.fabrics.fabricsdata.RdmaInfo]`
772855

773856
## JournalData Model
774857

@@ -840,6 +923,17 @@ Complete network configuration data
840923
- **rules**: `List[nodescraper.plugins.inband.network.networkdata.RoutingRule]`
841924
- **neighbors**: `List[nodescraper.plugins.inband.network.networkdata.Neighbor]`
842925
- **ethtool_info**: `Dict[str, nodescraper.plugins.inband.network.networkdata.EthtoolInfo]`
926+
- **broadcom_nic_devices**: `List[nodescraper.plugins.inband.network.networkdata.BroadcomNicDevice]`
927+
- **broadcom_nic_qos**: `Dict[int, nodescraper.plugins.inband.network.networkdata.BroadcomNicQos]`
928+
- **pensando_nic_cards**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicCard]`
929+
- **pensando_nic_dcqcn**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicDcqcn]`
930+
- **pensando_nic_environment**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicEnvironment]`
931+
- **pensando_nic_pcie_ats**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicPcieAts]`
932+
- **pensando_nic_ports**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicPort]`
933+
- **pensando_nic_qos**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicQos]`
934+
- **pensando_nic_rdma_statistics**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicRdmaStatistics]`
935+
- **pensando_nic_version_host_software**: `Optional[nodescraper.plugins.inband.network.networkdata.PensandoNicVersionHostSoftware]`
936+
- **pensando_nic_version_firmware**: `List[nodescraper.plugins.inband.network.networkdata.PensandoNicVersionFirmware]`
843937

844938
## NvmeDataModel Model
845939

0 commit comments

Comments
 (0)