Skip to content

Prometheus is fetching more and more IPMI sensor data from the same ipmi-exporter. #296

@noonme

Description

@noonme

My ipmi-exporter image version is v1.10.0, and I often encounter such confusion. In my usage scenario, it often happens that after starting ipmi-exporter, the server where the container is located will start thousands, even more, ipmi-sensors. This is a bad situation because the processes cannot be automatically released, causing my server CPU to run at a high load continuously, which is a catastrophic event for other containers on this server.

Image

This image is a monitoring record of server load by Prometheus, with the promql metric being node:load5:ratio * 100 > 300.
I haven't found a solution yet, so I can only temporarily shut down the ipmi-exporter program to alleviate this issue. It should be noted that such situations occur sporadically and without regularity, and we don't have a corresponding handling approach.

Here is my ipmi exporter configuration file
modules:
advanced:
collector_cmd:
ipmi: sudo
sel: sudo
collectors:
- ipmi
- sel
custom_args:
ipmi:
- ipmimonitoring
sel:
- ipmi-sel
driver: LAN
pass: secret_pw
privilege: admin
user: some_user
dcmi:
collectors:
- dcmi
driver: LAN_2_0
pass: another_pw
privilege: admin
user: admin_user
default:
collectors:
- bmc
- ipmi
- chassis
driver: LAN_2_0
exclude_sensor_ids:
- 2
- 29
- 32
- 50
- 52
- 55
pass: xxxx
privilege: user
timeout: 10000
user: xxx
thatspecialhost:
collectors:
- ipmi
- sel
custom_args:
ipmi:
- --bridge-sensors
driver: LAN
pass: secret_pw
privilege: admin
user: some_user
workaround_flags:
- discretereading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions