Skip to content

Ignore invalid output from docker stats command#334

Merged
qiluo-msft merged 2 commits intosonic-net:masterfrom
Junchao-Mellanox:master-fix-docker-stats
Dec 23, 2025
Merged

Ignore invalid output from docker stats command#334
qiluo-msft merged 2 commits intosonic-net:masterfrom
Junchao-Mellanox:master-fix-docker-stats

Conversation

@Junchao-Mellanox
Copy link
Contributor

Why I did this

We found some error in syslog:

2025 Dec 10 22:42:06.221428 sonic ERR memory_threshold_check: Failed to parse memory usage for "{'CPU%': '--', 'MEM%': '--', 'MEM_BYTES': '0', 'MEM_LIMIT_BYTES': '0', 'NAME': '--', 'PIDS': '--'}": could not convert string to float: '--'
2025 Dec 10 22:42:06.221527 sonic ERR memory_threshold_check: Failure occurred could not convert string to float: '--'

The error is statistical. The flow is as below:

  1. procdockerstatsd calls "docker stats" command periodically, parse the output and store to STATE DB
  2. memory_threshold_check which is called by monit service, handles the data in STATE DB and check the memory usage

After reviewing sonic code and docker code, I found that sonic would never generate a data with NAME=--; while docker might do so. Docker might generate such invalid output in a flow like this:

  1. user issues "docker stats -a --no-stream --format json" command
  2. docker CLI sends a command to docker engine for all existing containers: container a, b, c and so on
    3. other user removes container a
  3. docker engine starts to handle the command and find container a is no longer there, so it returns empty stats data
  4. docker CLI finds the stats data for docker a in empty and fill the data with "--"

For detailed docker code, please check:
https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L25
https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L169

This PR is to fix the issue

How I did this

procdockerstatsd should check the command output and ignore invalid value:

  1. if the name is empty of "--", ignore the output
  2. log a warning message to syslog

How I test this

Manual test

@mssonicbld
Copy link

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Junchao-Mellanox <junchao@nvidia.com>
@mssonicbld
Copy link

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Junchao-Mellanox <junchao@nvidia.com>
@mssonicbld
Copy link

/azp run

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this bugfix!

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft merged commit abb627d into sonic-net:master Dec 23, 2025
6 checks passed
@Junchao-Mellanox Junchao-Mellanox deleted the master-fix-docker-stats branch December 23, 2025 06:07
@mssonicbld
Copy link

Cherry-pick PR to 202511: #341

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants