[action] [PR:334] Ignore invalid output from docker stats command#341
Open
mssonicbld wants to merge 1 commit intosonic-net:202511from
Open
[action] [PR:334] Ignore invalid output from docker stats command#341mssonicbld wants to merge 1 commit intosonic-net:202511from
mssonicbld wants to merge 1 commit intosonic-net:202511from
Conversation
**Why I did this**
We found some error in syslog:
```
2025 Dec 10 22:42:06.221428 sonic ERR memory_threshold_check: Failed to parse memory usage for "{'CPU%': '--', 'MEM%': '--', 'MEM_BYTES': '0', 'MEM_LIMIT_BYTES': '0', 'NAME': '--', 'PIDS': '--'}": could not convert string to float: '--'
2025 Dec 10 22:42:06.221527 sonic ERR memory_threshold_check: Failure occurred could not convert string to float: '--'
```
The error is statistical. The flow is as below:
1. procdockerstatsd calls "docker stats" command periodically, parse the output and store to STATE DB
2. memory_threshold_check which is called by monit service, handles the data in STATE DB and check the memory usage
After reviewing sonic code and docker code, I found that sonic would never generate a data with `NAME`=`--`; while docker might do so. Docker might generate such invalid output in a flow like this:
1. user issues "docker stats -a --no-stream --format json" command
2. docker CLI sends a command to docker engine for all existing containers: container a, b, c and so on
**3. other user removes container a**
4. docker engine starts to handle the command and find container a is no longer there, so it returns empty stats data
5. docker CLI finds the stats data for docker a in empty and fill the data with "--"
For detailed docker code, please check:
https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L25
https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L169
This PR is to fix the issue
**How I did this**
procdockerstatsd should check the command output and ignore invalid value:
1. if the name is empty of "--", ignore the output
2. log a warning message to syslog
**How I test this**
Manual test
Author
|
Original PR: #334 |
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Commenter does not have sufficient privileges for PR 341 in repo sonic-net/sonic-host-services |
Contributor
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did this
We found some error in syslog:
The error is statistical. The flow is as below:
After reviewing sonic code and docker code, I found that sonic would never generate a data with
NAME=--; while docker might do so. Docker might generate such invalid output in a flow like this:3. other user removes container a
For detailed docker code, please check:
https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L25
https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L169
This PR is to fix the issue
How I did this
procdockerstatsd should check the command output and ignore invalid value:
How I test this
Manual test