Skip to content

Commit abb627d

Browse files
Ignore invalid output from docker stats command (#334)
Why I did this We found some error in syslog: 2025 Dec 10 22:42:06.221428 sonic ERR memory_threshold_check: Failed to parse memory usage for "{'CPU%': '--', 'MEM%': '--', 'MEM_BYTES': '0', 'MEM_LIMIT_BYTES': '0', 'NAME': '--', 'PIDS': '--'}": could not convert string to float: '--' 2025 Dec 10 22:42:06.221527 sonic ERR memory_threshold_check: Failure occurred could not convert string to float: '--' The error is statistical. The flow is as below: procdockerstatsd calls "docker stats" command periodically, parse the output and store to STATE DB memory_threshold_check which is called by monit service, handles the data in STATE DB and check the memory usage After reviewing sonic code and docker code, I found that sonic would never generate a data with NAME=--; while docker might do so. Docker might generate such invalid output in a flow like this: user issues "docker stats -a --no-stream --format json" command docker CLI sends a command to docker engine for all existing containers: container a, b, c and so on 3. other user removes container a docker engine starts to handle the command and find container a is no longer there, so it returns empty stats data docker CLI finds the stats data for docker a in empty and fill the data with "--" For detailed docker code, please check: https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L25 https://github.com/docker/cli/blob/93fa57bbcd08f2f5be7f6cf22f4273a2b5a49e71/cli/command/container/formatter_stats.go#L169 This PR is to fix the issue How I did this procdockerstatsd should check the command output and ignore invalid value: if the name is empty of "--", ignore the output log a warning message to syslog How I test this Manual test
1 parent fda4e65 commit abb627d

File tree

2 files changed

+11
-3
lines changed

2 files changed

+11
-3
lines changed

azure-pipelines.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ stages:
2727
vmImage: ubuntu-24.04
2828

2929
container:
30-
image: sonicdev-microsoft.azurecr.io:443/sonic-slave-bookworm:$(BUILD_BRANCH)
30+
image: sonicdev-microsoft.azurecr.io:443/sonic-slave-bookworm-amd64:$(BUILD_BRANCH)
3131

3232
steps:
3333
- checkout: self

crates/procdockerstatsd-rs/src/main.rs

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,13 @@ use std::fs;
99
use std::collections::HashMap;
1010
use std::sync::LazyLock;
1111
use procfs;
12-
use tracing::{error, info};
12+
use tracing::{error, info, warn};
1313
use syslog_tracing;
1414
use std::ffi::CString;
1515
use serde::Deserialize;
1616

1717
const UPDATE_INTERVAL: u64 = 120; // 2 minutes
18+
const INVALID_CONTAINER_NAME: &str = "—-"; // invalid container name returned by docker stats command
1819

1920
#[derive(Debug, Deserialize)]
2021
#[serde(rename_all = "PascalCase")]
@@ -98,11 +99,18 @@ fn parse_docker_json_output(json_output: &str) -> HashMap<String, HashMap<String
9899
let stats: DockerStats = match serde_json::from_str(line) {
99100
Ok(s) => s,
100101
Err(e) => {
101-
error!("Failed to parse docker stats JSON: {}", e);
102+
error!("Failed to parse docker stats JSON for output {} with error {}", line, e);
102103
continue;
103104
}
104105
};
105106

107+
if stats.name.is_empty() || stats.name == INVALID_CONTAINER_NAME {
108+
// If a container stops suddenly after we send the docker stats command,
109+
// it might return with a container name "—-". We should ignore such output.
110+
warn!("Skipping docker stats JSON for container {} with output: {}", stats.id, line);
111+
continue;
112+
}
113+
106114
let key = format!("DOCKER_STATS|{}", stats.id);
107115
let mut container_data = HashMap::new();
108116

0 commit comments

Comments
 (0)