Skip to content

Energy calculation uses stale resource utilization due to refresh ordering #2446

@HananAwwad

Description

@HananAwwad

Kepler Version

0.10.0 or later (Current/Supported)

Bug Description

Hi, while working on energy measurement and validating results, I noticed an inconsistency in how resource utilization is accounted for during power calculation.

From my measurements and testing, the current implementation computes energy based on stale resource utilization data.

In firstReading, resources are refreshed after the initial node read:

   func (pm *PowerMonitor) firstReading(newSnapshot *Snapshot) error {
	// First read for node
	if err := pm.firstNodeRead(newSnapshot.Node); err != nil {
		return fmt.Errorf(nodePowerError, err)
	}

	if err := pm.resources.Refresh(); err != nil {
		pm.logger.Error("snapshot rebuild failed to refresh resources", "error", err)
		return err
	}
}

However, in calculatePower, node power is computed before refreshing resources:

 func (pm *PowerMonitor) calculatePower(prev, newSnapshot *Snapshot) error {
	// Calculate node power
	if err := pm.calculateNodePower(prev.Node, newSnapshot.Node); err != nil {
		return fmt.Errorf(nodePowerError, err)
	}

  if err := pm.resources.Refresh(); err != nil {
  	pm.logger.Error("snapshot rebuild failed to refresh resources", "error", err)
  	return err
  }

Problem

Because pm.resources.Refresh() happens after calculateNodePower, the energy calculation is effectively based on the previous snapshot’s resource utilization, not the current one.

In practice, this leads to a measurable lag/inaccuracy in energy accounting, especially when resource usage changes between snapshots.

Steps to Reproduce

  1. Start the system with power monitoring enabled

  2. Establish a baseline

    • Let the system run under low or idle resource utilisation and record a few consecutive energy/power measurements.
  3. Introduce a sudden change in resource utilisation

    • For example Start a CPU-intensive workload (e.g., stress test)
  4. Observe power/energy measurements across snapshots

    • Compare the timestamp when resource utilisation increases with the timestamp when the energy measurement reflects the increase
  5. Identify the lag

    • You will notice that:

      • Resource utilization increases at snapshot N
      • Energy measurement reflects this increase only at snapshot N+1

Energy measurements lag behind actual resource utilization by one snapshot cycle.

Expected Behavior

Energy measurements should reflect resource utilization changes within the same snapshot cycle.

kepler_logs.log

Environment

  • OS: Ubuntu 22.04.5 LTS (Jammy Jellyfish)
  • Kubernetes Version: v1.33.5+k3s1 (k3s single-node setup)
  • Container Runtime: containerd 2.1.4 (k3s)
  • Hardware: Intel Xeon Silver 4309Y CPU (RAPL supported), 32 CPUs
  • Deployment Method: Kubernetes DaemonSet (Kepler, 1 node)

Logs and Error Messages

From the collected data, node-active-energy appears to be computed using the previous interval's node-CPUUsageRatio rather than the current one.

For example:

  • At time 2026-03-23T18:33:30.960Z: node-CPUUsageRatio ≈ 0.007805

  • At time 2026-03-23T18:33:35.978Z: node-CPUUsageRatio ≈ 0.020210 && node-rapl-delta-energy ≈ 607.13

If energy were computed using the current ratio: Expected ≈ 607.13 × 0.020210 ≈ 12.27

However, the observed node-active-energy is ≈ 4.74, which matches: 607.13 × 0.007805 ≈ 4.74

This indicates that: node-active-energy(2026-03-23T18:33:35.978Z) is derived using node-CPUUsageRatio(2026-03-23T18:33:30.960Z) i.e., from the previous interval.

Additionally, a request was issued at:
2026-03-23 18:33:32.408387 UTC

which falls between these two samples. This causes the CPU usage increase to appear in the later snapshot, while the energy attribution is still based on the previous (lower) CPU usage, leading to stale accounting.

Metadata

Metadata

Assignees

Labels

kind/bugreport bug issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions