Skip to content

Add monitoring for config digest missmatch#2068

Merged
emate merged 7 commits into
mainfrom
emate/ccip-9196-monitor-config-digest-missmatch
May 18, 2026
Merged

Add monitoring for config digest missmatch#2068
emate merged 7 commits into
mainfrom
emate/ccip-9196-monitor-config-digest-missmatch

Conversation

@emate
Copy link
Copy Markdown
Contributor

@emate emate commented May 15, 2026

Add per-round config digest mismatch beholder metric for exec and commit plugins

Start emmiting ccip_{exec|commit}_config_digest_mismatch gauge (1=mismatch, 0=match) every round by comparing the home chain config digest against the offramp's on-chain config digest. Both reads are cached, so no additional RPC load.

Changes:

  • Extract shared ConfigDigestsMatchhelper
  • Add TrackConfigDigestMismatch to exec and commit metrics reporters (
  • Refactor existing checkConfigDigest (exec) and validateReport (commit) to use the shared helper

Copilot AI review requested due to automatic review settings May 15, 2026 09:27
@emate emate requested review from a team as code owners May 15, 2026 09:27
@github-actions
Copy link
Copy Markdown

👋 emate, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-round monitoring of config-digest mismatches between the home chain and the offramp for both the exec and commit plugins. A shared helper is introduced and used to refactor existing digest checks, and a new TrackConfigDigestMismatch metric (Prometheus + Beholder gauge) is emitted from each plugin's Observation path.

Changes:

  • New plugincommon.ConfigDigestsMatch / FormatConfigDigest helpers, reused by execute/observation.go (checkConfigDigest) and commit/report.go (validateReport).
  • New TrackConfigDigestMismatch(bool) reporter method with Prom + Beholder gauges (ccip_exec_config_digest_mismatch, ccip_commit_config_digest_mismatch); called every round in Plugin.Observation.
  • Tests updated to mock GetOffRampConfigDigest and to wire a Noop observer/reporter.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/plugincommon/config_digest.go New shared digest comparison and hex-format helpers.
execute/observation.go Emits mismatch metric per round; refactors checkConfigDigest to use the helper.
execute/metrics/reporter.go Adds TrackConfigDigestMismatch to the Reporter interface and Noop.
execute/metrics/prom.go Registers Prom + Beholder gauges and implements the tracker.
execute/observation_test.go, execute/plugin_test.go Wire observer and mock GetOffRampConfigDigest.
commit/plugin.go Emits mismatch metric per round in Observation.
commit/report.go Refactors digest check in validateReport to use the helper.
commit/metrics/reporter.go Adds TrackConfigDigestMismatch to interfaces and Noop.
commit/metrics/prom.go Registers Prom + Beholder gauges and implements the tracker.
commit/plugin_test.go, commit/plugin_roledon_e2e_test.go Mock GetOffRampConfigDigest for new path.
Comments suppressed due to low confidence (1)

commit/metrics/prom.go:175

  • Two struct fields were collapsed onto a single line, breaking the formatting and grouping with the unrelated processorLatencyHistogram block. Place configDigestMismatch: promCommitConfigDigestMismatch, on its own line (preferably alongside the other Prom gauge fields, e.g., right after looppProviderSupported) and restore the blank line separator between the Prom and beholder field groups.
		looppProviderSupported: promLooppCCIPProviderSupported, configDigestMismatch: promCommitConfigDigestMismatch,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread commit/report.go Outdated
Comment thread execute/metrics/prom.go Outdated
Comment thread commit/metrics/prom.go Outdated
emate and others added 2 commits May 15, 2026 11:31
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread commit/metrics/prom.go Outdated
commitLatestRound: promCommitLatestRoundID,
looppProviderSupported: promLooppCCIPProviderSupported,

looppProviderSupported: promLooppCCIPProviderSupported, configDigestMismatch: promCommitConfigDigestMismatch,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think adding configDigestMismatch on its own line (nearer to the bottom) would be more in line with expected formatting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread commit/metrics/prom.go
Comment on lines +361 to +364
p.bhConfigDigestMismatch.Record(context.Background(), int64(value), metric.WithAttributes(
attribute.String("chainFamily", p.chainFamily),
attribute.String("chainID", p.chainID),
))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I see the use of context.Background() a lot in these Track* methods. I'm wondering what these methods actually do: is it blocking network I/O or is it some kind of IPC (to a statsd daemon or similar)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beholder Record/Add calls are non-blocking ops - they buffer metrics for async export

Comment thread commit/plugin.go Outdated
ctx, p.ccipReader, consts.PluginTypeCommit, p.reportingCfg.ConfigDigest,
)
if err != nil {
lggr.Errorw("failed to check config digest", "err", err)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more descriptive error message.

Suggested change
lggr.Errorw("failed to check config digest", "err", err)
lggr.Errorw("failed to check for config digest mismatch", "err", err)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, added additional labels

			"homeChainConfigDigest", p.reportingCfg.ConfigDigest,
			"pluginType", consts.PluginTypeExecute,
		)

Comment thread commit/metrics/prom.go Outdated
promCommitConfigDigestMismatch = promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "ccip_commit_config_digest_mismatch",
Help: "Reports whether the home chain config digest differs from the offramp config digest (1 = mismatch, 0 = match)",
}, []string{"chainFamily", "chainID"})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: this metric has "chainFamily" in the string slice, whereas the previous metric has "chain_family" (promLooppCCIPProviderSupported) and one pretty far up this list also has "chainFamily". Should we settle on one instead and preferably use a const to refer to this label?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair, adjusted the label names to what we currently have

Comment thread commit/metrics/prom.go
Help: "Tracks whether LOOPP CCIP provider is supported for each chain family (1 = supported, 0 = not supported)",
}, []string{"chain_family"})
promCommitConfigDigestMismatch = promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "ccip_commit_config_digest_mismatch",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should these metric names be const? They are referred to in multiple places by the string literal, which as we see in this file can quickly go out of sync.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it can be the way to go. I'm just wondering if we should start with having one metric as const leaving the rest for the other PR is. WDYT @makramkd ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine if we do the cleanup in a follow-up.

Comment thread execute/metrics/prom.go Outdated
promExecConfigDigestMismatch = promauto.NewGaugeVec(prometheus.GaugeOpts{
Name: "ccip_exec_config_digest_mismatch",
Help: "Reports whether the home chain config digest differs from the offramp config digest (1 = mismatch, 0 = match)",
}, []string{"chainFamily", "chainID"})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: same question here with "chainFamily" vs. "chain_family".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@emate emate enabled auto-merge May 15, 2026 15:56
@github-actions
Copy link
Copy Markdown

Metric emate/ccip-9196-monitor-config-digest-missmatch main
Coverage 70.0% 69.9%

@emate emate added this pull request to the merge queue May 18, 2026
Merged via the queue into main with commit cae62cf May 18, 2026
58 checks passed
@emate emate deleted the emate/ccip-9196-monitor-config-digest-missmatch branch May 18, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants