Skip to content

[exporter/kafka] Add health reporting to kafka exporter#47293

Open
khushijain21 wants to merge 4 commits intoopen-telemetry:mainfrom
khushijain21:kafka-component-status
Open

[exporter/kafka] Add health reporting to kafka exporter#47293
khushijain21 wants to merge 4 commits intoopen-telemetry:mainfrom
khushijain21:kafka-component-status

Conversation

@khushijain21
Copy link
Copy Markdown
Contributor

@khushijain21 khushijain21 commented Apr 1, 2026

Description

This PR adds health status reporting to kafka exporter. It reports statusOK if connection is successfull and events are published.

It reports degraded status in case topic authorization fails or unuspported version etc. This list can be extended. The idea is that errors that cannot be recovered unless config changes are made marks the exporter unhealthy.

Testing

Documentation

@khushijain21 khushijain21 changed the title kafka component status [exporter/kafka] Add health reporting to kafka exporter Apr 1, 2026
@khushijain21 khushijain21 marked this pull request as ready for review April 1, 2026 10:17
@khushijain21 khushijain21 requested review from a team, MovieStoreGuy and axw as code owners April 1, 2026 10:17
Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
// check if its defined as a non-retriable error by franzgo
kgoErr := &kerr.Error{}
if errors.As(r.Err, &kgoErr) && !kgoErr.Retriable {
if isNonRecoverableKafkaError(r.Err) || errors.Is(r.Err, kerr.MessageTooLarge) {
Copy link
Copy Markdown
Contributor

@VihasMakwana VihasMakwana Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move errors.Is(r.Err, kerr.MessageTooLarge) into isNonRecoverableKafkaError?

Suggested change
if isNonRecoverableKafkaError(r.Err) || errors.Is(r.Err, kerr.MessageTooLarge) {
if isNonRecoverableKafkaError(r.Err) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about this, but not every batch size will be > max_message_bytes. Hence reporting a degraded status does not seem right

return errors.Is(err, kerr.SaslAuthenticationFailed) ||
errors.Is(err, kerr.ClusterAuthorizationFailed) ||
errors.Is(err, kerr.UnsupportedVersion) ||
errors.Is(err, kerr.TopicAuthorizationFailed)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errors.Is(err, kerr.TopicAuthorizationFailed)
errors.Is(err, kerr.TopicAuthorizationFailed) ||
errors.Is(r.Err, kerr.MessageTooLarge)

Copy link
Copy Markdown
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits

@paulojmdias
Copy link
Copy Markdown
Member

paulojmdias commented Apr 1, 2026

@khushijain21 This can lead to data loss, right? Even some of those errors can be identified as "non-retriable", they can be fixed on the Kafka side (like ACLs, for example), and the collector on the next retry will successfully push the data without any restart needed.

At the same time, if we move forward with this, I'm wondering if we shouldn't make this option behind a config option to let the users choose the approach they want.

I would like to have other code owners opinions here

@khushijain21
Copy link
Copy Markdown
Contributor Author

khushijain21 commented Apr 1, 2026

This can lead to data loss, right? Even some of those errors can be identified as "non-retriable", they can be fixed on the Kafka side (like ACLs, for example), and the collector on the next retry will successfully push the data without any restart needed.

reporting a recoverable error status does not, by itself, turn off the exporter or stop it from seeing new telemetry. If collector is able to push data on next retry - the status will switch to OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants