Harden pipeline interpreter state reload against transient MongoDB errors by patrickmann · Pull Request #25894 · Graylog2/graylog2-server

patrickmann · 2026-05-05T14:27:13Z

Description

PipelineInterpreterStateUpdater.reloadAndSave() could silently replace a valid pipeline state with an empty one when MongoDB hit a transient error during an event-triggered reload. Messages processed in that window bypassed all pipeline rules and landed in the default stream.

This PR implements three of the four fixes from #25750:

Let MongoException propagate from MongoDbRuleService and MongoDbPipelineService loadAll() and friends, instead of swallowing it and returning an empty set. Transient MongoDB failures now fail loudly, so callers can react. Other callers (REST resources, content packs, migrations) get a 500 on transient failure, which is correct.
Migrate state reload to a new PipelineInterpreterStateReloadJob (SystemJob) submitted via SystemJobManager. On failure the job retries with a 1 second delay. The constructor of PipelineInterpreterStateUpdater now performs the synchronous initial state load before registering on the event bus, closing the startup race window described in Pipeline rules not applied during multi-node restart due to async state reload race #25745. Pattern follows the existing PipelineMetadataUpdateJob.
PipelineInterpreterStateUpdater.updateState() refuses to replace a non-empty state with an empty one and logs at WARN. Defense in depth.
Null safety in PipelineInterpreter.process(). If getLatestState() returns null, messages pass through unchanged with a warning log instead of NPE. The companion change for IlluminateMessageProcessor.process() is in Graylog2/graylog-plugin-enterprise#14157.

Note on retry policy: SystemJobResult.withRetry requires maxRetries == Integer.MAX_VALUE until per-trigger retry tracking lands in the system scheduler.

How Tested

Manual: start a single-node Graylog with one pipeline attached to a stream and verify message processing. Edit the pipeline rule via the UI, confirm the new rule takes effect within a few seconds. Stop MongoDB briefly while editing another rule, then restart MongoDB, and verify the system job retries (server log shows Failed to reload pipeline interpreter state, retrying) and pipeline state is eventually rebuilt with no empty-state interval observed in message processing.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactoring (non-breaking change)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have requested a documentation update.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.

…rors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Harden pipeline interpreter state reload against transient MongoDB er…

97361d2

…rors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden pipeline interpreter state reload against transient MongoDB errors#25894

Harden pipeline interpreter state reload against transient MongoDB errors#25894
patrickmann wants to merge 1 commit intomasterfrom
fix/harden-pipeline-state-reload

patrickmann commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

patrickmann commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Tested

Types of changes

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

patrickmann commented May 5, 2026 •

edited

Loading