Fix flaky tests for issue 19844 by liuguoqingfz · Pull Request #20257 · opensearch-project/OpenSearch

liuguoqingfz · 2025-12-16T14:14:45Z

Description

Fixed 3 flaky tests where the failure is coming from comparing an approximate MAD aggregation result to an exact precomputed value (singleValueExactMAD) with a fixed 25% tolerance. Depending on the random dataset and how the approximation merges, it can legitimately drift past 25% and become flaky.

Related Issues

Resolves #19844

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Tests
- Enhanced validation test coverage for median absolute deviation aggregations, ensuring consistency and reliability across different calculation approaches through comprehensive scenario-based testing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… instead of comparing to the exact value Signed-off-by: Joe Liu <guoqing4@illinois.edu>

coderabbitai · 2025-12-16T14:15:10Z

Walkthrough

This change modifies the MedianAbsoluteDeviationIT test file to address flaky test failures. The updates introduce field-based MAD aggregation testing alongside existing script-based MAD tests, fetch and validate both aggregation results, and assert that script-based and field-based MAD computations yield matching values.

Changes

Cohort / File(s)	Summary
Test Enhancement: Flaky Test Stabilization `server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java`	Added field-based MedianAbsoluteDeviation aggregation (mad_field) alongside script-based MAD (mad) in multiple tests. Updated test logic to fetch both aggregations, validate non-null results, and assert script-based MAD matches field-based MAD values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Single test file with straightforward assertion logic modifications
Changes involve adding a parallel aggregation and comparison, no complex logic density
Flaky test fix primarily adds validation rather than introducing new functionality

Suggested labels

flaky-test

Suggested reviewers

sachinpkale
cwperks
msfroh
andrross
gbbafna
dbwiddis

Poem

🐰 A flaky test once danced in the night,
But now we compare MAD side-by-side,
Field and script together take flight,
Both aggregations aligned with pride—
No more false failures to fear the tide! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing flaky tests related to issue 19844, which is the primary objective of the PR.
Description check	✅ Passed	The PR description provides a clear explanation of the fix, identifies the root cause (25% tolerance being exceeded), specifies the number of affected tests (3), and references the related issue (#19844).
Linked Issues check	✅ Passed	The PR directly addresses the flaky tests reported in issue #19844 by refactoring test logic to compare field-based and script-based MAD aggregations instead of comparing approximations against fixed precomputed values.
Out of Scope Changes check	✅ Passed	All changes are confined to the MedianAbsoluteDeviationIT test file and directly address the flaky test failures described in issue #19844; no out-of-scope modifications detected.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (2)
369-389: Consider using consistent aggregation settings for both MAD computations.

The field-based aggregation (line 375) uses an explicit builder with default settings, while the script-based aggregation (line 377) uses randomBuilder() which may apply random compression settings. Although the closeToRelative matcher likely has sufficient tolerance, using consistent settings for both aggregations would make the comparison more robust and reduce potential variability.

Apply this diff to use consistent settings:
         final SearchResponse response = client().prepareSearch("idx")
             .setQuery(matchAllQuery())
             .addAggregation(new MedianAbsoluteDeviationAggregationBuilder("mad_field").field("value"))
             .addAggregation(
-                randomBuilder().script(new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['value'].value + inc", params))
+                new MedianAbsoluteDeviationAggregationBuilder("mad").script(new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['value'].value + inc", params))
             )
             .get();
Note: The test correctly relies on MAD being translation-invariant (adding a constant to all values doesn't change MAD), so comparing the incremented script result to the non-incremented field result is mathematically sound.

392-411: Consider using consistent aggregation settings for both MAD computations.

Similar to testScriptSingleValuedWithParams, the field-based aggregation (line 395) uses explicit default settings while the script-based aggregation (line 397) uses randomBuilder() with potentially different compression settings. For a more robust and consistent comparison, both aggregations should use the same settings.

Apply this diff to use consistent settings:
         final SearchResponse response = client().prepareSearch("idx")
             .setQuery(matchAllQuery())
             .addAggregation(new MedianAbsoluteDeviationAggregationBuilder("mad_field").field("values"))
             .addAggregation(
-                randomBuilder().script(
+                new MedianAbsoluteDeviationAggregationBuilder("mad").script(
                     new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['values']", Collections.emptyMap())
                 )
             )
             .get();

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e798353 and 14e55ab.

📒 Files selected for processing (1)

server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: gradle-check
GitHub Check: precommit (21, macos-15)
GitHub Check: assemble (21, windows-latest)
GitHub Check: detect-breaking-change
GitHub Check: assemble (21, ubuntu-24.04-arm)
GitHub Check: assemble (25, ubuntu-latest)
GitHub Check: assemble (25, ubuntu-24.04-arm)
GitHub Check: assemble (25, windows-latest)
GitHub Check: assemble (21, ubuntu-latest)
GitHub Check: precommit (21, ubuntu-24.04-arm)
GitHub Check: precommit (21, windows-2025, true)
GitHub Check: precommit (25, macos-15)
GitHub Check: precommit (25, ubuntu-latest)
GitHub Check: precommit (21, windows-latest)
GitHub Check: precommit (25, windows-latest)
GitHub Check: precommit (25, macos-15-intel)
GitHub Check: precommit (25, ubuntu-24.04-arm)
GitHub Check: precommit (21, macos-15-intel)
GitHub Check: precommit (21, ubuntu-latest)
GitHub Check: Analyze (java)

🔇 Additional comments (1)

server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (1)

347-366: LGTM! Solid fix for the flaky test.

The refactored test now compares the script-based MAD to a field-based MAD instead of comparing against the exact precomputed value. Since both aggregations use the same default settings and the same underlying approximate algorithm, they should produce identical results, eliminating the flakiness from approximation drift beyond the 25% tolerance.

github-actions · 2025-12-16T14:58:30Z

❌ Gradle check result for 14e55ab: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

opensearch-trigger-bot · 2026-01-15T15:23:36Z

This PR is stalled because it has been open for 30 days with no activity.

compare the script-based MAD to the field-based MAD on the same data,…

14e55ab

… instead of comparing to the exact value Signed-off-by: Joe Liu <guoqing4@illinois.edu>

liuguoqingfz requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners December 16, 2025 14:14

github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Dec 16, 2025

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 15, 2026

liuguoqingfz closed this Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky tests for issue 19844#20257

Fix flaky tests for issue 19844#20257
liuguoqingfz wants to merge 1 commit intoopensearch-project:mainfrom
liuguoqingfz:flakytest-19844

liuguoqingfz commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

opensearch-trigger-bot bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liuguoqingfz commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 16, 2025

Uh oh!

opensearch-trigger-bot bot commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liuguoqingfz commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading