Skip to content

Fix flaky tests for issue 19844#20257

Closed
liuguoqingfz wants to merge 1 commit intoopensearch-project:mainfrom
liuguoqingfz:flakytest-19844
Closed

Fix flaky tests for issue 19844#20257
liuguoqingfz wants to merge 1 commit intoopensearch-project:mainfrom
liuguoqingfz:flakytest-19844

Conversation

@liuguoqingfz
Copy link
Contributor

@liuguoqingfz liuguoqingfz commented Dec 16, 2025

Description

Fixed 3 flaky tests where the failure is coming from comparing an approximate MAD aggregation result to an exact precomputed value (singleValueExactMAD) with a fixed 25% tolerance. Depending on the random dataset and how the approximation merges, it can legitimately drift past 25% and become flaky.

Related Issues

Resolves #19844

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Tests
    • Enhanced validation test coverage for median absolute deviation aggregations, ensuring consistency and reliability across different calculation approaches through comprehensive scenario-based testing.

✏️ Tip: You can customize this high-level summary in your review settings.

… instead of comparing to the exact value

Signed-off-by: Joe Liu <guoqing4@illinois.edu>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Walkthrough

This change modifies the MedianAbsoluteDeviationIT test file to address flaky test failures. The updates introduce field-based MAD aggregation testing alongside existing script-based MAD tests, fetch and validate both aggregation results, and assert that script-based and field-based MAD computations yield matching values.

Changes

Cohort / File(s) Summary
Test Enhancement: Flaky Test Stabilization
server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java
Added field-based MedianAbsoluteDeviation aggregation (mad_field) alongside script-based MAD (mad) in multiple tests. Updated test logic to fetch both aggregations, validate non-null results, and assert script-based MAD matches field-based MAD values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Single test file with straightforward assertion logic modifications
  • Changes involve adding a parallel aggregation and comparison, no complex logic density
  • Flaky test fix primarily adds validation rather than introducing new functionality

Suggested labels

flaky-test

Suggested reviewers

  • sachinpkale
  • cwperks
  • msfroh
  • andrross
  • gbbafna
  • dbwiddis

Poem

🐰 A flaky test once danced in the night,
But now we compare MAD side-by-side,
Field and script together take flight,
Both aggregations aligned with pride—
No more false failures to fear the tide! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing flaky tests related to issue 19844, which is the primary objective of the PR.
Description check ✅ Passed The PR description provides a clear explanation of the fix, identifies the root cause (25% tolerance being exceeded), specifies the number of affected tests (3), and references the related issue (#19844).
Linked Issues check ✅ Passed The PR directly addresses the flaky tests reported in issue #19844 by refactoring test logic to compare field-based and script-based MAD aggregations instead of comparing approximations against fixed precomputed values.
Out of Scope Changes check ✅ Passed All changes are confined to the MedianAbsoluteDeviationIT test file and directly address the flaky test failures described in issue #19844; no out-of-scope modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (2)

369-389: Consider using consistent aggregation settings for both MAD computations.

The field-based aggregation (line 375) uses an explicit builder with default settings, while the script-based aggregation (line 377) uses randomBuilder() which may apply random compression settings. Although the closeToRelative matcher likely has sufficient tolerance, using consistent settings for both aggregations would make the comparison more robust and reduce potential variability.

Apply this diff to use consistent settings:

         final SearchResponse response = client().prepareSearch("idx")
             .setQuery(matchAllQuery())
             .addAggregation(new MedianAbsoluteDeviationAggregationBuilder("mad_field").field("value"))
             .addAggregation(
-                randomBuilder().script(new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['value'].value + inc", params))
+                new MedianAbsoluteDeviationAggregationBuilder("mad").script(new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['value'].value + inc", params))
             )
             .get();

Note: The test correctly relies on MAD being translation-invariant (adding a constant to all values doesn't change MAD), so comparing the incremented script result to the non-incremented field result is mathematically sound.


392-411: Consider using consistent aggregation settings for both MAD computations.

Similar to testScriptSingleValuedWithParams, the field-based aggregation (line 395) uses explicit default settings while the script-based aggregation (line 397) uses randomBuilder() with potentially different compression settings. For a more robust and consistent comparison, both aggregations should use the same settings.

Apply this diff to use consistent settings:

         final SearchResponse response = client().prepareSearch("idx")
             .setQuery(matchAllQuery())
             .addAggregation(new MedianAbsoluteDeviationAggregationBuilder("mad_field").field("values"))
             .addAggregation(
-                randomBuilder().script(
+                new MedianAbsoluteDeviationAggregationBuilder("mad").script(
                     new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['values']", Collections.emptyMap())
                 )
             )
             .get();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e798353 and 14e55ab.

📒 Files selected for processing (1)
  • server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: Analyze (java)
🔇 Additional comments (1)
server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (1)

347-366: LGTM! Solid fix for the flaky test.

The refactored test now compares the script-based MAD to a field-based MAD instead of comparing against the exact precomputed value. Since both aggregations use the same default settings and the same underlying approximate algorithm, they should produce identical results, eliminating the flakiness from approximation drift beyond the 25% tolerance.

@github-actions
Copy link
Contributor

❌ Gradle check result for 14e55ab: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run stalled Issues that have stalled >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for MedianAbsoluteDeviationIT

1 participant