Fix flaky tests for issue 19844#20257
Fix flaky tests for issue 19844#20257liuguoqingfz wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
… instead of comparing to the exact value Signed-off-by: Joe Liu <guoqing4@illinois.edu>
WalkthroughThis change modifies the MedianAbsoluteDeviationIT test file to address flaky test failures. The updates introduce field-based MAD aggregation testing alongside existing script-based MAD tests, fetch and validate both aggregation results, and assert that script-based and field-based MAD computations yield matching values. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (2)
369-389: Consider using consistent aggregation settings for both MAD computations.The field-based aggregation (line 375) uses an explicit builder with default settings, while the script-based aggregation (line 377) uses
randomBuilder()which may apply random compression settings. Although thecloseToRelativematcher likely has sufficient tolerance, using consistent settings for both aggregations would make the comparison more robust and reduce potential variability.Apply this diff to use consistent settings:
final SearchResponse response = client().prepareSearch("idx") .setQuery(matchAllQuery()) .addAggregation(new MedianAbsoluteDeviationAggregationBuilder("mad_field").field("value")) .addAggregation( - randomBuilder().script(new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['value'].value + inc", params)) + new MedianAbsoluteDeviationAggregationBuilder("mad").script(new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['value'].value + inc", params)) ) .get();Note: The test correctly relies on MAD being translation-invariant (adding a constant to all values doesn't change MAD), so comparing the incremented script result to the non-incremented field result is mathematically sound.
392-411: Consider using consistent aggregation settings for both MAD computations.Similar to
testScriptSingleValuedWithParams, the field-based aggregation (line 395) uses explicit default settings while the script-based aggregation (line 397) usesrandomBuilder()with potentially different compression settings. For a more robust and consistent comparison, both aggregations should use the same settings.Apply this diff to use consistent settings:
final SearchResponse response = client().prepareSearch("idx") .setQuery(matchAllQuery()) .addAggregation(new MedianAbsoluteDeviationAggregationBuilder("mad_field").field("values")) .addAggregation( - randomBuilder().script( + new MedianAbsoluteDeviationAggregationBuilder("mad").script( new Script(ScriptType.INLINE, AggregationTestScriptsPlugin.NAME, "doc['values']", Collections.emptyMap()) ) ) .get();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, macos-15)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: Analyze (java)
🔇 Additional comments (1)
server/src/internalClusterTest/java/org/opensearch/search/aggregations/metrics/MedianAbsoluteDeviationIT.java (1)
347-366: LGTM! Solid fix for the flaky test.The refactored test now compares the script-based MAD to a field-based MAD instead of comparing against the exact precomputed value. Since both aggregations use the same default settings and the same underlying approximate algorithm, they should produce identical results, eliminating the flakiness from approximation drift beyond the 25% tolerance.
|
❌ Gradle check result for 14e55ab: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
This PR is stalled because it has been open for 30 days with no activity. |
Description
Fixed 3 flaky tests where the failure is coming from comparing an approximate MAD aggregation result to an exact precomputed value (singleValueExactMAD) with a fixed 25% tolerance. Depending on the random dataset and how the approximation merges, it can legitimately drift past 25% and become flaky.
Related Issues
Resolves #19844
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.