Skip to content

feat: Support general spill mechanism for distinct aggregation#15876

Closed
clay4megtr wants to merge 5 commits intofacebookincubator:mainfrom
clay4megtr:agg_distinct_support_spill_part_one
Closed

feat: Support general spill mechanism for distinct aggregation#15876
clay4megtr wants to merge 5 commits intofacebookincubator:mainfrom
clay4megtr:agg_distinct_support_spill_part_one

Conversation

@clay4megtr
Copy link

General spill mechanism implementation for the agg(distinct) scenario. For more details, please refer to #15859

Acknowledgment: Parts of this code are derived from @aditi-pandit 's PR (#7791). Much appreciated!

@netlify
Copy link

netlify bot commented Dec 31, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a354182
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/696f178227ee1f000852184e

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 31, 2025
@clay4megtr clay4megtr force-pushed the agg_distinct_support_spill_part_one branch from e2be9a1 to 5c8b587 Compare December 31, 2025 12:44
@clay4megtr
Copy link
Author

hi, @xiaoxmeng @mbasmanova @tanjialiang @duxiao1212 , This pr is ready for review, PTAL, Thx!

image

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clay4megtr LGTM % minors. A question on null value handling. Thanks!

@clay4megtr clay4megtr requested a review from xiaoxmeng January 19, 2026 07:42
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clay4megtr LGTM % one nit. Thanks!

@meta-codesync
Copy link

meta-codesync bot commented Jan 19, 2026

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this in D90958171.

@xiaoxmeng xiaoxmeng requested a review from tanjialiang January 19, 2026 21:39
Copy link
Contributor

@tanjialiang tanjialiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM % some nits


std::vector<VectorPtr> makeInputForAggregation(const VectorPtr& input) const {
if (isSingleInputAggregate()) {
std::vector<VectorPtr> makeInputForAggregation(VectorPtr& input) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we drop const here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we drop const here?

since we use std::move in L214

Copy link
Contributor

@tanjialiang tanjialiang Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original code was misleading. The original code, since having the const qualifier, making the std::move() actually performing a copy. your change breaks the contract of this method by making input non-const (meaning the external referenced input is going to be moved). it is normally dangerous. We should either maintain the const signature and return a copy (without std::move) or make the signature rvalue reference VectorPtr&&

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually since the code is already landed, i'll help to make the change.

if constexpr (std::is_same_v<T, ComplexType>) {
offset += accumulator->extractValues(*elementsVector, offset);
} else {
offset += accumulator->extractValues(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clay4megtr , If the total size of distinct elements in that group exceeds 2GB, the spill fails with validateSpillBytesSize. i was wondering how do you plan to handle groups where the distinct values exceed 2GB?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clay4megtr , If the total size of distinct elements in that group exceeds 2GB, the spill fails with validateSpillBytesSize. i was wondering how do you plan to handle groups where the distinct values exceed 2GB?

This is a typical hot key scenario. The issue occurs not only during the spill write stage, but also during the spill merge stage, as it may involve several memory bloat issues. This is exactly what the Dynamic Aggregation Spill Mechanism is designed to address. You can find more details here: https://docs.google.com/document/d/1jhWUcvun3fHe_v4-Fi1vjBFzIbpth6dqPvwbhShwdqg/edit?tab=t.0#heading=h.4dbf3b8kvqhb
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the doc, @clay4megtr , the initial PR looks good to me, my only concern is that the ArrayVector's actual size > 2GB is not an uncommon cases/

if constexpr (std::is_same_v<T, ComplexType>) {
offset += accumulator->extractValues(*elementsVector, offset);
} else {
offset += accumulator->extractValues(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the doc, @clay4megtr , the initial PR looks good to me, my only concern is that the ArrayVector's actual size > 2GB is not an uncommon cases/

@meta-codesync
Copy link

meta-codesync bot commented Jan 22, 2026

@xiaoxmeng merged this pull request in ac20fbc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants