Skip to content

fix(rust): Flatten late merge_sorted unions#26962

Open
pablogsal wants to merge 1 commit intopola-rs:mainfrom
pablogsal:fix-merge-sorted-union-stack
Open

fix(rust): Flatten late merge_sorted unions#26962
pablogsal wants to merge 1 commit intopola-rs:mainfrom
pablogsal:fix-merge-sorted-union-stack

Conversation

@pablogsal
Copy link

@pablogsal pablogsal commented Mar 18, 2026

Fixes: #26960

The merge_sorted crash in 1.37 comes from a bad plan shape produced by the optimizer, not from merge_sorted itself returning the wrong data.

A repeated reduction like:

functools.reduce(lambda a, b: a.merge_sorted(b, key="ts"), inputs)

builds a left-deep MergeSorted tree. When the query later ends in an order-destroying or order-reestablishing operator such as a final sort, the order-observability pass is allowed to rewrite each MergeSorted node into an unordered Union because the exact intermediate merge order is no longer semantically observable.

That rewrite is fine in isolation, but in 1.37 the order pass runs after the main FlattenUnionRule. The result is that the optimizer can manufacture a left-deep Union(Union(Union(...))) tree after the only flattening pass has already finished. The memory engine then executes that nested union tree recursively. With enough inputs, the recursive execution depth is large enough to overflow a rayon worker stack and segfault.

This patch keeps the existing MergeSorted -> Union optimization but immediately runs FlattenUnionRule again afterwards. That is enough to collapse the late-created nested unions back into one flat Union node before execution planning continues.

In other words, this does not disable the optimization. It preserves the cheaper unordered execution strategy while preventing the pathological recursive executor shape that was causing the crash.

The regression test added here intentionally uses a much smaller merge_sorted chain than the real crash repro. The production repro uses thousands of inputs; the test only needs enough inputs to prove the optimizer emits a single flat UNION block instead of a chain of nested UNION nodes.


As requested by the first contribution guidelines:

Screenshot 2026-03-18 at 12 55 52

The merge_sorted crash in 1.37 comes from a bad plan shape produced by
the optimizer, not from merge_sorted itself returning the wrong data.

A repeated reduction like:

    functools.reduce(lambda a, b: a.merge_sorted(b, key="ts"), inputs)

builds a left-deep MergeSorted tree. When the query later ends in an
order-destroying or order-reestablishing operator such as a final sort,
the order-observability pass is allowed to rewrite each MergeSorted node
into an unordered Union because the exact intermediate merge order is no
longer semantically observable.

That rewrite is fine in isolation, but in 1.37 the order pass runs after
the main FlattenUnionRule. The result is that the optimizer can
manufacture a left-deep Union(Union(Union(...))) tree after the only
flattening pass has already finished. The memory engine then executes
that nested union tree recursively. With enough inputs, the recursive
execution depth is large enough to overflow a rayon worker stack and
segfault.

This patch keeps the existing MergeSorted -> Union optimization but
immediately runs FlattenUnionRule again afterwards. That is enough to
collapse the late-created nested unions back into one flat Union node
before execution planning continues.

In other words, this does not disable the optimization. It preserves the
cheaper unordered execution strategy while preventing the pathological
recursive executor shape that was causing the crash.
@pablogsal pablogsal changed the title flatten late merge_sorted unions fix(rust): flatten late merge_sorted unions Mar 18, 2026
@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Mar 18, 2026
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.74%. Comparing base (2f06807) to head (10ed20c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main   #26962    +/-   ##
========================================
  Coverage   81.73%   81.74%            
========================================
  Files        1808     1809     +1     
  Lines      249000   249151   +151     
  Branches     3139     3139            
========================================
+ Hits       203531   203674   +143     
- Misses      44664    44672     +8     
  Partials      805      805            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pablogsal pablogsal changed the title fix(rust): flatten late merge_sorted unions fix(rust): Flatten late merge_sorted unions Mar 18, 2026
@pablogsal
Copy link
Author

pablogsal commented Mar 18, 2026

See #26960 (comment) for context.

A more "correct" fix that doesn't involve a second pass is to fix it in the IR perhaps as right now it does:

  *ir = IR::Union {
      inputs: vec![*input_left, *input_right],
      ...
  };

That creates a binary Union, and because this pass visits parents before children, a left-deep MergeSorted chain turns into a left-deep Union chain. Then you need late flattening.

A better version can be:

  • When all_outputs_unordered, recursively collect the current MergeSorted subtree’s non-MergeSorted inputs.
  • Emit one flat IR::Union { inputs, ... } directly.

Conceptually:

  fn collect_merge_sorted_inputs(node: Node, ir_arena: &Arena<IR>, out: &mut Vec<Node>) {
      match ir_arena.get(node) {
          IR::MergeSorted { input_left, input_right, .. } => {
              collect_merge_sorted_inputs(*input_left, ir_arena, out);
              collect_merge_sorted_inputs(*input_right, ir_arena, out);
          },
          _ => out.push(node),
      }
  }

Then rewrite the current node to a flat union from those collected inputs as hat avoids the extra optimize_loop entirely.

I can go this route if people agree.


Edit: I dug a bit deeper into this.

The more direct place to address the issue is indeed the MergeSorted -> Union rewrite in the order pass. Today it does:

*ir = IR::Union {
    inputs: vec![*input_left, *input_right],
    ...
};

That preserves the original binary shape, so a left-deep MergeSorted chain becomes a left-deep Union chain, which is why we need a late flatten afterwards.

However, after tracing set_order, I don't think the “just emit one flat union directly” version is a trivial drop-in change. simplify_and_fetch_orderings builds the outputs/leaves bookkeeping for the original graph before ir_pushdown runs, and both pushdown_orders and pullup_orders rely on that bookkeeping and on the node input arities staying in sync. So if we replace a binary MergeSorted(left, right) in-place with an N-ary Union([...]), we also need to update that bookkeeping inside the pass; otherwise the existing outputs / hit-count logic becomes stale.

So I think there are two options:

  • Keep the current surgical fix
  • Do the more direct fix inside set_order, but then it should update the pass bookkeeping as well, rather than only changing the IR node in isolation.

Given that, I no longer think “flatten directly in ir_pushdown” is automatically the more correct/smaller fix. It is more direct conceptually, but it is also more invasive in this pass than it first appears.

@nameexhaustion nameexhaustion requested a review from kdn36 March 18, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first-contribution First contribution by user fix Bug fix rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault when deeply chaining merge_sorted and then sorting before collect (StackOverflow)

1 participant