fix(rust): Flatten late `merge_sorted` unions by pablogsal · Pull Request #26962 · pola-rs/polars

pablogsal · 2026-03-18T12:56:06Z

The merge_sorted crash in 1.37 comes from a bad plan shape produced by the optimizer, not from merge_sorted itself returning the wrong data.

A repeated reduction like:

functools.reduce(lambda a, b: a.merge_sorted(b, key="ts"), inputs)

builds a left-deep MergeSorted tree. When the query later ends in an order-destroying or order-reestablishing operator such as a final sort, the order-observability pass is allowed to rewrite each MergeSorted node into an unordered Union because the exact intermediate merge order is no longer semantically observable.

That rewrite is fine in isolation, but in 1.37 the order pass runs after the main FlattenUnionRule. The result is that the optimizer can manufacture a left-deep Union(Union(Union(...))) tree after the only flattening pass has already finished. The memory engine then executes that nested union tree recursively. With enough inputs, the recursive execution depth is large enough to overflow a rayon worker stack and segfault.

This patch keeps the existing MergeSorted -> Union optimization but immediately runs FlattenUnionRule again afterwards. That is enough to collapse the late-created nested unions back into one flat Union node before execution planning continues.

In other words, this does not disable the optimization. It preserves the cheaper unordered execution strategy while preventing the pathological recursive executor shape that was causing the crash.

The regression test added here intentionally uses a much smaller merge_sorted chain than the real crash repro. The production repro uses thousands of inputs; the test only needs enough inputs to prove the optimizer emits a single flat UNION block instead of a chain of nested UNION nodes.

As requested by the first contribution guidelines:

The merge_sorted crash in 1.37 comes from a bad plan shape produced by the optimizer, not from merge_sorted itself returning the wrong data. A repeated reduction like: functools.reduce(lambda a, b: a.merge_sorted(b, key="ts"), inputs) builds a left-deep MergeSorted tree. When the query later ends in an order-destroying or order-reestablishing operator such as a final sort, the order-observability pass is allowed to rewrite each MergeSorted node into an unordered Union because the exact intermediate merge order is no longer semantically observable. That rewrite is fine in isolation, but in 1.37 the order pass runs after the main FlattenUnionRule. The result is that the optimizer can manufacture a left-deep Union(Union(Union(...))) tree after the only flattening pass has already finished. The memory engine then executes that nested union tree recursively. With enough inputs, the recursive execution depth is large enough to overflow a rayon worker stack and segfault. This patch keeps the existing MergeSorted -> Union optimization but immediately runs FlattenUnionRule again afterwards. That is enough to collapse the late-created nested unions back into one flat Union node before execution planning continues. In other words, this does not disable the optimization. It preserves the cheaper unordered execution strategy while preventing the pathological recursive executor shape that was causing the crash.

codecov · 2026-03-18T13:40:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.74%. Comparing base (2f06807) to head (10ed20c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff            @@
##             main   #26962    +/-   ##
========================================
  Coverage   81.73%   81.74%            
========================================
  Files        1808     1809     +1     
  Lines      249000   249151   +151     
  Branches     3139     3139            
========================================
+ Hits       203531   203674   +143     
- Misses      44664    44672     +8     
  Partials      805      805

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pablogsal · 2026-03-18T16:23:46Z

See #26960 (comment) for context.

A more "correct" fix that doesn't involve a second pass is to fix it in the IR perhaps as right now it does:

  *ir = IR::Union {
      inputs: vec![*input_left, *input_right],
      ...
  };

That creates a binary Union, and because this pass visits parents before children, a left-deep MergeSorted chain turns into a left-deep Union chain. Then you need late flattening.

A better version can be:

When all_outputs_unordered, recursively collect the current MergeSorted subtree’s non-MergeSorted inputs.
Emit one flat IR::Union { inputs, ... } directly.

Conceptually:

  fn collect_merge_sorted_inputs(node: Node, ir_arena: &Arena<IR>, out: &mut Vec<Node>) {
      match ir_arena.get(node) {
          IR::MergeSorted { input_left, input_right, .. } => {
              collect_merge_sorted_inputs(*input_left, ir_arena, out);
              collect_merge_sorted_inputs(*input_right, ir_arena, out);
          },
          _ => out.push(node),
      }
  }

Then rewrite the current node to a flat union from those collected inputs as hat avoids the extra optimize_loop entirely.

I can go this route if people agree.

Edit: I dug a bit deeper into this.

The more direct place to address the issue is indeed the MergeSorted -> Union rewrite in the order pass. Today it does:

*ir = IR::Union {
    inputs: vec![*input_left, *input_right],
    ...
};

That preserves the original binary shape, so a left-deep MergeSorted chain becomes a left-deep Union chain, which is why we need a late flatten afterwards.

However, after tracing set_order, I don't think the “just emit one flat union directly” version is a trivial drop-in change. simplify_and_fetch_orderings builds the outputs/leaves bookkeeping for the original graph before ir_pushdown runs, and both pushdown_orders and pullup_orders rely on that bookkeeping and on the node input arities staying in sync. So if we replace a binary MergeSorted(left, right) in-place with an N-ary Union([...]), we also need to update that bookkeeping inside the pass; otherwise the existing outputs / hit-count logic becomes stale.

So I think there are two options:

Keep the current surgical fix
Do the more direct fix inside set_order, but then it should update the pass bookkeeping as well, rather than only changing the IR node in isolation.

Given that, I no longer think “flatten directly in ir_pushdown” is automatically the more correct/smaller fix. It is more direct conceptually, but it is also more invasive in this pass than it first appears.

pablogsal requested review from MarcoGorelli, alexander-beedie, c-peters, orlp and ritchie46 as code owners March 18, 2026 12:56

github-actions bot added first-contribution First contribution by user title needs formatting labels Mar 18, 2026

pablogsal mentioned this pull request Mar 18, 2026

Segfault when deeply chaining merge_sorted and then sorting before collect (StackOverflow) #26960

Closed

2 tasks

pablogsal changed the title ~~flatten late merge_sorted unions~~ fix(rust): flatten late merge_sorted unions Mar 18, 2026

github-actions bot added fix Bug fix rust Related to Rust Polars labels Mar 18, 2026

pablogsal changed the title ~~fix(rust): flatten late merge_sorted unions~~ fix(rust): Flatten late merge_sorted unions Mar 18, 2026

github-actions bot removed the title needs formatting label Mar 18, 2026

nameexhaustion requested a review from kdn36 March 18, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rust): Flatten late `merge_sorted` unions#26962

fix(rust): Flatten late `merge_sorted` unions#26962
pablogsal wants to merge 1 commit intopola-rs:mainfrom
pablogsal:fix-merge-sorted-union-stack

pablogsal commented Mar 18, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 18, 2026

Uh oh!

pablogsal commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pablogsal commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 18, 2026

Codecov Report

Uh oh!

pablogsal commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pablogsal commented Mar 18, 2026 •

edited

Loading

pablogsal commented Mar 18, 2026 •

edited

Loading