feat(tesseract): Support separate pre-aggregations for different multi-fact subqueries by waralexrom · Pull Request #10718 · cube-js/cube

waralexrom · 2026-04-20T12:38:21Z

Check List

Tests have been run in packages where changes have been made if available
Linter has been run for changed code
Tests for the changes have been added if not covered yet
Docs have been added / updated if required

…i-stage subqueries

…i-fact subqueries

claude · 2026-04-20T12:38:41Z

Claude finished @waralexrom's task in 2m 37s —— View job

Code Review: Separate Pre-aggregations for Multi-fact Subqueries

Gather context and understand PR changes
Review code quality and architecture
Review security considerations
Review performance implications
Review test coverage
Post comprehensive review

Summary

This PR refactors how multiplied (multi-fact) measures are planned and optimized in Tesseract. The key architectural change is unifying the CTE management for both multi-stage members and multiplied measures into a single CteState, eliminating the separate ResolveMultipliedMeasures logical node. This enables pre-aggregation matching to work independently on each multi-fact subquery (regular or multiplied) rather than requiring the entire multiplied-measures subtree to match or fail as a unit.

What changed:

Deleted ResolveMultipliedMeasures (logical + physical) — multiplied measures are now planned as individual LogicalMultiStageMember CTEs
New CteState — shared counter/accumulator for both MultiStageQueryPlanner and MultipliedMeasuresQueryPlanner
MultiStageLeafMeasure.measure → .measures (Vec) — supports multi-measure CTEs from regular measure subqueries
pre_aggregation_override on AggregateMultipliedSubquery — allows the pre-agg optimizer to short-circuit individual multiplied subqueries
Pre-agg optimizer refactored — try_rewrite_query now handles simple rewrite + multi-stage fallback in one place; new try_rewrite_schema_and_filter for AggregateMultipliedSubquery matching
New bucketing integration tests — 8 tests covering multi-stage dimensions with bucketing logic

The approach is sound and the refactoring materially simplifies the codebase (~200 net lines removed despite adding tests and new functionality).

Code Quality

Positive:

Good separation of concerns: CteState is a clean, focused abstraction
The ResolveMultipliedMeasures + ResolvedMultipliedMeasures enum removal significantly simplifies FullKeyAggregate and all three aggregate strategy processors
The pre-aggregation optimizer flows more linearly now

Issues:

try_rewrite_schema_and_filter always uses TimeShiftState::default() — rust/cubesqlplanner/cubesqlplanner/src/logical_plan/optimizers/pre_aggregation/optimizer.rs:139-143

When matching a AggregateMultipliedSubquery against pre-aggregations, try_rewrite_schema_and_filter hardcodes TimeShiftState::default() for date range extraction. This is likely correct since multiplied measure subqueries don't have time shifts, but the same hardcoding also appears in the PlanNode::AggregateMultipliedSubquery arm at line 218. Consider whether the time shifts from the enclosing context should be threaded through here.

Investigate this →
root_filter used for all multiplied subquery matching — optimizer.rs:189-191

The comment says "multiplied-measure CTEs don't carry their own filter — logically they apply the same filter as the root query." This is a key semantic assumption. If a future change adds filter pushdown into multiplied measure subqueries, this would silently become wrong. The comment is helpful but consider adding a debug assertion or at minimum ensuring this invariant is documented at the AggregateMultipliedSubquery struct level.
Duplicate code in try_rewrite_simple_query vs try_rewrite_schema_and_filter — optimizer.rs:117-131 vs optimizer.rs:139-170

These two methods share nearly identical logic (schema/filter matching → make source → build Query). The only differences are: (a) try_rewrite_simple_query copies the existing query's modifiers while try_rewrite_schema_and_filter creates empty modifiers, and (b) the date range extraction. Consider extracting a shared helper.

Fix this →
Empty data_queries left behind in strategy files — The full_join_aggregate_strategy.rs:103, inner_join_aggregate_strategy.rs:29, and keys_aggregate_strategy.rs:33 files still declare let mut data_queries = vec![]; or let mut keys_queries = vec![]; immediately before the remaining loop. Not a bug, but now the code looks like it was hastily trimmed rather than cleanly restructured. No need to fix now, but worth noting.

Security

No security concerns. This is internal query planning logic with no user-facing input parsing, no SQL injection vectors (queries are built via typed builders), and no auth/authz changes.

Performance

CTE naming is globally sequential now — With CteState shared across multi-stage and multiplied planners, CTE names are cte_0, cte_1, ... in order of planning, not grouped by type. This is fine functionally but means CTE numbering is now sensitive to planning order. If the order of multi_stage_query_planner.plan_queries() vs multiplied_measures_query_planner.plan_queries() ever changes, CTE names will shift, potentially breaking snapshot tests. This is acceptable but worth being aware of.
No performance regression expected — The refactoring doesn't add extra passes or copies. The pre_aggregation_override field adds one Option<Rc<Query>> per AggregateMultipliedSubquery which is negligible.
Pre-aggregation matching improvement — The new approach tries simple rewrite first, then falls back to multi-stage rewriting (line 95-100). This is more efficient than the old code which checked !plan.multistage_members().is_empty() upfront and went straight to the multi-stage path, potentially missing a simpler match.

Testing

Strengths:

8 new integration tests covering bucketing scenarios: simple, multistage measures, complex dimensions, cross-cube references, joins, multi-dimension, and concatenated dimensions
Tests exercise both SQL generation (build_sql) and actual execution against Postgres (try_execute_pg)
Good seed data with diverse customer revenue patterns enabling meaningful assertions

Concerns:

No test for the pre-aggregation optimization path — The new PlanNode::AggregateMultipliedSubquery arm in the optimizer (line 216-233) and try_rewrite_schema_and_filter are not directly tested. The bucketing tests validate the planning pipeline but don't exercise the pre-aggregation rewrite of multiplied subqueries. Consider adding a test that provides a pre-aggregation definition matching a multiplied measure subquery.
No negative/error tests — All tests are happy-path. Consider adding a test where pre-aggregation matching partially fails (some subqueries match, some don't) to verify the rollback behavior (line 243-250) works correctly.
cumulative measure rejection — The new guard in measure_matcher.rs:34-38 that rejects cumulative measures from pre-aggregation matching is important but not directly tested.

Minor Notes

Pre-existing typo: only_addictive should be only_additive in measure_matcher.rs — not introduced by this PR.
modifers typo in Query builder (line 125, 161) — also pre-existing, not from this PR.
The regex update in pre-aggregation-utils.ts to handle WITH in addition to SELECT is correct and necessary since multiplied measure CTEs may now produce WITH ... SELECT patterns.

Verdict

This is a well-designed architectural refactoring that simplifies the codebase while enabling more granular pre-aggregation matching. The main actionable feedback is around code duplication between the two rewrite methods and the missing test coverage for the pre-aggregation optimization of multiplied subqueries. The rest is solid.

codecov · 2026-04-20T12:49:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.76%. Comparing base (a2e8f6a) to head (8a7bcd9).

Additional details and impacted files

@@                                   Coverage Diff                                   @@
##           tesseract-multi-stage-pre-aggregations-full-support   #10718      +/-   ##
=======================================================================================
- Coverage                                                78.79%   78.76%   -0.03%     
=======================================================================================
  Files                                                      465      465              
  Lines                                                    91999    91925      -74     
  Branches                                                  3374     3353      -21     
=======================================================================================
- Hits                                                     72488    72408      -80     
- Misses                                                   19020    19028       +8     
+ Partials                                                   491      489       -2

Flag	Coverage Δ
cube-backend	`57.80% <ø> (-0.20%)`	⬇️
cubesql	`83.40% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

waralexrom added 30 commits April 9, 2026 16:47

feat(tesseract): Support separate pre-aggregations for different mult…

4f80d07

…i-stage subqueries

in work

6b69875

in work

e7db7e3

in work

01ef908

in work

42cf36f

in work

f81988e

in work

2c07309

in work

2bf4b01

in work

93becb3

fmt

14ce271

fix

0f72f20

fix

f838708

fix

1f4cfe8

fmt

ef0e00f

lint

dfb32cc

lint

554c62d

fix

a759d69

fix

2b4fad1

fix

5b1e2f1

fix

9617e56

fix

e6f7e9a

fmt

8c6b01e

fix

a2e8f6a

feat(tesseract): Support separate pre-aggregations for different mult…

63aeb89

…i-fact subqueries

in work

f573aae

bucketing tests

3b5299e

in work

d8b0b20

in work

5471928

multiplied measures as subqueries

5582430

in work

b61c274

waralexrom added 3 commits April 20, 2026 10:20

in work

f30383b

in work

5cbf9af

in work

df64c07

waralexrom requested a review from a team as a code owner April 20, 2026 12:38

github-actions Bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Apr 20, 2026

vercel Bot deployed to Preview April 20, 2026 12:39 View deployment

waralexrom added 8 commits April 20, 2026 15:09

in work

2fb0e1c

in work

9ced439

in work

ddc5d3f

fmt

c010d63

in work

5e498c8

in work

047ec09

in work

cf5a0fb

in work

8a7bcd9

waralexrom force-pushed the tesseract-multi-stage-pre-aggregations-full-support branch from a884e18 to 64242fe Compare April 24, 2026 10:53

waralexrom requested review from a team as code owners April 24, 2026 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tesseract): Support separate pre-aggregations for different multi-fact subqueries#10718

feat(tesseract): Support separate pre-aggregations for different multi-fact subqueries#10718
waralexrom wants to merge 41 commits intotesseract-multi-stage-pre-aggregations-full-supportfrom
tesseract-multi-fact-separate-pre-aggregations

waralexrom commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waralexrom commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Separate Pre-aggregations for Multi-fact Subqueries

Summary

Code Quality

Security

Performance

Testing

Minor Notes

Verdict

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 20, 2026 •

edited

Loading

codecov Bot commented Apr 20, 2026 •

edited

Loading