Skip to content

Conversation

@godnight10061
Copy link
Contributor

Summary

  • Run ScanBuilder::prepare() for ScanBuilder::into_stream() on spawn_blocking instead of inside Stream::poll_next.
  • Add a regression test to ensure the first poll returns quickly.

Motivation

This addresses the ClickBench regression reported in #5899. prepare() can be expensive, and running it synchronously inside poll_next can cause head-of-line blocking when many scan streams are polled concurrently.

Testing

  • cargo +nightly fmt --all --check
  • cargo test -p vortex-scan --lib
  • cargo clippy --locked --all-features --all-targets -p vortex-scan -- -D warnings
  • cargo hack clippy -p vortex-scan --no-default-features -- -D warnings
  • cargo semver-checks --workspace (ran in a temporary worktree after setting workspace version to 0.58.0)
  • PYO3_PYTHON=C:\Python312\python.exe cargo test --locked --workspace --all-features --exclude vortex-duckdb --exclude duckdb-bench

Bench

Local DataFusion ClickBench (partitioned, q23, 5 iters):

  • datafusion:vortex-file-compressed: 39.405s → 27.661s
  • datafusion:vortex-compact: 57.573s → 35.945s

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 70.05076% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.35%. Comparing base (d757be1) to head (5d5b799).
⚠️ Report is 11 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-scan/src/scan_builder.rs 67.58% 59 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@connortsui20 connortsui20 added the changelog/performance A performance improvement label Jan 9, 2026
@connortsui20
Copy link
Contributor

connortsui20 commented Jan 9, 2026

Thanks for the PR!

I'm going to run our benchmarks on this branch so there will be a bunch of large comments made here by a bot that gives a rough idea where there are improvements/regressions, this is just a heads up!

Edit: huh something went wrong with that but at least they ran...

Edit again: There seems to be some sort of race condition in our CI. Because we have to approve the workflows on outside contributors' PRs, the actions bot doesn't take the benchmark-sql tag off and we don't get any summaries from the bot. The benchmarks do run though. Maybe @AdamGS understands this better?

@connortsui20 connortsui20 added action/benchmark-sql Trigger SQL benchmarks to run on this PR and removed action/benchmark-sql Trigger SQL benchmarks to run on this PR labels Jan 9, 2026
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
@godnight10061 godnight10061 force-pushed the fix/5899-clickbench-regression branch from 2f445e6 to 9bf6ce2 Compare January 10, 2026 01:10
@connortsui20 connortsui20 added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jan 10, 2026
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
@connortsui20 connortsui20 removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jan 10, 2026
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
@joseph-isaacs
Copy link
Contributor

Re the benchmark that is the expected behaviour. We cannot give a fork action write permission. We use the run summary to print out the result

@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label Jan 12, 2026
@godnight10061
Copy link
Contributor Author

godnight10061 commented Jan 12, 2026

@connortsui20 @joseph-isaacs The two failing checks don't appear to be caused by this PR. Happy to provide additional context if needed.

@joseph-isaacs
Copy link
Contributor

@godnight10061 look there is not a merge conflict

I also see some regressions in clickbench: https://github.com/vortex-data/vortex/actions/runs/20914237836?pr=5906 and fineweb any ideas??

@godnight10061
Copy link
Contributor Author

I also see some regressions in clickbench: https://github.com/vortex-data/vortex/actions/runs/20914237836?pr=5906 and fineweb any ideas??

May you tell me which exact target line you're referring to?

@joseph-isaacs joseph-isaacs added action/benchmark Trigger full benchmarks to run on this PR and removed action/benchmark Trigger full benchmarks to run on this PR labels Jan 14, 2026
@joseph-isaacs
Copy link
Contributor

joseph-isaacs commented Jan 14, 2026

clickbench: clickbench_{q24,q26}/duckdb:vortex-file-compressed, I have just kicked off another run so we can see if these repeat (new: https://github.com/vortex-data/vortex/actions/runs/20988924098 previous: https://github.com/vortex-data/vortex/actions/runs/20914237836)

@joseph-isaacs joseph-isaacs added action/benchmark Trigger full benchmarks to run on this PR and removed action/benchmark Trigger full benchmarks to run on this PR labels Jan 14, 2026
@robert3005
Copy link
Contributor

clickbench_q24/duckdb:vortex-file-compressed 58105110 4.38643e+07 1.32465 ns 🚨
clickbench_q25/duckdb:vortex-file-compressed 72616639 6.3129e+07 1.15029 ns 🚨
clickbench_q26/duckdb:vortex-file-compressed 53189696 4.08331e+07 1.30261 ns 🚨

Interestingly doesn't affect datafusion

Signed-off-by: godnight10061 <godnight10061@users.noreply.github.com>
@godnight10061
Copy link
Contributor Author

Thanks for the report. I pushed a follow-up (5d5b799) that reduces blocking work in LazyScanStream, which I suspect is behind the DuckDB q24–q26 slowdown. I can’t run DuckDB ClickBench/FineWeb locally on Windows—could you rerun the benchmark workflows on CI? If it still regresses, I’ll investigate further.

@joseph-isaacs joseph-isaacs added action/benchmark Trigger full benchmarks to run on this PR and removed action/benchmark Trigger full benchmarks to run on this PR labels Jan 15, 2026
@AdamGS
Copy link
Contributor

AdamGS commented Jan 15, 2026

Results look pretty good to me, @joseph-isaacs WDYT? The regressions look within general noise, and there are a few improvements that look significant.

Copy link
Contributor

@AdamGS AdamGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline, code looks great and so do the results. I'll merge this later today.

BTW - if you want to discuss vortex, we have a shared slack channel.

@AdamGS AdamGS merged commit 11c6bea into vortex-data:develop Jan 15, 2026
63 of 64 checks passed
@godnight10061 godnight10061 deleted the fix/5899-clickbench-regression branch January 15, 2026 14:13
@connortsui20
Copy link
Contributor

@godnight10061 Thanks again for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action/benchmark Trigger full benchmarks to run on this PR changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants