perf(sort): use lookup table to speed up sequence conversion to bitpacked representation#1649
perf(sort): use lookup table to speed up sequence conversion to bitpacked representation#1649corneliusroemer wants to merge 20 commits intomasterfrom
Conversation
…cked representation Looking into sort performance (see #1647), I noticed that the various string containment operations where performance limiting. Instead of doing multiple string containment call, this PR uses a compile time lookup table to turn a sequence into a bitpacked representation. A quick benchmark on 240MB of SC2 sequences show this reduces core sort algorithm runtime (excluding minimizer download) by 42%.
| return cutoff + 1; // break out of loop, return hash above cutoff | ||
| } | ||
|
|
||
| // A=11=3, C=10=2, G=00=0, T=01=1 |
There was a problem hiding this comment.
This comment was actually wrong - T and C should have been interchanged.
There was a problem hiding this comment.
That's why the initial preview finds no matches (I used the comment rather than the code to implement the lookup table)
|
I was requested to review this. Perf improvement sounds nice! (haven't checked numbers though) I am having difficulties understanding the diff and commit messages. Looks like the work is ongoing? Agent went sentient? :) The PR goes well beyond the "bitpacked representation". Some replaced pieces seem unrelated and don't look equivalent. Will need some more explanation to understand. It's unclear how to verify correctness. We don't have tests here, so we will have to "big brain" it, and I will need all the help available to do that. Ideally, I'd like bit-twiddling tricks to be isolated, rather than just spread all over the place. And if it's a well-known algo (most of them are), to be named explicitly. If some kind of a bit container is used, perhaps it could be made into a reusable class? Or perhaps an existing library can be used? (sparing us from edge cases, like all kinds of over-/under-flows, and from maintenance) Oftentimes bits don't need to be moved directly. Compiler can usually transform well-written, human-readable code into bit hackery itself. Usually even better than humans can. This looks like an over-optimization and "not invented here" code (typical for LLM code). Though if performance gains are there and correctness is preserved, I would not mind. But I might be a bit grumpy :) Will wait for some complete work and some comments before re-reviewing. Feel free to ping me or re-request a review when ready. |
Looking into sort performance (see #1647), I noticed that the various string containment operations where performance limiting.
Instead of doing multiple string containment call, this PR uses a compile time lookup table to turn a sequence into a bitpacked representation.
A quick benchmark on 240MB of SC2 sequences show this reduces core sort algorithm runtime (excluding minimizer download) by 42%.