Skip to content

Performance improvements by refactoring penalty and interleaveBytes#32

Merged
paulmillr merged 3 commits intopaulmillr:mainfrom
Pjb518:main
Nov 16, 2025
Merged

Performance improvements by refactoring penalty and interleaveBytes#32
paulmillr merged 3 commits intopaulmillr:mainfrom
Pjb518:main

Conversation

@Pjb518
Copy link
Copy Markdown
Contributor

@Pjb518 Pjb518 commented Nov 11, 2025

Includes a significant refactor of penalty that is much more efficient.

Also makes interleaveBytes marginally more memory and CPU efficient by calculating the resulting array size of and creating the resulting array up front to avoid dynamic resizing. The new implementation also avoids unnecessary bounding checks.

Here are some old benchmarks:

======== encode/ascii ========
encode/paulmillr x 1,998 ops/sec @ 500μs/op
encode/qrcode-generator x 2,928 ops/sec @ 341μs/op ± 1.13% (313μs..3ms)
encode/nuintun x 1,923 ops/sec @ 519μs/op ± 4.14% (418μs..9ms)
======== encode/gif ========
encode/paulmillr x 2,006 ops/sec @ 498μs/op ± 4.44% (440μs..15ms)
encode/qrcode-generator x 1,927 ops/sec @ 518μs/op
encode/nuintun x 2,324 ops/sec @ 430μs/op
======== encode: big ========
encode/paulmillr x 124 ops/sec @ 8ms/op
encode/qrcode-generator x 131 ops/sec @ 7ms/op
encode/nuintun x 166 ops/sec @ 6ms/op
======== decode ========
decode/paulmillr x 106 ops/sec @ 9ms/op ± 3.26% (8ms..24ms)
decode/jsqr x 35 ops/sec @ 28ms/op ± 3.29% (26ms..43ms)
decode/nuintun x 34 ops/sec @ 28ms/op ± 6.43% (26ms..48ms)
decode/instascan x 80 ops/sec @ 12ms/op ± 35.89% (7ms..193ms)

And here is the new version:

======== encode/ascii ========
encode/paulmillr x 3,430 ops/sec @ 291μs/op
encode/qrcode-generator x 2,995 ops/sec @ 333μs/op
encode/nuintun x 2,279 ops/sec @ 438μs/op
======== encode/gif ========
encode/paulmillr x 3,331 ops/sec @ 300μs/op
encode/qrcode-generator x 1,949 ops/sec @ 513μs/op
encode/nuintun x 2,295 ops/sec @ 435μs/op
======== encode: big ========
encode/paulmillr x 179 ops/sec @ 5ms/op
encode/qrcode-generator x 127 ops/sec @ 7ms/op ± 2.12% (7ms..17ms)
encode/nuintun x 158 ops/sec @ 6ms/op ± 1.67% (5ms..11ms)
======== decode ========
decode/paulmillr x 99 ops/sec @ 10ms/op ± 5.79% (8ms..29ms)
decode/jsqr x 34 ops/sec @ 28ms/op ± 4.48% (27ms..45ms)
decode/nuintun x 33 ops/sec @ 29ms/op ± 6.60% (26ms..49ms)
decode/instascan x 78 ops/sec @ 12ms/op ± 39.12% (7ms..211ms)

…lating the size of and creating the resulting array up front to avoid dynamic resizing.
…calculating the penalties for the various rules.
@Pjb518 Pjb518 changed the title Minor performance tweaks for interleaveBytes Performance improvements by refactoring penalty and interleaveBytes Nov 12, 2025
@paulmillr
Copy link
Copy Markdown
Owner

interleaveBytes change is great.

Not sure about the loop unroll.

Why did decode got slower? It's much more important vs encode. Is there any way to make it as fast?

@Pjb518
Copy link
Copy Markdown
Contributor Author

Pjb518 commented Nov 16, 2025

Why did decode got slower? It's much more important vs encode. Is there any way to make it as fast?

I imagine it's just variance in the benchmark. None of these changes touch decoding.

There are definitely ways to make it faster. I think significantly, but I'd have to actually try some options.

interleaveBytes change is great.

Not sure about the loop unroll.

Almost all of the performance gains are from the changes to penalty, just FYI. The impact of the interleaveBytes changes are minimal.

@paulmillr paulmillr merged commit 80751d7 into paulmillr:main Nov 16, 2025
6 checks passed
@paulmillr
Copy link
Copy Markdown
Owner

Thank you.

@paulmillr
Copy link
Copy Markdown
Owner

The PR is live in 0.5.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants