Skip to content

Fix: CRC32 mismatch when decoding backtick dynEncode strings in Bun#16

Open
lopadz wants to merge 2 commits intoeshaz:mainfrom
lopadz:fix/bun-null-byte-crc-mismatch
Open

Fix: CRC32 mismatch when decoding backtick dynEncode strings in Bun#16
lopadz wants to merge 2 commits intoeshaz:mainfrom
lopadz:fix/bun-null-byte-crc-mismatch

Conversation

@lopadz
Copy link
Copy Markdown

@lopadz lopadz commented Feb 27, 2026

What's happening

wasm-audio-decoders embeds WASM binaries as dynamicEncode backtick strings and reads them with String.raw. In Bun, String.raw does two things that differ from Node.js:

  1. Every non-ASCII byte (128–255) gets expanded to \uXXXX form — already handled by the existing \u escape code in decode.
  2. Embedded null bytes (U+0000) become U+FFFD (replacement character) instead of \u0000.

The backtick encoder didn't escape null bytes (unlike " and ' modes, which both list 0 in escapeBytes). When the chosen offset maps a source byte to 0, the null ends up raw in the string. Bun substitutes U+FFFD, decode parses it as 65533, which truncates to the wrong byte, and CRC32 validation throws.

Changes

decode — map 0xFFFD back to 0 after parseInt in the \u handler. Backward-compatible: fixes strings encoded by any older version of this library that contains raw nulls.

byte = parseInt(string.substring(i + 2, i + 6), 16);
if (byte === 0xfffd) byte = 0;

In Bun, String.raw replaces embedded null bytes with U+FFFD. Backtick
mode did not escape null, so bytes that mapped to 0 at the chosen offset
were embedded raw. On decode the \uFFFD sequence parsed as 65533, which
truncated to the wrong uint8 value and failed CRC32 validation.

Two changes:
- decode: after parseInt in the \u handler, map 0xFFFD back to 0
- dynamicEncode (backtick): add 0 to escapeBytes and shouldEscape so
  null is never embedded raw in newly-encoded strings
Three tests covering the Bun null-byte fix:
- decode maps \uFFFD to 0 in a version-0 dynEncode string
- dynamicEncode (backtick, offset 0) produces no raw null in payload
- all 256 byte values round-trip through backtick encode/decode

Also removes the hardcoded encoded-length assertions from the four
backtick file tests (image, opus, mpeg, vorbis) since escaping null
bytes changes the optimal offset and the resulting encoded size.
@eshaz
Copy link
Copy Markdown
Owner

eshaz commented Feb 28, 2026

I would put in an issue to the developers of Bun. Their website states they aim for 100% NodeJS compatibility, so if this issue is real, then they would probably want to fix it. Have you tested this change to verify it solves your issue?

Screenshot_20260227_185043_Chrome

What was the actual issue you saw with wasm-audio-decoders (which is another project I maintain) that led to this LLM response? Also, it's more helpful to ask the question directly rather than posting an LLM response.

@lopadz
Copy link
Copy Markdown
Author

lopadz commented Feb 28, 2026

@eshaz you're right, man. I should've opened an issue first--my b! I figured updating the fix here would be faster than reporting to Bun (reported here).

For context, I'm building a sort of media library app using Electorbun. In this app, I wanted to decode audio files to raw PCM samples for some analysis features I'm trying to implement.

The problem I have is that, when using mpg123-decoder, @wasm-audio-decoders/flac, and @wasm-audio-decoders/ogg-vorbis, all 3 throw "Decode failed crc32 validation" on startup. The decoders never load.

After some digging (with AI), I found that Bun's String.raw does 2 things differently from Node.js when reading template literals. And since the wasm decoders are using the decode() from simple-yec at runtime, I thought a quick fix would be helpful.

The last thing I want to do is upset you. So, however you would like to move forward, I'm game.

Thanks for your hard work!

@eshaz
Copy link
Copy Markdown
Owner

eshaz commented Feb 28, 2026

Definitely not upset and happy that issues get reported. Did this change fix your issue though? I'm curious if you confirmed that fixed the problem for all three decoders.

@lopadz
Copy link
Copy Markdown
Author

lopadz commented Feb 28, 2026

Yep! I verified with real audio files. OGG Vorbis, FLAC, and MP3 all decode and analyze correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants