Fix: CRC32 mismatch when decoding backtick dynEncode strings in Bun#16
Fix: CRC32 mismatch when decoding backtick dynEncode strings in Bun#16lopadz wants to merge 2 commits intoeshaz:mainfrom
Conversation
In Bun, String.raw replaces embedded null bytes with U+FFFD. Backtick mode did not escape null, so bytes that mapped to 0 at the chosen offset were embedded raw. On decode the \uFFFD sequence parsed as 65533, which truncated to the wrong uint8 value and failed CRC32 validation. Two changes: - decode: after parseInt in the \u handler, map 0xFFFD back to 0 - dynamicEncode (backtick): add 0 to escapeBytes and shouldEscape so null is never embedded raw in newly-encoded strings
Three tests covering the Bun null-byte fix: - decode maps \uFFFD to 0 in a version-0 dynEncode string - dynamicEncode (backtick, offset 0) produces no raw null in payload - all 256 byte values round-trip through backtick encode/decode Also removes the hardcoded encoded-length assertions from the four backtick file tests (image, opus, mpeg, vorbis) since escaping null bytes changes the optimal offset and the resulting encoded size.
|
I would put in an issue to the developers of Bun. Their website states they aim for 100% NodeJS compatibility, so if this issue is real, then they would probably want to fix it. Have you tested this change to verify it solves your issue? What was the actual issue you saw with wasm-audio-decoders (which is another project I maintain) that led to this LLM response? Also, it's more helpful to ask the question directly rather than posting an LLM response. |
|
@eshaz you're right, man. I should've opened an issue first--my b! I figured updating the fix here would be faster than reporting to Bun (reported here). For context, I'm building a sort of media library app using Electorbun. In this app, I wanted to decode audio files to raw PCM samples for some analysis features I'm trying to implement. The problem I have is that, when using After some digging (with AI), I found that Bun's The last thing I want to do is upset you. So, however you would like to move forward, I'm game. Thanks for your hard work! |
|
Definitely not upset and happy that issues get reported. Did this change fix your issue though? I'm curious if you confirmed that fixed the problem for all three decoders. |
|
Yep! I verified with real audio files. OGG Vorbis, FLAC, and MP3 all decode and analyze correctly |

What's happening
wasm-audio-decodersembeds WASM binaries asdynamicEncodebacktick strings and reads them withString.raw. In Bun,String.rawdoes two things that differ from Node.js:\uXXXXform — already handled by the existing\uescape code indecode.\u0000.The backtick encoder didn't escape null bytes (unlike
"and'modes, which both list0inescapeBytes). When the chosen offset maps a source byte to0, the null ends up raw in the string. Bun substitutes U+FFFD,decodeparses it as65533, which truncates to the wrong byte, and CRC32 validation throws.Changes
decode— map0xFFFDback to0afterparseIntin the\uhandler. Backward-compatible: fixes strings encoded by any older version of this library that contains raw nulls.