The msgpack length field decoding at src/ucl_msgpack.c:964-970 reads uint16_t/uint32_t/uint64_t through a cast on a potentially unaligned pointer. This is UB per the C standard and causes SIGBUS on strict-alignment platforms (ARM, SPARC). The int/float parsers later in the same file already do this correctly via memcpy; the length-parsing path just wasn't updated to match. Affects 0.9.4 and earlier.
Description
The length fields are read with direct casts:
// ucl_msgpack.c:964-970
case 2:
len = FROM_BE16(*(uint16_t *) p); // unaligned
break;
case 4:
len = FROM_BE32(*(uint32_t *) p); // unaligned
break;
case 8:
len = FROM_BE64(*(uint64_t *) p); // unaligned
break;
The pointer p points to the byte immediately after the type byte in the msgpack stream. Since msgpack is a byte-packed format, p has no alignment guarantee.
Notably, the integer/float value parsers in the same file already use the correct pattern:
// ucl_msgpack.c:1402 (correct)
memcpy(&iv16, pos, sizeof(iv16));
iv16 = FROM_BE16(iv16);
Reproduction
UBSAN detects this on msgpack input where a str16, str32, bin16, bin32, array16, array32, map16, or map32 length field falls at an unaligned offset. This requires a preceding entry to push the length field to an odd byte boundary.
Minimal reproducer (str16 key at unaligned offset):
#include <ucl.h>
int main(void) {
/* msgpack: fixmap(2){
* fixstr(1)"a" : fixint(1), -- 3 bytes, pushes offset to odd
* str16(3)"key" : fixint(2) -- str16 length at byte offset 5 (odd!)
* }
* Layout: 82 a1 61 01 da 00 03 6b 65 79 02
* ^^ ^^ ^^-- length field at offset 5 (unaligned) */
unsigned char data[] = {
0x82, /* fixmap, 2 entries */
0xa1, 0x61, /* fixstr(1) "a" */
0x01, /* fixint 1 */
0xda, 0x00, 0x03, /* str16, length 3 (length bytes at offset 5-6) */
0x6b, 0x65, 0x79, /* "key" */
0x02 /* fixint 2 */
};
struct ucl_parser *p = ucl_parser_new(0);
ucl_parser_add_chunk_full(p, data, sizeof(data), 0,
UCL_DUPLICATE_APPEND, UCL_PARSE_MSGPACK);
ucl_object_t *obj = ucl_parser_get_object(p);
if (obj) ucl_object_unref(obj);
ucl_parser_free(p);
return 0;
}
Build with UBSAN (requires both library and test compiled with -fsanitize=alignment):
cmake -B build -DCMAKE_C_COMPILER=clang \
-DCMAKE_C_FLAGS="-fsanitize=alignment -fno-sanitize-recover=all -g"
cmake --build build
clang -fsanitize=alignment -fno-sanitize-recover=all -g -I include \
-o repro repro.c build/libucl.a
./repro
# runtime error: load of misaligned address 0x... for type 'uint16_t',
# which requires 2 byte alignment
# ucl_msgpack.c:964
Suggested Fix
Replace the pointer casts with memcpy, matching the pattern already used in the int/float parsers:
case 2: {
uint16_t tmp;
memcpy(&tmp, p, sizeof(tmp));
len = FROM_BE16(tmp);
break;
}
// Similarly for case 4 and case 8
Happy to send a PR if that helps.
The msgpack length field decoding at
src/ucl_msgpack.c:964-970readsuint16_t/uint32_t/uint64_tthrough a cast on a potentially unaligned pointer. This is UB per the C standard and causes SIGBUS on strict-alignment platforms (ARM, SPARC). The int/float parsers later in the same file already do this correctly viamemcpy; the length-parsing path just wasn't updated to match. Affects 0.9.4 and earlier.Description
The length fields are read with direct casts:
The pointer
ppoints to the byte immediately after the type byte in the msgpack stream. Since msgpack is a byte-packed format,phas no alignment guarantee.Notably, the integer/float value parsers in the same file already use the correct pattern:
Reproduction
UBSAN detects this on msgpack input where a
str16,str32,bin16,bin32,array16,array32,map16, ormap32length field falls at an unaligned offset. This requires a preceding entry to push the length field to an odd byte boundary.Minimal reproducer (str16 key at unaligned offset):
Build with UBSAN (requires both library and test compiled with
-fsanitize=alignment):Suggested Fix
Replace the pointer casts with
memcpy, matching the pattern already used in the int/float parsers:Happy to send a PR if that helps.