-
Notifications
You must be signed in to change notification settings - Fork 2k
zfs send: PANIC - VERIFY3U(first_change, >, ranges[i]->start_blkid) failed (1 > 4) #18365
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Fedora |
| Distribution Version | 43 |
| Kernel Version | 6.19.8-200.fc43.x86_64 |
| Architecture | x86_64 |
| OpenZFS Version | zfs-2.4.99-470_g3f3cadc52 |
Describe the problem you're observing
When I was experimenting with the zfs redact feature on my production machine (zfs 2.4.1) I ran into a kernel NULL pointer dereference - zfs module then hung up, and server was in need of a power reset
I retried locally with latest commit and --enable-debug and came up with this reproducer - I fear it is not minimal, but I hope it is good enough as is... (I am not actually convinced it has anything to do with the zfs redact feature at all, but these are essentially the commands that I slow-typed, when I originally encountered the panic)
Describe how to reproduce the problem
repro () {
for i in $(seq 100); do
echo "turn $i"
fallocate -l 100M /tmp/test
zpool create -O mountpoint=none test /tmp/test
zfs create -o mountpoint=/mnt/test test/ROOT
zfs create test/ROOT/data
echo a > /mnt/test/data/a
zfs snapshot test/ROOT/data@a
echo b > /mnt/test/data/b
zfs snapshot test/ROOT/data@ab
echo c > /mnt/test/data/c
zfs snapshot test/ROOT/data@abc
rm -f /mnt/test/data/b
zfs snapshot test/ROOT/data@ac
zfs clone test/ROOT/data@ab test/ROOT/data-ab
rm -f /mnt/test/data-ab/b
zfs snapshot test/ROOT/data-ab@ab-rm-b
zfs redact test/ROOT/data@ab ab-redact-b test/ROOT/data-ab@ab-rm-b
(
set -x
zfs send test/ROOT/data@ac --redact test/ROOT/data#ab-redact-b -i test/ROOT/data@abc > /dev/null
)
zpool destroy test
rm -f /tmp/test
done
}
repro
I honest to god swear, I did not add the loop for my amusement - on my machine it usually triggers on turn 2 or 3-ish - I once saw it on the first term right away, and no later than turn 9.
Include any warning/errors/backtraces from the system logs
dmesg from local test (g3f3cadc52 with --enable-debug)
ZFS: Loaded module v2.4.99-470_g3f3cadc52 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
VERIFY3U(first_change, >, ranges[i]->start_blkid) failed (1 > 4)
PANIC at dmu_send.c:1446:find_next_range()
Showing stack for process 45908
CPU: 0 UID: 0 PID: 45908 Comm: send_merge Tainted: P OE 6.19.8-200.fc43.x86_64 #1 PREEMPT(lazy)
Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
spl_panic+0xf5/0x11a [spl]
? update_load_avg+0x84/0x400
? bqueue_dequeue+0x27/0x290 [zfs]
find_next_range+0x302/0x3a0 [zfs]
? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
send_merge_thread+0x2b8/0x440 [zfs]
? __pfx_send_merge_thread+0x10/0x10 [zfs]
? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
thread_generic_wrapper+0x67/0xb0 [spl]
kthread+0xfc/0x240
? __pfx_kthread+0x10/0x10
ret_from_fork+0x130/0x1a0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
dmesg from the server event (2.4.1 production build)
BUG: kernel NULL pointer dereference, address: 0000000000000068
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 6 UID: 0 PID: 638 Comm: dp_sync_taskq Tainted: P OE 6.18.16-1-lts #1 PREEMPT(voluntary) 674a1aec569c3e6c68f0032d072385577b88e9e1
Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: MSI MS-7823/B85M-G43 (MS-7823), BIOS V3.14B7 07/16/2018
RIP: 0010:dnode_undirty_dbufs+0x15c/0x1c0 [zfs]
Code: 51 ff 31 d2 4c 89 fe 4c 89 ef e8 9f 39 fc ff 49 8b 46 08 48 3b 04 24 74 37 49 8b 5e 08 49 2b 1e 74 2e 4c 8b 6b 20 4c 8b 7b 10 <41> 80 7d 68 00 0f 84 dd fe ff ff 48 8d bb 00 01 00 00 e8 8d fe ff
RSP: 0018:ffffcc7e42aa3c38 EFLAGS: 00010286
RAX: ffff8965c752e800 RBX: ffff8965c752e800 RCX: 000000008020001b
RDX: 0000000000000000 RSI: ffff896ab1c9ca80 RDI: ffff896ab1c9ca78
RBP: dead000000000122 R08: ffff8965dba8e800 R09: 0000000000000000
R10: ffff8965dba8e800 R11: ffff896400042b00 R12: dead000000000100
R13: 0000000000000000 R14: ffff896ab1c9ca78 R15: 00000000081619a8
FS: 0000000000000000(0000) GS:ffff896b70c6e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000068 CR3: 0000000418c24005 CR4: 00000000001726f0
Call Trace:
<TASK>
dnode_sync+0x98e/0xb00 [zfs 82a72a3d44564e4aac8b4f85a0626683232f3f9c]
? __schedule+0x420/0x1320
? dnode_multilist_index_func+0x29/0x40 [zfs 82a72a3d44564e4aac8b4f85a0626683232f3f9c]
sync_dnodes_task+0x96/0x190 [zfs 82a72a3d44564e4aac8b4f85a0626683232f3f9c]
taskq_thread+0x390/0x770 [spl d7da2364c458657400ba69334a999c60595153fd]
? __pfx_taskq_thread+0x10/0x10 [spl d7da2364c458657400ba69334a999c60595153fd]
? __pfx_default_wake_function+0x10/0x10
? __pfx_sync_meta_dnode_task+0x10/0x10 [zfs 82a72a3d44564e4aac8b4f85a0626683232f3f9c]
? __pfx_taskq_thread+0x10/0x10 [spl d7da2364c458657400ba69334a999c60595153fd]
kthread+0xfc/0x240
? __pfx_kthread+0x10/0x10
ret_from_fork+0x1c2/0x1f0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: tun overlay wireguard libcurve25519 ip6_udp_tunnel udp_tunnel intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel raid1 kvm irqbypass r8169 polyval_clmulni mei_wdt ghash_clmulni_intel realtek mei_pxp mei_hdcp ppdev intel_oc_wdt spi_nor aesni_intel at24 md_mod iTCO_wdt mdio_devres mtd rapl intel_pmc_bxt libphy mei_me iTCO_vendor_support intel_cstate mdio_bus i2c_i801 mei i2c_smbus i2c_mux parport_pc intel_uncore mxm_wmi pcspkr parport mac_hid cfg80211 rfkill nfnetlink bpf_preload zfs(POE) i915 spi_intel_platform spi_intel spl(OE) i2c_algo_bit drm_buddy ttm intel_gtt drm_display_helper video wmi cec lpc_ich
CR2: 0000000000000068
---[ end trace 0000000000000000 ]---