fix segfault when exitting cage with a child still present by sdumetz · Pull Request #488 · cage-kiosk/cage

sdumetz · 2026-03-18T12:33:41Z

I encountered a SEGFAULT when running a program that had bad cleanup handling. ie: it was spawning child processes that owned the (xwayland-based) window and on SIGINT was exitting before the child had time to exit properly. I hacked-together a bash program that can reproduce the behaviour :

trigger SIGSEGV with xclock - bash script

#!/usr/bin/env bash
# Minimal reproducer for cage XWayland surface destroy crash.
#
# This script sends SIGTERM only to cage, leaving xclock alive so the
# surface is still present when wlr_xwayland_destroy() is called.
#
# Usage:
#   ./reproduce_crash.sh [path-to-cage]
#
# Expected result with unfixed cage:  exit code 139 (SIGSEGV)
# Expected result with fixed cage:    exit code 0

set -euo pipefail

CAGE=${1:-./build/cage}

if ! command -v xclock &>/dev/null; then
    echo "ERROR: xclock not found. Install with: sudo apt install x11-apps" >&2
    exit 1
fi

if [[ ! -x "$CAGE" ]]; then
    echo "ERROR: cage binary not found at $CAGE" >&2
    exit 1
fi

echo "Starting cage with xclock..."
"$CAGE" -- xclock &
CAGE_PID=$!

echo "Waiting for xclock to map its window..."
sleep 3

echo "Sending SIGTERM to cage (pid $CAGE_PID) only, xclock stays alive..."
kill -TERM "$CAGE_PID" 2>/dev/null || true

wait "$CAGE_PID"
EXIT=$?

if [[ $EXIT -eq 139 ]]; then
    echo "CRASHED (exit 139, SIGSEGV) — bug reproduced"
elif [[ $EXIT -eq 0 ]]; then
    echo "Clean exit (exit 0) — bug not present or already fixed"
else
    echo "Exited with code $EXIT"
fi

This is not a very high-profile bug, because a program that handles signals properly wouldn't trigger it. And it's triggered on exit anyways so it's only a minor inconvenience.

But I suppose it can't hurt to make destroy handlers a bit more robust? Anyways, I hope this helps.

The proposed fix registers a handler to nullify scene_tree after wlroots destroys the surface. cage's destroy handler is then made conditional to prevent dereferencing the destroyed tree node.

Please let me know if I misunderstood something or if the fix is otherwise problematic: I'm still new to wlroots.

sdumetz · 2026-03-18T12:45:36Z

Note that #484 looks like it's adressing the same thing, with part of the same fix but it's not sufficient to fix my exact issue.

emersion · 2026-03-19T19:31:21Z

+	view->surface_destroy.notify = handle_surface_destroy;
+	wl_signal_add(&surface->events.destroy, &view->surface_destroy);


I'm not sure I understand. wlr_surface.events.unmap is always guaranteed to be called before destroy, so I don't know how view->scene_tree can free itself before we call wlr_scene_node_destroy().

Its entirely possible that I got things wrong. I'm not at all confident in my knowledge of the wlroots or cage codebases.

What I know is that in the described test, view_unmap does raise a SIGSEGV (at least, _on my machine_™)

Adding log lines to this patch in view_unmap, and handle_surface_destroy I see (when running with the detached child process):

00:00:02.989 [../view.c:118] surface destroy 00:00:02.989 [../view.c:137] view_unmap: scene_tree is NULL

Which if I understand correctly means either there is a bug elsewhere (in wlroots?) or unmap is not guaranteed to run before destroy?

When the surface object is destroyed, this function gets called in wlroots:

https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/334019f839bf0728d958c179aceed67e0e8db66a/types/wlr_compositor.c#L725

The first thing it does is ensure the surface is unmapped:

https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/334019f839bf0728d958c179aceed67e0e8db66a/types/wlr_compositor.c#L934

Are we sure a single surface is involved?

The script linked above doesn't reproduce the bug on my setup. cage doesn't crash, it exits cleanly.

The script linked above doesn't reproduce the bug on my setup. cage doesn't crash, it exits cleanly.

Oh that's strange. I didn't think it would be setup-dependant since I saw the issue in the wild on a brand new amd64 mini-PC and on a raspberry PI running cage over a DRM backend (admitttedly, running a modified version with a few patches) and was able to reproduce with this script on my machine (debian/KDE, wayland backend) using the master branch and default options.

The attached script seems to segfault 100% of the time, too.

When compiled in debug mode, the logs' tail looks like this:

00:00:00.202 [types/wlr_compositor.c:786] New wlr_surface 0x556959ee6d10 (res 0x556959d49bb0) 00:00:00.202 [xwayland/xwm.c:2055] New xwayland surface: 0x556959ee6d10 00:00:00.202 [xwayland/xwm.c:695] XCB_ATOM_NET_STARTUP_ID: (null) 00:00:00.203 [xwayland/xwm.c:1105] unhandled X11 property 283 (WM_STATE) for window 4194314 00:00:00.203 [backend/wayland/output.c:433] Primary buffer size mismatch 00:00:00.215 [types/scene/wlr_scene.c:2189] Direct scan-out enabled Sending SIGTERM to cage (pid 65809) only, xclock stays alive... (EE) failed to read Wayland events: Connection reset by peer ./reproduce_crash.sh : ligne 46 : 65809 Erreur de segmentation (core dumped)"$CAGE" -- xclock X connection to :1 broken (explicit kill or server shutdown).

Where pretty much everything looks ok to me, except for the core dump. Could it be a xwayland issue rather than a cage / wlroots issue?

fix segfault when exitting cage with a child still present

e32200c

emersion reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix segfault when exitting cage with a child still present#488

fix segfault when exitting cage with a child still present#488
sdumetz wants to merge 1 commit intocage-kiosk:masterfrom
Holusion:child_segfault

sdumetz commented Mar 18, 2026 •

edited

Loading

Uh oh!

sdumetz commented Mar 18, 2026

Uh oh!

emersion Mar 19, 2026

Uh oh!

sdumetz Mar 20, 2026

Uh oh!

emersion Mar 29, 2026

Uh oh!

sdumetz Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		view->surface_destroy.notify = handle_surface_destroy;
		wl_signal_add(&surface->events.destroy, &view->surface_destroy);

Conversation

sdumetz commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdumetz commented Mar 18, 2026

Uh oh!

emersion Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

sdumetz Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

emersion Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

sdumetz Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sdumetz commented Mar 18, 2026 •

edited

Loading

sdumetz Apr 3, 2026 •

edited

Loading