Skip to content

core: monitor replug workaround for nvidia#845

Merged
PointerDilemma merged 1 commit intohyprwm:mainfrom
PointerDilemma:main
Sep 4, 2025
Merged

core: monitor replug workaround for nvidia#845
PointerDilemma merged 1 commit intohyprwm:mainfrom
PointerDilemma:main

Conversation

@PointerDilemma
Copy link
Collaborator

@PointerDilemma PointerDilemma commented Aug 5, 2025

So I have been investigating #793 (Which has a confusing title, but I think all reports are caused by monitor re-plugging). Edit: wrong issue. I wanted to link this one: #695. They might be the same though. I think both probably have nothing to do with suspend per se.

I was able to reproduce the issue on this system:

Hyprland 0.50.0 built from branch main at commit 2859f1b795e1e772e9fc2132708ae03cd23ca39b  (keybinds: use the triggering keyboard for repeat timings (11309)).
Date: Tue Aug 5 15:54:55 2025
Tag: v0.50.0-68-g2859f1b7, commits: 6347
built against:
 aquamarine 0.9.2
 hyprlang 0.6.3
 hyprutils 0.8.2
 hyprcursor 0.1.13
 hyprgraphics 0.1.5


no flags were set


System Information:
System name: Linux
Node name: desktop
Release: 6.15.8-200.fc42.x86_64
Version: #1 SMP PREEMPT_DYNAMIC Thu Jul 24 13:26:52 UTC 2025


GPU information: 
0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1) (prog-if 00 [VGA controller])
NVRM version: NVIDIA UNIX x86_64 Kernel Module  575.64.05  Fri Jul 18 16:01:21 UTC 2025


os-release: NAME="Fedora Linux"
VERSION="42 (Workstation Edition)"
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=42
VERSION_CODENAME=""
PLATFORM_ID="platform:f42"
PRETTY_NAME="Fedora Linux 42 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:42"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f42/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=42
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=42
SUPPORT_END=2026-05-13
VARIANT="Workstation Edition"
VARIANT_ID=workstation


plugins:

Explicit sync: supported
GL ver: 3.2
Backend: drm

Monitor info:
	Panel DP-2: 3840x2160, DP-2 Samsung Electric Company U28E850 HTPK600065 -> backend drm
		explicit ✔️
		edid:
			hdr ❌
			chroma ✔️
			bt2020 ❌
		vrr capable ❌
		non-desktop ❌
		

I can get back to it tomorrow and hopefully I will find a fix for the issue.

Somehow we are failing to create and render to a new session lock surface after the previous one has been destroyed.

This issue does not occur with swaylock and is reproducible in hyprland and sway.

@PointerDilemma
Copy link
Collaborator Author

session lock surface never gets mapped.

We do successfully create a new EGLSurface and then proceed to call eglSwapBuffers after the reconnect, but the wayland debug log from hyprlock shows there is no wl_surface.attach or wl_surface.commit.

So it is some egl related issue. I tried to recreate the entire egl context on monitor disconnect, but that also didn't solve it.

@PointerDilemma
Copy link
Collaborator Author

eglSwapBuffers returns EGL_FALSE

@PointerDilemma
Copy link
Collaborator Author

Error is EGL_BAD_SURFACE

@PointerDilemma
Copy link
Collaborator Author

Hmm didnt manage to solve it. I think its a NVIDIA/egl-wayland issue, but i am not sure. (I found some refcount issue in the eglSwapBuffer implementation there, but might be unrelated)

I cant get back to the machine i can reproduced it on for two weeks, so sadly this is all i got for now.

@PointerDilemma PointerDilemma changed the title [WIP Tracking] lockSurface: fix monitor replugging issues on nvidia lockSurface: fix monitor replugging issues on nvidia Aug 26, 2025
@PointerDilemma
Copy link
Collaborator Author

Ok so this PR fixes the issue now.

Not sure if I mentioned it: It only happens when the last monitor is disconnected and reconnected.
Having an additional monitor that is not turned off magically makes it a non-issue. I think that hints towards it actually being an issue in the nvidia egl wayland stack.
Tested with nvidia 580.76.05 (lastest at time of writing this).

The fix is done by recreating the rendering context when the last monitor gets disconnected. This way we don't get BAD_SURFACE on eglSwapBuffers after creating a clean new surface+eglWindow+eglSurface and finally lockSurface for the output that gets readded.
I think we clean up everything correctly on the disconnect (verified via printf debugging that its not an issue with some SP still having a reference, verified the locksurface gets deleted on the compositor side;).
I also spent quite a bit time trying to find another way to fix it and verifying that we are using the APIs correctly , but I didn't find anything.

@PointerDilemma
Copy link
Collaborator Author

@vaxerski meat an potato of this is b20aacd

What would you think? Is this an OK workaround for now? Should we make it Nvidia exclusive?
Any other ideas? I think it is tied to some refcount in egl-wayland and fixing it requires us to nuke eglDisplay and creating a fresh one.

@vaxerski
Copy link
Member

vaxerski commented Aug 28, 2025

I mean if it works it works - but the context should absolutely not be lost when all monitors die. If it is, that's a driver bug.

You ideally would just reuse your current context to take a screenshot once a new monitor shows up

I would make it nvidia exclusive though yes

@xenia-foxtrot
Copy link

I've continued to have this issue, just haven't had the time to investigate. I'd be happy to test out this patch.

@eichelberger-c
Copy link

Previously I was experiencing the issue when my monitor would turn off, hyprlock would show the crash screen every time. I have been using the fix in this PR for about a week and have had zero issues since then.

@w1zpony
Copy link

w1zpony commented Sep 4, 2025

Does this pr only apply to NVIDIA? I’m using an Intel iGPU laptop , and I also run into this issue when plugging and unplugging the monitor.

@PointerDilemma PointerDilemma force-pushed the main branch 4 times, most recently from 075615d to dfcfe42 Compare September 4, 2025 06:46
This is a workaround for nvidia that can hopefully be removed at some point.
@PointerDilemma PointerDilemma marked this pull request as ready for review September 4, 2025 06:58
@PointerDilemma PointerDilemma changed the title lockSurface: fix monitor replugging issues on nvidia core: monitor replug workaround for nvidia Sep 4, 2025
@PointerDilemma
Copy link
Collaborator Author

I made this specifically for nvidia. We will find out if it also fixes @w1zpony's issue in #779 hopefully.
If so, this is not nvidia exclusive and we are doing something wrong.

I will still merge this as is for nvidia only because it is definitely an improvement. But I will try to stay on this issue as I want to get rid of this workaround at some point. Cheers!

@PointerDilemma PointerDilemma merged commit 04cfdc4 into hyprwm:main Sep 4, 2025
1 check passed
@PointerDilemma
Copy link
Collaborator Author

I messed something up.
I am back on the nvidia machine, but the workaround doesn't work anymore.

@MalteKW
Copy link

MalteKW commented Oct 3, 2025

Fixed it for me, thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants