gthread worker deadlocks during startup in 25.1.0 (regression from 25.0.3) #3509
Replies: 5 comments 1 reply
-
Root Cause Analysis (from Claude Code)After diffing the package contents of 25.0.3 and 25.1.0, the issue appears to be a fork-after-thread deadlock introduced by the new Control Socket feature. What changed in 25.1.0In # arbiter.setup() — runs BEFORE workers are spawned:
self._start_control_server() # NEW in 25.1.0
self.cfg.when_ready(self)
_start_control_server() spins up a background thread running an asyncio event loop in the master process:
# ctl/server.py
def start(self):
self._running = True
self._thread = threading.Thread(target=self._run_loop, daemon=True)
self._thread.start()
def _run_loop(self):
asyncio.run(self._serve()) # asyncio event loop in a thread
Why this causes the deadlock
This is the classic problem described in Python's os.fork() docs: "safely forking a multithreaded process is problematic."
The sequence:
1. Master calls _start_control_server() → starts a background thread with an asyncio event loop (which acquires internal mutexes/locks)
2. Master forks the worker process
3. The child inherits all mutex/lock state, but not the threads that hold them
4. When the gthread worker tries to initialize its thread pool, it attempts to acquire a lock that was held by the now-nonexistent control socket thread → futex deadlock
This explains all observed behavior
┌─────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────┐
│ Observation │ Explanation │
├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ Worker stuck on futex_do_wait with 1 thread │ Deadlocked before creating gthread thread pool │
├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ No "Booting worker with pid" log │ Worker never finished initialization │
├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ 25.0.3 worked fine │ No pre-fork threads existed │
├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ Fresh Gunicorn inside stuck container works │ New process, no inherited lock state │
├─────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ Restart sometimes recovers │ Timing-dependent — if fork happens before asyncio acquires certain locks, the child survives │
└─────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘
Suggested fix
The control socket server should be started after workers are forked, not during setup().
Workaround
Either pin to gunicorn==25.0.3 or use --no-control-socket to disable the feature. |
Beta Was this translation helpful? Give feedback.
-
|
I think we started experiencing something like that using Gevent workers and a single process. Sometimes Gunicorn fails to start upon booting a cloudinit instance/server. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, we're having a similar issue using Gevent workers and a single process/worker also. Using |
Beta Was this translation helpful? Give feedback.
-
|
did you tested #3520 ? |
Beta Was this translation helpful? Give feedback.
-
|
The good news is I have an explanation of why I've aged 2 years in the last two weeks, staring at logs and reconfiguring everything! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Type
Bug Report
Description
The gthread worker deadlocks during initialization in Gunicorn 25.1.0. The master process starts normally, but the worker process never finishes booting — the expected "Booting worker with pid" message never appears. The worker has only 1 thread (instead of the expected 5 for gthread with --threads 4), is blocked on
futex_do_wait, and never callsaccept(). Because the master has bound the socket, incoming TCP connections time out rather than being refused, making the server appear running but completely unresponsive.This is a regression. The same application on the same Python version runs without issues on 25.0.3.
Steps to Reproduce (for bugs)
threading.Thread(worker loop)threading.Timer(scheduled cleanup task)gunicorn --workers 1 --threads 4 --bind 0.0.0.0:8080 --log-level info --timeout 0 server:appReproduced at 100% rate across 2 consecutive container starts (automated via Watchtower). A manual
docker compose restartresolves the deadlock each time — the new worker boots successfully.Notably, within the same stuck container:
python3.14 -c "import server"completes fine (module loads, threads start normally)Configuration
Logs / Error Output
Gunicorn Version
25.1.0
Python Version
3.14.3
Worker Class
gthread
Operating System
Ubuntu Server 24.04 in Docker
Additional Context
The WSGI app (Flask) creates two background threads during module import:
This pattern has worked on all prior Gunicorn versions. The thread creation during module import may interact poorly with changes in 25.1.0's gthread worker initialization sequence.
Workaround: pin to gunicorn==25.0.3.
Checklist
Beta Was this translation helpful? Give feedback.
All reactions