rt: shard the multi-thread inject queue to reduce remote spawn contention by alex · Pull Request #7973 · tokio-rs/tokio

alex · 2026-03-13T23:44:47Z

The multi-threaded scheduler's inject queue was protected by a single global mutex (shared with idle coordination state). Every remote task spawn — any spawn from outside a worker thread — acquired this lock, serializing concurrent spawners and limiting throughput.

This change introduces inject::Sharded, which splits the inject queue into up to 8 independent shards, each an existing Shared/Synced pair with its own mutex and cache-line padding.

Design:

Push: each thread is assigned a home shard on first push (via a global counter) and sticks with it. This keeps consecutive pushes from one thread cache-local while spreading distinct threads across distinct locks.
Pop: workers rotate through shards starting at their own index, skipping empty shards via per-shard atomic length. pop_n drains from one shard at a time to keep critical sections bounded.
Shard count: capped at 8 (and 1 under loom). Contention drops off steeply past a handful of shards, and is_empty()/len() scan all shards in the worker hot loop.
is_closed: a single Release atomic set after all shards are closed, so the shutdown check stays lock-free.

Random shard selection via context::thread_rng_n (as used in #7757 for the blocking pool) was measured and found to be 20-33% slower on remote_spawn at 8+ threads. The inject workload is a tight loop of trivial pushes where producer-side cache locality dominates: with RNG, a hot thread bounces between shard cache lines on every push; with sticky assignment it stays hot on one mutex and list tail. RNG did win slightly (5-9%) on single-producer benchmarks where spreading tasks lets workers pop in parallel, but not enough to offset the regression at scale.

The inject state is removed from the global Synced mutex, which now only guards idle coordination. This also helps the single-threaded path since remote pushes no longer contend with worker park/unpark.

Results on remote_spawn benchmark (12,800 no-op tasks, N spawner threads, 64-core box):

threads before after improvement
1 9.38 ms 7.33 ms -22%
2 14.94 ms 6.64 ms -56%
4 23.69 ms 5.34 ms -77%
8 34.81 ms 4.69 ms -87%
16 32.33 ms 4.54 ms -86%
32 30.37 ms 4.73 ms -84%
64 26.59 ms 5.34 ms -80%

rt_multi_threaded benchmarks: spawn_many_local -8%, spawn_many_remote_idle -7%, yield_many -1%, rest neutral.

Developed in conjunction with Claude.

…tion The multi-threaded scheduler's inject queue was protected by a single global mutex (shared with idle coordination state). Every remote task spawn — any spawn from outside a worker thread — acquired this lock, serializing concurrent spawners and limiting throughput. This change introduces `inject::Sharded`, which splits the inject queue into up to 8 independent shards, each an existing `Shared`/`Synced` pair with its own mutex and cache-line padding. Design: - Push: each thread is assigned a home shard on first push (via a global counter) and sticks with it. This keeps consecutive pushes from one thread cache-local while spreading distinct threads across distinct locks. - Pop: workers rotate through shards starting at their own index, skipping empty shards via per-shard atomic length. pop_n drains from one shard at a time to keep critical sections bounded. - Shard count: capped at 8 (and 1 under loom). Contention drops off steeply past a handful of shards, and is_empty()/len() scan all shards in the worker hot loop. - is_closed: a single Release atomic set after all shards are closed, so the shutdown check stays lock-free. Random shard selection via context::thread_rng_n (as used in tokio-rs#7757 for the blocking pool) was measured and found to be 20-33% slower on remote_spawn at 8+ threads. The inject workload is a tight loop of trivial pushes where producer-side cache locality dominates: with RNG, a hot thread bounces between shard cache lines on every push; with sticky assignment it stays hot on one mutex and list tail. RNG did win slightly (5-9%) on single-producer benchmarks where spreading tasks lets workers pop in parallel, but not enough to offset the regression at scale. The inject state is removed from the global Synced mutex, which now only guards idle coordination. This also helps the single-threaded path since remote pushes no longer contend with worker park/unpark. Results on remote_spawn benchmark (12,800 no-op tasks, N spawner threads, 64-core box): threads before after improvement 1 9.38 ms 7.33 ms -22% 2 14.94 ms 6.64 ms -56% 4 23.69 ms 5.34 ms -77% 8 34.81 ms 4.69 ms -87% 16 32.33 ms 4.54 ms -86% 32 30.37 ms 4.73 ms -84% 64 26.59 ms 5.34 ms -80% rt_multi_threaded benchmarks: spawn_many_local -8%, spawn_many_remote_idle -7%, yield_many -1%, rest neutral. Developed in conjunction with Claude.

ADD-SP · 2026-03-14T03:11:28Z

Is the inject queue still a FIFO queue?

alex · 2026-03-14T03:12:56Z

Approximately -- each of the queue shards is FIFO, but nothing attempts to ensure ordering cross-shard.

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR labels Mar 13, 2026

alex force-pushed the shard-remote-lock branch 2 times, most recently from e4fa3a2 to adfa19f Compare March 13, 2026 23:54

alex force-pushed the shard-remote-lock branch from adfa19f to de52c66 Compare March 13, 2026 23:59

ADD-SP added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime T-performance Topic: performance and benchmarks labels Mar 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rt: shard the multi-thread inject queue to reduce remote spawn contention#7973

rt: shard the multi-thread inject queue to reduce remote spawn contention#7973
alex wants to merge 1 commit intotokio-rs:masterfrom
alex:shard-remote-lock

alex commented Mar 13, 2026

Uh oh!

ADD-SP commented Mar 14, 2026

Uh oh!

alex commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alex commented Mar 13, 2026

Uh oh!

ADD-SP commented Mar 14, 2026

Uh oh!

alex commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants