util: add AtomicBool for lock-free is_cancelled check#7874
util: add AtomicBool for lock-free is_cancelled check#7874HueCodes wants to merge 4 commits intotokio-rs:masterfrom
Conversation
Add an AtomicBool field to TreeNode that mirrors the is_cancelled state. The atomic is updated with Release ordering when cancellation occurs, and read with Acquire ordering in is_cancelled(), providing lock-free reads without mutex contention. Fixes: tokio-rs#7775
There was a problem hiding this comment.
Please delete this field to avoid duplication.
Remove the `is_cancelled: bool` field from `Inner` to avoid duplication with the `AtomicBool` on `TreeNode`. All reads and writes now go through the atomic. The atomic stores are moved before the lock drop so that locked readers see a consistent value.
| /// Returns whether or not the node is cancelled | ||
| pub(crate) fn is_cancelled(node: &Arc<TreeNode>) -> bool { | ||
| node.inner.lock().unwrap().is_cancelled | ||
| node.is_cancelled.load(Ordering::Acquire) |
There was a problem hiding this comment.
Hmm. Acquire is not actually strong enough here. When WaitForCancellationFuture::poll() returns pending in parallel with being woken up, we currently rely on the following logic:
- Inside
WaitForCancellationFuture, we create aNotifiedfuture. - Inside
WaitForCancellationFuture, we readfalsefromis_cancelled. - Inside
CancellationToken::cancel(), we writetruetois_cancelled. - Inside
CancellationToken::cancel(), we callnotify_waiters().
And here it is really important that notify_waiters() will wake up the Notified future created in step 1. Otherwise the WaitForCancellationFuture might sleep forever. With a mutex, that's fine, but with the current atomics, it's not actually satisfied!
That's because what we need is that step 2 happens-before step 3. However, for that to be true, operation 2 must be Release, and operation 3 must be Acquire. Unfortunately, it's reversed! Operation 2 is Acquire, and operation 3 is Release.
Unfortunately, a load() can't be Release and a store() can't be Acquire, so we have to upgrade them to read-modify-write operations for this to work:
- Use
AtomicBool::swap(true, AcqRel)to writetrue. - Use
AtomicBool::compare_exchange(false, false, AcqRel, Acquire)to read. (fetch_and(true, AcqRel)would also work)
There was a problem hiding this comment.
Or we can just take the mutex in WaitForCancellationFuture::poll(), and leave the AtomicBool optimization for users of is_cancelled() only.
There was a problem hiding this comment.
Thanks again for the review and explaining the memory order issue. I am learning a ton.
I've gone with your simpler suggestion and added a separate
is_cancelled_with_lock() function that takes the mutex for proper synchronization in
WaitForCancellationFuture::poll(), while keeping the lock-free AtomicBool path for direct
is_cancelled() calls.
I've also just pushed a fix for the cargo-spellcheck CI failures
Let me know if you need anything else.
There was a problem hiding this comment.
IMO using both AtomicBool and Mutex for the same purpose but for different use cases is not a good idea. It makes the reasoning harder - should I use the atomics based solution or the mutex one for use case 3 (a use case from the future) ?!
If it could be implemented fully with atomics (swap + compare_exchange) it would be better!
A good start would be to add an integration (or bench) test that reproduces the problem with Acquire+Release and then fixing it by using swap+compare_exchange.
There was a problem hiding this comment.
What do you mean by a use-case from the future? As #7775 indicates, there are users today that call is_cancelled() instead of .awaiting a future, and such users would get the benefit today.
There was a problem hiding this comment.
and such users would get the benefit today
This is clear!
What bothers me is that some use cases could use is_cancelled() (lock-free, for both external and internal users) and other use cases (internal user) like WaitForCancellationFuture::poll() can't and they need to depend on the Mutex.
With a "use case from the future" I mean the second - a new Tokio internal functionality that needs to know whether a token is cancelled or not. What should the developer use - is_cancelled() or is_cancelled_with_lock() ?! I guess the answer is it depends.
But if by using swap+compare_exchange there won't be a need of is_cancelled_with_lock() then the decision will be simple.
There was a problem hiding this comment.
if there are any changes I should make let me know!
Add is_cancelled_with_lock() method that acquires the mutex before checking the atomic flag. Use this in WaitForCancellationFuture::poll() and WaitForCancellationFutureOwned::poll() to ensure proper memory ordering guarantees. The original Acquire/Release atomic ordering was insufficient to guarantee that notify_waiters() would wake the Notified future created in poll(). The mutex lock ensures the required happens-before relationship between the is_cancelled check and the cancellation flag write. The lock-free atomic optimization remains in place for general is_cancelled() calls, preserving the fix for issue tokio-rs#7775.
Fix cargo-spellcheck CI failures by wrapping type and method names (WaitForCancellationFuture::poll(), Mutex) in backticks so they are ignored by the spellchecker.
Motivation
CancellationToken::is_cancelled()acquires a mutex on every call, causing >2ms delays on real-time systems.Fixes: #7775
Solution
Add an
AtomicBoolfield toTreeNodethat mirrors theis_cancelledstate. The atomic is updated withReleaseordering when cancellation occurs, and read withAcquireordering inis_cancelled(), providing lock-free reads without mutex contention.