Skip to content

Add an atomically shared allocation limit#2708

Open
197g wants to merge 3 commits intomainfrom
shared-limits
Open

Add an atomically shared allocation limit#2708
197g wants to merge 3 commits intomainfrom
shared-limits

Conversation

@197g
Copy link
Member

@197g 197g commented Dec 26, 2025

With our Limits attributes we encountered issues with using them in a more diverse encoder setting. We often want to share the limits between different components, not all of which are immediately under our control. With the current interface that means predicting, in advance, exactly how many bytes will be needed for each split or the allocation limits would be copied. We can avoid this by a shared atomic variable.

Obviously Limit can then no longer be PartialEq, Eq or Hash but it seems this was only motivated through its use in an error. That part of the interface should be fixed regardless of the changes to Limits.

@197g 197g changed the title Add an atomically shared allocation amount Add an atomically shared allocation limit Dec 26, 2025
@fintelia
Copy link
Contributor

fintelia commented Dec 26, 2025

I really like the idea of using an Arc<AtomicU64> to share the allocation limit. That also elegantly handles the case of the decoder allocating space for a EXIF / ICC chunk and the caller freeing it.

I'm always hesitant to add new types to our API since every additional type clutters up our docs page regardless of how niche its use-case is (making it harder for new users to find what they need). But here I think it is possible to actually make the API simpler instead...

How would you feel about replacing ImageDecoder::set_limits with a ImageDecoder::set_allocation_limit method that took only an Arc<AtomicU64>? That's the only one of our limits that decoders themselves are expected to implement. Then the higher-level ImageReader could expose a public API that took the current Limits struct while translating that into checks against the image metadata and holding a copy of the Arc.

@197g
Copy link
Member Author

197g commented Dec 27, 2025

I do like the struct on which we can add utility methods, yet agreed we should encourage copying that Arc into different places anyways. There's also at least some reasons for preferring an owned capacity value rather than an Arc (as seen in the rather odd update loop). An alternate design would be wrapping it in an RwLock like so:

    pub shared_alloc: Option<Arc<RwLock<AtomicU64>>>,

That would allow falling back to write() after a cycle of too many failed fetch_update calls and then this could do a non-atomic update to resolve its situation. Still somewhat fast on the common path. There is no guarantee on the policy of RwLock though, it's not optimal..

@fintelia
Copy link
Contributor

One annoyance is that any utility methods we add won't be available to the decoder backends (since the png, tiff, etc. crates cannot depend on image).

How much of a concern is spinning within fetch_update? My understanding of the docs is that it can only block while the value is being continuously updated by another thread. But poor behavior by other threads would be equally problematic with a Mutex / RwLock so not something to worry about. If we did have sufficient guarantees of forward progress, then the entire update logic should just be fetch_update(..., u64::checked_sub) and fetch_update(..., u64::saturating_add) so we could avoid needing helper methods for it.

@197g
Copy link
Member Author

197g commented Dec 27, 2025

It's probably not super realistic issue, but if one thread repeatedly tries to allocate, realizes it won't be enough or some other hazard, thus returns it to try again, then said constant atomic accesses might starve other threads from ever succeeding in their compxchg. I could see someone writing code like that.. This would be okay if the consumption was monotonic but free is odd in this regard and means we don't necessarily ever terminate such a live-locked loop by success or error.

Mutex/RwLock on the other hand at least involve the operating system that likely provides a form of fairness here.

Maybe we provide strict guidance to implement any free only as returning memory to the local pool (as we do for now). It'll make this much less of an issue, alloc monotonic, and I don't see any bad failure modes. But in the end, we are writing an allocator. I'd be okay shipping this as a v1 of shared allocation for a good while, I think.

So: let me adjust it for more simplicity and documentation.

@197g
Copy link
Member Author

197g commented Dec 27, 2025

Also: we should drop Clone in favor of split(n: usize) that takes a part of the local allocation with it.

@mstoeckl
Copy link
Contributor

With our Limits attributes we encountered issues with using them in a more diverse encoder setting. We often want to share the limits between different components, not all of which are immediately under our control.

Could you give an example of a program that would use the new API?

In general I expect memory limits to be useful for programs that could operate on adversarial or very unusual images, like image viewers or servers for image processing tasks. (Batch processing and one-shot conversions might use process-level memory limits; video games loading known images wouldn't need limits.) Both image viewers and servers may try to load many images using multiple threads, for which a shared AtomicU64 memory usage estimate could indeed be convenient; but they are also long running, so that after displaying or processing an image the memory used for it would need to be reused. With the current way this PR privatizes the shared limit and then never returns memory back to the shared limit, I don't see how a user could use a single global shared memory limit and load many images without the shared limit running out.

But in the end, we are writing an allocator

The most general allocation accounting API would probably be a pair of callback functions to request and release memory....

@197g
Copy link
Member Author

197g commented Dec 28, 2025

Could you give an example of a program that would use the new API?

ImageReader. It's allocating the DynamicImage it returns, the ImageDecoder layer may allocate various things (e.g. gif, png) and the underlying library that we call into may as well. Plus if we want to get serious about animations we should give ImageReader a pool to fetch allocated frames from to avoid pointless re-allocation and initialization costs. Then the pool owns the memory associated with its pooled frames which is essentially private to the rest of the system. How it allocates things internally is an orthogonal concern, I think.

Pretty much every program mutating images is adversarial unless you suggest they somehow get magically validated beforehand.

@197g
Copy link
Member Author

197g commented Dec 28, 2025

The most general allocation accounting API would probably be a pair of callback functions to request and release memory....

Yes but pragmatically we're dealing with std::vec::Vec and others most of the time; and we don't have a stabilized allocator api so dealing with raw pointers to actual extents of memory is not going to happen.

@197g 197g mentioned this pull request Dec 31, 2025
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants