Auto-Scaling Memory #849

neilSchroeder · 2025-12-05T21:53:47Z

neilSchroeder
Dec 5, 2025

So there's a use case I have for processing a whole bunch of different sizes of chunked data using the same formula. Right now the best option I seem to have for updating my Spec programmatically with the correct memory constraints needed for the job is to deliberately set it low, check the projected memory, update the spec, check the projected memory is still valid, compute. A loop that looks a bit like this:

    memory_mb = 100
    for attempt in range(max_retries):
        spec, _ = _create_spec(memory_mb)
        print(f"Attempt {attempt + 1}: Using {memory_mb} MB memory")

        t2 = cubed.from_zarr(t2_store, spec=spec)
        rh = cubed.from_zarr(rh_store, spec=spec)

        # Compute dew point using Cubed's lazy array API
        es = 0.611 * xp.exp(5423 * ((1 / 273) - (1 / t2)))  # saturation vapor pressure
        e_vap = (es * rh) / 100.0  # vapor pressure
        tdps = ((1 / 273) - 0.0001844 * xp.log(e_vap / 0.611)) ** -1  # dew point in K

        if dry_run:
            # Visualize the computation graph
            tdps.visualize("dew_point_derivation_dag", optimize_function=opt_fn)
            return tdps.nbytes

        # Check projected memory before executing
        finalized_plan = cubed.plan(tdps, optimize_function=opt_fn)
        max_projected = finalized_plan.max_projected_mem
        allowed = memory_mb * 1024 * 1024  # Convert MB to bytes

        if max_projected <= allowed:
            print(
                f"Memory OK: projected {max_projected / 1024 / 1024:.1f} MB <= allowed {memory_mb} MB"
            )
            break
        else:
            # Scale up memory: projected + 512 MB headroom, rounded to nearest 100 MB
            needed_mb = (max_projected / 1024 / 1024) + 512
            new_memory = int((needed_mb + 99) // 100 * 100)
            print(
                f"Memory insufficient: projected {max_projected / 1024 / 1024:.1f} MB > allowed {memory_mb} MB"
            )
            print(f"Scaling up: {memory_mb} MB → {new_memory} MB")
            memory_mb = new_memory
    else:
        raise RuntimeError(
            f"Failed after {max_retries} attempts to find sufficient memory"
        )

Is there a better way to do this? Can the spec be dynamically updated before I call compute again?

TomNicholas · 2025-12-08T11:50:06Z

TomNicholas
Dec 8, 2025
Maintainer

IIUC you're trying to find the minimum allowed_mem that still allows the computation to run. There is currently no way to automatically do that today.

But is the allowed_mem actually something that should be optimized like this? What are you trying to optimize for by lowering allowed_mem?

I think of allowed_mem as a physical constraint, which reflects the hardware / container you're running on. If allowed_mem happened to be much higher then projected_mem for your computation, I would hope that cubed would simply take advantage of that overhead to compute the problem in a more optimal way (e.g. maybe allocate >1 task per container). I'm not sure whether it does anything like that today.

One could imagine more advanced optimization strategies which could tune this parameter automatically, but that strategy would surely have to know a lot more about your hardware options.

0 replies

neilSchroeder · 2025-12-08T15:20:25Z

neilSchroeder
Dec 8, 2025
Author

The reason I want to do this is cost management of Lambda instances. You're billed by allocation of Lambda instances, so if I partition 2 GB for the instances and I only needed 650 MB I'm still billed for the full 2 GB. This script adjusts the allocated memory in a dynamic way based on the requirements of the graph, and leaves a little head room just in case, so as to be a little more efficient w.r.t costs.

2 replies

neilSchroeder Dec 8, 2025
Author

I suppose I should have mentioned I was using Lithops to do serverless lambda distribution in the initial problem statement. I can see how this would be extremely confusing as a goal if I were running on a fixed hardware machine.

TomNicholas Dec 8, 2025
Maintainer

Thanks for the clarification. I still think that what I said above applies (about cubed potentially utilising the extra memory more efficiently and cost optimization needing to be hardware-aware (see also #334)).

tomwhite · 2025-12-08T16:43:30Z

tomwhite
Dec 8, 2025
Maintainer

Another way to find the maximum allowed memory would be to set a very high allowed memory then find what the projected memory is (by calling plan), then setting the allowed memory to that (plus some buffer). This should generally work, although there are a couple of caveats:

It won't work for operations that use the size of the allowed memory to choose chunk sizes. There actually aren't many of these (qr/svd is the only one now I think).
The optimizer uses allowed mem to work out how many operations to fuse. I'm not sure this happens very often though (possibly in nested reduction operations).

0 replies

tomwhite · 2025-12-09T12:56:57Z

tomwhite
Dec 9, 2025
Maintainer

I think there is scope for making (more of) the spec only needed at planning time, not at graph construction time. Some things like choice of executor already are, and it seems that allowed memory could be except for tricky cases like qr. To do that we might store a logical (memory-independent) representation of the computation (at graph construction time) then convert it to a physical representation (at planning time) where the amount of allowed memory is known. (Having a logical representation would open up #333 too.) This would be quite a lot of work.

Regarding cost management of Lambda instances (which is the more immediate concern), some workloads have very uneven memory requirements. For example, one operation might only need 500GB, whereas a later one in the same computation might need 2GB. It would be useful to think about how to improve the efficiency in these cases.

In the local machine case, we could do this by dynamically changing the number of workers in the thread or process pool for each operation. So if the machine has 16GB of memory then there would be 32 workers for the first operation, and 8 for the second. We could implement this simply by creating a new pool for each operation, or by writing a specialised pool (or using a library like more-executors).

For distributed executors like Lambda, we could pack multiple tasks into a single container, or we could have variable-sized containers.

0 replies

tomwhite · 2025-12-12T11:18:36Z

tomwhite
Dec 12, 2025
Maintainer

Thinking a bit more about variable-sized containers, for Lithops it would be possible to deploy multiple images with different memory limits (by running this line with, say, --memory 500, --memory 1000, --memory 1500, --memory 2000). Then the logic for dispatching tasks to Lithops could use a different FunctionExecutor with the relevant memory limit, based on projected memory, for each op in the plan. This is probably the simplest way to get what you need.

Other distributed executors have different considerations:

Ray: @ray.remote takes a memory limit, so we could pass it in there fairly easily, without having to pre-build anything.
Modal: the app function decorator takes a memory parameter (like Ray), but it can't be defined anywhere but the top-level (afaik), which makes it hard to dynamically set memory in the way we want to. We could build some coarser-grained memory levels as suggested above for Lithops.
Coiled: it's possible to specify different memory values for Coiled functions, but since Coiled is starting a cluster to run on, this doesn't make sense as it would need a new cluster for each op in the plan (I think), which would be very inefficient.

So this might be worth exploring on Lithops and Ray at least. Other executors probably need a way to pack multiple tasks into a single container.

3 replies

TomNicholas Dec 12, 2025
Maintainer

for Lithops it would be possible to deploy multiple images with different memory limits [...] Then the logic for dispatching tasks to Lithops could use a different FunctionExecutor with the relevant memory limit, based on projected memory, for each op in the plan.

This is exactly what I was talking about when I said

One could imagine more advanced optimization strategies which could tune this parameter automatically, but that strategy would surely have to know a lot more about your hardware options.

You're proposing creating an executor which has a lot more information about your hardware options. (FWIW I think this is a cool and good idea.)

neilSchroeder Dec 13, 2025
Author

Fascinating. I hadn't even considered the internal use case for internal job scheduling with different memory allocations.

My use case is a little higher level as I'd like to calculate the same function over datasets of different grids and timescales which requires totally different memory allocation per dataset.

While the optimization at the internal job level is interesting I don't currently need that level of granularity.

neilSchroeder Dec 13, 2025
Author

I guess I'm more interested in something like a DynamicSpec that would create a plan and update the memory allocated to lithops on my behalf instead of having to create a Spec, create a plan, recreate a Spec with a better memory allocation, then compute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-Scaling Memory #849

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Auto-Scaling Memory #849

Uh oh!

neilSchroeder Dec 5, 2025

Replies: 5 comments · 5 replies

Uh oh!

TomNicholas Dec 8, 2025 Maintainer

Uh oh!

neilSchroeder Dec 8, 2025 Author

Uh oh!

neilSchroeder Dec 8, 2025 Author

Uh oh!

TomNicholas Dec 8, 2025 Maintainer

Uh oh!

tomwhite Dec 8, 2025 Maintainer

Uh oh!

tomwhite Dec 9, 2025 Maintainer

Uh oh!

tomwhite Dec 12, 2025 Maintainer

Uh oh!

TomNicholas Dec 12, 2025 Maintainer

Uh oh!

neilSchroeder Dec 13, 2025 Author

Uh oh!

neilSchroeder Dec 13, 2025 Author

neilSchroeder
Dec 5, 2025

Replies: 5 comments 5 replies

TomNicholas
Dec 8, 2025
Maintainer

neilSchroeder
Dec 8, 2025
Author

neilSchroeder Dec 8, 2025
Author

TomNicholas Dec 8, 2025
Maintainer

tomwhite
Dec 8, 2025
Maintainer

tomwhite
Dec 9, 2025
Maintainer

tomwhite
Dec 12, 2025
Maintainer

TomNicholas Dec 12, 2025
Maintainer

neilSchroeder Dec 13, 2025
Author

neilSchroeder Dec 13, 2025
Author