Skip to content

For MVS Systems, move to malloc instead of MMAP for Constant Files#3400

Draft
JacobEngelbrecht wants to merge 8 commits intoonnx:mainfrom
JacobEngelbrecht:mvs_mmap_large_file_support
Draft

For MVS Systems, move to malloc instead of MMAP for Constant Files#3400
JacobEngelbrecht wants to merge 8 commits intoonnx:mainfrom
JacobEngelbrecht:mvs_mmap_large_file_support

Conversation

@JacobEngelbrecht
Copy link

@JacobEngelbrecht JacobEngelbrecht commented Feb 23, 2026

Support large constant files (>1GB) on z/OS via malloc fallback

Problem

z/OS has a 1GB limit (in reality around 1.2GB) on mmap() operations, which causes failures when loading large constant weight files for neural network models. System-level configuration changes would be required to increase this limit, which is not always feasible in production environments.

Solution

Implemented a system for z/OS that uses malloc() + chunked read() instead of mmap() for constant files larger than 1GB:

  • Chunked reading: Reads files in 1GB chunks to avoid system limitations
  • Short-read handling: Properly handles POSIX read() behavior where large reads may return fewer bytes than requested
  • Thread-safe loading: Uses sentinel-based compare-and-swap (CAS) to ensure only one thread loads large files while others wait, preventing redundant memory allocation

Technical Details

  • Files ≤1GB continue to use mmap() with __MAP_MEGA flag on z/OS
  • Files >1GB use malloc() + read() with proper cleanup on errors on z/OS
  • Other platforms continue to use standard mmap() for all file sizes
  • Added helper function omMallocAndReadFile() to encapsulate the malloc+read logic

Testing

Tested with GPT-2XL model (6.5GB constant file) on z/OS - successfully loads and runs inference without system configuration changes. Tested with BERT uncased for <1GB constant file testing.


Notes

  • Kept everything contained in omMMapBinaryFile to avoid additional file changes and testing changes.
  • It can take about ~10seconds per GB to load in the constants file, there is a timeout of 5 minutes in case of an unexpected error, but results in an upper limit of 30GB constant files.

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@gongsu832
Copy link
Collaborator

The code logic of mmap/munmap is such that when there are multiple threads calling omMMapBinaryFile simultaneously each of them will get a valid tempAddr but only one thread will succeed in the compare-and-swap and update the shared constAddr. All other threads will just munmap their tempAddr and use the shared constAddr instead. For these other threads, the mmap/munmap are basically a small "waste" but it's OK since mmap/munmap without actually reading the file is fairly cheap.

In the proposed malloc code logic, however, all threads will malloc and then read in the entire file. They then compete to update the shared constAddr and all but one threads will throw away their tempAddr. But unlike mmap/munmap, this malloc/read/free is a pretty large waste since reading in the entire file (and causing all pages to be backed and therefore generating many page faults) are very expensive. The malloc code logic should be that only thread will actually read in the file.

Also, sticking some malloc/read/free code in a function called omMMapBinaryFile is not very pretty. You likely want to put the malloc/read/free code in its own function and have another appropriately named higher level function to branch to either mmap or malloc accordingly.

Lastly, our Jenkin CI doesn't have a z/OS test machine so the code cannot be checked by the Jenkins CI. So the code will have to be tested somewhere else before it can be merged.

Large file MMAP support on MVS systems can require system level alterations for memory allocations, sometimes making it easier to instead allocate memory in your address space and simply page in information at load time.

Signed-off-by: Jacob Engelbrecht <jacob@engelbrecht.works>
@JacobEngelbrecht JacobEngelbrecht force-pushed the mvs_mmap_large_file_support branch from 48d843b to d70f9bb Compare February 25, 2026 18:37
@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@JacobEngelbrecht
Copy link
Author

Thank you for the feedback, we are working on verifying and this will be cleaned up and improved prior to officially opening the PR, thank you for the feedback and those changes will be implemented.

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@JacobEngelbrecht
Copy link
Author

still working on this, but verified approach.

@jenkins-droid
Copy link
Collaborator

Can one of the admins verify this patch?

@tungld
Copy link
Member

tungld commented Mar 6, 2026

@JacobEngelbrecht another thing to consider: who will free the allocated buffer? Since there might be multiple threads calling the same .so file, we don't know when all the threads finish. So we should provide a function like omFreeConstantBuffer, which is called by the user's program. @gongsu832 any thought on this?

@gongsu832
Copy link
Collaborator

@JacobEngelbrecht another thing to consider: who will free the allocated buffer? Since there might be multiple threads calling the same .so file, we don't know when all the threads finish. So we should provide a function like omFreeConstantBuffer, which is called by the user's program. @gongsu832 any thought on this?

@tungld I haven't looked at the compiler code for a while so my understanding might be wrong. My understanding is that omMMapBinaryFile is part of the emitted IR function omLoadConstantsFromFile, which ultimately goes into the compiled model.so? Conceptually, constant handling is part of the model computation so it's an internal matter to the model that shouldn't concern the user. So ideally, if it's possible to derive from the model computation graph when the constants are no longer needed, the compiler should similarly emit an IR function omFreeConstantBuffer at the proper point to handle munmap/free the constant buffer without user involvement.

@tungld
Copy link
Member

tungld commented Mar 9, 2026

My understanding is that omMMapBinaryFile is part of the emitted IR function omLoadConstantsFromFile, which ultimately goes into the compiled model.so?

Yes, it is called from the entry point function inside .so, we can free the buffer at the end of the entry point function.
But I am considering the two following cases:

  • Decoding models: the entry point function inside a .so file is called repeatedly with different inputs, each generate one token. It's inefficient to alloc/free each time the entry point function is called.
  • Multi-threading: multiple threads can the same entry point function with different inputs. For this case, a simple solution is to provide a reference counter to know how many threads are using the buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants