Authors: Saurav Jha*, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev*
*equal contribution
REAM-ed Qwen3 and GLM-4.5-Air and models 🤗:
See requirements.txt for the necessary packages and their recommended versions. These are the versions we used for our experiments, but you may be able to use other versions as well.
pip install -r requirements.txt
Obtaining a merged model from the original one requires running the merge.py script with appropriate arguments, e.g.:
python merge.py --model <> --merge_size <> --save_path <> ...
See merge.py and config.py for the hyperparameters and options.
Default arguments correspond to our full REAM model.
--merge_size should be chosen appropriately, e.g. 25% or 50% of the original number of experts.
MTP (Multi-token Prediction) layers, present in some LLMs for more efficient decoding, are supported in our code (treated as an additional MoE layer), but it was checked only for Qwen3 and requires additional steps:
- setting the
--mtp_safe_tensorsfile path - renaming the merged model's safe tensors of the MTP and other layers and the corresponding model.safetensors.index.json file
See Qwen3-Next-80B-A3B-Instruct-REAM for our example of the merged model with MTP layers.
We evaluate the original and compressed (merged/pruned) models on the MC and GEN tasks.
Run python eval_mc.py to evaluate the merged model on 8 multiple-choice tasks.
The main options are --model and --batch_size.
See eval_mc.py and config.py for more options.
We evaluate on 6 generative tasks: IFEval, AIME25, GSM8K, GPQA-Diamond, HumanEval and LiveCodeBench We use the following tools as described on our main huggingface page and in the paper:
- lm-eval https://github.com/EleutherAI/lm-evaluation-harness
- LiveCodeBench https://github.com/LiveCodeBench/LiveCodeBench
- For GLM-4.5-Air, HumanEval and LiveCodeBench tasks: https://github.com/zai-org/glm-simple-evals (see GLM-4.5-Air-REAM for details)
MIT, see the LICENSE file.
If you find this work useful, please consider citing:
@article{jha2026ream,
title={REAM: Merging Improves Pruning of Experts in LLMs},
author={Jha, Saurav and Hashemzadeh, Maryam and Pasand, Ali Saheb and Parviz, Ali and Lee, Min-Joong and Knyazev, Boris},
journal={arXiv preprint arXiv:2604.04356},
year={2026}
}