REAM: Merging Improves Pruning of Experts in LLMs

Authors: Saurav Jha*, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev*

*equal contribution

REAM-ed Qwen3 and GLM-4.5-Air and models 🤗:

Requirements

See requirements.txt for the necessary packages and their recommended versions. These are the versions we used for our experiments, but you may be able to use other versions as well.

pip install -r requirements.txt

Merging

Obtaining a merged model from the original one requires running the merge.py script with appropriate arguments, e.g.:

python merge.py --model <> --merge_size <> --save_path <> ...

See merge.py and config.py for the hyperparameters and options. Default arguments correspond to our full REAM model. --merge_size should be chosen appropriately, e.g. 25% or 50% of the original number of experts.

MTP (Multi-token Prediction) layers, present in some LLMs for more efficient decoding, are supported in our code (treated as an additional MoE layer), but it was checked only for Qwen3 and requires additional steps:

setting the --mtp_safe_tensors file path
renaming the merged model's safe tensors of the MTP and other layers and the corresponding model.safetensors.index.json file

See Qwen3-Next-80B-A3B-Instruct-REAM for our example of the merged model with MTP layers.

Evaluation

We evaluate the original and compressed (merged/pruned) models on the MC and GEN tasks.

MC tasks

Run python eval_mc.py to evaluate the merged model on 8 multiple-choice tasks. The main options are --model and --batch_size. See eval_mc.py and config.py for more options.

GEN tasks

We evaluate on 6 generative tasks: IFEval, AIME25, GSM8K, GPQA-Diamond, HumanEval and LiveCodeBench We use the following tools as described on our main huggingface page and in the paper:

lm-eval https://github.com/EleutherAI/lm-evaluation-harness
LiveCodeBench https://github.com/LiveCodeBench/LiveCodeBench
For GLM-4.5-Air, HumanEval and LiveCodeBench tasks: https://github.com/zai-org/glm-simple-evals (see GLM-4.5-Air-REAM for details)

License

MIT, see the LICENSE file.

Citation

If you find this work useful, please consider citing:

@article{jha2026ream,
  title={REAM: Merging Improves Pruning of Experts in LLMs},
  author={Jha, Saurav and Hashemzadeh, Maryam and Pasand, Ali Saheb and Parviz, Ali and Lee, Min-Joong and Knyazev, Boris},
  journal={arXiv preprint arXiv:2604.04356},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REAM: Merging Improves Pruning of Experts in LLMs

Requirements

Merging

Evaluation

MC tasks

GEN tasks

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
ream		ream
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
eval_mc.py		eval_mc.py
merge.py		merge.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

REAM: Merging Improves Pruning of Experts in LLMs

Requirements

Merging

Evaluation

MC tasks

GEN tasks

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages