Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression
Reproducible Codebase
This repository provides the complete, modular codebase necessary to replicate our study on efficient language model compression. Our work addresses the critical need to reduce the computational footprint of large transformer models while preserving their core performance capabilities.
We introduce a novel framework that integrates Neuron Importance Analysis with established Low-Rank Approximation techniques. This allows for data-aware pruning and compression, ensuring that we compress only the most critical components of the model, leading to state-of-the-art performance retention at significantly lower bitrates.
This codebase is designed using a structured, modular action system (hydra) to ease the complexity of multiple experimental runs and varying compression strategies.
The results presented in this repository are detailed in the following publication:
- Title: Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression
- Journal: IEEE Access
- Date: January 2026
- Authors: A. Dovas, A. Doumanoglou, P. Drakoulis, D. Zarpalas
- Link: https://ieeexplore.ieee.org/document/11346468
The experimental pipeline is built upon the Hydra framework, which facilitates robust configuration management and complex multi-run orchestration for handling multiple experiments. The core experiment structure relies on three primary components:
- Actions (
/actions): These are free functions that define specific steps in the overall research workflow (e.g., downloading data, initial evaluation). An action takes inputs and produces defined file artifacts. - Data API (
actions/dataapi.py): This component centralizes all file path and directory management. It acts as a getter utility, ensuring consistent artifact handling across different actions by building the entire required file structure on top of a specifieddata_root_dir. - Components (
/comp): These are the algorithmic implementations (the core logic). They operate strictly on input and output file paths provided by the actions.
Note on Configuration: The data directory root must be explicitly defined in the configuration file (e.g.,
./config/paths/workstation-1). Thedataapiwill construct the complete artifact hierarchy beneath this base directory.
The experiment workflow is highly sequential. We have grouped the necessary actions into a logical flow to manage dependencies: some actions must run only once, while others are designed for multi-run capability (hyperparameter tuning).
Our framework supports several specialized compression types, each leveraging different mathematical approaches:
| Abbreviation | Full Name | Description |
|---|---|---|
| SVD | Singular Value Decomposition | Pure SVD approximation. |
| FWSVD | Fisher Weighted-SVD | Incorporates weights derived from the Fisher Information Matrix. |
| Whiten-SVD | SVD LLMv2 (Whitening) | Uses whitening for data-aware approximation. |
| Whiten-SVD-FW | NIDA SVD with Parameter Importance | Combines whitening and parameter importance weighting. |
| Whiten-SVD-FD | NIDA SvD with Neuron Importance | The full methodology: combines whitening, low-rank approximation, and neuron importance analysis (fd). |
Technical Note on Allocation: Our scripts (
/scripts) manage two key allocation strategies:
- Index-Based Allocation (NIDA-SVD): In code terms, this is termed
layer-basedallocation.- Role-Based Allocation (SVD-LLMv2): This is implemented as
type-basedallocation.
The workflow must be followed sequentially by executing groups of actions in a row:
-
0-download: Downloads the specified language model from HuggingFace and stores it in the designateddata_root_dir. (Execute Once) -
1-evaluate: Performs baseline evaluation on the initially downloaded, uncompressed model. (Execute Once) -
2-init(Initialization): Inspects all model layers, computes Fisher Information, and derives the Neuron Importance metrics ($\mathbf{FD}$ ). (Execute Once) -
3-params(Parameter Tuning): Designed for a multi-run setting. This action generates optimal rank configurations necessary for uniform compression at various target compression levels. -
4-prepare: Records feature activations and computes the necessary whitening matrix, which serves as a critical prerequisite for data-aware compression algorithms. -
5-compress: Executes the actual model compression and initial evaluation using the rank configurations derived in Step 4 (3-params). -
mgen(Helper): A utility script to generate specialized shell scripts required for executing subsequent actions sets in a multi-run environment. (Execute after step 4, before finetuning). -
cmprs-finetune: Fine-tunes the resulting compressed model and evaluates its final performance.
- Prerequisites: Python 3.11.
- Installation: Install all necessary dependencies using:
pip install -r requirements.txt
The entry point for executing the entire pipeline is hyrun.py. This script reads a configuration file to execute actions sequentially.
-
Example Single Action Execution: To run only the download step:
python hyrun.py --config-name 0-download
-
Batch Scripts: We provide example batch scripts (
run_distilbert.batandrun_distilbert_finetune.bat) for executing the full workflow pipeline for DistilBert, including finetuning steps.
⚠️ Note on Automation: Thecompression_runidis randomly generated during the compression process. Therefore, fully automating the final finetuning step requires advanced directory parsing functionality that is not provided in this repository structure. Users must manually track the output IDs in this case.