Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression

Reproducible Codebase

📚 Overview and Contribution

This repository provides the complete, modular codebase necessary to replicate our study on efficient language model compression. Our work addresses the critical need to reduce the computational footprint of large transformer models while preserving their core performance capabilities.

We introduce a novel framework that integrates Neuron Importance Analysis with established Low-Rank Approximation techniques. This allows for data-aware pruning and compression, ensuring that we compress only the most critical components of the model, leading to state-of-the-art performance retention at significantly lower bitrates.

This codebase is designed using a structured, modular action system (hydra) to ease the complexity of multiple experimental runs and varying compression strategies.

📄 Paper Details

The results presented in this repository are detailed in the following publication:

Title: Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression
Journal: IEEE Access
Date: January 2026
Authors: A. Dovas, A. Doumanoglou, P. Drakoulis, D. Zarpalas
Link: https://ieeexplore.ieee.org/document/11346468

⚙️ Code Architecture and Components

The experimental pipeline is built upon the Hydra framework, which facilitates robust configuration management and complex multi-run orchestration for handling multiple experiments. The core experiment structure relies on three primary components:

Actions (/actions): These are free functions that define specific steps in the overall research workflow (e.g., downloading data, initial evaluation). An action takes inputs and produces defined file artifacts.
Data API (actions/dataapi.py): This component centralizes all file path and directory management. It acts as a getter utility, ensuring consistent artifact handling across different actions by building the entire required file structure on top of a specified data_root_dir.
Components (/comp): These are the algorithmic implementations (the core logic). They operate strictly on input and output file paths provided by the actions.

Note on Configuration: The data directory root must be explicitly defined in the configuration file (e.g., ./config/paths/workstation-1). The dataapi will construct the complete artifact hierarchy beneath this base directory.

🚀 Usage Guide and Execution Workflow

The experiment workflow is highly sequential. We have grouped the necessary actions into a logical flow to manage dependencies: some actions must run only once, while others are designed for multi-run capability (hyperparameter tuning).

🔬 Compression Methodologies

Our framework supports several specialized compression types, each leveraging different mathematical approaches:

Abbreviation	Full Name	Description
SVD	Singular Value Decomposition	Pure SVD approximation.
FWSVD	Fisher Weighted-SVD	Incorporates weights derived from the Fisher Information Matrix.
Whiten-SVD	SVD LLMv2 (Whitening)	Uses whitening for data-aware approximation.
Whiten-SVD-FW	NIDA SVD with Parameter Importance	Combines whitening and parameter importance weighting.
Whiten-SVD-FD	NIDA SvD with Neuron Importance	The full methodology: combines whitening, low-rank approximation, and neuron importance analysis (`fd`).

Technical Note on Allocation: Our scripts (/scripts) manage two key allocation strategies:

Index-Based Allocation (NIDA-SVD): In code terms, this is termed layer-based allocation.

Role-Based Allocation (SVD-LLMv2): This is implemented as type-based allocation.

📋 Step-by-Step Execution Sequence

The workflow must be followed sequentially by executing groups of actions in a row:

0-download: Downloads the specified language model from HuggingFace and stores it in the designated data_root_dir. (Execute Once)
1-evaluate: Performs baseline evaluation on the initially downloaded, uncompressed model. (Execute Once)
2-init (Initialization): Inspects all model layers, computes Fisher Information, and derives the Neuron Importance metrics ($\mathbf{FD}$). (Execute Once)
3-params (Parameter Tuning): Designed for a multi-run setting. This action generates optimal rank configurations necessary for uniform compression at various target compression levels.
4-prepare: Records feature activations and computes the necessary whitening matrix, which serves as a critical prerequisite for data-aware compression algorithms.
5-compress: Executes the actual model compression and initial evaluation using the rank configurations derived in Step 4 (3-params).
mgen (Helper): A utility script to generate specialized shell scripts required for executing subsequent actions sets in a multi-run environment. (Execute after step 4, before finetuning).
cmprs-finetune: Fine-tunes the resulting compressed model and evaluates its final performance.

💻 Environment Setup

Prerequisites: Python 3.11.
Installation: Install all necessary dependencies using:
```
pip install -r requirements.txt
```

▶️ Running the Pipeline (Entry Point)

The entry point for executing the entire pipeline is hyrun.py. This script reads a configuration file to execute actions sequentially.

Example Single Action Execution: To run only the download step:
```
python hyrun.py --config-name 0-download
```
Batch Scripts: We provide example batch scripts (run_distilbert.bat and run_distilbert_finetune.bat) for executing the full workflow pipeline for DistilBert, including finetuning steps.

⚠️ Note on Automation: The compression_runid is randomly generated during the compression process. Therefore, fully automating the final finetuning step requires advanced directory parsing functionality that is not provided in this repository structure. Users must manually track the output IDs in this case.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.config		.config
.vscode		.vscode
actions		actions
comp		comp
notebooks		notebooks
scripts		scripts
utils		utils
.gitignore		.gitignore
.project-root		.project-root
LICENSE		LICENSE
README.md		README.md
hyrun.py		hyrun.py
requirements.txt		requirements.txt
run_distilbert.bat		run_distilbert.bat
run_distilbert_finetune.bat		run_distilbert_finetune.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression

📚 Overview and Contribution

📄 Paper Details

⚙️ Code Architecture and Components

🚀 Usage Guide and Execution Workflow

🔬 Compression Methodologies

📋 Step-by-Step Execution Sequence

💻 Environment Setup

▶️ Running the Pipeline (Entry Point)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression

📚 Overview and Contribution

📄 Paper Details

⚙️ Code Architecture and Components

🚀 Usage Guide and Execution Workflow

🔬 Compression Methodologies

📋 Step-by-Step Execution Sequence

💻 Environment Setup

▶️ Running the Pipeline (Entry Point)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages