Skip to content

alexd314/llm-comp

Repository files navigation


Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression

Reproducible Codebase


📚 Overview and Contribution

This repository provides the complete, modular codebase necessary to replicate our study on efficient language model compression. Our work addresses the critical need to reduce the computational footprint of large transformer models while preserving their core performance capabilities.

We introduce a novel framework that integrates Neuron Importance Analysis with established Low-Rank Approximation techniques. This allows for data-aware pruning and compression, ensuring that we compress only the most critical components of the model, leading to state-of-the-art performance retention at significantly lower bitrates.

This codebase is designed using a structured, modular action system (hydra) to ease the complexity of multiple experimental runs and varying compression strategies.


📄 Paper Details

The results presented in this repository are detailed in the following publication:

  • Title: Compressing What Matters: Neuron Importance Meets Low Rank Approximation for Language Model Compression
  • Journal: IEEE Access
  • Date: January 2026
  • Authors: A. Dovas, A. Doumanoglou, P. Drakoulis, D. Zarpalas
  • Link: https://ieeexplore.ieee.org/document/11346468

⚙️ Code Architecture and Components

The experimental pipeline is built upon the Hydra framework, which facilitates robust configuration management and complex multi-run orchestration for handling multiple experiments. The core experiment structure relies on three primary components:

  1. Actions (/actions): These are free functions that define specific steps in the overall research workflow (e.g., downloading data, initial evaluation). An action takes inputs and produces defined file artifacts.
  2. Data API (actions/dataapi.py): This component centralizes all file path and directory management. It acts as a getter utility, ensuring consistent artifact handling across different actions by building the entire required file structure on top of a specified data_root_dir.
  3. Components (/comp): These are the algorithmic implementations (the core logic). They operate strictly on input and output file paths provided by the actions.

Note on Configuration: The data directory root must be explicitly defined in the configuration file (e.g., ./config/paths/workstation-1). The dataapi will construct the complete artifact hierarchy beneath this base directory.


🚀 Usage Guide and Execution Workflow

The experiment workflow is highly sequential. We have grouped the necessary actions into a logical flow to manage dependencies: some actions must run only once, while others are designed for multi-run capability (hyperparameter tuning).

🔬 Compression Methodologies

Our framework supports several specialized compression types, each leveraging different mathematical approaches:

Abbreviation Full Name Description
SVD Singular Value Decomposition Pure SVD approximation.
FWSVD Fisher Weighted-SVD Incorporates weights derived from the Fisher Information Matrix.
Whiten-SVD SVD LLMv2 (Whitening) Uses whitening for data-aware approximation.
Whiten-SVD-FW NIDA SVD with Parameter Importance Combines whitening and parameter importance weighting.
Whiten-SVD-FD NIDA SvD with Neuron Importance The full methodology: combines whitening, low-rank approximation, and neuron importance analysis (fd).

Technical Note on Allocation: Our scripts (/scripts) manage two key allocation strategies:

  • Index-Based Allocation (NIDA-SVD): In code terms, this is termed layer-based allocation.
  • Role-Based Allocation (SVD-LLMv2): This is implemented as type-based allocation.

📋 Step-by-Step Execution Sequence

The workflow must be followed sequentially by executing groups of actions in a row:

  1. 0-download: Downloads the specified language model from HuggingFace and stores it in the designated data_root_dir. (Execute Once)
  2. 1-evaluate: Performs baseline evaluation on the initially downloaded, uncompressed model. (Execute Once)
  3. 2-init (Initialization): Inspects all model layers, computes Fisher Information, and derives the Neuron Importance metrics ($\mathbf{FD}$). (Execute Once)
  4. 3-params (Parameter Tuning): Designed for a multi-run setting. This action generates optimal rank configurations necessary for uniform compression at various target compression levels.
  5. 4-prepare: Records feature activations and computes the necessary whitening matrix, which serves as a critical prerequisite for data-aware compression algorithms.
  6. 5-compress: Executes the actual model compression and initial evaluation using the rank configurations derived in Step 4 (3-params).
  7. mgen (Helper): A utility script to generate specialized shell scripts required for executing subsequent actions sets in a multi-run environment. (Execute after step 4, before finetuning).
  8. cmprs-finetune: Fine-tunes the resulting compressed model and evaluates its final performance.

💻 Environment Setup

  1. Prerequisites: Python 3.11.
  2. Installation: Install all necessary dependencies using:
    pip install -r requirements.txt

▶️ Running the Pipeline (Entry Point)

The entry point for executing the entire pipeline is hyrun.py. This script reads a configuration file to execute actions sequentially.

  • Example Single Action Execution: To run only the download step:

    python hyrun.py --config-name 0-download
  • Batch Scripts: We provide example batch scripts (run_distilbert.bat and run_distilbert_finetune.bat) for executing the full workflow pipeline for DistilBert, including finetuning steps.

⚠️ Note on Automation: The compression_runid is randomly generated during the compression process. Therefore, fully automating the final finetuning step requires advanced directory parsing functionality that is not provided in this repository structure. Users must manually track the output IDs in this case.

About

Official Repository of the paper Compressing What Matters: Neuron Importance Meets Data-Aware Low Rank Approximation for Language Model Compression, IEEE Access, Jan 2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors