bcNMF: Background Contrastive Nonnegative Matrix Factorization

bcNMF extracts target-enriched latent components from high-dimensional data by jointly factorizing a target dataset and a matched background using shared nonnegative bases under a contrastive objective that suppresses shared variation.

Yixuan Li, Archer Y. Yang, Yue Li bcNMF: Background Contrastive Nonnegative Matrix Factorization Identifies Target-Specific Features in High-Dimensional Data arXiv:2602.22387

System Requirements

Software dependencies

Python >= 3.9
PyTorch >= 2.0
numpy >= 1.22
scipy
scikit-learn
tqdm
umap-learn
matplotlib
scanpy (required for scRNA-seq experiments)

All dependencies are listed in requirements.txt.

Operating systems tested

macOS 13+ (Ventura / Sonoma)
Linux (Ubuntu 20.04+)
Windows 10/11

Hardware

No non-standard hardware is required. A CUDA-capable GPU is optional and will be used automatically if available; all functions fall back to CPU otherwise.

Installation Guide

Instructions

git clone https://github.com/li-lab-mcgill/bcnmf.git
cd bcnmf
pip install -e .

Or install dependencies only:

pip install -r requirements.txt

Typical install time

On a standard desktop computer with a normal internet connection, installation takes < 5 minutes.

Demo

Dataset

A self-contained demo is provided in the demo/ directory. It reproduces the MNIST + natural image background experiment (Section 2.2 of the paper) using pre-generated data included in the repository:

demo/
├── simulation.ipynb              # Demo notebook
├── demo_data/
│   ├── target_images.csv         # 784 × 200  (MNIST digits 0/1 on flower backgrounds)
│   ├── background_images.csv     # 784 × 150  (flower patches only)
│   └── target_labels.csv         # Ground-truth digit labels (0 or 1)
└── results/                      # Output figures and matrices written here

Instructions to run

cd demo
jupyter notebook simulation.ipynb

Run all cells in order. The notebook will:

Display example target images (digits on backgrounds) and background-only images
Fit standard NMF with K=2
Fit bcNMF with K=2 and α=1
Produce scatter plots of NMF vs. bcNMF factor scores coloured by digit label
Report ARI (Adjusted Rand Index) for each method

Expected output

Two scatter plots saved to demo/results/nmf_vs_bcnmf.png
Factor matrices saved to demo/results/H_nmf.csv, demo/results/H_bcnmf.csv
ARI summary saved to demo/results/ari_summary.csv
bcNMF achieves substantially higher ARI than standard NMF, showing that the contrastive objective suppresses the background texture signal and recovers the digit structure.

Expected run time

< 2 minutes on a standard desktop CPU (no GPU required).

Instructions for Use

Running bcNMF on your own data

import numpy as np
from bcNMF import contrastive_nmf_poisson

# X: (n_features, n_target_samples)     — target data (nonnegative counts)
# Y: (n_features, n_background_samples) — background data (nonnegative counts)
X = np.random.poisson(5, size=(500, 200)).astype(float)
Y = np.random.poisson(5, size=(500, 100)).astype(float)

W, H_X, H_Y, perf = contrastive_nmf_poisson(X, Y, K=10, alpha=1.0, niter=200)
# W   : (n_features, K)  shared nonneg basis
# H_X : (K, n_target_samples)  target coefficients  ← use for downstream analysis
# H_Y : (K, n_background_samples)  background coefficients

For continuous/image data use contrastive_nmf_sse (squared-error loss). Mini-batch training is available via contrastive_nmf_poisson_minibatch for large datasets.

Available functions

Function	Loss	Use case
`nmf_sse`	Squared error	Standard NMF baseline
`nmf_poisson`	Poisson	Standard NMF, count data
`nmf_poisson_minibatch`	Poisson	Large-scale standard NMF
`contrastive_nmf_sse`	Squared error	bcNMF for continuous / image data
`contrastive_nmf_poisson`	Poisson	bcNMF for scRNA-seq / count data
`contrastive_nmf_poisson_minibatch`	Poisson	bcNMF, large-scale
`contrastive_nmf_sse_multi`	Squared error	bcNMF for two-modality data

Key parameters

Parameter	Description	Default
`K`	Number of factors	required
`alpha`	Background suppression strength (higher = more contrastive)	`1.0`
`niter`	Number of multiplicative-update iterations	`200`
`tol`	Convergence tolerance	`1e-4`

Reproduction Instructions

Each experiment in the paper has a corresponding Jupyter notebook under experiments/. Pre-computed result matrices are provided in result/ so figures can be reproduced without re-running the full fitting.

Section	Notebook	Data
Sec 2.2 — Simulation (MNIST + ImageNet)	`experiments/simulation/mnist_imagenet.ipynb`	`dataset/simulation/n11939491/` (ImageNet grass images)
Sec 2.3 — Mice protein	`experiments/mice_protein/mice_protein.ipynb`	`dataset/mice_protein/Data_Cortex_Nuclear.csv`
Sec 2.4 — Leukemia scRNA-seq	`experiments/leukemia/leukemia.ipynb`	`dataset/leukemia/` (see note below)
Sec 2.5 — Cancer cell lines	`experiments/cancer_cell_lines/mcfarland.ipynb`	`dataset/cancer_cell_lines/` (see note below)
Sec 2.6 — MDD snRNA-seq	`experiments/mdd/mdd_final.ipynb`	`dataset/mdd/` (see note below)

Note on large datasets: The following raw data files exceed GitHub's file size limit and are not included in this repository. Each notebook contains instructions at the top for downloading the data.

dataset/leukemia/adata_X_hvg_3000.h5ad (183 MB) — generated by running the preprocessing cells in leukemia.ipynb on the raw 10x Genomics BMMC data
dataset/cancer_cell_lines/**/matrix.mtx (~105 MB each) — downloaded automatically by running Cell 1 in mcfarland.ipynb
MDD full dataset (mdd_full.h5ad, ~9 GB) — available from the original data source; see mdd_final.ipynb for details

Repository Structure

bcNMF/
├── bcNMF/                  # Python package
│   ├── __init__.py
│   └── bcnmf.py            # Core multiplicative-update algorithms
├── demo/                   # Self-contained demo (runs without downloading data)
│   ├── simulation.ipynb
│   ├── demo_data/
│   └── results/
├── experiments/
│   ├── simulation/         # Sec 2.2 — MNIST + ImageNet
│   ├── mice_protein/       # Sec 2.3 — Down syndrome protein expression
│   ├── leukemia/           # Sec 2.4 — Leukemia scRNA-seq (pre/post transplant)
│   ├── cancer_cell_lines/  # Sec 2.5 — MIX-seq idasanutlin / TP53
│   └── mdd/                # Sec 2.6 — MDD snRNA-seq (postmortem brain)
├── dataset/                # Data files (large files excluded, see .gitignore)
├── result/                 # Pre-computed result matrices (.npy)
├── LICENSE
├── setup.py
├── requirements.txt
└── README.md

Citation

@article{li2025bcnmf,
  title  = {bcNMF: Background Contrastive Nonnegative Matrix Factorization
            Identifies Target-Specific Features in High-Dimensional Data},
  author = {Li, Yixuan and Yang, Archer Y. and Li, Yue},
  year   = {2025},
  eprint = {2602.22387},
  archivePrefix = {arXiv},
  url    = {https://arxiv.org/abs/2602.22387}
}

License

This project is licensed under the MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bcNMF: Background Contrastive Nonnegative Matrix Factorization

System Requirements

Software dependencies

Operating systems tested

Hardware

Installation Guide

Instructions

Typical install time

Demo

Dataset

Instructions to run

Expected output

Expected run time

Instructions for Use

Running bcNMF on your own data

Available functions

Key parameters

Reproduction Instructions

Repository Structure

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bcNMF		bcNMF
dataset		dataset
demo		demo
experiments		experiments
result		result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

bcNMF: Background Contrastive Nonnegative Matrix Factorization

System Requirements

Software dependencies

Operating systems tested

Hardware

Installation Guide

Instructions

Typical install time

Demo

Dataset

Instructions to run

Expected output

Expected run time

Instructions for Use

Running bcNMF on your own data

Available functions

Key parameters

Reproduction Instructions

Repository Structure

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages