corrselect

every uncorrelated subset of predictors

Exact enumeration of maximal predictor sets under a correlation threshold.

Feed it your predictors. corrselect returns the maximal sets whose pairwise correlations all stay under your threshold, found exactly by graph enumeration (Bron–Kerbosch / Eppstein–Löffler–Strash) in C++. A greedy filter hands you one set and hides the rest. This keeps every variable it can, and shows you all the valid choices.

library(corrselect)

# prune to one low-correlation set, in a single call
corrPrune(mtcars, threshold = 0.7)

# or enumerate *every* maximal low-correlation set
corrSelect(mtcars, threshold = 0.7)

Exact, not greedy

The usual tool, caret::findCorrelation(), removes variables greedily: it is order-dependent, non-deterministic, and typically drops more than it needs to. corrselect solves the same threshold constraint by maximal-clique enumeration, so it retains at least as many variables and returns the same answer every run.

m <- cor(mtcars)

caret::findCorrelation(m, cutoff = 0.7)              # greedy, ordering-dependent
corrPrune(mtcars, threshold = 0.7, mode = "exact")   # exact, deterministic

What's in the box

corrPrune(): association-based pruning, model-free. Exact mode for p <= 100, greedy mode for larger p, protect variables with force_in.
modelPrune(): VIF-based pruning for lm, glm, lme4, glmmTMB, or any custom engine (INLA, mgcv, brms, ...).
corrSelect() / MatSelect(): exhaustive enumeration of all maximal sets, on a data frame or directly on a correlation matrix.
assocSelect(): mixed-type data (numeric, factor, ordered), with the right association metric chosen per pair.

Multiple association metrics are supported: "pearson", "spearman", "kendall", "bicor" (WGCNA), "distance" (energy), "maximal" (minerva), and "eta" / "cramersv" for mixed-type data.

`corrPrune` or `modelPrune`?

	`corrPrune()`	`modelPrune()`
Needs a model?	No	Yes
Based on	Pairwise correlation / association	Model diagnostics (VIF)
Works without a response?	Yes	No
Mixed models?	No	Yes (`lme4`, `glmmTMB`)
Best for	Exploratory analysis, large `p`	Regression workflows, VIF reduction

Use corrPrune() first to cut dimensionality, then modelPrune() for final cleanup inside a modeling framework.

Model-based pruning with any engine

modelPrune() works with lm, glm, lme4, glmmTMB, or any package you wire in:

# linear model, VIF threshold
modelPrune(mpg ~ cyl + disp + hp + wt, data = mtcars, limit = 5)

# any modeling package, via a custom engine (here: INLA)
inla_engine <- list(
  name = "inla",
  fit  = function(formula, data, ...) {
    INLA::inla(formula, data = data, family = "gaussian", ...)
  },
  diagnostics = function(model, fixed_effects) {
    scores <- model$summary.fixed[, "sd"]          # posterior SD as the badness score
    setNames(scores, rownames(model$summary.fixed))[fixed_effects]
  }
)
modelPrune(y ~ x1 + x2, data = df, engine = inla_engine, limit = 0.5)

Mixed-type data

assocSelect() chooses the right association metric per pair (Pearson, Spearman, eta-squared, Cramér's V):

df <- data.frame(
  height = rnorm(30, 170, 10),
  weight = rnorm(30, 70, 12),
  group  = factor(sample(c("A", "B"), 30, TRUE)),
  rating = ordered(sample(c("low", "med", "high"), 30, TRUE))
)

assocSelect(df, threshold = 0.6)

Installation

install.packages("corrselect")            # CRAN

install.packages("pak")                   # development version
pak::pak("gcol33/corrselect")

Documentation

Support

"Software is like sex: it's better when it's free." — Linus Torvalds

I'm a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.

If this package saved you some time, buying me a coffee is a nice way to say thanks. It helps with my coffee addiction.

License

MIT (see the LICENSE.md file)

Citation

@article{corrselect,
  author  = {Colling, Gilles},
  title   = {corrselect: Fast and flexible predictor pruning for data analysis and modeling},
  journal = {Journal of Open Source Software},
  year    = {2025},
  doi     = {10.21105/joss.09539},
  url     = {https://joss.theoj.org/papers/10.21105/joss.09539}
}

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.github/workflows		.github/workflows
R		R
data-raw		data-raw
data		data
docs		docs
inst		inst
man		man
paper		paper
pkgdown		pkgdown
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml
corrselect.Rproj		corrselect.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

corrselect

Exact, not greedy

What's in the box

`corrPrune` or `modelPrune`?

Model-based pruning with any engine

Mixed-type data

Installation

Documentation

Support

License

Citation

About

Licenses found

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

corrselect

Exact, not greedy

What's in the box

corrPrune or modelPrune?

Model-based pruning with any engine

Mixed-type data

Installation

Documentation

Support

License

Citation

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`corrPrune` or `modelPrune`?

Packages