msaviz is a small R package for visualizing multiple sequence
alignments (MSAs). It reshapes an alignment into a tidy tibble and
then renders it as either a ggplot2 heatmap or a per-sequence lollipop
chart. A rasterGeom() wrapper keeps file sizes and render times in
check, even on alignments with millions of cells (which is usually where
plain vector output starts to struggle).
The code here started life producing some of the figures in Hodgins et al. (2023), Nature Communications 14:5475, and was later tidied up into this package.
# install.packages("remotes")
remotes::install_github("bjmt/msaviz")msaviz doesn’t depend on any particular alignment reader, so you can
use either seqinr or
Biostrings to load your
sequences (whichever you already happen to have on hand).
library(msaviz)
aln_path <- system.file(
"extdata", "hug_ribosomal.fa.gz", package = "msaviz"
)
aln <- seqinr::read.alignment(aln_path, format = "fasta")
alnDF <- msa2DF(aln)
msaHeatmap(alnDF)By default msaHeatmap() colours every cell by whether it matches the
reference sequence (a computed consensus, if you don’t pass
reference =). For per-letter colouring instead, switch to
column = "Letter".
msaLollipop() is an alternative view of the same alnDF. Each
sequence becomes one row, every non-reference cell is drawn as a
vertical stem with a circular head, and a per-position baseline shows
gaps as breaks. This is the form used for Supplementary Figure 20 of
Hodgins et al. (2023).
alnDF_ll <- msa2DF(aln, drop.gaps = FALSE, keep.consensus = TRUE)
msaLollipop(alnDF_ll, labels = FALSE)On smaller alignments (or short windows of larger ones) the labels are readable, and the heads can be colour-mapped to any column:
toy <- c(
seq1 = "ACGTACGTACGTACGTACGT",
seq2 = "ACGTACGAACGTACGTACGT",
seq3 = "AAGTACGTACGTGCGTACGT",
seq4 = "ACGTACGTACGTACGTCCGA"
)
toyDF <- msa2DF(toy, reference = "seq1", drop.gaps = FALSE,
keep.consensus = TRUE)
msaLollipop(toyDF,
head.fill.by = "Letter",
head.fill.colours = msa_palette_DNA)saveHeatmap() is a ggsave() wrapper that picks a reasonable size
from the number of sequences and the axis text size (you can always pass
height = / width = to override it):
saveHeatmap(msaHeatmap(alnDF), "aln.pdf")Two standalone command-line scripts ship with the package.
inst/scripts/msaHeatmap.R renders an alignment heatmap, and
inst/scripts/msaLollipop.R renders the per-sequence lollipop chart
(again, the form used for Supplementary Figure 20 of Hodgins et
al. 2023).
system.file("scripts", "msaHeatmap.R", package = "msaviz")
system.file("scripts", "msaLollipop.R", package = "msaviz")Use them as:
Rscript msaHeatmap.R --input aln.fa --output aln.pdf --column Aln
Rscript msaLollipop.R --input aln.fa --output lollipop.pdf --reference E88See --help on each for the full set of options.
composeMSA() lines the heatmap up (via patchwork) with optional
companion plots: a percent-identity track, a hierarchical-clustering
dendrogram, and a left-side group annotation strip.
d <- msaDendro(aln)
groups <- setNames(rep(c("Bacteria", "Archaea", "Eukarya"),
length.out = length(aln$nam)),
aln$nam)
group_cols <- c(Bacteria = "#1f77b4", Archaea = "#ff7f0e",
Eukarya = "#2ca02c")
composeMSA(
heatmap = msaHeatmap(alnDF, row.order = attr(d, "order"),
legend.pos = "none"),
top = msaPID(alnDF),
left = d,
groups = groups[attr(d, "order")],
group.colours = group_cols,
heights = c(1, 6), # PID short, heatmap tall
widths = c(0.5, 0.04, 4) # narrow dendrogram, thin group strip, wide heatmap
)The package vignette walks through Aln vs Letter colouring, custom
palettes, SNP emphasis, the raster = TRUE/FALSE trade-off,
posStats(), the lollipop view, and the rest of the companion plotters:
vignette("msaviz")


