Skip to content

bjmt/msaviz

Repository files navigation

msaviz

R-CMD-check

msaviz is a small R package for visualizing multiple sequence alignments (MSAs). It reshapes an alignment into a tidy tibble and then renders it as either a ggplot2 heatmap or a per-sequence lollipop chart. A rasterGeom() wrapper keeps file sizes and render times in check, even on alignments with millions of cells (which is usually where plain vector output starts to struggle).

The code here started life producing some of the figures in Hodgins et al. (2023), Nature Communications 14:5475, and was later tidied up into this package.

Installation

# install.packages("remotes")
remotes::install_github("bjmt/msaviz")

msaviz doesn’t depend on any particular alignment reader, so you can use either seqinr or Biostrings to load your sequences (whichever you already happen to have on hand).

Quick example

library(msaviz)

aln_path <- system.file(
  "extdata", "hug_ribosomal.fa.gz", package = "msaviz"
)
aln <- seqinr::read.alignment(aln_path, format = "fasta")

alnDF <- msa2DF(aln)
msaHeatmap(alnDF)

By default msaHeatmap() colours every cell by whether it matches the reference sequence (a computed consensus, if you don’t pass reference =). For per-letter colouring instead, switch to column = "Letter".

Per-sequence lollipops

msaLollipop() is an alternative view of the same alnDF. Each sequence becomes one row, every non-reference cell is drawn as a vertical stem with a circular head, and a per-position baseline shows gaps as breaks. This is the form used for Supplementary Figure 20 of Hodgins et al. (2023).

alnDF_ll <- msa2DF(aln, drop.gaps = FALSE, keep.consensus = TRUE)
msaLollipop(alnDF_ll, labels = FALSE)

On smaller alignments (or short windows of larger ones) the labels are readable, and the heads can be colour-mapped to any column:

toy <- c(
  seq1 = "ACGTACGTACGTACGTACGT",
  seq2 = "ACGTACGAACGTACGTACGT",
  seq3 = "AAGTACGTACGTGCGTACGT",
  seq4 = "ACGTACGTACGTACGTCCGA"
)
toyDF <- msa2DF(toy, reference = "seq1", drop.gaps = FALSE,
  keep.consensus = TRUE)
msaLollipop(toyDF,
  head.fill.by = "Letter",
  head.fill.colours = msa_palette_DNA)

Saving plots

saveHeatmap() is a ggsave() wrapper that picks a reasonable size from the number of sequences and the axis text size (you can always pass height = / width = to override it):

saveHeatmap(msaHeatmap(alnDF), "aln.pdf")

Shell scripts

Two standalone command-line scripts ship with the package. inst/scripts/msaHeatmap.R renders an alignment heatmap, and inst/scripts/msaLollipop.R renders the per-sequence lollipop chart (again, the form used for Supplementary Figure 20 of Hodgins et al. 2023).

system.file("scripts", "msaHeatmap.R",  package = "msaviz")
system.file("scripts", "msaLollipop.R", package = "msaviz")

Use them as:

Rscript msaHeatmap.R  --input aln.fa --output aln.pdf      --column Aln
Rscript msaLollipop.R --input aln.fa --output lollipop.pdf --reference E88

See --help on each for the full set of options.

Composing with PID, groups, and a dendrogram

composeMSA() lines the heatmap up (via patchwork) with optional companion plots: a percent-identity track, a hierarchical-clustering dendrogram, and a left-side group annotation strip.

d <- msaDendro(aln)
groups <- setNames(rep(c("Bacteria", "Archaea", "Eukarya"),
                       length.out = length(aln$nam)),
                   aln$nam)
group_cols <- c(Bacteria = "#1f77b4", Archaea = "#ff7f0e",
                Eukarya  = "#2ca02c")

composeMSA(
  heatmap = msaHeatmap(alnDF, row.order = attr(d, "order"),
                       legend.pos = "none"),
  top  = msaPID(alnDF),
  left = d,
  groups = groups[attr(d, "order")],
  group.colours = group_cols,
  heights = c(1, 6),       # PID short, heatmap tall
  widths  = c(0.5, 0.04, 4) # narrow dendrogram, thin group strip, wide heatmap
)

More

The package vignette walks through Aln vs Letter colouring, custom palettes, SNP emphasis, the raster = TRUE/FALSE trade-off, posStats(), the lollipop view, and the rest of the companion plotters:

vignette("msaviz")

About

MSA Visualizer in R

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages