Skip to content

IMP: add alignment visualizer#90

Open
fethalen wants to merge 2 commits intoqiime2:devfrom
fethalen:alignment-visualizer
Open

IMP: add alignment visualizer#90
fethalen wants to merge 2 commits intoqiime2:devfrom
fethalen:alignment-visualizer

Conversation

@fethalen
Copy link
Contributor

This PR addresses issue #15 by introducing a visualizer for multiple sequence alignments (MSAs).

Features

  • One table describing basic statistics of the MSA
    • Number of sequences
    • Total alignment length
    • Overall gaps
    • Mean GC-content
  • One table containing per-sequence statistics
    • Sequence ID
    • Ungapped sequence length
    • Number of gaps
    • GC-content
  • Additionally, BLAST links and FASTA downloads are generated for each sequence
  • MSA figure is downloadable in 4 formats
    • PNG
    • JPEG
    • SVG
    • PDF
  • By default, the resolution is set to 300 DPI and is adjustable by the user
  • Sequences are wrapped at position 80 but this setting can be changed as well

Dependencies

In addition to the dependencies already found in q2-alignment, the MSA visualizer requires pyMSAviz for plotting the alignments. I tested the MSA visualizer using pyMSAviz v0.5.0.

Technical Details

The tables are displayed using DataTables. I used a local installation and did not include jQuery as a dependency.

Screenshots

1_msa_visualizer_screenshot_2025-08-26 2_msa_visualizer_screenshot_2025-08-26 3_msa_visualizer_screenshot_2025-08-26

@gregcaporaso
Copy link
Member

gregcaporaso commented Sep 4, 2025

@fethalen - very cool!

Would you mind running this on some real data so we can see how this scales to the size of data we're typically working with? The gut-to-soil tutorial data is probably a good start - you'd need to align the FeatureData[Sequence] artifact, (called asv-seqs.qza in the tutorial) as we don't currently align in that tutorial, and then run that through the new visualizer. That data is subsampled so it won't be massive, and is therefore probably a good intermediate test. If that works well, it might then be a good idea to try it on a much larger dataset, just so we get a feel for how this is going to work for users in practice.

@gregcaporaso gregcaporaso moved this from Needs Triage to Awaiting Info in QIIME 2 - Triage 🚑 Sep 4, 2025
@gregcaporaso
Copy link
Member

@fethalen, just wanted to ping you on this. We'd love to get this in to the 2025.10 release but will need it merged by end of your day on 17 October to make that work. We just want to make sure it works well on the types of alignments that our users will be working with in practice before then.

cc @misialq

@fethalen
Copy link
Contributor Author

fethalen commented Oct 2, 2025

@fethalen, just wanted to ping you on this. We'd love to get this in to the 2025.10 release but will need it merged by end of your day on 17 October to make that work. We just want to make sure it works well on the types of alignments that our users will be working with in practice before then.

cc @misialq

Thanks for the reminder, @gregcaporaso. I did test the visualizer on the data that you suggested but, as one could have suspected, pyMSAviz won't render alignments that large (~16,000 sequences). I therefore need to replace this package with something that can show a scrollable view of alignments and that scales well with large datasets. Depending on how easy it will be to integrate this new package, I might be able to get it done before the release.

@gregcaporaso
Copy link
Member

gregcaporaso commented Oct 9, 2025

Thanks @fethalen!

If it looks like it won't be possible in time for the release, another option would be to do some assessment based on that data of how large of an alignment could be used. You could integrate that information into the documentation for the command, and when the user provides an alignment it could be validated to confirm that it has at most the maximum number of sequences, and raise an error if there are too many.

Update: You should be able to use qiime feature-table filter-seqs for this, though probably makes the most sense to do this using the Python API (e.g., get the list of all ids, and then select a random sample of those to provide using the metadata parameter).

@lizgehret
Copy link
Member

Hey @fethalen! Just checking in on this one.

@fethalen
Copy link
Contributor Author

Hi @lizgehret, thanks for checking in. I was working on another task which took priority but I will get back to this one soon. I will try to see if MSAViewer would be better suited for this task, as it doesn't create a single figure but is an actual alignment viewer in your browser.

@lizgehret
Copy link
Member

Hi @lizgehret, thanks for checking in. I was working on another task which took priority but I will get back to this one soon. I will try to see if MSAViewer would be better suited for this task, as it doesn't create a single figure but is an actual alignment viewer in your browser.

No worries, thanks for the update! Just wanted to make sure you weren't waiting on a review from anyone on our team. Just let us know when you want some eyes on this!

@cherman2 cherman2 moved this to Needs Review in 2026.1 ❄️ Jan 6, 2026
@cherman2 cherman2 moved this from Needs Review to In Development in 2026.1 ❄️ Jan 6, 2026
@cherman2
Copy link
Contributor

cherman2 commented Jan 6, 2026

I misunderstood what alignment PR we were talking about in our meeting. I put this back in "in development". Sorry for the confusion.

@github-project-automation github-project-automation bot moved this to Backlog in 2026.4 🌱 Jan 30, 2026
@lizgehret lizgehret moved this from Backlog to In Development in 2026.4 🌱 Jan 30, 2026
@lizgehret
Copy link
Member

Hey @fethalen sorry for spamming you with github notifications, I'm just moving things around project boards with the new dev cycle. Just let us know when you are ready for a review on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Development

Development

Successfully merging this pull request may close these issues.

5 participants