Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
/build/
/data_dev*/
/data_local*
/data/
/data/*
!data/recomb/
/docs/build/
/e2e/cli/snapshots/
/e2e/cli/tmp/
Expand Down
18 changes: 18 additions & 0 deletions data/recomb/enpen/enterovirus/ev-d68/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## 2025-12-10T13:21:04Z

- Update alignment parameters in pathogen.json:
- Fix gap extension penalty
- Enable reverse-complement handling
- Recompute tree topology (ML tree rerun)
- Regenerate mutation labels for all clades
- Update reference example sequences

## 2025-11-20T19:02:04Z

Add citation information to README.md

## 2025-11-19T20:40:14Z

Initial release of an Enterovirus D68 dataset for lineage classification!

Read more about Nextclade datasets in the documentation: https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html
62 changes: 62 additions & 0 deletions data/recomb/enpen/enterovirus/ev-d68/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Enterovirus D68 dataset with reference Fermon

| Key | Value |
|----------------------|-----------------------------------------------------------------------|
| authors | [Nadia Neuner-Jehle](https://eve-lab.org/people/nadia-neuner-jehle/), [Alejandra González-Sánchez](https://www.vallhebron.com/en/professionals/alejandra-gonzalez-sanchez), [Emma B. Hodcroft](https://eve-lab.org/people/emma-hodcroft/), [ENPEN](https://escv.eu/european-non-polio-enterovirus-network-enpen/) |
| name | Enterovirus D68 |
| reference | [AY426531.1](https://www.ncbi.nlm.nih.gov/nuccore/AY426531.1) |
| workflow | https://github.com/enterovirus-phylo/nextclade_d68 |
| path | `enpen/enterovirus/ev-d68` |
| clade definitions | A–C (D) |

## Citation

If you use this dataset in your research, please cite:

> Neuner-Jehle, N., González Sánchez, A., Hodcroft, E. B., & European Non-Polio Enterovirus Network (ENPEN). (2025). *enterovirus-phylo/nextclade_d68: Enterovirus D68 Nextclade Dataset v1.0.0* (v1.0.0--2025-11-18). Zenodo. https://doi.org/10.5281/zenodo.17642338

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17642338.svg)](https://doi.org/10.5281/zenodo.17642338)

## Scope of this dataset

Based on full-genome sequences, this dataset uses the **Fermon reference sequence** ([AY426531.1](https://www.ncbi.nlm.nih.gov/nuccore/AY426531.1)), originally isolated in 1962. It serves as the basis for quality control, clade assignment, and mutation calling across global EV-D68 diversity.

*Note: The Fermon reference differs substantially from currently circulating strains.* This is common for enterovirus datasets, in contrast to some other virus datasets (e.g., seasonal influenza), where the reference is updated more frequently to reflect recent lineages.

To address this, the dataset is *rooted* on a Static Inferred Ancestor — a phylogenetically reconstructed ancestral sequence near the tree root. This provides a stable reference point that can be used, optionally, as an alternative for mutation calling.

## Features

This dataset supports:

- Assignment of subgenotypes
- Phylogenetic placement
- Sequence quality control (QC)

## Subgenogroups of Enterovirus D68

Clade designations follow the global diversity of EV-D68: A (A1–A2/D), B (B1–B3), and C. The label "pre-ABC" indicates old, basal lineages that are no longer circulating. Sequences labeled "pre-ABC" or "unassigned" may indicate sequencing or assembly issues and should be assessed carefully.

These designations are based on the phylogenetic structure and mutations, and are widely used in molecular epidemiology, similar to subgenotype systems for other enteroviruses. Unlike influenza (H1N1, H3N2) or SARS-CoV-2, there is no universal, standardized global lineage nomenclature for enteroviruses. Naming follows conventions from published studies and surveillance practices.

## Reference types

This dataset includes several reference points used in analyses:
- *Reference:* RefSeq or similarly established reference sequence. Here Fermon.

- *Parent:* The nearest ancestral node of a sample in the tree, used to infer branch-specific mutations.

- *Clade founder:* The inferred ancestral node defining a clade (e.g., A2, B3). Mutations "since clade founder" describe changes that define that clade.

- *Static Inferred Ancestor:* Reconstructed ancestral sequence inferred with an outgroup, representing the likely founder of EV-D68. Serves as a stable reference.

- *Tree root:* Corresponds to the root of the tree, it may change in future updates as more data become available.

All references use the coordinate system of the Fermon sequence.

## Issues & Contact
- For questions or suggestions, please [open an issue](https://github.com/enterovirus-phylo/nextclade_d68/issues) or email: eve-group[at]swisstph.ch

## What is a Nextclade dataset?

A Nextclade dataset includes the reference sequence, genome annotations, tree, clade definitions, and QC rules. Learn more in the [Nextclade documentation](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html).
18 changes: 18 additions & 0 deletions data/recomb/enpen/enterovirus/ev-d68/genome_annotation.gff3
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
##sequence-region AY426531.2 1 7367
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=42789
# seqname source feature start end score strand frame attribute
AY426531.1 Genbank region 1 7367 . + . ID=AY426531.1:1..7367;Dbxref=taxon:42789;country=USA;gb-acronym=EV-D68;gbkey=Src;mol_type=genomic RNA;note=prototype strain of Enterovirus 68;old-name=Enterovirus 68;strain=Fermon
AY426531.1 Genbank CDS 733 939 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAR98503.1:1..69
AY426531.1 Genbank CDS 940 1683 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAR98503.1:70..317
AY426531.1 Genbank CDS 1684 2388 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAR98503.1:318..552
AY426531.1 Genbank CDS 2389 3315 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAR98503.1:553..861
AY426531.1 Genbank CDS 3316 3756 . + . Name=2A;gbkey=Prot;product=2A;ID=id-AAR98503.1:862..1008
AY426531.1 Genbank CDS 3757 4053 . + . Name=2B;gbkey=Prot;product=2B;ID=id-AAR98503.1:1009..1107
AY426531.1 Genbank CDS 4054 5043 . + . Name=2C;gbkey=Prot;product=2C;ID=id-AAR98503.1:1108..1437
AY426531.1 Genbank CDS 5044 5310 . + . Name=3A;gbkey=Prot;product=3A;ID=id-AAR98503.1:1438..1526
AY426531.1 Genbank CDS 5311 5376 . + . Name=3B;gbkey=Prot;product=3B;ID=id-AAR98503.1:1527..1548
AY426531.1 Genbank CDS 5377 5925 . + . Name=3C;gbkey=Prot;product=3C;ID=id-AAR98503.1:1549..1731
AY426531.1 Genbank CDS 5926 7296 . + . Name=3D;gbkey=Prot;product=3D;ID=id-AAR98503.1:1732..2188
Loading