Skip to content

Initial upload of CVA16 dataset#412

Open
nneune wants to merge 24 commits intomasterfrom
enpen-cva16
Open

Initial upload of CVA16 dataset#412
nneune wants to merge 24 commits intomasterfrom
enpen-cva16

Conversation

@nneune
Copy link
Copy Markdown
Collaborator

@nneune nneune commented Mar 3, 2026

This pull request introduces a new Nextclade dataset for Coxsackievirus A16 (CVA16), based on the reference strain "G-10". It provides all the essential files and documentation required for lineage classification, phylogenetic analysis, and quality control of CVA16 sequences. The dataset is tailored for broad subgenogroup assignment and includes detailed metadata, genome annotation, and configuration for Nextclade compatibility.

It needs to be tested further by ENPEN and others.

@nneune nneune had a problem deploying to refs/pull/412/merge March 3, 2026 09:49 — with GitHub Actions Failure
@nneune nneune temporarily deployed to refs/pull/412/merge March 3, 2026 09:58 — with GitHub Actions Inactive
@nneune nneune temporarily deployed to refs/pull/412/merge March 3, 2026 11:25 — with GitHub Actions Inactive
@ivan-aksamentov
Copy link
Copy Markdown
Member

@nneune Great job as always! Technically looks good. I assume Emma or Richard will review the science.

My only complaint is a few typos (see below), and that the dataset path might not accommodate future needs: there will permanently be only one dataset under enpen/enterovirus/cva16 path, only for this particular strain and nothing else. But it's up to you guys to decide how to structure.

I've been trying these new Ai-assisted dataset reviews. Posting in case useful:


⚠️ AI-generated content below. Verify all claims.

Testing

Try in Nextclade Web:

Science

Background on the pathogen, its classification, epidemiology, and the reference strains used in this dataset. Provides context for evaluating dataset design decisions.

Coxsackievirus A16 biology and classification [click to expand]

Coxsackievirus A16 (CVA16) is a member of species Enterovirus A in the family Picornaviridae. It is one of the major causative agents of hand, foot, and mouth disease (HFMD), primarily affecting children under 5 (Sun et al., J Clin Microbiol 2014). The positive-sense single-stranded RNA genome is approximately 7,410 nt, encoding a single polyprotein cleaved into structural proteins (VP4, VP2, VP3, VP1) and non-structural proteins (2A, 2B, 2C, 3A, 3B, 3C, 3D), flanked by 5' and 3' UTRs (Xu et al., Front Microbiol 2025).

The reference genome in this dataset, G-10 (GenBank U05876.1, 7413 nt), is the prototype CVA16 strain isolated in South Africa in 1951. It represents genotype A, the sole member of this clade (Sun et al., J Clin Microbiol 2014). As the README correctly notes, G-10 differs substantially from currently circulating strains, which belong to genotype B sublineages.

VP1 is the standard molecular target for enterovirus typing and subgenogroup classification (Sun et al., J Clin Microbiol 2014), consistent with the dataset's defaultCds: "VP1" setting.

Subgenogroup classification [click to expand]

CVA16 phylogeny based on VP1 defines genotypes A, B, and D (Sun et al., J Clin Microbiol 2014). Genotype B is divided into B1 and B2, with B1 further split into B1a, B1b, and B1c clusters at 6.6-8.0% genetic distance (Zeng et al., Viruses 2025). Recombinant forms (sometimes labeled C-F) are described in Han et al., Virus Evol 2024.

The dataset tree includes clades: A, B1, B1a, B1b, B1c, C, D, E, F, RFs, and unassigned. The README describes clades C-F as "recombinant forms" that cluster with the prototype strain (clade A), also known as B2, B3, and D in alternative nomenclatures. This is consistent with the recombination-driven genotype evolution described in Han et al., Virus Evol 2024.

B1a and B1b co-circulated globally for decades; B1b became dominant in some regions after 2020 (Xu et al., Front Microbiol 2025). B1c, first reported in Southeast Asia and Europe after 2000, surged in China since 2023-2024 (Zeng et al., Viruses 2025) and was recently detected in Thailand for the first time in 2023 (Taoma et al., Microbiol Resour Announc 2025). The tree reflects this diversity with B1a (733 nodes), B1b (483), B1c (195), B1 (80), and smaller representation of D (43), C (5), RFs (4), F (3), E (3), and A (1).

ENPEN and enterovirus Nextclade datasets [click to expand]

The European Non-Polio Enterovirus Network (ENPEN), under the European Society for Clinical Virology, coordinates enterovirus surveillance across 20+ European countries (Harvala et al., Microorganisms 2021). A 2025 study in The Lancet Regional Health - Europe analyzed 63,659 samples from 48 countries (2015-2022), with ENPEN contributing 85% of typed non-polio enterovirus data (Harvala et al., Lancet Reg Health Eur 2025).

This CVA16 dataset is the second ENPEN enterovirus dataset for Nextclade, following EV-D68. The build pipeline is available at enterovirus-phylo/nextclade_a16, adapted from the EV-D68 pipeline template. The same team (Neuner-Jehle, Gonzalez-Sanchez, Hodcroft) maintains both datasets.

Blocking issues

Issues affecting scientific correctness, data integrity, or user-facing accuracy. These block adoption of the dataset until addressed.

🔴 H1. Dataset path naming inconsistent with sibling dataset [click to expand]

The dataset path enpen/enterovirus/cva16 uses an undashed name, while the sibling EV-D68 dataset uses enpen/enterovirus/ev-d68 (dashed). Dataset paths are immutable after release and cannot be changed. The curation guide recommends consistency: "choose between 'flu' and 'influenza', stick to it."

The EV-D68 dataset is already released (3 versions) and its flat convention (no reference accession suffix) is locked. The reference suffix question is moot for CVA16 if ENPEN follows the same pattern.

Effect: Once released, the path cannot be renamed. cva16 vs ev-d68 sets an inconsistent naming pattern in the ENPEN collection.

Fix: Consider enpen/enterovirus/cva-16 (with dash, matching ev-d68 convention).

Non-blocking issues

Cosmetic issues, minor inconsistencies, and documentation improvements. Fix if time allows.

🟡 M1. Typo "Cocksackievirus" in README [click to expand]

data/enpen/enterovirus/cva16/README.md:28:1: the subheading reads "Subgenogroups of Cocksackievirus A16" instead of "Coxsackievirus".

Fix: Change "Cocksackievirus" to "Coxsackievirus" on line 28.

🟡 M2. `reference name` attribute duplicates pathogen name [click to expand]

In data/enpen/enterovirus/cva16/pathogen.json:15:1:, the attributes["reference name"] is set to "Coxsackievirus A16", identical to attributes.name. This does not help users distinguish the reference strain from the pathogen name.

The EV-D68 dataset uses the FASTA header description: "Human enterovirus 68 strain Fermon, complete genome.". The CVA16 reference FASTA header is >U05876.1 coxsackievirus A16 G-10, complete genome. A more informative value would be "G-10" or "Coxsackievirus A16 strain G-10".

The data_output/index.json also inherits this duplicated name.

Fix: Change "reference name" to "G-10" or "Coxsackievirus A16 strain G-10" in pathogen.json.

🟡 M3. Consider `experimental` flag given testing status [click to expand]

The PR description states: "It needs to be tested further by ENPEN and others." The dataset has no experimental flag set, and the generated index shows "enabled": true.

If the dataset is intended for broader testing before full release, setting "experimental": true in pathogen.json would signal this to users. This is consistent with the "unreleased" version tag already in use.

Fix: Consider adding "experimental": true to pathogen.json if the dataset is not yet ready for production use.

🟡 M4. Three files missing trailing newlines [click to expand]

The following files lack a trailing newline:

  • data/enpen/enterovirus/cva16/CHANGELOG.md
  • data/enpen/enterovirus/cva16/README.md
  • data/enpen/enterovirus/cva16/tree.json

Fix: Add a trailing newline to each file.

🔵 L1. No citation section in README [click to expand]

The EV-D68 README includes a "Citation" section with a recommended citation. The CVA16 README omits this.

Fix: Add a citation section referencing the dataset authors and the workflow repository, consistent with the EV-D68 README.

Notes

Observations that require no action: correct design decisions, positive patterns, comparisons with related work, and future improvement ideas.

Click to expand
  • All 11 CDS regions in genome_annotation.gff3 have lengths divisible by 3 and zero ambiguous bases. Coordinates match between GFF3 and tree.json meta.genome_annotations exactly. The CDS regions represent enterovirus polyprotein cleavage products: only VP4 has an ATG start codon (polyprotein start) and only 3D is followed by a TAG stop codon. This is correct for a picornavirus single-ORF genome.
  • Reference U05876.1 is 7413 nt with a 750 nt 5' UTR, 6579 nt coding region (VP4 through 3D), and 84 nt 3' UTR, matching expected CVA16 genome organization.
  • The reference sequence U05876 is present as a leaf node in the tree. This means Nextclade will not accumulate spurious private mutations when analyzing the reference itself.
  • The tree contains 777 leaf nodes across 11 clades, spanning dates 1997 to 2024. Clade distribution (B1a: 733, B1b: 483, B1c: 195) reflects current global CVA16 epidemiology with B1a/B1b dominance and emerging B1c (Zeng et al., Viruses 2025).
  • The Static Inferred Ancestor approach (via outgroup rooting) is well-documented in meta.extensions.nextclade and the README, matching the EV-D68 dataset pattern.
  • defaultCds: "VP1" is appropriate - VP1 is the standard target for enterovirus molecular typing (Sun et al., J Clin Microbiol 2014).
  • QC thresholds are reasonable for a ~7.4 kb genome: missingDataThreshold: 1000 (~13.5%), privateMutations cutoff 120 (~1.6%), maxDivergence: 0.15.
  • 21/35 example sequences are tree leaves, 14 placed at runtime. This is expected behavior for Nextclade.
  • No alignmentPreset or ignoredFrameShifts are set (see dynamic validation below for frameshift observations).
  • The snpClusters QC rule (absent in EV-D68) is enabled with windowSize: 100, clusterCutOff: 4.
  • The mutLabels section contains extensive nucleotide mutation labels mapping mutations to subgenogroups (B1a, B1b, B1c, B1, C, D, E, F), enabling lineage-defining mutation annotation.
  • The compatibility field uses "3.0.0" (matching EV-D68), not "3.0.0-alpha.0".
  • Workflow source repository enterovirus-phylo/nextclade_a16 exists and is publicly accessible.

Nextclade CLI run

Nextclade CLI run via Docker (nextstrain/nextclade) against the dataset.

Reference sequence [click to expand]

The reference (U05876.1) passes all QC checks with overall status good (score 0). Zero private mutations, zero frameshifts, zero stop codons, zero missing data. Clade assigned: A. This confirms the reference is correctly represented in the tree as a leaf node.

Example sequences (35 total) [click to expand]
QC status Count Fraction
good 23 65.7%
mediocre 2 5.7%
bad 10 28.6%

Clade assignments: B1b (12), B1a (9), B1c (7), B1 (5), C (1), D (1). All 35 received a clade assignment.

Private mutations: min=0, max=377, mean=51.5. 5/35 exceed the threshold of 120. Three sequences (PX448982, PX448985, PX448978) have >200 private mutations, suggesting they are divergent from the nearest tree node.

Frameshifts: 4/35 have frameshifts. No ignoredFrameShifts are configured, so all trigger QC warnings:

Sequence CDS Codon range
PX448822 VP1 165-297
PX448982 VP2 160-254
PX449037 2A 135-150
PX448850 VP3 34-242

These are large frameshifts spanning most of each CDS. They are more likely sequencing artifacts or incomplete sequences than biological frameshifts. If they are expected in the example set, adding ignoredFrameShifts entries would suppress the QC warnings.

SNP clusters: 5/35 flagged as bad, 3/35 as mediocre. The snpClusters rule (windowSize: 100, clusterCutOff: 4) flags sequences with concentrated mutations. This is expected for divergent examples and does not indicate misconfiguration.

Stop codons: 1/35 has a premature stop codon.

Missing data: 0 across all 35 sequences.

@ivan-aksamentov
Copy link
Copy Markdown
Member

🔴 H1. Dataset path naming inconsistent with sibling dataset

Claude is being a bit too dramatic here 😆

@nneune
Copy link
Copy Markdown
Collaborator Author

nneune commented Mar 4, 2026

Oh wow, the AI did a great job at summarizing the science behind CVA16. For the "dataset path naming" inconsistency, CVA16 is actually the official name and not CV-A16 (see Simmonds et al., 2020). I'll correct the typos!

@ivan-aksamentov ivan-aksamentov deployed to refs/heads/enpen-cva16 March 4, 2026 17:27 — with GitHub Actions Active
Copy link
Copy Markdown
Collaborator Author

@nneune nneune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issues exist as with the EV-D68 dataset. Divergence is not a valid QC label, and the nucMutLabelMapReverse is deprecated.

Copilot AI review requested due to automatic review settings April 19, 2026 09:49
@nneune nneune had a problem deploying to refs/heads/enpen-cva16 April 19, 2026 09:49 — with GitHub Actions Error
@nneune nneune temporarily deployed to refs/pull/412/merge April 19, 2026 09:49 — with GitHub Actions Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an ENPEN Nextclade dataset for Coxsackievirus A16 (CVA16) and wires it into the ENPEN collection so it appears in the dataset server index.

Changes:

  • Adds a new dataset under data/enpen/enterovirus/cva16/ (reference, annotation, config, tree, examples, README/CHANGELOG).
  • Updates the ENPEN collection ordering to include enpen/enterovirus/cva16.
  • Adds corresponding generated artifacts under data_output/ including the dataset bundle and an entry in data_output/index.json.

Reviewed changes

Copilot reviewed 10 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
data/enpen/collection.json Adds CVA16 to ENPEN dataset ordering.
data/enpen/enterovirus/cva16/pathogen.json Introduces CVA16 dataset configuration and QC/clade metadata.
data/enpen/enterovirus/cva16/reference.fasta Adds the CVA16 reference sequence used for alignment/mutation calling.
data/enpen/enterovirus/cva16/genome_annotation.gff3 Adds genome annotation for coding regions.
data/enpen/enterovirus/cva16/README.md Documents dataset scope/reference choice and usage notes.
data/enpen/enterovirus/cva16/CHANGELOG.md Adds initial release notes under ## Unreleased.
data/enpen/enterovirus/cva16/sequences.fasta Adds example sequences for the dataset.
data/enpen/enterovirus/cva16/tree.json Adds the reference phylogeny used for placement/clade assignment.
data_output/index.json Registers the CVA16 dataset in the generated dataset index.
data_output/enpen/enterovirus/cva16/unreleased/reference.fasta Generated reference FASTA for the unreleased dataset version.
data_output/enpen/enterovirus/cva16/unreleased/genome_annotation.gff3 Generated genome annotation for the unreleased dataset version.
data_output/enpen/enterovirus/cva16/unreleased/pathogen.json Generated pathogen config for the unreleased dataset version.
data_output/enpen/enterovirus/cva16/unreleased/tree.json Generated tree for the unreleased dataset version.
data_output/enpen/enterovirus/cva16/unreleased/sequences.fasta Generated example sequences for the unreleased dataset version.
data_output/enpen/enterovirus/cva16/unreleased/README.md Generated README displayed to dataset users.
data_output/enpen/enterovirus/cva16/unreleased/CHANGELOG.md Generated changelog displayed to dataset users.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread data_output/index.json
Comment on lines +1 to +2
>ancestral_sequence
TTAAAACAGCCTGTGGGTTGTTCCCACCCACAGGGCCCACTGGGCGCTAGCACACTGATT
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated data_output dataset uses U05876.1 as the reference (see FASTA header), which conflicts with the data/ dataset’s reference ID (ancestral_sequence). Since data_output/ should be produced from data/, please resolve the reference choice in data/ and rerun ./scripts/rebuild to regenerate data_output/enpen/enterovirus/cva16/unreleased/* consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +17
##sequence-region U05876.1 1 7413
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=31704
U05876.1 Genbank region 1 7413 . + . ID=U05876.1:1..7413;Dbxref=taxon:31704;gb-acronym=CV-A16;gbkey=Src;mol_type=genomic RNA;nat-host=Homo sapiens;strain=G-10
U05876.1 Genbank CDS 751 957 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAA50478.1:1..69
U05876.1 Genbank CDS 958 1719 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAA50478.1:70..323
U05876.1 Genbank CDS 1720 2445 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAA50478.1:324..565
U05876.1 Genbank CDS 2446 3336 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAA50478.1:566..862
U05876.1 Genbank CDS 3337 3786 . + . Name=2A;product=2A;gbkey=Prot;ID=id-AAA50478.1:863..1012
U05876.1 Genbank CDS 3787 4083 . + . Name=2B;product=2B;gbkey=Prot;ID=id-AAA50478.1:1013..1111
U05876.1 Genbank CDS 4084 5070 . + . Name=2C;product=2C;gbkey=Prot;ID=id-AAA50478.1:1112..1440
U05876.1 Genbank CDS 5071 5328 . + . Name=3A;product=3A;gbkey=Prot;ID=id-AAA50478.1:1441..1526
U05876.1 Genbank CDS 5329 5394 . + . Name=3B;product=3B;gbkey=Prot;ID=id-AAA50478.1:1527..1548
U05876.1 Genbank CDS 5395 5943 . + . Name=3C;product=3C;gbkey=Prot;ID=id-AAA50478.1:1549..1731
U05876.1 Genbank CDS 5944 7329 . + . Name=3D;product=3D;gbkey=Prot;ID=id-AAA50478.1:1732..2193
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this dataset, the GFF3 seqid is U05876.1, but the reference FASTA header in reference.fasta is ancestral_sequence. Nextclade expects the genome annotation seqid values to match the reference sequence ID; otherwise CDS translation/annotation lookup can fail. Please make the FASTA ID and all first-column GFF3 IDs consistent (either rename the FASTA header to U05876.1 or change the GFF3 seqid/##sequence-region to ancestral_sequence).

Suggested change
##sequence-region U05876.1 1 7413
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=31704
U05876.1 Genbank region 1 7413 . + . ID=U05876.1:1..7413;Dbxref=taxon:31704;gb-acronym=CV-A16;gbkey=Src;mol_type=genomic RNA;nat-host=Homo sapiens;strain=G-10
U05876.1 Genbank CDS 751 957 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAA50478.1:1..69
U05876.1 Genbank CDS 958 1719 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAA50478.1:70..323
U05876.1 Genbank CDS 1720 2445 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAA50478.1:324..565
U05876.1 Genbank CDS 2446 3336 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAA50478.1:566..862
U05876.1 Genbank CDS 3337 3786 . + . Name=2A;product=2A;gbkey=Prot;ID=id-AAA50478.1:863..1012
U05876.1 Genbank CDS 3787 4083 . + . Name=2B;product=2B;gbkey=Prot;ID=id-AAA50478.1:1013..1111
U05876.1 Genbank CDS 4084 5070 . + . Name=2C;product=2C;gbkey=Prot;ID=id-AAA50478.1:1112..1440
U05876.1 Genbank CDS 5071 5328 . + . Name=3A;product=3A;gbkey=Prot;ID=id-AAA50478.1:1441..1526
U05876.1 Genbank CDS 5329 5394 . + . Name=3B;product=3B;gbkey=Prot;ID=id-AAA50478.1:1527..1548
U05876.1 Genbank CDS 5395 5943 . + . Name=3C;product=3C;gbkey=Prot;ID=id-AAA50478.1:1549..1731
U05876.1 Genbank CDS 5944 7329 . + . Name=3D;product=3D;gbkey=Prot;ID=id-AAA50478.1:1732..2193
##sequence-region ancestral_sequence 1 7413
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=31704
ancestral_sequence Genbank region 1 7413 . + . ID=U05876.1:1..7413;Dbxref=taxon:31704;gb-acronym=CV-A16;gbkey=Src;mol_type=genomic RNA;nat-host=Homo sapiens;strain=G-10
ancestral_sequence Genbank CDS 751 957 . + . Name=VP4;gbkey=Prot;product=VP4;ID=id-AAA50478.1:1..69
ancestral_sequence Genbank CDS 958 1719 . + . Name=VP2;gbkey=Prot;product=VP2;ID=id-AAA50478.1:70..323
ancestral_sequence Genbank CDS 1720 2445 . + . Name=VP3;gbkey=Prot;product=VP3;ID=id-AAA50478.1:324..565
ancestral_sequence Genbank CDS 2446 3336 . + . Name=VP1;gbkey=Prot;product=VP1;ID=id-AAA50478.1:566..862
ancestral_sequence Genbank CDS 3337 3786 . + . Name=2A;product=2A;gbkey=Prot;ID=id-AAA50478.1:863..1012
ancestral_sequence Genbank CDS 3787 4083 . + . Name=2B;product=2B;gbkey=Prot;ID=id-AAA50478.1:1013..1111
ancestral_sequence Genbank CDS 4084 5070 . + . Name=2C;product=2C;gbkey=Prot;ID=id-AAA50478.1:1112..1440
ancestral_sequence Genbank CDS 5071 5328 . + . Name=3A;product=3A;gbkey=Prot;ID=id-AAA50478.1:1441..1526
ancestral_sequence Genbank CDS 5329 5394 . + . Name=3B;product=3B;gbkey=Prot;ID=id-AAA50478.1:1527..1548
ancestral_sequence Genbank CDS 5395 5943 . + . Name=3C;product=3C;gbkey=Prot;ID=id-AAA50478.1:1549..1731
ancestral_sequence Genbank CDS 5944 7329 . + . Name=3D;product=3D;gbkey=Prot;ID=id-AAA50478.1:1732..2193

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this step really necessary?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not necessary. I mean it would be nice to have for consistency (and also in pathogen.json), but there are dozens of datasets which have these values all over the place. Don't bother. Hope users will undesstand. Might add an automated check later.

Comment on lines +14 to +18
This dataset uses the [Static Inferred Ancestor](https://github.com/enterovirus-phylo/nextclade_a16/blob/master/resources/inferred-root.fasta) instead of the historical G-10 prototype sequence ([U05876.1](https://www.ncbi.nlm.nih.gov/nuccore/U05876)). It is intended for broad subgenogroup classification, mutation quality control, and phylogenetic analysis of CVA16 diversity.

*Note: The G-10 reference differs substantially from currently circulating strains.* This is common for enterovirus datasets, in contrast to some other virus datasets (e.g., seasonal influenza), where the reference is updated more frequently to reflect recent lineages.

To address this, the dataset is *rooted* on a Static Inferred Ancestor, a phylogenetically reconstructed ancestral sequence near the tree root. This provides a stable reference point that can be used as an alternative for mutation calling.
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset README states the dataset uses a “Static Inferred Ancestor” instead of the G-10 prototype, but the accompanying genome annotation (U05876.1) and the generated data_output/ dataset currently indicate G-10/U05876.1 as the reference. Please clarify which reference sequence is intended and update README and dataset metadata consistently (README, pathogen.json attributes, reference FASTA header/accession).

Copilot uses AI. Check for mistakes.
Comment thread data/enpen/enterovirus/cva16/README.md Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@nneune nneune had a problem deploying to refs/heads/enpen-cva16 April 19, 2026 10:30 — with GitHub Actions Error
@nneune nneune temporarily deployed to refs/pull/412/merge April 19, 2026 10:30 — with GitHub Actions Inactive
@nneune nneune had a problem deploying to refs/heads/enpen-cva16 April 19, 2026 11:15 — with GitHub Actions Error
@nneune nneune temporarily deployed to refs/pull/412/merge April 19, 2026 11:15 — with GitHub Actions Inactive
@nneune nneune had a problem deploying to refs/heads/enpen-cva16 April 20, 2026 08:42 — with GitHub Actions Error
@nneune nneune temporarily deployed to refs/pull/412/merge April 20, 2026 08:42 — with GitHub Actions Inactive
@nneune nneune had a problem deploying to refs/heads/enpen-cva16 April 20, 2026 08:51 — with GitHub Actions Error
@nneune nneune deployed to refs/pull/412/merge April 20, 2026 08:51 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants