Skip to content

Expand cattle-outbreak beyond B3.13 #139

@jameshadfield

Description

@jameshadfield

With the recent move to B3.13 filtering defining the cattle-outbreak genome build we are not able to include strains with fewer than 8 sequenced segments (and thus the implementation in #111 is outdated). Furthermore we're going to drop some strains because their genoFLU calls aren't B3.13. Comparing the last successful cattle-flu dataset we are going to drop the following strains due to not being B3.13:

$ cat data/ncbi/metadata.tsv | csvtk grep -t -f strain -P auspice.strains.tsv | csvtk cut -t -f strain,genoflu | grep -v B3.13 | csvtk pretty -t
strain                                  genoflu                                                                           
-------------------------------------   ----------------------------------------------------------------------------------
A/cattle/Texas/24-009499-002/2024       Not assigned: No Matching Genotypes                                               
A/cattle/Texas/24-009308-003/2024       Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/NewMexico/24-010195-004/2024   Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_MD_041/2024     Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_ME_003/2024     Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Idaho/Broad_ME_018/2024        Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Idaho/Broad_ME_020/2024        Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_MF_011/2024     Not assigned: Only 5 segments >98.0% match found of total 8 segments in input file
A/cattle/Missouri/Broad_MD_031/2024     Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/cattle/Texas/Broad_MD_027/2024        Not assigned: Only 5 segments >98.0% match found of total 8 segments in input file
A/cattle/Colorado/Broad_MF_016/2024     Not assigned: Only 6 segments >98.0% match found of total 8 segments in input file
A/cattle/Michigan/Broad_ME_010/2024     Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-031346-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-032636-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-034010-002/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-034010-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/Cattle/USA/24-033997-001/2024         Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/PETFOOD/USA/24-037325-013/2024        Not assigned: Only 7 segments >98.0% match found of total 8 segments in input file
A/PETFOOD/USA/24-037325-012/2024        Not assigned: Only 5 segments >98.0% match found of total 8 segments in input file

Including full genome strains which don't match B3.13

We may wish to relax the 98% cutoff. Looking at some of those examples above the number of Ns is perhaps behind their exclusion:

  • A/cattle/Texas/24-009499-002/2024 has 4.5kb of Ns on the branch leading to it, although few mutations indicating that it is likely to be part of the outbreak
  • A/cattle/Texas/24-009308-003/2024 - similarly - 4.5kb of Ns

Including strains with fewer than 8 segments sequenced

If we modify GenoFLU to report segment-level calls for strains with <8 segments then we can match on (e.g) "7 segments sequenced and all agree with B3.13 constellation". This improvement to GenoFLU was also mentioned here as being desirable more generally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions