Skip to content

Handling adapter sequences #101

@martinghunt

Description

@martinghunt

We need to handle adapter sequences.

Observed behaviour:

  1. amplicon N is ok, amplicon N+1 is dropped.
  2. reads at the end of amplicon N have adapters.
  3. a few bp of adapter are included in the consensus output by cylon, so the consensus is amplicon + (a little adapter sequence)
  4. the reads all map fine to the consensus, including the adapters because the consensus now has some adapter seq
  5. self-QC does not mask the adapters

Desired behaviour: the adapter sequence is masked. I think self-qc could do this. An alternative would be to remove adapters from the reads earlier on in the pipeline so they are never seen again.

Example is the same as in #99. Please also see #100 for more detail on the amplicon in question.

The adapter sequence is included in the consensus at the end of amplicon SARS-CoV-2_76. All the reads there end either with the primer, or with the primer plus a few bp of adapter. None of them should contribute to Clean.Tot.cons at the positions of the primer or past the end of the primer.

First columns of all_stats.tsv at the end of the primer -- start of adapter -- start of dropped amplicon:

23055    A         23030     A          335  0    2    3    2    342       335             7
23056    A         23031     A          322  0    2    0    0    324       322             2
23057    C         23032     C          1    309  0    4    2    316       309             7
23058    C         23033     A          170  1    0    0    21   192       170             22
23058    -         23034     G          2    2    181  0    1    186       181             5
23059    C         23035     C          1    171  0    4    1    177       171             6
23060    A         23036     A          169  2    0    0    0    171       169             2
23061    C         23037     A          164  0    0    0    0    164       164             0
23062    T         23038     T          0    0    2    155  0    157       155             2
23063    A         23039     A          141  2    0    0    0    143       141             2
23064    A         23040     N          0    0    0    0    0    0         0               0
23065    T         23041     N          0    0    0    0    0    0         0               0
23066    G         23042     N          0    0    0    0    0    0         0               0
23067    G         23043     N          0    0    0    0    0    0         0               0
23068    T         23044     N          0    0    0    0    0    0         0               0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions