Conversation
I've hit a 20-character limit on auto-generated contig IDs in the released prokka 1.11 archive, and started looking for the source of this limit. I have only managed to find 2 SeqID length limits: 25 in Sequin documentation, and 41 in the "annotation pipeline readme"; NCBI's sample GenBank record (no longer?) mentions any specific limits on LOCUS name length. It might be useful to have some limit-related URLs next to the limit definition, so that it is easier to set a new reasonable limit when the default is not suitable for some reason.
|
Hi @spock, this might be useful for others: There's an issue (#76) discussing Prokka's MAXCONTIGIDLEN. I would actually prefer a more informative error message from Prokka. I don't think many users will look in the source code, but still nice to have those URLs. The problem is, that GBK, GFF and SQN files all have different restrictions on the SeqID length ... |
|
Hi @aleimba, thank you for the issue link - I forgot to search the issues before requesting :( The suggested URLs do cover GBK and SQN cases, and according to @tseemann GFF has no limit on SeqID. Not sure how to improve the error message, though. |
|
"Previously", I added also the max. characters for the locus_tag: err("Genbank contig IDs are $contig_name_len chars, must be <= $MAXCONTIGIDLEN. Prefix is '$contigprefix', locus_tag has to be <= 6.");But that was before @tseemann mentioned the different GBK, GFF, SQN specs. So, probably doesn't make sense and I'm guessing @tseemann will replace Another option flag might be an idea (although Prokka already has quite many) and mentioning it in the error message. And then a mention of potentially broken GBKs ... Definitely another commit ;-). Nevertheless, I'm guessing at least all the SPAdes users are running into the MAXCONTIGIDLEN problem. |
I've hit a 20-character limit on auto-generated contig IDs in the released prokka 1.11 archive, and started looking for the source of this limit.
I have only managed to find 2 SeqID length limits: 25 in Sequin documentation, and 41 in the "annotation pipeline readme"; NCBI's sample GenBank record (no longer?) mentions any specific limits on LOCUS name length.
It might be useful to have some limit-related URLs next to the limit definition, so that it is easier to set a new reasonable limit when the default is not suitable for some reason.