Enhancement: clarify the element formats tighter specification NEW in qiime2_core__tools__import/2023.5.0+dist.h193f7cc9.3

https://github.com/qiime2/galaxy-tools/tree/main/tools/suite_qiime2_core__tools
toolshed.g2.bx.psu.edu/repos/q2d2/qiime2_core__tools__import/qiime2_core__tools__import/2023.5.0+dist.h193f7cc9.3

Hello! The latest wrapper update is great! It might need a small documentation update in Galaxy to reflect the newer usage. A few users have noticed that the format for element identifiers has become slightly more strict. In practical use, this is the format of the Illumina fastq sequence files that is parsed into a list collection, then later consumed by qimme2 import. 

For an example, please see the discussion yesterday here: https://help.galaxyproject.org/t/qiime2-2025-10-errors/16577

Now, "real" data would already have this stricter formatting, but we have many people working in Galaxy for exploratory reasons, and importantly, _many instructors with downsampled data used for teaching purposes_. They are newly having problems with confusing errors. If we could clarify this update to the format a bit more, it would very helpful! We want them to pass through this first step as easily as possible and use the package.

I don't think the change is a bug! Stricter format is fine. My hope is that we could make the requirements clearer.

**How the tool can error** 

This is from an example I was reviewing from an instructor. 

> A pair of paired-end files were found not to have the same number of records. /corral4/main/jobs/XXX/XXX/XXXXXX/working/q2galaxy-importx7a1ak5m/**4_S4_L001_R2_001.fastq.gz** has 131348 records. /corral4/main/jobs/XXX/XXX/XXXXXX/working/q2galaxy-importx7a1ak5m/**24_S24_L001_R2_001.fastq.gz** has 372128 records.

Notice how the tool is attempting to "match up" sample `4_S4` and `24_S24`. The sample ID seems to be truncated when creating the qiime2 index based on the collection's element identifiers. 

**Detail**

This recommendation for the original sample formatting

`.+_.+_L[0-9][0-9][0-9]_R[12]_001.fastq.gz`

The update requires that the values contained in .+ expressions need to be a consistent character length for all in the same Qiime2 Import batch.

Meaning, a group like this will result in an error.

```
1_s1_L001_R1_001.fastq.gz
2_s2_L001_R1_001.fastq.gz
11_s11_L001_R1_001.fastq.gz
```

But padding out the values to all be the same character length like this works.

```

01_s01_L001_R1_001.fastq.gz
02_s02_L001_R1_001.fastq.gz
11_s11_L001_R1_001.fastq.gz

```

But -- I've also noticed that if the first item in the list has the _longest padding length_, that will also work. Or, at least as far as the import step! Meaning, this will also work.

```
11_s11_L001_R1_001.fastq.gz
1_s1_L001_R1_001.fastq.gz
2_s2_L001_R1_001.fastq.gz
```

I'm not sure if those samples could be mixed up with later tools? I saw this topic about something similar that may be related. https://github.com/qiime2/galaxy-tools/issues/50

**Enhancement request**

If this change was intentional (setting the variable character lengths based on the _first_ element in the list?), I think we should update the tool form help. Note: I didn't test whether the first .+ match is the root change, or if it is the second .+, or if it is both! 

This

> This data should be formatted as a FastqGzFormat. See the documentation below for more information. Elements must match regex: .+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz

To something like this

> This data should be formatted as a FastqGzFormat. Elements must match regex: .+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz. **All .+ matches must be padded to a consistent character length.**  See the documentation below for more information. 

Then, add more details to the Help section, maybe with an example. 

I can make a suggestion in a PR -- but first I wanted to confirm that this change was intentional. Thanks! 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: clarify the element formats tighter specification NEW in qiime2_coretoolsimport/2023.5.0+dist.h193f7cc9.3 #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhancement: clarify the element formats tighter specification NEW in qiime2_core__tools__import/2023.5.0+dist.h193f7cc9.3 #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Enhancement: clarify the element formats tighter specification NEW in qiime2_coretoolsimport/2023.5.0+dist.h193f7cc9.3 #86