Merged
Conversation
for west-nile in values.yaml
anna-parker
reviewed
Apr 7, 2026
anna-parker
reviewed
Apr 7, 2026
legacy direct submissions without hostIdentifier)
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
reviewed
Apr 8, 2026
anna-parker
approved these changes
Apr 9, 2026
Contributor
|
We tested on staging and this is working well - using these functions in PPX should be done only after a migration of the originally submitted metadata to handle cases where the scientific name is invalid. We should wait till #6254 is merged to do this |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding host validation to preprocessing
Implementation
Host validation is implemented across three new processing functions:
validate_hostTakes unvalidated input, calls out to the taxonomy service to validate it, and returns a taxon ID when validation succeeds. If the input casts to an int, it is assumed to be a taxon id. Otherwise, it is assumed to be a scientific name. Responses to successful validations are cached intaxon_cache(see below).It is possible that multiple taxa have the same scientific name. In these cases,
we return the tax_id of the most generic taxon (i.e., the one that's closest to
the root of the taxonomy).
If the taxon ID or host name does not exists, we return
Noneand add a warning (ifis_insdc_ingest_group) or an error (for everyone else).scientific_name_from_idtakes a validated taxon ID and maps it to a scientific name. Responses to successful validations are cached intaxon_cache(see below).common_name_from_idtakes a validated taxon ID and maps it to a common name. If the input taxon has a common name itself, that taxon is returned. If it does not, the nearest ancestor with a common name is returned instead. Responses to successful validations are cached incommon_name_cache(see below).Caching
Successful requests to the taxonomy service are cached. Since the Number of distinct host organisms in a dataset are expected to be relatively constrained, this reduces the amount of network requests needed at the cost of keeping a small cache in memory.
The caching approach is implemented in the new class
RequestCache:This is now used to cache requests to the taxonomy service:
PR Checklist
🚀 Preview: https://prepro-hostname-validatio.loculus.org