Context
The updated MaveDB data model requires the mapper to output a MappingRecord + Allele rows at all applicable levels (genomic, coding, protein) for every variant — not just protein-level reverse translation. This is a broader interface change than the original scope of this issue.
Goal
Redesign the mapper output interface so that for each input variant, the mapper returns:
- A
MappingRecord draft carrying provenance:
vrs_digest — VRS digest of the pre-mapped (assayed-level) variant, extracted as a top-level field for indexing
pre_mapped JSONB (raw input blob)
assay_level, mapping_api_version, QC fields (at_mismatched_locus, near_gap)
- A list of
Allele draft objects — one per level — each containing:
level (enum: 'genomic' | 'coding' | 'protein')
transcript (NOT NULL — present for all levels)
- The appropriate HGVS field(s):
hgvs_g / hgvs_c / hgvs_p
vrs_digest (computed)
post_mapped JSONB (the raw mapper output blob for this allele at this level)
clingen_allele_id where already known (optional)
For protein-level score set targets, the existing reverse translation package is used to enumerate all coding variants encoding each protein change. Each produces a coding-level Allele draft. Non-protein-level targets (genomic, coding) produce no reverse-translated alleles but do produce alleles at all natively derivable levels.
Acceptance Criteria
- Mapper output schema includes a
MappingRecord draft and a list of Allele drafts per input variant
MappingRecord draft includes vrs_digest as a top-level field (not only buried in pre_mapped JSONB)
- All
Allele drafts carry level, transcript, the appropriate HGVS field, vrs_digest, and post_mapped JSONB
- Protein-level mapping runs produce coding and genomic
Allele drafts via reverse translation, in addition to the protein Allele draft
- Non-protein-level mappings produce alleles at all natively derivable levels
- Integration tests verify that mapping a protein variant returns the expected protein + coding allele drafts (one per codon variant per transcript)
- Integration tests verify non-protein-level mapping returns the expected allele set
- Downstream consumer (mavedb-api worker job, mavedb-api#740) updated to consume the new output schema
Context
The updated MaveDB data model requires the mapper to output a
MappingRecord+Allelerows at all applicable levels (genomic, coding, protein) for every variant — not just protein-level reverse translation. This is a broader interface change than the original scope of this issue.Goal
Redesign the mapper output interface so that for each input variant, the mapper returns:
MappingRecorddraft carrying provenance:vrs_digest— VRS digest of the pre-mapped (assayed-level) variant, extracted as a top-level field for indexingpre_mappedJSONB (raw input blob)assay_level,mapping_api_version, QC fields (at_mismatched_locus,near_gap)Alleledraft objects — one per level — each containing:level(enum:'genomic'|'coding'|'protein')transcript(NOT NULL — present for all levels)hgvs_g/hgvs_c/hgvs_pvrs_digest(computed)post_mappedJSONB (the raw mapper output blob for this allele at this level)clingen_allele_idwhere already known (optional)For protein-level score set targets, the existing reverse translation package is used to enumerate all coding variants encoding each protein change. Each produces a coding-level
Alleledraft. Non-protein-level targets (genomic, coding) produce no reverse-translated alleles but do produce alleles at all natively derivable levels.Acceptance Criteria
MappingRecorddraft and a list ofAlleledrafts per input variantMappingRecorddraft includesvrs_digestas a top-level field (not only buried inpre_mappedJSONB)Alleledrafts carrylevel,transcript, the appropriate HGVS field,vrs_digest, andpost_mappedJSONBAlleledrafts via reverse translation, in addition to the proteinAlleledraft