Skip to content

feat: Redesign mapper output to produce MappingRecord + Allele rows at all levels #100

@bencap

Description

@bencap

Context

The updated MaveDB data model requires the mapper to output a MappingRecord + Allele rows at all applicable levels (genomic, coding, protein) for every variant — not just protein-level reverse translation. This is a broader interface change than the original scope of this issue.

Goal

Redesign the mapper output interface so that for each input variant, the mapper returns:

  • A MappingRecord draft carrying provenance:
    • vrs_digest — VRS digest of the pre-mapped (assayed-level) variant, extracted as a top-level field for indexing
    • pre_mapped JSONB (raw input blob)
    • assay_level, mapping_api_version, QC fields (at_mismatched_locus, near_gap)
  • A list of Allele draft objects — one per level — each containing:
    • level (enum: 'genomic' | 'coding' | 'protein')
    • transcript (NOT NULL — present for all levels)
    • The appropriate HGVS field(s): hgvs_g / hgvs_c / hgvs_p
    • vrs_digest (computed)
    • post_mapped JSONB (the raw mapper output blob for this allele at this level)
    • clingen_allele_id where already known (optional)

For protein-level score set targets, the existing reverse translation package is used to enumerate all coding variants encoding each protein change. Each produces a coding-level Allele draft. Non-protein-level targets (genomic, coding) produce no reverse-translated alleles but do produce alleles at all natively derivable levels.

Acceptance Criteria

  • Mapper output schema includes a MappingRecord draft and a list of Allele drafts per input variant
  • MappingRecord draft includes vrs_digest as a top-level field (not only buried in pre_mapped JSONB)
  • All Allele drafts carry level, transcript, the appropriate HGVS field, vrs_digest, and post_mapped JSONB
  • Protein-level mapping runs produce coding and genomic Allele drafts via reverse translation, in addition to the protein Allele draft
  • Non-protein-level mappings produce alleles at all natively derivable levels
  • Integration tests verify that mapping a protein variant returns the expected protein + coding allele drafts (one per codon variant per transcript)
  • Integration tests verify non-protein-level mapping returns the expected allele set
  • Downstream consumer (mavedb-api worker job, mavedb-api#740) updated to consume the new output schema

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions