Skip to content

Add direct RGP to gene families mapping output#360

Merged
JeanMainguy merged 1 commit intodevfrom
output_rgp_to_fams
Nov 19, 2025
Merged

Add direct RGP to gene families mapping output#360
JeanMainguy merged 1 commit intodevfrom
output_rgp_to_fams

Conversation

@JeanMainguy
Copy link
Copy Markdown
Member

Summary

This PR adds a new output option --regions_families to the write_pangenome command that generates a simple TSV file mapping Regions of Genomic Plasticity (RGPs) directly to their gene family content.

Motivation

Previously, analyzing the gene family composition of RGPs required a multi-step workflow:

  1. Export gene pangneome annotation with RGP info using ppanggolin write_genomes --table
  2. Map genes within an RGP to their families using the gene_families.tsv file
  3. Join these files to obtain RGP-to-family mappings

This indirect approach was quite complicated and was necessary to process many files from the write_genomes command.

Changes

  • Added --regions_families flag to ppanggolin write_pangenome command
  • Generates rgp_families.tsv with two columns:
    • rgp_id: RGP identifier (matching the 'region' column in regions_of_genomic_plasticity.tsv)
    • family_id: Gene family identifier present in the RGP
  • Updated documentation in docs/user/RGP/rgpOutputs.md with usage examples and column descriptions

Usage

ppanggolin write_pangenome -p pangenome.h5 --regions_families -o rgp_outputs

@JeanMainguy JeanMainguy marked this pull request as ready for review November 19, 2025 12:49
@JeanMainguy JeanMainguy requested a review from tlemane November 19, 2025 14:47
@JeanMainguy JeanMainguy merged commit 4b9c4da into dev Nov 19, 2025
6 checks passed
@axbazin axbazin deleted the output_rgp_to_fams branch December 17, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants