Skip to content

dr-yoon/TAILVAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

201 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TAILVAR (Terminal extension Analysis for Improved prediction of Lengthened VARiants)

Welcome to the TAILVAR repository! This repository stores the codes for developing the TAILVAR score designed to assess the functional impact of stop-loss variants with C-terminal extensions occurring at stop codons (TAA, TGA, TAG).

TAILVAR overview

Overview

TAILVAR is built using a Random Forest model that predicts the pathogenicity of stop-loss variants. By integrating a combination of in-silico prediction scores, transcript, and protein features of C-terminal extensions, TAILVAR provides a score ranging from 0 to 1, indicating the probability of pathogenic potential. TAILVAR score cutoffs ≥ 0.70 and ≤ 0.30 can be used to distinguish potential pathogenic/likely pathogenic and benign/likely benign variants.

TAILVAR overview

Key components (integration of 37 features)

  • Variant effect prediction tool:

    • CADD: Combined Annotation Dependent Depletion
  • Conservation scores:

    • GERP: Genomic Evolutionary Rate Profiling
    • phyloP100way: Phylogenetic P-value across 100 vertebrates
    • phastCons100way: Phylogenetic Conserved Elements across 100 vertebrates
  • Transcript features:

    • 3'UTR_GC: GC content of the 3' UTR
    • 3'UTR_length: Length of the 3' UTR
    • mRNA stability: Z-scores of Saluki dataset
    • pLI: probablity score of loss-of-function (LOF) intolerance from gnomAD
    • LOEUF: upper boundary fraction of observed/expected LOF variants from gnomAD
    • s_het: Bayesian estimation of gene constraint metrics
  • Protein features:

    • Protein_lengths: Total counts of amino acids in the original protein
    • C-terminal_lengths: Total counts of amino acids in the C-terminal extension
    • Amino acids counts: Counts of each of the 20 amino acids in the C-terminal extension
    • Hydrophobicity_KD: Mean hydrophobicity of the C-terminal extension in Kyte–Doolittle (KD) scales
    • Hydrophobicity_MJ: Mean hydrophobicity of the C-terminal extension in Miyazawa–Jernigan (MJ) scales
    • TANGO: aggregation properties of the C-terminal peptide predicted by TANGO
    • CANYA: aggregation properties of the C-terminal peptide predicted by CANYA
    • Intrinsically Disordered Proteins (IDPs): manually curated IDPs database by DisProt database

Annotation

You can download pre-computed TAILVAR scores provided in the vcf file below. To annotate TAILVAR scores using VEP in Custom Annotation mode, use the following command:

./vep [...] --custom file=TAILVAR_score_SNV.hg38.vcf.gz,short_name=TAILVAR,format=vcf,type=exact,fields=score
./vep [...] --custom file=TAILVAR_score_INDEL.hg38.vcf.gz,short_name=TAILVAR,format=vcf,type=exact,fields=score

For more details, visit VEP

Download

The datasets, including training and test sets of TAILVAR with annotations and pre-computed scores for all possible single nucleotide substitutions and indels, are available for download here.

Citation

If you use TAILVAR in your research, please cite:

Yoon et al. "A predictive framework for stop-loss variants with C-terminal extensions." Nucleic Acids Research (2026) https://doi.org/10.1093/nar/gkag031

About

TAILVAR: Terminal extension Analysis for Improved prediction of Lengthened VARiants

Topics

Resources

License

Stars

Watchers

Forks