Welcome to the TAILVAR repository! This repository stores the codes for developing the TAILVAR score designed to assess the functional impact of stop-loss variants with C-terminal extensions occurring at stop codons (TAA, TGA, TAG).
TAILVAR is built using a Random Forest model that predicts the pathogenicity of stop-loss variants. By integrating a combination of in-silico prediction scores, transcript, and protein features of C-terminal extensions, TAILVAR provides a score ranging from 0 to 1, indicating the probability of pathogenic potential. TAILVAR score cutoffs ≥ 0.70 and ≤ 0.30 can be used to distinguish potential pathogenic/likely pathogenic and benign/likely benign variants.
-
Variant effect prediction tool:
- CADD: Combined Annotation Dependent Depletion
-
Conservation scores:
- GERP: Genomic Evolutionary Rate Profiling
- phyloP100way: Phylogenetic P-value across 100 vertebrates
- phastCons100way: Phylogenetic Conserved Elements across 100 vertebrates
-
Transcript features:
- 3'UTR_GC: GC content of the 3' UTR
- 3'UTR_length: Length of the 3' UTR
- mRNA stability: Z-scores of Saluki dataset
- pLI: probablity score of loss-of-function (LOF) intolerance from gnomAD
- LOEUF: upper boundary fraction of observed/expected LOF variants from gnomAD
- s_het: Bayesian estimation of gene constraint metrics
-
Protein features:
- Protein_lengths: Total counts of amino acids in the original protein
- C-terminal_lengths: Total counts of amino acids in the C-terminal extension
- Amino acids counts: Counts of each of the 20 amino acids in the C-terminal extension
- Hydrophobicity_KD: Mean hydrophobicity of the C-terminal extension in Kyte–Doolittle (KD) scales
- Hydrophobicity_MJ: Mean hydrophobicity of the C-terminal extension in Miyazawa–Jernigan (MJ) scales
- TANGO: aggregation properties of the C-terminal peptide predicted by TANGO
- CANYA: aggregation properties of the C-terminal peptide predicted by CANYA
- Intrinsically Disordered Proteins (IDPs): manually curated IDPs database by DisProt database
You can download pre-computed TAILVAR scores provided in the vcf file below. To annotate TAILVAR scores using VEP in Custom Annotation mode, use the following command:
./vep [...] --custom file=TAILVAR_score_SNV.hg38.vcf.gz,short_name=TAILVAR,format=vcf,type=exact,fields=score
./vep [...] --custom file=TAILVAR_score_INDEL.hg38.vcf.gz,short_name=TAILVAR,format=vcf,type=exact,fields=scoreFor more details, visit VEP
The datasets, including training and test sets of TAILVAR with annotations and pre-computed scores for all possible single nucleotide substitutions and indels, are available for download here.
If you use TAILVAR in your research, please cite:
Yoon et al. "A predictive framework for stop-loss variants with C-terminal extensions." Nucleic Acids Research (2026) https://doi.org/10.1093/nar/gkag031

