| Sebastian Ojeda, Rafael Velasquez, Nicolás Aparicio, Juanita Puentes, Paula Cárdenas, Nicolás Andrade, Gabriel González, Sergio Rincón, Carolina Muñoz-Camargo and, Pablo Arbeláez |
Expanded Standardized Collection for Antimicrobial Peptide Evaluation (ESCAPE) is an experimental framework for multilabel antimicrobial peptide classification. It combines a large-scale curated dataset, a benchmark for evaluating models, and a transformer-based baseline that integrates both sequence and structural information.
The ESCAPE Dataset integrates over 80,000 peptide sequences from 27 validated public repositories to address critical limitations in existing AMP resources, including data fragmentation, inconsistent annotations, and limited functional coverage. It distinguishes antimicrobial peptides from negative sequences and organizes their functional annotations into a biologically meaningful multilabel hierarchy, covering antibacterial, antifungal, antiviral, and antiparasitic activities.The dataset comprises 21,409 experimentally validated AMPs and 60,950 non-AMPs filtered from unrelated sources.
The ESCAPE Dataset is available for download. You can access the complete ESCAPE Database on Harvard Dataverse.
We evaluate eight representative models for antimicrobial peptide classification: AMPlify, AMP BERT, TransImbAMP, amPEPpy, AMPs Net, AVP-IFT, PEP Net and the ESCAPE Baseline, using the multilabel framework defined by the ESCAPE Benchmark. Each model was modified to support multilabel classification and trained with two fold cross validation. We report final performance by averaging predictions from both folds through an ensemble strategy. Evaluation uses two standard metrics for multilabel tasks: F1 score and mean Average Precision, which are suitable for datasets with class imbalance.
The table below summarizes the key methods for antimicrobial peptide classification of the ESCAPE Benchmark, their primary architectures, GitHub repositories, and the F1-score and mean Average Precision (mAP) these methods achieve by evaluating them on the ESCAPE Dataset.
| Method | Primary Architecture | GitHub Repository | F1-score (%) | mAP (%) |
|---|---|---|---|---|
| AMPs-Net | GCN | GitHub | 57.7 ± 0.70 | 54.6 ± 0.86 |
| TranslmbAMP | Transformer-Based | GitHub | 62.0 ± 0.70 | 64.9 ± 1.11 |
| AMP-BERT | BERT | GitHub | 64.7 ± 0.64 | 66.9 ± 1.17 |
| amPEPpy | Random Forest (RF) | GitHub | 66.5 ± 0.37 | 68.5 ± 0.48 |
| PEP-Net | Transformer-Based | GitHub | 65.5 ± 0.61 | 68.4 ± 0.53 |
| AVP-IFT | Contrastive-Learning + Transformer | GitHub | 66.5 ± 0.59 | 68.8 ± 0.50 |
| AMPlify | Bi-LSTM with attention layers | GitHub | 68.5 ± 0.77 | 70.3 ± 0.87 |
| ESCAPE Baseline (ours) | Dual-branch transformer | GitHub | 69.8 ± 0.43 | 72.1 ± 0.60 |
1. Clone the ESCAPE repository.
git clone https://github.com/BCV-Uniandes/ESCAPE.git
2. Install general dependencies. To set up the environment and install the necessary dependencies, run the following commands:
conda env create -f ESCAPE.yml
conda activate ESCAPE_env
To reproduce the ESCAPE Benchmark results on the ESCAPE Dataset:
1. Update the paths to both model checkpoints in the src/ensemble.sh executable script.
2. Set the model architecture in the test_ESCAPE.py file.
3. Run the following command:
bash src/ensemble.shThis script loads both trained models, averages their outputs, and computes the final metrics over the test set.
The ESCAPE Baseline is a dual-branch transformer architecture designed to classify antimicrobial peptides (AMPs) using both sequence and structural information. It processes amino acid sequences through a transformer encoder and structural representations through a second branch that encodes peptide distance matrices. These two modalities are fused using a bidirectional cross-attention mechanism, enabling the model to capture both biological context and spatial structure. This approach achieves state-of-the-art overall performance on the ESCAPE Benchmark, outperforming existing methods in both F1-score and mean Average Precision.
For the structural branch, each peptide is represented as a 224×224 distance matrix, where each element corresponds to the Euclidean distance between Cα atoms in the 3D conformation. We extract these structures from UniProt when available, or predict them using RosettaFold or AlphaFold3. The resulting distance matrices are precomputed for all peptides and stored as .npy files.
1. Download distance matrices. You can download the distance matrices for the test set from this link.
2. Set the distance matrix path. Modify the path to the folder containing the distance matrices in the test.py file to ensure the model can load the correct structural inputs during evaluation.
We evaluate the ESCAPE Baseline on the ESCAPE Benchmark using two standard metrics for multilabel classification: F1-score and mean Average Precision (mAP). This model achieves state-of-the-art overall performance, outperforming six existing AMP classifiers across both metrics. To reproduce the evaluation of the ESCAPE Baseline:
1. Download trained model checkpoints. You can download the .pth files for both folds from this link.
2. Update the script configuration. Set the correct paths to both checkpoints in the src/ensemble.sh script, and ensure that the MultiModalClassifier architecture from src/models.py is properly initialized in src/test_ESCAPE.py.
3. Run ensemble evaluation. Use the following command:
bash src/ensemble.shTo reproduce the training procedure for the ESCAPE Baseline, this repository provides the complete training pipeline, including argument handling, model initialization, and data loading. All input paths and training parameters are defined in src/args.py, and are passed through the executable script src/train.sh. These arguments include the locations of the ESCAPE CSV files, the directory containing the structural distance matrices, optimization settings, and the model configuration.
1. Set the training arguments in src/args.py. This file defines all required parameters, including learning rate, batch size, number of epochs, and output directories. The model is selected through the --mode argument, which supports three options: sequence (sequence-only transformer), distance (distance-matrix transformer), and MultiModal (dual-branch architecture used as the baseline).
2. Modify the src/train.sh script to provide the correct paths to the ESCAPE training, validation, and test partitions, as well as the folder containing the 224×224 structural matrices. Any additional argument defined in args.py may also be adjusted directly from this script.
3. Run the following command to execute training:
bash src/train.sh
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


