iTCR - TCR Analysis Tools

A toolkit for T-Cell Receptor (TCR) sequence analysis based on information theory principles.

Introduction

The ubiquity of information theory provides the ability to directly capture how knowledge of one event increases understanding of another. In this study, we developed iTCR, a tool grounded in information theory to systematically assess and interpret the complexity and informativeness of TCR αβ-chain pairing patterns.

We formalized how paired $\alpha$ and $\beta$ chains constrain the accessible repertoire at the level of coarse-grained TCR features. Our iTCR provides two core analytical approaches:

MCR: Quantifies the fraction of the theoretical diversity space that is biologically accessible. A value of $MCR \approx 1$ implies perfect independence, where the features pair randomly. Conversely, values approaching $0$ reveal strong pairing constraints between $X$ and $Y$, indicating that the accessible repertoire manifold is significantly compressed relative to the theoretical potential of combinatorial pairing.
PLS: Serves as a global metric of combinatorial plasticity within the fixed germline space. A higher PLS indicates that a significant fraction of the V(J) pairing architecture has been actively reconfigured in the repertoire.

Installation

From PyPI (Recommended)

pip install iTCR

Requirements

Python >= 3.7
numpy >= 1.22.4
pandas >= 1.5.0
matplotlib >= 3.6.3
seaborn >= 0.11.2
scipy >= 1.10.1

Usage

Input data

Format

The input data should be a dictionary saved in a pickle file with the following structure:

Data Structure

    "sample_name_1": pandas.DataFrame,
    "sample_name_2": pandas.DataFrame,
    # ... more samples

Required DataFrame Columns

Each DataFrame must contain the following columns:

Column	Description	Example
`TRAV`	T-cell receptor alpha variable gene	TRAV1-2
`TRBV`	T-cell receptor beta variable gene	TRBV19
`TRAJ`	T-cell receptor alpha joining gene	TRAJ33
`TRBJ`	T-cell receptor beta joining gene	TRBJ2-1
`cdr3A`	CDR3 alpha amino acid sequence	CAVRDSSYKLIF
`cdr3B`	CDR3 beta amino acid sequence	CASSLAPGATNEKLFF
`(customized name)`	Frequency/probability of the TCR for down-sampling	clonotype.freq

Configuration File (config.json)

Users can customize which features to analyze by providing a configuration file (please visit iTCR/config.py). This allows flexible control over the entropy and mutual information calculations performed by iTCR.

Configuration File (config.py)

{
    "SINGLE_FEATURES": ["feature1", "feature2", ...],
    "CONDITIONAL_FEATURES": [["feature1", "feature2"], ...],
    "CROSS_FEATURES": [["feature1", "feature2"], ...]
}

Default Configuration

If no configuration file is provided, iTCR uses the following default settings:

{
    "SINGLE_FEATURES": [
        "cdr3A", "cdr3B", "TRAV", "TRBV", "TRAJ", "TRBJ"
    ],
    "CONDITIONAL_FEATURES": [
        ["cdr3A", "cdr3B"], ["cdr3B", "cdr3A"],
        ["TRAV", "TRBV"], ["TRBV", "TRAV"],
        ["TRAJ", "TRBJ"], ["TRBJ", "TRAJ"]
    ],
    "CROSS_FEATURES": [
        ["TRAV", "TRBV"], ["TRAV", "cdr3B"],
        ["TRAJ", "TRBJ"], ["TRAJ", "cdr3B"],
        ["cdr3A", "TRBV"], ["cdr3A", "cdr3B"],
        ["cdr3A", "TRBJ"]
    ]
}

Feature Types Explained

SINGLE_FEATURES: Individual features for entropy calculation
- Calculates H(X) for each feature X
- Used when --analysis_type includes entropy
CONDITIONAL_FEATURES: Feature pairs for conditional entropy calculation
- Calculates H(X|Y) for each pair [X, Y]
- Format: ["condition_feature", "target_feature"] means H(target|condition)
- Used when --analysis_type includes entropy
MCR_FEATURES: Feature pairs for MCR calculation
- Calculates MCR(X,Y) for each pair [X, Y]
- Order doesn't matter as MCR(X,Y) = MCR(Y,X)
- Used when --analysis_type includes mcr

Command Line Interface Overview

# General usage
python3 -m iTCR [command] [options]
# Or using the installed command
itcr [command] [options]

Available Commands

mcr                   - Entropy and MCR analysis
PLS                   - V(J)-gene Pairing Landscape Shift analysis
mcr-display           - Display MCR results
entropy-display       - Display entropy results

Analysis Modules

1. Manifold Coverage Ratio (MCR) Analysis

Analysis usage

Basic command

This module calculates entropy and MCR between different TCR features (V genes, J genes, CDR3 sequences).

python3 -m iTCR mi --inputfile data.pickle --outputdir results/ [options]

Paramenters

Parameter	Type	Default	Description
`--inputfile`	str	Required	Path to input pickle file containing TCR data
`--outputdir`	str	Required	Output directory for results
`--analysis_type`	str	both	Type of analysis: entropy, mcr, or both
`--sample_times`	int	300	Number of down-sampling times
`--sample_weights`	str	clonotype.freq	Sample weights method
`--outer_jobs`	int	8	Number of parallel outer permutation tasks; if your cores < 64, you should set it smaller.
`--inner_jobs`	int	None	Number of cores per permutation task

Examples

# Calculate entropy for TRAV region
python3 iTCR analysis \
    --inputfile tcr_data.pickle \
    --outputdir example_outputs/ \
    --analysis_type both \
    --sample_times 300 \
    --sample_weights clonotype.freq

Output files

entropy.pickle: Entropy values
mcr.pickle: MCR values

2. V(J)-gene Pairing Landscape Shift (PLS) Analysis

PLS analysis usage

The PLS module is a two-step pipeline that quantifies repertoire remodeling between biological conditions (e.g., pre- vs. post-treatment, different timepoints) by analyzing V(J)-gene pairing patterns.

Pipeline Overview

Step 1: Calculate Normalized Pointwise Information (NPMI)

Computes NPMI matrices for V-gene and J-gene pairs
Uses bootstrap sampling to generate robust estimates
Quantifies local coupling strength for each gene pair

Step 2: Analyze Timepoint Changes

Performs statistical testing between conditions
Applies dual-criterion filtering (FDR and effect size)
Calculates PLS as the proportion of significantly shifted gene pairs

Sample Naming Convention (IMPORTANT)

⚠️ Before running PLS analysis, you MUST configure your sample naming convention in your inputdata.
PLS analysis requires specific sample ID formats to identify paired samples (e.g., pre- vs. post-treatment):
Required Sample ID Format:
patient_id pretreatment # Pre-treatment sample
patient_id posttreatment # Post-treatment sample
Examples: UPN1 pretreatment, UPN1 posttreatment, UPN4 pretreatment, UPN4 posttreatment

Customizing Sample Metadata

Step 1: Locate the configuration file
The sample parser configuration is located at: iTCR/analysis/sample_parser.py
Step 2: Modify the create_sample_mapping() function

Edit this function to match your patient metadata:

def create_sample_mapping():
    """
    Create sample mapping dictionary
    MODIFY THIS FUNCTION according to your sample naming convention
    
    Returns:
    --------
    dict: Mapping of patient IDs to their metadata
    """
    return {
        "patient_id_1": {
            "pre": "Pre",
            "posttreatment": "timepoint_info",
            "metadata_field_1": "value1",
            "metadata_field_2": "value2",
            # Add more metadata fields as needed
        },
        "patient_id_2": {
            "pre": "Pre",
            "posttreatment": "timepoint_info",
            "metadata_field_1": "value1",
            "metadata_field_2": "value2",
        },
        # Add more patients...
    }

Example configuration

def create_sample_mapping():
    return {
        "UPN1": {
            "pre": "Pre",
            "posttreatment": "3M_CR",
            "cmv_status": "Positive",
            "3M_response": "CR",
            "6M_response": "CR"
        },
        "UPN4": {
            "pre": "Pre",
            "posttreatment": "3M_PR",
            "cmv_status": "Positive",
            "3M_response": "PR",
            "6M_response": "Relapsed"
        },
        "UPN6": {
            "pre": "Pre",
            "posttreatment": None,  # No post-treatment sample
            "cmv_status": "Negative",
            "3M_response": "NR",
            "6M_response": "NE, off"
        },
        # Add more patients...
    }

Data Structure Requirements
Your input pickle file should contain a dictionary where:

Keys: Sample IDs following the naming convention (e.g., "UPN1 pretreatment")
Values: DataFrames with required TCR columns (TRAV, TRBV, TRAJ, TRBJ, cdr3A, cdr3B, frequency column)
Example:

{
    "UPN1 pretreatment": DataFrame(...),
    "UPN1 posttreatment": DataFrame(...),
    "UPN4 pretreatment": DataFrame(...),
    "UPN4 posttreatment": DataFrame(...),
    # ...
}

Basic Command

python3 -m iTCR PLS --inputfile data.pickle --outputdir results/ [options]

Parameters

Parameter	Type	Default	Description
Input/Output
`--inputfile`	str	Required	Path to input pickle file
`--outputdir`	str	Required	Output directory for results
Step 1: NPMI Calculation
`--sample_times`	int	300	Number of bootstrap samples
`--sample_weights`	str	clonotype.freq	Column name for sampling weights
`--outer_jobs`	int	4	Number of parallel outer tasks
`--inner_jobs`	int	None	Number of cores per task (auto)
`--base`	float	e	Logarithm base for NPMI calculation
Step 2: Statistical Analysis
`--n_permutations`	int	10000	Number of permutations for testing
`--n_jobs`	int	-1	Number of parallel jobs (-1 = all cores)
Pipeline Control
`--skip_step1`	flag	False	Skip Step 1 and use existing NPMI results
`--only_step1`	flag	False	Only run Step 1 (NPMI calculation)

Examples

Full Pipeline

# Run complete PLS analysis
python3 -m iTCR PLS \
    --inputfile tcr_data.pickle \
    --outputdir pls_results/ \
    --sample_times 300 \
    --n_permutations 10000

Step-by-Step Execution

# Step 1 only: Calculate NPMI
python3 -m iTCR PLS \
    --inputfile tcr_data.pickle \
    --outputdir pls_results/ \
    --only_step1 \
    --sample_times 300

# Step 2 only: Analyze changes (requires existing NPMI results)
python3 -m iTCR PLS \
    --inputfile tcr_data.pickle \
    --outputdir pls_results/ \
    --skip_step1 \
    --n_permutations 10000

Output files

Step 1 Output

npmi.pickle: NPMI matrices for all V(J)-gene pairs across bootstrap iterations

Step 2 Output

patient_PLS_detailed.pickle
patient_PLS_summary.csv

3. Results Visualization

We provide the visualization for MI and entropy results generated by the "analysis" module.

Display Commands for MCR results

Features

Statistical Testing: Performs pairwise Mann-Whitney U tests between samples
Multiple Testing Correction: Supports FDR and Bonferroni correction methods
Combined Visualizations: Creates multi-panel boxplots and heatmaps
Flexible Analysis: Customizable feature pairs and test parameters
Batch Processing: Support for automated analysis without display

Usage

Basic Usage

# Analyze with default settings
python3 -m iTCR mcr-display --mcr_path results.pickle

Advanced Options

# Use FDR correction with custom significance threshold
python3 -m iTCR mcr-display --mcr_path results.pickle --adjust_method FDR 

# Custom feature pairs
python3 -m iTCR mcr-display --mcr_path results.pickle --features "TRAV,TRBV;cdr3A,cdr3B"

Parameters

Parameter	Type	Default	Description
`--mcr_path`	str	Required	Path to pickle file containing MCR data
`--save_dir`	str	figures/MCR_analysis	Directory to save output figures
`--features`	str	None	Custom feature pairs ("feat1,feat2;feat3,feat4") to display
`--adjust_method`	str	Bonferroni	Multiple testing correction (FDR/Bonferroni)
`--no_adjust`	flag	False	Skip multiple testing correction
`--significance_threshold`	float	0.05	P-value threshold for significance
`--no_display`	flag	False	Batch mode without plot display
`--output_results`	str	None	Save statistical results to CSV file
`--verbose`	flag	False	Enable detailed output

Default Feature Pairs

The analysis includes these TCR feature combinations by default:

TRAV, TRBV - Alpha and beta V genes
cdr3A, cdr3B - Alpha and beta CDR3 sequences
TRAV, cdr3B - Alpha V gene with beta CDR3
cdr3A, TRBV - Alpha CDR3 with beta V gene
TRAJ, TRBJ - Alpha and beta J genes
cdr3A, TRBJ - Alpha CDR3 with beta J gene
TRAJ, cdr3B - Alpha J gene with beta CDR3

Statistical Analysis

Multiple Testing Correction

Bonferroni: Conservative correction for multiple comparisons
FDR: False Discovery Rate (Benjamini-Hochberg) correction
None: Raw p-values without correction

Output Files

Visualizations

combined_boxplots.pdf - Multi-panel boxplots showing MI value distributions
combined_heatmaps.png - P-value heatmaps with significance annotations

Statistical Results (Optional)

CSV file with columns: Feature1, Feature2, Sample1, Sample2, P_Value_Raw, P_Value_Adjusted, Test_Direction_Used, N_Sample1, N_Sample2

Interpretation

Boxplots

Show MCR value distributions across samples for each feature pair
Colored boxes represent different samples
Means are indicated by markers
Lower MCR values suggest stronger feature associations

Heatmaps

Gray cells represent no significant ($p \ge 0.05$).
Colored cells represent significant diferences ($p < 0.05$). Red: The sample on the Left (Row) has a HIGHER value than the sample on the Bottom (Column). Blue: The sample on the Left (Row) has a LOWER value than the sample on the Bottom (Column).

Example Output

Display Commands for entropy results

The `entropy_display.py` module provides comprehensive visualization and statistical analysis tools for Entropy data generated by TCR analysis.

Features

Statistical Testing: Performs pairwise Mann-Whitney U tests between samples
Multiple Testing Correction: Supports FDR and Bonferroni correction methods
Combined Visualizations: Creates multi-panel boxplots and heatmaps
Flexible Analysis: Customizable entropy features and test parameters
Batch Processing: Support for automated analysis without display

Usage

Basic Usage

# Analyze with default settings
python3 iTCR entropy-display  --entropy_path entropy.pickle

Advanced Options

# Use FDR correction with custom significance threshold
python3 iTCR entropy-display --entropy_path entropy.pickle --adjust_method FDR

# Custom entropy features
python3 iTCR entropy-display --entropy_path entropy.pickle --features "cdr3A;cdr3B;TRAV|TRBV"

Parameters

Parameter	Type	Default	Description
`--entropy_path`	str	Required	Path to pickle file containing Entropy data
`--save_dir`	str	figures/Entropy_analysis	Directory to save output figures
`--features`	str	None	Custom entropy features ("feat1;feat2;feat3\|feat4") to display
`--adjust_method`	str	Bonferroni	Multiple testing correction (FDR/Bonferroni)
`--no_adjust`	flag	False	Skip multiple testing correction
`--significance_threshold`	float	0.05	P-value threshold for significance
`--no_display`	flag	False	Batch mode without plot display
`--output_results`	str	None	Save statistical results to CSV file
`--verbose`	flag	False	Enable detailed output

Default Entropy Features

The analysis includes these TCR entropy features by default:

cdr3A - Alpha CDR3 entropy
cdr3B - Beta CDR3 entropy
TRAV - Alpha V gene entropy
TRBV - Beta V gene entropy
cdr3A|cdr3B - Conditional entropy of alpha CDR3 given beta CDR3
cdr3B|cdr3A - Conditional entropy of beta CDR3 given alpha CDR3
TRAV|TRBV - Conditional entropy of alpha V gene given beta V gene
TRBV|TRAV - Conditional entropy of beta V gene given alpha V gene

Statistical Analysis

Multiple Testing Correction

Bonferroni: Conservative correction for multiple comparisons
FDR: False Discovery Rate (Benjamini-Hochberg) correction
None: Raw p-values without correction

Output Files

Visualizations

combined_entropy_boxplots.pdf - Multi-panel boxplots showing entropy value distributions
combined_entropy_heatmaps.png - P-value heatmaps with significance annotations

Statistical Results (Optional)

CSV file with columns: Feature, Sample1, Sample2, P_Value_Raw, P_Value_Adjusted, Test_Direction_Used, N_Sample1, N_Sample2, Mean_Sample1, Mean_Sample2, Std_Sample1, Std_Sample2

Interpretation

Boxplots

Show entropy value distributions across samples for each feature
Colored boxes represent different samples
Means are indicated by markers
Higher entropy values suggest greater diversity/uncertainty

Heatmaps

Gray cells represent no significant ($p \ge 0.05$).
Colored cells represent significant diferences ($p < 0.05$). Red: The sample on the Left (Row) has a HIGHER value than the sample on the Bottom (Column). Blue: The sample on the Left (Row) has a LOWER value than the sample on the Bottom (Column).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
downstream		downstream
example_outputs		example_outputs
figures		figures
iTCR		iTCR
.gitignore		.gitignore
readme.md		readme.md
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

iTCR - TCR Analysis Tools

Introduction

Installation

From PyPI (Recommended)

Requirements

Usage

Format

Data Structure

Required DataFrame Columns

Configuration File (config.json)

Configuration File (config.py)

Default Configuration

Feature Types Explained

Available Commands

1. Manifold Coverage Ratio (MCR) Analysis

Basic command

Paramenters

Examples

2. V(J)-gene Pairing Landscape Shift (PLS) Analysis

Pipeline Overview

Sample Naming Convention (IMPORTANT)

Customizing Sample Metadata

Basic Command

Parameters

Examples

3. Results Visualization

Features

Usage

Basic Usage

Advanced Options

Parameters

Default Feature Pairs

Statistical Analysis

Multiple Testing Correction

Output Files

Visualizations

Statistical Results (Optional)

Interpretation

Boxplots

Heatmaps

Example Output

Features

Usage

Basic Usage

Advanced Options

Parameters

Default Entropy Features

Statistical Analysis

Multiple Testing Correction

Output Files

Visualizations

Statistical Results (Optional)

Interpretation

Boxplots

Heatmaps

Example Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages