A deep learning project for colorectal cancer tissue classification using H&E-stained histopathology image patches. This repository fine-tunes an ImageNet-pretrained EfficientNet-B0 model on the NCT-CRC-HE-100K dataset and evaluates it on the independent CRC-VAL-HE-7K test set.
This project performs patch-level classification of colorectal histology images into 9 tissue categories:
- ADI: adipose tissue
- BACK: background
- DEB: debris
- LYM: lymphocytes
- MUC: mucus
- MUS: smooth muscle
- NORM: normal colon mucosa
- STR: cancer-associated stroma
- TUM: colorectal adenocarcinoma epithelium
The goal is to build a compact and reproducible baseline for colorectal tissue recognition using transfer learning.
The project uses the public colorectal cancer histology datasets by Kather et al.
| Dataset | Usage | Size | Description |
|---|---|---|---|
| NCT-CRC-HE-100K | Training | 100,000 patches | H&E-stained colorectal tissue image patches |
| CRC-VAL-HE-7K | Testing | 7,180 patches | Independent external validation set |
All images are 224 x 224 pixel H&E-stained tissue patches. The external validation set is patient-independent from the training set, making it useful for evaluating model generalization.
The pipeline uses transfer learning with EfficientNet-B0:
- Load EfficientNet-B0 with ImageNet-pretrained weights.
- Replace the classifier head with a 9-class output layer.
- Resize and normalize images using ImageNet statistics.
- Train with cross-entropy loss.
- Optimize using Adam.
- Save the best model according to test accuracy.
| Component | Setting |
|---|---|
| Architecture | EfficientNet-B0 |
| Pretraining | ImageNet |
| Input size | 224 x 224 |
| Number of classes | 9 |
| Optimizer | Adam |
| Learning rate | 0.001 |
| Loss function | Cross-Entropy Loss |
| Batch size | 64 |
| Epochs | 10 |
| Metric | Value |
|---|---|
| Test Accuracy | 96.76% |
| Epochs | 10 |
| Batch Size | 64 |
The model achieves strong classification performance on the independent CRC-VAL-HE-7K test set, showing that EfficientNet-B0 transfer learning is an effective baseline for colorectal histopathology patch classification.
histopathology-classification/
├── README.md
├── models.py
├── train.py
├── predict.py
└── requirements.txt
| File | Description |
|---|---|
models.py |
Defines the EfficientNet-B0 classification model |
train.py |
Training and evaluation pipeline |
predict.py |
Inference script for single-image prediction |
requirements.txt |
Python dependencies |
Clone the repository:
git clone https://github.com/Qiaoli-Li-Res/histopathology-classification.git
cd histopathology-classificationCreate a Python environment:
conda create -n histopathology python=3.10 -y
conda activate histopathologyInstall dependencies:
pip install -r requirements.txtpython train.pyThis trains EfficientNet-B0 on the training dataset and evaluates the model on the test set.
python predict.pyThis script can be used to run inference on a histopathology image patch with the trained model.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Train the classifier
python train.py
# 3. Run prediction
python predict.py- This project performs patch-level tissue classification, not whole-slide image diagnosis.
- The reported result is based on the CRC-VAL-HE-7K external test set.
- For clinical usage, additional validation, calibration, interpretability analysis, and expert pathology review are required.
- Accuracy alone may not fully reflect model reliability, especially when class imbalance or staining variation exists.
- Kather, J. N., Halama, N., & Marx, A. 100,000 histological images of human colorectal cancer and healthy tissue. Zenodo, 2018. https://doi.org/10.5281/zenodo.1214456
- Tan, M., & Le, Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML, 2019.ng · EfficientNet · PyTorch