Skip to content

This project uses deep learning and computer vision to predict cancerous CT scan images among the normal ones. Using the datasets provided by Kaggle we have developed different architectures. This current project uses the TinyVGG model architecture to come up with an accurate model.

Notifications You must be signed in to change notification settings

FreshPrince99/Cancer-detection-through-CT-scan-imaging

Repository files navigation

Cancer Detection from CT Scan Imaging

End-to-end experiments for classifying lung CT scans into cancer classes vs normal. This repo includes traditional ML baselines (Decision Tree / Logistic Regression) and a compact TinyVGG-style CNN implemented in notebooks, plus a lightweight app script for local inference.

Dataset: Chest CT-Scan images (Kaggle) — 4 classes (adenocarcinoma, large cell carcinoma, squamous cell carcinoma, normal). (Kaggle, arXiv, Nature)


Table of contents


Project overview

This project explores multiple modeling approaches for CT image classification:

  • Classical ML: Decision Tree & Logistic Regression baselines (useful sanity checks, quick to train).
  • Deep Learning: TinyVGG-style CNN trained from scratch / fine-tuned.
  • Deployment sketch: a small local app script to try single-image predictions.

These approaches are reflected by the files and notebooks in this repo (see below). The original description and current README mention both TinyVGG and Decision Trees, which is why both paths are kept here. (GitHub)


Repo structure

.
├── TinyVGGModel.ipynb               # TinyVGG training (baseline)
├── TinyVGGModelModified.ipynb       # Variants / tweaks
├── DecisionTreeNoteBook.ipynb       # Classical ML baseline
├── model_deployment_app.py          # Simple local inference app (see below)
├── app.py                           # (alt) demo script
├── model_LogR.sav                   # Saved Logistic Regression model
├── requirement.txt                  # Python dependencies
├── README.md
└── docs/ (dataset dictionaries)
    ├── cxr_abnormalities.dictionary.d040722.pdf
    ├── participant.dictionary.d040722.pdf
    └── sct_image_series.dictionary.d040722.pdf

GitHub reports the repo is mostly Jupyter notebooks with a bit of Python glue. (GitHub)


Data

Expected local layout

Place (or symlink) the dataset like this (adjust paths in notebooks as needed):

data/
  train/
    adenocarcinoma/
    large cell carcinoma/
    squamous cell carcinoma/
    normal/
  valid/
    adenocarcinoma/
    large cell carcinoma/
    squamous cell carcinoma/
    normal/
  test/
    adenocarcinoma/
    large cell carcinoma/
    squamous cell carcinoma/
    normal/

If your download only has a single folder of class subdirs, you can create train/valid/test splits with a small script or via torchvision.datasets.ImageFolder + random_split.


Quickstart

  1. Environment
# clone
git clone https://github.com/FreshPrince99/Cancer-detection-through-CT-scan-imaging
cd Cancer-detection-through-CT-scan-imaging

# create venv (recommended)
python -m venv .venv
# mac/linux
source .venv/bin/activate
# windows (powershell)
# .\.venv\Scripts\Activate.ps1

# install deps
pip install --upgrade pip
pip install -r requirement.txt

The file is named requirement.txt (singular) in this repo.

  1. Open the notebooks

Use Jupyter or VS Code to run:

  • TinyVGGModel.ipynb or TinyVGGModelModified.ipynb
  • DecisionTreeNoteBook.ipynb

Update dataset paths at the top of each notebook.


Training

TinyVGG (PyTorch)

The TinyVGG setup follows the well-known VGG pattern of stacked 3×3 conv + ReLU + max-pool blocks with a small classifier head. It’s intentionally compact for quick iteration on limited data. (Background on VGG/TinyVGG: see references.) (PyImageSearch, viso.ai, FreeCodeCamp)

Common knobs you can tweak in the notebook:

  • Input size (224×224 is conventional for VGG-style models)
  • Data augmentation (flip, rotate, slight zoom, CLAHE if desired)
  • Learning rate / scheduler, weight decay, early stopping
  • Class weights if classes are imbalanced

Classical ML baselines

DecisionTreeNoteBook.ipynb loads features (either simple pixel/intensity stats or embeddings) and trains a Decision Tree; model_LogR.sav holds a trained Logistic Regression model you can reuse for quick comparisons.


Evaluation

Typical metrics to track:

  • Accuracy, Precision, Recall, F1, ROC-AUC, plus per-class support
  • Confusion matrix (helps reveal class confusion, e.g. adenocarcinoma vs squamous)

For medical imaging work, consider patient-level splits (avoid leakage across splits), calibration curves, and threshold selection tuned to your use case.


Inference / Demo app

There are two small scripts included; open the file to confirm which framework it uses in your environment:

  • model_deployment_app.py – a simple local app for single-image prediction (check imports to see if it uses Streamlit or Flask, then run accordingly).

    • Streamlit style: streamlit run model_deployment_app.py
    • Flask style: python model_deployment_app.py and visit the shown URL.
  • app.py – alternate demo script with similar intent.

Make sure the model weights you want to use (TinyVGG .pth or classical .sav) are loaded in the script, and the preprocessing matches your training pipeline.


Notes & tips

  • Reproducibility: fix random seeds and note your exact train/val/test split.
  • Class imbalance: try class-weighted loss, balanced sampling, or modest augmentation.
  • Generalization: keep a held-out test set from the start; avoid peeking via repeated tuning.
  • Ethics & scope: this is a research/learning project; do not use it for clinical decisions.

Roadmap

  • Lift notebooks into Python modules and a CLI (train.py, infer.py)
  • Add Grad-CAM/saliency maps for explainability
  • Training logs + tensorboard, early-stopping & checkpointing
  • Clear export path: TinyVGG → .pth and matching loader in the app
  • Unit tests for transforms and inference preprocessing

Citations

  • Dataset: Kaggle Chest CT-Scan images (4 classes; common train/val/test layout). (Kaggle, Nature)
  • Repo context: file list and description indicating TinyVGG and Decision Tree variants. (GitHub)
  • Background on VGG/TinyVGG: compact VGG-style CNNs with stacked 3×3 conv blocks. (PyImageSearch, viso.ai, FreeCodeCamp)
  • Independent references using the same Kaggle dataset and 4-class setup. (arXiv)

License

No license file is present in this repo. If you plan to reuse parts of this code, please open an issue to discuss terms.


Maintainer

FreshPrince99 — PRs and suggestions are welcome.

About

This project uses deep learning and computer vision to predict cancerous CT scan images among the normal ones. Using the datasets provided by Kaggle we have developed different architectures. This current project uses the TinyVGG model architecture to come up with an accurate model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published