Skip to content

Hongye-Chen/ad-perception-system

Repository files navigation

Real-Time Autonomous Driving Perception System

A complete perception pipeline — detection, tracking, segmentation, and deployment — trained on KITTI and BDD10K, optimized with TensorRT, and deployed in C++.

Python Version PyTorch TensorRT License

Project Overview

This project implements a production-grade autonomous driving perception system covering four core tasks:

  • Object Detection — YOLOv8-l/m on KITTI (4 classes), scratch vs fine-tuning comparison
  • Multi-Object Tracking — ByteTrack with Kalman filter on KITTI tracking sequences
  • Semantic Segmentation — UNet, DeepLabV3+, SegFormer on BDD10K (19 classes)
  • Deployment & Optimization — PyTorch → ONNX → TensorRT FP16, standalone C++ inference app

Highlights

Phase Task Best Result Model
1 Object Detection 95.70% mAP@50 YOLOv8-l (KITTI, 4 classes)
2 Multi-Object Tracking 84.0% MOTA, 84.9% IDF1 ByteTrack @ 60.2 FPS
3 Semantic Segmentation 64.49% mIoU SegFormer MiT-B3 (BDD10K, 19 classes)
4 TensorRT Deployment 3.2× speedup, C++ @ 91.3 FPS YOLOv8-l FP16, zero accuracy loss

System Architecture

flowchart LR
    subgraph datasets [Datasets]
        KITTI["KITTI\n7.5K images\n21 sequences"]
        BDD10K["BDD10K\n7K train / 1K val\n19 classes"]
    end

    subgraph training [Training — Python / PyTorch]
        Det["Phase 1\nDetection\nYOLOv8-l/m"]
        Track["Phase 2\nTracking\nByteTrack"]
        Seg["Phase 3\nSegmentation\nUNet / DLV3+ / SegFormer"]
    end

    subgraph deploy [Deployment — Phase 4]
        ONNX["ONNX Export"]
        TRT["TensorRT FP16\n3.2x speedup"]
        CPP["C++ Application\n91.3 FPS E2E"]
    end

    KITTI --> Det
    KITTI --> Track
    BDD10K --> Seg
    Det --> Track
    Det --> ONNX
    Seg --> ONNX
    ONNX --> TRT
    TRT --> CPP
Loading

Project Status

Phase Status Documentation
Phase 1: Detection Complete KITTI Report, BDD100K Report
Phase 2: Tracking Complete Tracking Report
Phase 3: Segmentation Complete Segmentation Report
Phase 4: Deployment Complete Deployment Report

Results

Phase 1: Object Detection (KITTI)

5 models trained comparing scratch training vs fine-tuning from BDD100K.

Model Training mAP@50 mAP@50-95 Epochs Time
YOLOv8-l From scratch 95.70% 80.60% 100 4.2 hrs
YOLOv8-l Fine-tuned (BDD→KITTI) 95.28% 78.91% 100 2.6 hrs
YOLOv8-m From scratch 95.57% 79.76% 100 3.0 hrs
YOLOv8-l Fine-tuned (BDD→KITTI) 94.85% 76.64% 30 48 min
YOLOv8-m Fine-tuned (BDD→KITTI) 94.38% 75.30% 30 35 min

Key finding: Training from scratch outperformed fine-tuning by +1.7% mAP@50-95 despite being 38% slower — revealing the BDD100K→KITTI domain gap.

Classes: Car, Truck, Pedestrian, Cyclist (4 classes) — see KITTI report for per-class breakdown and convergence analysis.

Phase 2: Multi-Object Tracking (KITTI)

ByteTrack tracker with Kalman filter, evaluated on 21 KITTI tracking sequences (853 GT tracks).

Metric Target Result
MOTA > 50% 84.0%
IDF1 > 60% 84.9%
FPS > 30 60.2
Mostly Tracked (MT) > 50% 65.4%
Recall 87.8%
Precision 96.9%

Key finding: Tuning IoU matching threshold from 0.8→0.5 yielded +37.7pp MOTA improvement — the default was far too strict for Kalman filter predictions, causing ~17,000 unnecessary missed detections.

See tracking report for parameter sweep analysis and demo videos.

Phase 3: Semantic Segmentation (BDD10K)

Three architectural families compared on BDD10K (7,000 train / 1,000 val, 19 Cityscapes-compatible classes).

Model Encoder mIoU Pixel Acc FPS Params
UNet ResNet-50 60.49% 93.04% 116.5 32.5M
DeepLabV3+ ResNet-101 63.14% 93.65% 105.9 45.7M
SegFormer MiT-B2 64.26% 93.61% 46.0 27.4M
SegFormer MiT-B3 64.49% 93.81% 34.5 47.2M

Architecture progression: UNet → DeepLabV3+ (+2.65% mIoU via ASPP multi-scale reasoning) → SegFormer (+1.12% via transformer global context). Road IoU > 94% across all models (robust ground plane estimation). SegFormer MiT-B2 is the most parameter-efficient — 99.6% of B3's accuracy with 42% fewer parameters.

See segmentation report for per-class IoU, category analysis, and architectural insights.

Phase 4: Deployment & Optimization

Full PyTorch → ONNX → TensorRT FP16 pipeline with a standalone C++ inference application.

Detection — YOLOv8-l (960x960):

Backend Precision Infer FPS E2E FPS mAP@50
PyTorch FP32 97.1 54.1 95.68%
TensorRT (Python) FP16 313.0 85.9 95.62%
TensorRT (C++) FP16 309.0 91.3 95.62%

Detection — YOLOv8-m (960x960):

Backend Precision Infer FPS E2E FPS mAP@50
PyTorch FP32 152.5 66.8 95.55%
TensorRT (Python) FP16 429.1 93.3 95.53%
TensorRT (C++) FP16 429.4 102.9 95.53%

Segmentation — DeepLabV3+ (1280x720):

Backend Precision Infer FPS E2E FPS mIoU
PyTorch FP32 75.0 30.2 63.03%
TensorRT (Python) FP16 342.6 43.4 63.02%

C++ cross-validation: 1,122/1,122 KITTI images produce bit-identical detections vs Python TRT reference (0.0 px max coordinate diff). The hardest bug was a sort-stability issue caused by FP16 confidence ties in NMS — documented in debug report.

C++ timing breakdown (YOLOv8-l): Preprocessing 4.0 ms, H2D copy 3.4 ms, inference 3.2 ms, D2H copy 0.3 ms, postprocess 0.1 ms. The bottleneck is CPU preprocessing and memory transfers — a CUDA preprocessing kernel would cut latency by ~35%.

See deployment report for full benchmark tables and implementation details.


Installation

Prerequisites

  • Python 3.12+
  • CUDA 12.x (for GPU acceleration)
  • TensorRT 10.x (for deployment)
  • CMake 3.18+ and MSVC (for C++ build)

Setup Environment

git clone https://github.com/Hongye-Chen/ad-perception-system.git
cd ad-perception-system

conda create -n ad_perception python=3.12
conda activate ad_perception

pip install -r requirements.txt

Download Datasets

# KITTI (detection + tracking)
# Register at https://www.cvlibs.uni-tuebingen.de/datasets/kitti/
python scripts/kitti/prepare_kitti.py

# BDD10K segmentation (for Phase 3)
# Download from https://www.bdd100k.com/

Quick Start

Detection — Training

# YOLOv8-l from scratch on KITTI
python scripts/train_yolov8.py --data data/kitti/processed/dataset.yaml --model yolov8l --epochs 100 --batch 16 --imgsz 960 --device 1 --name yolov8l_kitti_scratch --patience 30

# YOLOv8-l fine-tuned from BDD100K
python scripts/train_yolov8.py --data data/kitti/processed/dataset.yaml --model "models/detection/yolov8l_bdd100k_1280/weights/best.pt" --epochs 100 --batch 16 --imgsz 960 --device 1 --name yolov8l_kitti_finetune --optimizer AdamW --lr0 0.001 --freeze 10

Tracking — Run and Evaluate

# Run ByteTrack on KITTI sequences
python scripts/track_kitti.py

# Evaluate MOT metrics
python scripts/evaluate_tracking.py

Segmentation — Training and Evaluation

# Train SegFormer MiT-B2 on BDD10K
python scripts/segmentation/train_segmentation.py --model segformer --encoder mit_b2 --crop_size 768 --batch_size 4 --lr 0.00006 --encoder_lr 0.000006 --epochs 100 --patience 20 --output_dir models/segmentation/segformer_mit_b2_bdd10k

# Evaluate
python scripts/segmentation/evaluate_segmentation.py --model segformer --encoder mit_b2 --checkpoint models/segmentation/segformer_mit_b2_bdd10k/best.pt

# Compare all models
python scripts/segmentation/compare_models.py --metrics_files models/segmentation/*/metrics.json --output_dir results/segmentation

Deployment — Export and Benchmark

# Export YOLOv8-l to ONNX
python scripts/deployment/export_detection.py

# Build TensorRT engines
python scripts/deployment/build_trt_engines.py

# Validate accuracy (ONNX and TRT vs PyTorch)
python scripts/deployment/validate_onnx.py
python scripts/deployment/validate_trt.py

# Benchmark all backends
python scripts/deployment/benchmark_detection.py

C++ Inference

cd cpp/build
cmake ..
cmake --build . --config Release

# Run inference
.\Release\ad_perception_infer.exe --engine ..\..\models\deployment\yolov8l_kitti_fp16.engine --image ..\..\data\kitti\processed\images\val\000000.png

# Benchmark
.\Release\ad_perception_infer.exe --engine ..\..\models\deployment\yolov8l_kitti_fp16.engine --benchmark --iterations 200

Project Structure

ad-perception-system/
├── scripts/
│   ├── train_yolov8.py           # YOLOv8 detection training
│   ├── track_kitti.py            # ByteTrack on KITTI sequences
│   ├── evaluate_tracking.py      # MOT metrics evaluation
│   ├── segmentation/             # Segmentation training, evaluation, comparison, visualization
│   ├── deployment/               # ONNX export, TRT build, validation, benchmarking
│   ├── trackers/                 # ByteTrack, Kalman filter implementations
│   ├── kitti/                    # KITTI dataset preparation
│   └── bdd100k/                  # BDD100K dataset preparation
├── cpp/                          # C++ TensorRT inference application
│   ├── CMakeLists.txt
│   ├── include/                  # trt_engine.h, preprocess.h, postprocess.h, cuda_utils.h
│   └── src/                      # main.cpp, trt_engine.cpp, preprocess.cpp, postprocess.cpp
├── configs/                      # YAML configs (detection, tracking, export)
├── notebooks/                    # Jupyter notebooks for data exploration
├── docs/                         # Phase reports, plans, guides
├── results/                      # Benchmark results, metrics JSONs
├── models/                       # Trained checkpoints, ONNX, TensorRT engines (git-ignored)
├── data/                         # KITTI, BDD100K, BDD10K datasets (git-ignored)
├── requirements.txt
├── environment.yml
└── LICENSE

See docs/project_structure.md for a detailed breakdown of every file.


Documentation

Phase Reports

Guides


Technical Details

Detection

  • YOLOv8-l/m (Ultralytics): Single-stage anchor-free detector, trained at 960x960 on KITTI
  • Scratch training vs BDD100K fine-tuning comparison across 5 configurations
  • SGD (scratch) vs AdamW (fine-tuning) with layer freezing

Tracking

  • ByteTrack: Two-stage association (high-confidence + low-confidence matches) with Kalman filter motion prediction
  • Systematic parameter sweep on IoU matching threshold, revealing +37.7pp MOTA improvement

Segmentation

  • UNet (ResNet-50): Encoder-decoder baseline with skip connections
  • DeepLabV3+ (ResNet-101): Multi-scale context via Atrous Spatial Pyramid Pooling
  • SegFormer (MiT-B2/B3): Hierarchical vision transformer with MLP decoder
  • Combined CrossEntropy + Dice loss, differential learning rates, 768x768 training crop

Deployment

  • ONNX export with no embedded NMS (platform-agnostic postprocessing)
  • TensorRT FP16 engines via trtexec — 3.2x inference speedup, zero mAP@50 loss
  • C++ application: TRT engine loading, letterbox preprocessing, per-class NMS with std::stable_sort, JSON benchmark output
  • Cross-validation: Bit-identical results between C++ and Python on 1,122 images

Acknowledgments

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Real-time autonomous driving perception system: detection, tracking, segmentation, and TensorRT C++ deployment

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors