Proyek MLOps end-to-end untuk memprediksi tarif taksi NYC menggunakan data NYC TLC Trip Record.
π Live Demo: https://mlops-nyc-taxi-7eaf5edf4a58.herokuapp.com
| Komponen | Deskripsi |
|---|---|
| ML Model | Random Forest & Gradient Boosting dengan Optuna tuning |
| API | FastAPI dengan auto-documentation (Swagger) |
| Dashboard | HTML/CSS/JS dengan prediksi real-time |
| Monitoring | Drift detection untuk Distance & Target (Fare) |
| Registry | Blue/Green deployment dengan MLflow |
| Deployment | Docker + Heroku dengan auto-deploy dari GitHub |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BROWSER β
β HTML Dashboard (Prediction + Monitoring) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β HTTP REST API
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β FastAPI Server β
β βββ POST /predict β Prediksi tarif β
β βββ GET /health β Status server β
β βββ GET /model/info β Info model aktif β
β βββ GET /monitoring/drift β Drift metrics β
β βββ GET /docs β Swagger UI β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β ML Model Layer β
β βββ production_model.joblib (Model aktif) β
β βββ MLflow Registry (Version control) β
β βββ Reference Stats (Baseline untuk drift) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
mlops/
βββ src/
β βββ serving/
β β βββ api.py # FastAPI server
β β βββ static/index.html # Dashboard UI
β β βββ reference_stats.json
β βββ models/
β β βββ train.py # Training pipeline
β β βββ registry.py # MLflow registry
β βββ features/
β βββ engineering.py # Feature engineering
βββ cli/
β βββ main.py # CLI commands
βββ models/
β βββ production_model.joblib # Model production
βββ Dockerfile # Docker config
βββ heroku.yml # Heroku config
βββ requirements.txt
# Clone repo
git clone https://github.com/kemasverii/mlops-nyc-taxi.git
cd mlops-nyc-taxi
# Install dependencies
pip install -r requirements.txt
# Jalankan server
python cli/main.py serve start
# Buka browser: http://localhost:8000# Build image
docker build -t mlops-api .
# Run container
docker run -p 8000:8000 mlops-api# Server
python cli/main.py serve start # Start API server
python cli/main.py serve start -p 8080 # Custom port
python cli/main.py serve mlflow # Start MLflow UI
# Model Registry
python cli/main.py registry status # Lihat Blue/Green status
python cli/main.py registry list # List semua versi
python cli/main.py registry promote 2 # Promote versi ke production
python cli/main.py registry rollback 1 # Rollback ke versi sebelumnya
python cli/main.py registry runs # List MLflow runs
python cli/main.py registry register <run_id> # Register model dari run
# Training
python cli/main.py train quick # Quick training (no registry)
python cli/main.py train pipeline # Production training + registry
python cli/main.py train compare-algos # Compare semua algoritma
python cli/main.py train tune # Hyperparameter tuning (Optuna)
# Data
python cli/main.py data --help # Data operations
# Monitoring
python cli/main.py monitor --help # Monitoring operations
# Model Testing
python cli/main.py model test <version> # Test model tertentu
python cli/main.py model compare 1 2 # Compare dua versi| Stage | Deskripsi |
|---|---|
| π΅ BLUE | Model production yang aktif |
| π’ GREEN | Model staging untuk testing |
Promote model dari GREEN ke BLUE:
python cli/main.py registry promote <version>Dashboard menampilkan:
- Distance Drift - Pergeseran distribusi jarak trip
- Target Drift - Pergeseran prediksi fare
- Model Info - Nama & versi model aktif
- Charts - Visualisasi distribusi data
Sistem menggunakan Evidently AI untuk deteksi drift:
| Komponen | Deskripsi |
|---|---|
| Reference Data | 10,000 sample dari training data |
| Current Data | 100 prediksi terakhir |
| Algoritma | Wasserstein Distance (numerik) |
| Threshold | 0.3 (per fitur), 50% (dataset) |
| Fitur | Range | Deskripsi |
|---|---|---|
PULocationID |
Dropdown (263) | Zona pickup NYC |
DOLocationID |
Dropdown (263) | Zona dropoff NYC |
passenger_count |
1 - 6 | Jumlah penumpang |
pickup_hour |
0 - 23 | Jam pickup |
pickup_dayofweek |
0 - 6 | Hari (0=Senin) |
VendorID |
Dropdown | Vendor taxi |
| Fitur | Formula | Sumber Data |
|---|---|---|
trip_distance |
Lookup table | Rata-rata per rute (39,307) |
is_weekend |
1 if dayofweek >= 5 else 0 |
Dari input user |
trip_duration_minutes |
(distance / 11) * 60 |
11 mph = avg speed NYC |
pickup_month |
random(1-5) |
Training data hanya Jan-Mei |
hour_sin, hour_cos |
Cyclical encoding | Pattern waktu circular |
dow_sin, dow_cos |
Cyclical encoding | Pattern hari circular |
avg_speed_mph |
distance / (duration/60) |
= 11 mph |
is_rush_hour |
1 if 16 <= hour <= 19 |
Jam sibuk sore |
same_location |
1 if PU == DO |
Dari input user |
| Fitur | Nilai | Alasan |
|---|---|---|
has_tolls |
0 | Simplifikasi untuk demo |
Sistem menggunakan pre-computed lookup table untuk estimasi jarak:
- User pilih Pickup Location (dropdown 263 zona)
- User pilih Dropoff Location (dropdown 263 zona)
- Sistem query lookup table β auto-fill
trip_distance
| Statistik | Nilai |
|---|---|
| Total Routes | 39,307 |
| Sumber | 11M trips |
| Algoritma | Mean per route |
| Default (jika N/A) | 3.0 miles |
Pickup: Midtown Center (161)
Dropoff: Upper East Side South (237)
β Lookup: "161_237" = 1.07 miles
Dihitung dari 11 juta trips seluruh NYC:
- Mean actual: 11.11 mph
- Dibulatkan: 11 mph
Training data hanya berisi bulan Januari-Mei:
- Jan: 1.9 juta trips
- Feb: 2.0 juta trips
- Mar: 2.4 juta trips
- Apr: 2.3 juta trips
- Mei: 2.5 juta trips
| Method | Endpoint | Deskripsi |
|---|---|---|
| GET | / |
Dashboard HTML |
| GET | /health |
Health check |
| POST | /predict |
Prediksi tarif |
| GET | /model/info |
Info model |
| GET | /zones |
List 263 zona NYC |
| GET | /route-distance |
Estimasi jarak rute |
| GET | /monitoring/drift |
Drift metrics |
| GET | /docs |
Swagger UI |
curl -X POST https://mlops-nyc-taxi-7eaf5edf4a58.herokuapp.com/predict \
-H "Content-Type: application/json" \
-d '{
"trip_distance": 5.0,
"passenger_count": 2,
"pickup_hour": 14,
"pickup_dayofweek": 2,
"PULocationID": 161,
"DOLocationID": 237,
"pickup_month": 1
}'Project ini di-deploy ke Heroku dengan auto-deploy dari GitHub:
- Push ke GitHub β Heroku auto-rebuild
- Zero-downtime deployment
- Docker-based containerization
- Backend: FastAPI, Uvicorn
- ML: Scikit-learn, Pandas, NumPy
- Tracking: MLflow
- Frontend: HTML, CSS, JavaScript, Chart.js
- Deployment: Docker, Heroku
- CI/CD: GitHub (auto-deploy)
Kemas Veriandra Ramadhan
Ahmad Sahidin Akbar
Eli Dwi Putra Berema
Nisrina Nur Afifah
β Khaalishah Zuhrah Alyaa V.
MIT License