Klasifikasi level video: input berupa 16 frame berurutan dari sebuah klip → satu prediksi per klip.
| Folder | Backbone | Batch × accum | Effective batch | Epochs | Loss default | Catatan |
|---|---|---|---|---|---|---|
swin3d |
torchvision.models.video.swin3d_t (K400) |
2 × 8 | 16 | 40 | asl |
SWA epoch 30+, EigenCAM |
videomae |
OpenGVLab/VideoMAEv2-Base (HF) |
8 × 16 | 128 | 20 | focal |
trust_remote_code=True |
| Parameter | swin3d | videomae |
|---|---|---|
| Input shape | (B, 3, 16, 224, 224) |
(B, 16, 3, 224, 224) |
| Normalisasi | ImageNet mean/std | ImageNet mean/std |
| LR backbone | 5e-5 | 1e-4 |
| LR head | 5e-4 (backbone × 10) | 1e-3 (backbone × 10) |
| Weight decay | 3e-4 | 1e-4 |
| Freeze epochs | 3 | 5 |
| Warmup epochs | — (OneCycleLR, pct_start=0.1) | 2 (linear setelah unfreeze) |
| Grad clip | 1.0 | 1.0 |
| Dropout | 0.5 | 0.5 |
| Early stopping | patience=8 | — |
| Seed | 42 | 42 |
Swin3D menggunakan OneCycleLR (dengan 10% warmup) selama fase normal, lalu beralih ke SWALR (swa_lr=1e-5) di fase SWA ab epoch 30.
VideoMAE menggunakan ConstantLR saat freeze → LinearLR warmup → CosineAnnealingLR.
<model>/
├── README.md
├── config.py
├── dataset.py
├── model.py
├── loss.py
├── train.py
├── sweep.py
├── _sweep_worker.py
├── gradcam.py
└── crop_faces_video.py # khusus videomae
- Temporal subsampling:
NUM_FRAMES = 16frame per klip - Resolusi per frame:
IMG_SIZE = 224 × 224 - Swin3D: dataset berupa folder frame (
frame_00.jpgs.d.frame_15.jpg) - VideoMAE: dataset berupa file video MP4 (dibaca via OpenCV)
- Label CSV:
Label3d/train.csv,Label3d/val.csv,Label3d/test.csv - Kolom:
video_path,Boredom,Engagement,Confusion,Frustration
- Multi-label sigmoid dengan 4 emosi: Boredom, Engagement, Confusion, Frustration
- Pencarian threshold per-label pada separuh validation; evaluasi di separuh sisanya (hindari leakage)
eval_criteriontanpapos_weightuntuk val/test agar loss comparable antar run- Setiap run sweep dijalankan di subprocess terpisah via
_sweep_worker.py - VideoMAE memerlukan
crop_faces_video.pydijalankan dulu untuk modesupercrop/faceonly
Untuk detail lengkap parameter, visualisasi, dan cara pakai, lihat README masing-masing pipeline.