Skip to content

aitf-sr1/Temporal-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

3D Models

Klasifikasi level video: input berupa 16 frame berurutan dari sebuah klip → satu prediksi per klip.

Pipeline

Folder Backbone Batch × accum Effective batch Epochs Loss default Catatan
swin3d torchvision.models.video.swin3d_t (K400) 2 × 8 16 40 asl SWA epoch 30+, EigenCAM
videomae OpenGVLab/VideoMAEv2-Base (HF) 8 × 16 128 20 focal trust_remote_code=True

Konfigurasi Umum

Parameter swin3d videomae
Input shape (B, 3, 16, 224, 224) (B, 16, 3, 224, 224)
Normalisasi ImageNet mean/std ImageNet mean/std
LR backbone 5e-5 1e-4
LR head 5e-4 (backbone × 10) 1e-3 (backbone × 10)
Weight decay 3e-4 1e-4
Freeze epochs 3 5
Warmup epochs — (OneCycleLR, pct_start=0.1) 2 (linear setelah unfreeze)
Grad clip 1.0 1.0
Dropout 0.5 0.5
Early stopping patience=8
Seed 42 42

Swin3D menggunakan OneCycleLR (dengan 10% warmup) selama fase normal, lalu beralih ke SWALR (swa_lr=1e-5) di fase SWA ab epoch 30. VideoMAE menggunakan ConstantLR saat freeze → LinearLR warmup → CosineAnnealingLR.

Struktur per pipeline

<model>/
├── README.md
├── config.py
├── dataset.py
├── model.py
├── loss.py
├── train.py
├── sweep.py
├── _sweep_worker.py
├── gradcam.py
└── crop_faces_video.py   # khusus videomae

Input & Dataset

  • Temporal subsampling: NUM_FRAMES = 16 frame per klip
  • Resolusi per frame: IMG_SIZE = 224 × 224
  • Swin3D: dataset berupa folder frame (frame_00.jpg s.d. frame_15.jpg)
  • VideoMAE: dataset berupa file video MP4 (dibaca via OpenCV)
  • Label CSV: Label3d/train.csv, Label3d/val.csv, Label3d/test.csv
  • Kolom: video_path, Boredom, Engagement, Confusion, Frustration

Konvensi Umum

  • Multi-label sigmoid dengan 4 emosi: Boredom, Engagement, Confusion, Frustration
  • Pencarian threshold per-label pada separuh validation; evaluasi di separuh sisanya (hindari leakage)
  • eval_criterion tanpa pos_weight untuk val/test agar loss comparable antar run
  • Setiap run sweep dijalankan di subprocess terpisah via _sweep_worker.py
  • VideoMAE memerlukan crop_faces_video.py dijalankan dulu untuk mode supercrop/faceonly

Untuk detail lengkap parameter, visualisasi, dan cara pakai, lihat README masing-masing pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages