Paper: A Multimodal Transformer for UAV Detection (Radar+RGB+IR+Audio)
ArXiv: https://arxiv.org/abs/2511.15312
Status: Initial implementation baseline (paper verified with reproducibility risks)
Focus: UAV/Drone defense for Shenzhen Robot Fair
- Paper-audit workflow with explicit red-flag reporting
- Dual-backend runtime resolution (
mlx|cuda|cpu) throughdevice.py - Paper-aligned multimodal Transformer baseline:
- Input projection (128 -> 256)
- Positional encoding
- 2-layer Transformer encoder, 4 heads
- Mean pooling + MLP classifier
- Synthetic multimodal data pipeline (audio/rgb/ir/radar) with early-fusion tensor
- Training/evaluation loop for initial sanity checks
- YOLO26 adapter scaffold for integration with latest YOLO26 weights
- Numerical parity kernel check between MLX and Torch arithmetic paths
# install editable package
uv pip install -e ".[dev]"
# verify paper audit snapshot
python3 -m anima_shepherd --config configs/default.toml paper-audit
# run initial synthetic training (fast sanity run)
python3 -m anima_shepherd --config configs/default.toml train-synthetic --backend auto --epochs 1 --seq-len 64
# run single forward pass on selected backend
python3 -m anima_shepherd --config configs/default.toml infer-synthetic --backend mlx --batch-size 2
python3 -m anima_shepherd --config configs/default.toml infer-synthetic --backend cpu --batch-size 2
# MLX parity kernel check
python3 -m anima_shepherd --config configs/default.toml check-parity --backend mlx --atol 1e-2
# inspect YOLO26 integration readiness
python3 -m anima_shepherd --config configs/default.toml yolo26-statusbash scripts/download_data.sh --check
bash scripts/download_data.sh --downloadThe script is intentionally safe in this phase: it checks volume state and prepares acquisition notes for missing datasets without forcing downloads.
project_shepherd/
├── src/anima_shepherd/
│ ├── cli.py
│ ├── config.py
│ ├── data.py
│ ├── device.py
│ ├── modeling.py
│ ├── paper_audit.py
│ ├── training.py
│ └── yolo26_adapter.py
├── configs/default.toml
├── scripts/download_data.sh
├── PRD.md
├── VERIFICATION_REPORT.md
├── ARCHITECTURE.md
├── MODULE_TODO.md
└── NEXT_STEPS.md