Paper: Adaptive Image Zoom-in with Bounding Box Transformation for UAV Object Detection ArXiv: https://arxiv.org/abs/2602.07512 GitHub: https://github.com/twangnh/zoomdet_code Defense Score: 41/50 | Tier: T2 Wave: 10 — WARDOG (War Dog Breeds) Focus: UAV/Drone Defense for Shenzhen Robot Fair Backbone Contract: YOLO26 only
GREYHOUND brings the paper's adaptive zoom front-end into the ANIMA stack and rebases the downstream detector onto the Wave-10 YOLO26 contract. The module is built end-to-end — model, training driver, evaluator, FastAPI service, Docker, and a ROS2 node — and is ready to train the moment the NIGHTHAWK UAV mega dataset finishes rendering on the shared GPU server.
- Input image →
OffsetNet(truncated ResNet18 after block 2) predicts low-resolution 2D offsets. - Offsets upsample to the detector input size and convert into a normalized sampling grid.
- Bilinear warp produces a zoomed image; ground-truth boxes are corner-aligned into the zoomed frame during training.
- YOLO26 runs inference on the zoomed image.
- Predictions map back to the original frame for downstream consumers (ROS2 detections, FastAPI
/predict).
# Install dependencies (CUDA 12.8 wheels via project pyproject.toml)
uv pip install -e ".[dev]"
# Synthetic dry run (no weights required)
python -m anima_greyhound --dry-run
# Serve the FastAPI app
python -m anima_greyhound.serve --host 0.0.0.0 --port 8000
curl http://localhost:8000/health
# Container
docker compose -f docker/docker-compose.yml up --buildTraining is gated by --confirm-gpu-free while NIGHTHAWK is generating the UAV mega dataset. Until the DONE.flag appears the training script only runs preflight:
python scripts/train.py --config configs/paper.toml
# Once NIGHTHAWK is done and /gpu-batch-finder confirms memory:
python scripts/train.py --config configs/paper.toml --confirm-gpu-free \
--data /mnt/train-data/datasets/nighthawk_mega_highres/data.yamlAll training artifacts land under /mnt/artifacts-datai/{checkpoints,logs,exports}/project_greyhound/.
project_greyhound/
├── src/anima_greyhound/ # Zoom transform, datasets, eval, API, ROS2, monitoring
├── scripts/ # train.py, export.py, download_data.sh
├── launch/ # ROS2 launch descriptions
├── prds/ # 7-PRD ANIMA build suite
├── tasks/ # Granular implementation breakdown
├── tests/ # Unit tests (geometry, zoom, eval, api, monitoring)
├── configs/ # default.toml, paper.toml, debug.toml
├── docker/ # Dockerfile + docker-compose.yml
├── papers/ # Paper PDF
├── anima_module.yaml # ANIMA registry manifest
├── ASSETS.md # Data + weights manifest
├── PRD.md # Module-level contract
├── TRAINING_REPORT.md # Pre-training snapshot
└── NEXT_STEPS.md # Execution ledger
All first-party code accepts mlx, cuda, or cpu via device.py. CUDA is the default training target on the shared L4 GPU server.
- Paper PDF read from
papers/2602.07512.pdf - Reference repo cloned into
references/zoomdet_code - Public repo verified as Faster R-CNN/MMDetection branch
- YOLO branch referenced but not locally verified (ANIMA owns the YOLO26 rebase)
- Shared dataset volume exists; VisDrone / UAVDT / SeaDroneSee / DroneVehicle provisioning pending
Research use only. See paper for original license terms.
