An AI agent for analyzing soccer videos: player detection, team classification, and pitch keypoint detection. Built with YOLO for object detection, OSNet for team re-identification, and HRNet for field keypoint detection.
SNGS-117_tv_tv_teams.mp4
- Player detection — YOLO-based detection of players on the pitch
- Team classification — OSNet embeddings + K-means clustering to assign players to Team 1 or Team 2
- Pitch keypoint detection — HRNet detects field markers for homography and field normalization
- Kit color analysis — Grass-aware color extraction as fallback for team differentiation
soccer-video-detection-ai-agent/
├── src/
│ └── soccer_agent/
│ ├── __init__.py
│ ├── agent.py # Main AI agent (YOLO + OSNet + HRNet orchestration)
│ └── types.py # BoundingBox, TVFrameResult models
├── weights/ # Model weights (git-lfs or download separately)
│ ├── player_detect.pt # YOLO
│ ├── keypoint_detect.pt # HRNet
│ ├── osnet_model.pth.tar-100
│ └── hrnetv2_w48.yaml
├── scripts/
│ └── run_video.py # CLI to process videos
├── run.py # Entry point
├── requirements.txt
├── pyproject.toml
└── README.md
Purpose: Object detection — locates players (and optionally ball, referees) in each video frame.
Role in the agent: Runs first on every frame. Outputs bounding boxes with class IDs and confidence scores. Player boxes (class ID 2) are passed to OSNet for team assignment. Also assigns track IDs for temporal consistency across frames.
Purpose: Person re-identification — produces embedding vectors from cropped upper-body images.
Role in the agent: Takes player crops from YOLO boxes and extracts 512‑dim embeddings. Embeddings are aggregated per track, then clustered with K-means (2 clusters) to assign each player to Team 1 or Team 2. Uses kit/jersey appearance to distinguish teams. If OSNet weights are missing, the agent falls back to HSV-based kit color analysis.
Purpose: Pitch keypoint detection — detects 32 field markers (lines, corners, markings) in each frame.
Role in the agent: Outputs heatmaps for field keypoints. These are mapped to a standard pitch template and refined via homography. Used for field normalization and warping (e.g., bird’s-eye view), not for player body pose.
Model inference runs on GPU for acceptable performance. The agent uses CUDA when available (YOLO, OSNet, HRNet).
| Requirement | Minimum |
|---|---|
| GPU | NVIDIA GPU with CUDA support |
| VRAM | 24 GB+ recommended |
| CUDA | 11.8+ (matches PyTorch / Ultralytics) |
| cuDNN | Compatible with your CUDA version |
Without a GPU, inference falls back to CPU and will be significantly slower. For production or batch processing, a GPU machine is strongly recommended.
- Create a virtual environment and install dependencies:
python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt-
Place model weights in
weights/(see structure above). -
Run on a video:
python run.py --video path/to/soccer_video.mp4 --output-dir ./output --save-videoOr use the script directly:
python scripts/run_video.py --video path/to/video.mp4Reach out via Telegram: t.me/whisdev