BirdBox

Deep Learning Bird Call Detection & Evaluation System

BirdBox is a comprehensive system for detecting and evaluating bird calls in audio recordings using deep learning. It leverages YOLO (You Only Look Once) object detection on spectrogram images to identify and localize bird vocalizations in time and frequency.

⚠️ Note: This project is still in active development. Performance may vary.

Key Features

Multiple Audio Formats - Supports WAV, FLAC, OGG, MP3 (WAV/FLAC recommended for best results)
Arbitrary-Length Audio Processing - Handle audio from seconds to hours
Song Reconstruction - Automatically merge temporally adjacent detections into continuous bird songs
Batch Processing - Process entire directories of audio files
PCEN Normalization - Per-Channel Energy Normalization for robust spectral features
Comprehensive Evaluation - F-beta analysis, confusion matrices, optimal threshold finding
Multiple Output Formats - JSON, CSV (compatible with annotation formats)
Model Agnostic - Works with .pt, .onnx, .engine model formats

YOLO-Models

Trained YOLO-Models for this task can be found on the TUC-Cloud. Alternatively, you can train your own model on a custom dataset by using the code available in the BirdBox-Train repository (not yet publicly available).

To specify the model using the CLI, just pass the relative path of the model as the --model command-line argument. If you use the code as a package, you can specify the model function parameter to match the relative path of the model file.

Important: The species mapping in the conf.yaml file the model is trained with and the DATASETS[model_name] dictionary in src/config.py have to match.

Quick Start

Installation

# Clone the repository
git clone https://github.com/birdnet-team/BirdBox.git
cd BirdBox

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Basic Usage, i. e. run detection on audio

Option 1: Web Interface (Streamlit App)

The easiest way to use BirdBox is through the interactive web interface:

streamlit run src/streamlit/app.py

Then open your browser to http://localhost:8501 and:

Upload audio files (WAV, FLAC, OGG, MP3)
Select a model from the dropdown
Adjust detection parameters with sliders
View PCEN spectrograms with bounding boxes
Download results as JSON or CSV

If done correctly, the Streamlit Web Interface will look like this:

Option 2: Command Line Interface

# Detect birds in a single audio file (supports WAV, FLAC, OGG, MP3)
python src/inference/detect_birds.py \
    --audio path/to/recording.wav \
    --model models/best.pt \
    --species-mapping Hawaii

# Or process entire directory (batch processing)
python src/inference/detect_birds.py \
    --audio path/to/audio/folder \
    --model models/best.pt \
    --species-mapping Hawaii

Typical Workflow

Complete Detection & Evaluation Pipeline

# Step 1: Run comprehensive detection with low confidence threshold
python src/inference/detect_birds.py \
    --audio data/test_audio/ \
    --model models/best.pt \
    --species-mapping Hawaii \
    --conf 0.001 \
    --output-path results/all_detections \
    --output-format json

# Step 2: Analyze F-beta scores to find optimal threshold
python src/evaluation/f_beta_score_analysis.py \
    --detections results/all_detections.json \
    --labels data/test_labels.csv \
    --beta 2.0 \
    --output-path results/f_beta_analysis

# Step 3: Filter detections to optimal threshold (e.g., 0.35)
python src/evaluation/filter_detections.py \
    --input results/all_detections.json \
    --conf 0.35 \
    --output-path results/filtered_detections \
    --format all

# Step 4: Generate confusion matrix
python src/evaluation/confusion_matrix_analysis.py \
    --detections results/filtered_detections.csv \
    --labels data/test_labels.csv \
    --output-path results/confusion_matrix

# Step 5: Examine results in results/ directory

Package Usage

Detection Library

from inference.detect_birds import BirdCallDetector

# Initialize detector
detector = BirdCallDetector(
    model_path="models/best.pt",
    species_mapping="Hawaii",
    conf_threshold=0.001,
    song_gap_threshold=0.1
)

# Detect birds (supports WAV, FLAC, OGG, MP3)
detections = detector.detect(
    "path/to/audio.wav",  # or .flac, .ogg, .mp3
    output_path="results/detections"
)

# Print summary
detector.print_summary(detections)

# Access detection data
for det in detections:
    print(f"{det['species']}: {det['time_start']:.1f}s - {det['time_end']:.1f}s "
          f"(confidence: {det['confidence']:.3f})")

Evaluation Library

from evaluation.f_beta_score_analysis import FBetaScoreAnalyzer

# Create analyzer
analyzer = FBetaScoreAnalyzer(
    iou_threshold=0.5,
    beta=2.0,
    use_optimal_matching=True
)

# Analyze performance
results_df = analyzer.analyze_confidence_thresholds(
    detections_path="results/all_detections.json",
    labels_path="data/ground_truth.csv",
    confidence_thresholds=[0.1, 0.2, 0.3, 0.4, 0.5]
)

# Find optimal thresholds
optimal_df = analyzer.find_optimal_thresholds(results_df)
print(optimal_df)

Performance Optimization

For Detection

Use GPU acceleration (automatically detected)
Use TensorRT models (.engine) for NVIDIA GPUs
Lower confidence threshold for comprehensive detection, filter later
Adjust song_gap_threshold based on species vocalization patterns
Adjust ìou-threshold to fit the specific use-case

For Evaluation

Tune the β-Parameter for the Fβ-Analysis to fit the specific use-case
β < 1 leads to more weight on precision
β > 1 leads to more weight on recall

Troubleshooting

Common Issues

"No detections found"

Lower confidence threshold (--conf 0.001)
Check if audio file is in a supported format (WAV, FLAC, OGG, MP3)
Verify model is trained on similar species
If using MP3/OGG, try with WAV/FLAC version of same recording

"Poor detection performance"

Use lossless formats (WAV/FLAC) instead of lossy (MP3/OGG)
Model was trained on WAV files - lossy compression can affect accuracy
Ensure MP3/OGG files use high bitrate (≥256 kbps) if you must use them

"Out of memory errors"

Process shorter audio files
Reduce PCEN segment length in config
Use smaller YOLO model (e.g., yolo11n instead of yolo11l)

"No matching files in evaluation"

Check filename formats (tools auto-normalize extensions)
Verify ground truth CSV has correct column names
Ensure audio filenames match between detections and labels

Citation

Feel free to use BirdBox for your acoustic analyses and research. If you do, please cite as:

@article{kahl2021birdnet,
  title={BirdNET: A deep learning solution for avian diversity monitoring},
  author={Kahl, Stefan and Wood, Connor M and Eibl, Maximilian and Klinck, Holger},
  journal={Ecological Informatics},
  volume={61},
  pages={101236},
  year={2021},
  publisher={Elsevier}
}

Funding

Our work in the K. Lisa Yang Center for Conservation Bioacoustics is made possible by the generosity of K. Lisa Yang to advance innovative conservation technologies to inspire and inform the conservation of wildlife and habitats.

The development of BirdNET is supported by the German Federal Ministry of Research, Technology and Space (FKZ 01|S22072), the German Federal Ministry for the Environment, Climate Action, Nature Conservation and Nuclear Safety (FKZ 67KI31040E), the German Federal Ministry of Economic Affairs and Energy (FKZ 16KN095550), the Deutsche Bundesstiftung Umwelt (project 39263/01) and the European Social Fund.

Partners

BirdNET is a joint effort of partners from academia and industry. Without these partnerships, this project would not have been possible. Thank you!

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.streamlit		.streamlit
img		img
models		models
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BirdBox

Key Features

YOLO-Models

Quick Start

Installation

Basic Usage, i. e. run detection on audio

Option 1: Web Interface (Streamlit App)

Option 2: Command Line Interface

Typical Workflow

Complete Detection & Evaluation Pipeline

Package Usage

Detection Library

Evaluation Library

Performance Optimization

For Detection

For Evaluation

Troubleshooting

Common Issues

Citation

Funding

Partners

About

Uh oh!

Releases

Packages

Languages

birdnet-team/BirdBox

Folders and files

Latest commit

History

Repository files navigation

BirdBox

Key Features

YOLO-Models

Quick Start

Installation

Basic Usage, i. e. run detection on audio

Option 1: Web Interface (Streamlit App)

Option 2: Command Line Interface

Typical Workflow

Complete Detection & Evaluation Pipeline

Package Usage

Detection Library

Evaluation Library

Performance Optimization

For Detection

For Evaluation

Troubleshooting

Common Issues

Citation

Funding

Partners

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages