Skip to content

Add grounding flops tool#12430

Open
0xyangl wants to merge 6 commits intoopen-mmlab:mainfrom
0xyangl:add-grounding-flops-tool
Open

Add grounding flops tool#12430
0xyangl wants to merge 6 commits intoopen-mmlab:mainfrom
0xyangl:add-grounding-flops-tool

Conversation

@0xyangl
Copy link
Copy Markdown

@0xyangl 0xyangl commented Feb 10, 2026

Motivation

The existing tools/analysis_tools/get_flops.py does not support grounding / vision-language detection models (e.g., GroundingDINO) because these models require text inputs and have multi-modal architectures that cannot be traced end-to-end with mmengine.analysis.get_model_complexity_info.

This PR adds a dedicated FLOPs analysis tool that handles the unique architecture of grounding detection models, providing per-component FLOPs and parameter breakdowns.

Modification

New file: tools/analysis_tools/get_flops_grounding.py

A script that computes per-component FLOPs and parameter counts for grounding detection models:

  • Vision Backbone: Accurate FLOPs via fvcore.nn.FlopCountAnalysis
  • Text Encoder: Estimated FLOPs based on model type (CLIP, BERT, etc.)
  • Neck (ChannelMapper): Estimated from config-driven channel/stride info
  • Transformer Encoder/Decoder: Estimated from config-driven architecture params
  • Detection Head: Parameter count

Key design choices:

  • Automatically disables with_cp (gradient checkpointing) which is incompatible with JIT tracing, without modifying the original config
  • Reads architecture parameters (channels, layers, embed_dim, etc.) dynamically from the model config instead of hardcoding
  • Uses MMLogger consistent with existing mmdet tools

New file: tests/test_tools/test_get_flops_grounding.py

41 unit tests covering all helper functions and config readers.

BC-breaking

No. This PR only adds new files and does not modify any existing code.

Use cases

# Basic usage
python tools/analysis_tools/get_flops_grounding.py \
    configs/mm_grounding_dino/grounding_dino_swin-t_finetune_8xb4_20e_cat.py

# Custom input shape
python tools/analysis_tools/get_flops_grounding.py <config> --shape 640 640

# Specify number of classes for text encoder FLOPs estimation
python tools/analysis_tools/get_flops_grounding.py <config> --num-classes 80

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 10, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ 0xyangl
❌ lauriebax
You have signed the CLA already but the status is still pending? Let us recheck it.

@0xyangl 0xyangl force-pushed the add-grounding-flops-tool branch from 96b9b2a to 3f1c603 Compare February 10, 2026 02:54
## Motivation

The existing [tools/analysis_tools/get_flops.py](cci:7://file:///home/david/mmdetection/tools/analysis_tools/get_flops.py:0:0-0:0) does not support grounding / vision-language detection models (e.g., GroundingDINO, GroundingCLIP) because these models require text inputs and have multi-modal architectures that cannot be traced end-to-end with `mmengine.analysis.get_model_complexity_info`.

This PR adds a dedicated FLOPs analysis tool that handles the unique architecture of grounding detection models, providing per-component FLOPs and parameter breakdowns.

## Modification

**New file: [tools/analysis_tools/get_flops_grounding.py](cci:7://file:///home/david/mmdetection/tools/analysis_tools/get_flops_grounding.py:0:0-0:0)**

A script that computes per-component FLOPs and parameter counts for grounding detection models:
- **Vision Backbone**: Accurate FLOPs via `fvcore.nn.FlopCountAnalysis`
- **Text Encoder**: Estimated FLOPs based on model type (CLIP, BERT, etc.)
- **Neck (ChannelMapper)**: Estimated from config-driven channel/stride info
- **Transformer Encoder/Decoder**: Estimated from config-driven architecture params
- **Detection Head**: Parameter count

Key design choices:
- Automatically disables `with_cp` (gradient checkpointing) which is incompatible with JIT tracing, without modifying the original config
- Reads architecture parameters (channels, layers, embed_dim, etc.) dynamically from the model config instead of hardcoding
- Uses `MMLogger` consistent with existing mmdet tools

**New file: [tests/test_tools/test_get_flops_grounding.py](cci:7://file:///home/david/mmdetection/tests/test_tools/test_get_flops_grounding.py:0:0-0:0)**

41 unit tests covering all helper functions and config readers.

## BC-breaking

No. This PR only adds new files and does not modify any existing code.

## Use cases

```bash
# Basic usage
python tools/analysis_tools/get_flops_grounding.py \
    configs/mm_grounding_dino/grounding_dino_swin-t_finetune_8xb4_20e_cat.py

# Custom input shape
python tools/analysis_tools/get_flops_grounding.py <config> --shape 640 640

# Specify number of classes for text encoder FLOPs estimation
python tools/analysis_tools/get_flops_grounding.py <config> --num-classes 80
@0xyangl 0xyangl marked this pull request as draft February 10, 2026 03:38
@0xyangl 0xyangl marked this pull request as ready for review February 10, 2026 06:05
lauriebax and others added 2 commits February 10, 2026 09:13
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants