Skip to content

feat: add Qwen3-VL support with image and video token calculation#16

Merged
thisisiron merged 2 commits intomainfrom
feat/add-qwen3-vl-support
Jan 8, 2026
Merged

feat: add Qwen3-VL support with image and video token calculation#16
thisisiron merged 2 commits intomainfrom
feat/add-qwen3-vl-support

Conversation

@thisisiron
Copy link
Copy Markdown
Owner

Summary

  • Implement Qwen3VLAnalyst with proper video token calculation logic
  • Add smart_resize_video() function for Qwen3-VL's video preprocessing
  • Add qwen3-vl tests for both image and video token counting

Changes

src/vt_calculator/analysts/analyst.py

  • Added Qwen3VLAnalyst class that overrides calculate_video() method
  • Qwen3-VL uses different video processing parameters:
    • min_frames=4, max_frames=768
    • Uses video_processor.size for pixel limits instead of image_processor
    • ASSUMED_SOURCE_FPS=24.0 for fallback when video fps is unknown

src/vt_calculator/analysts/tools.py

  • Added smart_resize_video() function that handles temporal dimension in video resizing

tests/test_analysts.py

  • Added Qwen3VLAnalyst import
  • Added qwen3-vl test case for image token counting
  • Added qwen3-vl-video test case for video token counting

Test Results

All 8 tests pass:

tests/test_analysts.py::test_analyst_token_count_matches_transformers[qwen2.5-vl] PASSED
tests/test_analysts.py::test_analyst_token_count_matches_transformers[qwen3-vl] PASSED
tests/test_analysts.py::test_analyst_token_count_matches_transformers[internvl3] PASSED
tests/test_analysts.py::test_analyst_token_count_matches_transformers[llava] PASSED
tests/test_analysts.py::test_analyst_token_count_matches_transformers[llava-next] PASSED
tests/test_analysts.py::test_analyst_token_count_matches_transformers[llava-onevision] PASSED
tests/test_analysts.py::test_analyst_video_token_count_matches_transformers[qwen2.5-vl-video] PASSED
tests/test_analysts.py::test_analyst_video_token_count_matches_transformers[qwen3-vl-video] PASSED

- Implement Qwen3VLAnalyst with proper video token calculation logic
- Add smart_resize_video() for Qwen3-VL's video preprocessing
- Add qwen3-vl tests for both image and video token counting
- Qwen3-VL uses different video processing: min_frames=4, max_frames=768
  and video_processor settings for pixel limits
@thisisiron
Copy link
Copy Markdown
Owner Author

/style

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 8, 2026

Style fixes have been applied and pushed!

@thisisiron thisisiron merged commit 4f3d9d8 into main Jan 8, 2026
@thisisiron thisisiron deleted the feat/add-qwen3-vl-support branch January 8, 2026 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant