Hi, thanks for your great work on “R1-VL: Learning to Reason with Multimodal LMs via Step-wise GRPO”.
In the paper / supplementary material you mention the Qwen2-VL-7B-GRPO model.
Could you please release this checkpoint, or share whether there are plans to release it?
Thanks!