Thanks for your great work!
I used the open-sourced 47k cold-start dataset to finetune Qwen-2.5-VL-7B-Instruct for 5 epochs with the initial lr of 2e-5 and cosine lr decay strategy. But I cannot reproduce the results the same as the official ReVisual-R1-Coldstart checkpoint. For example, the performance comparisons between official and reproduced models are 55.1 vs 46 (MMMU), 48.9 vs 35.2 (MathVision), where 20k max_new_tokens are set.
Best,
Jay