Reproducing the reported results with open-sourced 47k cold-start dataset

Thanks for your great work!

I used the open-sourced 47k cold-start dataset to finetune Qwen-2.5-VL-7B-Instruct for 5 epochs with the initial lr of 2e-5 and cosine lr decay strategy. But I cannot reproduce the results the same as the official ReVisual-R1-Coldstart checkpoint.  For example, the performance comparisons between official and reproduced models are 55.1 vs 46 (MMMU), 48.9 vs 35.2 (MathVision), where 20k max_new_tokens are set.

Best,
  Jay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducing the reported results with open-sourced 47k cold-start dataset #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducing the reported results with open-sourced 47k cold-start dataset #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions