Skip to content

LR / BSZ grid for MoE reference model #162

@aaronkl

Description

@aaronkl

Find a good configuration for learning rate and global batch size for the Qwen3 0.6BA100M model.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions