Skip to content

Conversation

@awni
Copy link
Member

@awni awni commented Dec 22, 2025

Tests are passing but I haven't done any performance tuning yet.

Benchmarking Qwen3 4B on M4 Max:

Quantization Prefill @ 2048 TG @ 2048 / 128
mxfp8 1574.142 88.057
q8 1606.946 87.512
mxfp4 1612.251 141.271
nvfp4 1608.091 136.408
q4 1620.635 138.180

@awni awni requested a review from angeloskath December 23, 2025 00:24
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@awni awni merged commit 1eef1d1 into main Dec 23, 2025
27 of 30 checks passed
@awni awni deleted the metal_nvfp4 branch December 23, 2025 04:45
@awni awni mentioned this pull request Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants