[Type] Mat: better cache locality for operator*(Mat) #5921
+6
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changing accesses for better cache locality (suggested by AI)
TL;DR:
the Mat<3,3> version does not change because it has its own optimized specialized version
bigger the matrices, bigger the gain (Mat24x24, speedup of 400% in floats !)
macOS has a weird quirk for Mat6x6 on double, which is 50% slower ? 🤔 maybe due to a failed vectorization or somethin'
Timings:
Ubuntu 22.04, gcc12, lto, O3
Windows VS2026, release, lto
macOS, xcode 26, lto
By submitting this pull request, I acknowledge that
I have read, understand, and agree SOFA Developer Certificate of Origin (DCO).
Reviewers will merge this pull-request only if