-
Notifications
You must be signed in to change notification settings - Fork 69
Description
Hi,
First of all - thanks for creating (and open-sourcing) this swift code! Looks great!
I was looking through the SIMD wrappers for AVX512F in vector.h and I noticed a few wrappers that refer to non-existent intrinsics (at least in AVX512F) or have better implementations. In particular, vec_and maps to _mm512_and_ps, which does not exist (at least according to the Intel Intrinsics Guide). From the looks of it, all and/or operations are now only relevant for masks and not for individual data-types.
I also saw that vec_fabs is implemented via two intrinsics -- is the new _mm512_abs_ps intrinsic too slow?
I am also curious - I do not see any references to any mask(z)_load. I found those masks quite useful for staying in SIMD mode and eliminating the serial part of the code (dealing with remainder loops for array lengths not divisible by the SIMD width).
Once again, the performance gains look awesome!