-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Description
In this snippet, popcount_naive is lowered as an unrolled loop, while popcount_bit_tricks is optimized to a popcnt instruction. This seems kind of backwards -- I would've expected a simpler loop to get recognized as an idiom than the more complex one.
int popcount_naive(unsigned long x) {
int result = 0;
for (int i = 0; i < 64; i++) {
result += (x >> i) & 1;
}
return result;
}
int popcount_bit_tricks(unsigned long x) {
int result = 0;
while (x) {
result++;
x &= x - 1;
}
return result;
}Though I imagine improving this would be non-trivial, since you'd have to recognize this either before the loop is unrolled or somehow find all the accesses later...
Bonus:
int popcount_divide_and_conquer(unsigned long x) {
x = (x & 0x5555555555555555) + ((x >> 1 ) & 0x5555555555555555);
x = (x & 0x3333333333333333) + ((x >> 2 ) & 0x3333333333333333);
x = (x & 0x0f0f0f0f0f0f0f0f) + ((x >> 4 ) & 0x0f0f0f0f0f0f0f0f);
x = (x & 0x00ff00ff00ff00ff) + ((x >> 8 ) & 0x00ff00ff00ff00ff);
x = (x & 0x0000ffff0000ffff) + ((x >> 16) & 0x0000ffff0000ffff);
x = (x & 0x00000000ffffffff) + ((x >> 32) & 0x00000000ffffffff);
return x;
}This is another popcount idiom, and it's always more performant to optimize it to popcnt if it's available.
It also seems reasonable to optimize popcount_naive to popcount_divide_and_conquer on targets without popcnt. (Optimizing popcount_bit_tricks to popcount_divide_and_conquer is more iffy, since the latter has worse performance on numbers with low popcount.)