Skip to content

[missed-opt] Naive per-bit loop not optimized to popcnt #172193

@purplesyringa

Description

@purplesyringa

Godbolt

In this snippet, popcount_naive is lowered as an unrolled loop, while popcount_bit_tricks is optimized to a popcnt instruction. This seems kind of backwards -- I would've expected a simpler loop to get recognized as an idiom than the more complex one.

int popcount_naive(unsigned long x) {
    int result = 0;
    for (int i = 0; i < 64; i++) {
        result += (x >> i) & 1;
    }
    return result;
}

int popcount_bit_tricks(unsigned long x) {
    int result = 0;
    while (x) {
        result++;
        x &= x - 1;
    }
    return result;
}

Though I imagine improving this would be non-trivial, since you'd have to recognize this either before the loop is unrolled or somehow find all the accesses later...

Bonus:

int popcount_divide_and_conquer(unsigned long x) {
    x = (x & 0x5555555555555555) + ((x >> 1 ) & 0x5555555555555555);
    x = (x & 0x3333333333333333) + ((x >> 2 ) & 0x3333333333333333);
    x = (x & 0x0f0f0f0f0f0f0f0f) + ((x >> 4 ) & 0x0f0f0f0f0f0f0f0f);
    x = (x & 0x00ff00ff00ff00ff) + ((x >> 8 ) & 0x00ff00ff00ff00ff);
    x = (x & 0x0000ffff0000ffff) + ((x >> 16) & 0x0000ffff0000ffff);
    x = (x & 0x00000000ffffffff) + ((x >> 32) & 0x00000000ffffffff);
    return x;
}

This is another popcount idiom, and it's always more performant to optimize it to popcnt if it's available.

It also seems reasonable to optimize popcount_naive to popcount_divide_and_conquer on targets without popcnt. (Optimizing popcount_bit_tricks to popcount_divide_and_conquer is more iffy, since the latter has worse performance on numbers with low popcount.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions