Skip to content

Loop vectorizer generates inefficient code #172217

@zijinshanren

Description

@zijinshanren

https://godbolt.org/z/MPPnvT5h8

the simple code:

void swap_ptr_impl(int64_t* ptr, size_t len) {
    for (size_t i = 0; i < len; i++) {
        ptr[i] = std::byteswap(ptr[i]);
    }
}
void swap_ptr2_impl(int64_t* ptr, size_t len) {
    auto end = ptr + len;
    for (; ptr < end; ptr++) {
        *ptr = std::byteswap(*ptr);
    }
}

void swap_span_impl(std::span<int64_t> sp) {
    for (auto& x : sp) {
        x = std::byteswap(x);
    }
}

void swap_span_2(std::span<int64_t, 1024> sp) {
    for (auto& x : sp) {
        x = std::byteswap(x);
    }
}

swap_ptr_impl is 2x slower than other functions on i9-14900KF. 2.8x slower is seen on quickbench.
swap_span_2 (span length known) is also 2x slower.

Run on (32 X 3187 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 2048 KiB (x16)
  L3 Unified 36864 KiB (x1)
------------------------------------------------------
Benchmark            Time             CPU   Iterations
------------------------------------------------------
swap_ptr           400 ns          390 ns      1723077
swap_ptr2          184 ns          180 ns      4072727
swap_span          176 ns          165 ns      4072727
swap_span_2        403 ns          399 ns      1723077

with -fno-vectorize, the results are reasonable.

Run on (32 X 3187 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 2048 KiB (x16)
  L3 Unified 36864 KiB (x1)
------------------------------------------------------
Benchmark            Time             CPU   Iterations
------------------------------------------------------
swap_ptr           181 ns          184 ns      4072727
swap_ptr2          181 ns          180 ns      3733333
swap_span          173 ns          172 ns      3733333
swap_span_2        175 ns          173 ns      4072727

so I assume that there is something wrong in the loop vectorizer. Verified since clang 17.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions