Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The simd compare kernels use the following pattern to handle the remainder that is not a multiple of the number of vector lanes:
if rem > 0 {
let simd_left = T::load(left.value_slice(len - rem, lanes));
let simd_right = T::load(right.value_slice(len - rem, lanes));
let simd_result = op(simd_left, simd_right);
let rem_buffer_size = (rem as f32 / 8f32).ceil() as usize;
T::bitmask(&simd_result, |b| {
result.extend_from_slice(&b[0..rem_buffer_size]);
});
}
While this avoids writing into result out of bounds, it still reads from the left and right arrays at out of bounds indices and valgrind complains about that. I propose to rewrite the logic to use chunked iteration, with a scalar loop for the remainder, similar to the change for arithmetic kernels in ARROW-10914.
Attachments
Issue Links
- links to