Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When concatenating dictionary arrays with nulls, and whose index type is not 8-bit wide the wrong bits of the index buffer get zeroed out.
Example using pyarrow:
import pyarrow as pa dictionary_type = pa.dictionary(pa.int16(), pa.string()) empty_array = pa.array([], dictionary_type) array1 = pa.array(["a", "b", None], dictionary_type) array2 = pa.concat_arrays([empty_array, array1]) print(array1.to_pylist()) print(array2.to_pylist())
We would expect array1 and array2 to be the same, but this prints:
['a', 'b', None] ['a', 'a', None]
This bug happens because the index type is 2-byte wide, so the null at position 2 should result in zeroing out byte 4-5 (0-indexed) of the index buffer. However the code instead zeroes out byte 2-3 because we don't take into account the width of the index type when adding the position here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/concatenate.cc#L314-L315
Attachments
Issue Links
- links to