Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
7.0.0
Description
Hi ! I noticed this bug by running this code:
import pyarrow as pa
arr = pa.array([None, [0]])
reconstructed_arr = pa.ListArray.from_arrays(arr.offsets, arr.values)
print(reconstructed_arr.to_pylist())
# [[], [0]]
The resulting array, reconstructed from the offsets and values of the original array, is not the same at the original array.
This is the case because it seems that `arr.offsets` is wrong. Indeed it returns `[0, 0, 1]` instead of `[None, 0, 1]`:
print(arr.offsets.to_pylist()) # [0, 0, 1] fixed_reconstructed_arr = pa.ListArray.from_arrays(pa.array([None, 0, 1]), arr.values) print(fixed_reconstructed_arr.to_pylist()) # [None, [0]]
If it can help, here is my investigation:
The offsets seem to be wrong because they don't include the validity bitmap from `arr.buffers()[0]`, which is used to say which values are null and which values are non-null. Therefore the `None` is replaced by `0`.
Though even if the validity bitmap is not taken into account at all, I checked its value and it was not what I expected: the validity bitmap at `arr.buffers()[0]` is supposed to be `110` (in order to mask the None in `[None, 0, 1]`) but it is `10` for some reason:
bin(int(arr.buffers()[0].hex(), 16)) # '0b10' # I think it should be 0b110 - 1 corresponds to non-null and 0 corresponds to null, if you take the bits in reverse order
Attachments
Issue Links
- links to