Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.12.0
Description
In the example below the PyArrow array gives wrong results for strided numpy arrays when the type is different from the initial Numpy type:
>> import pyarrow as pa >> import numpy as np >> np_array = np.arange(0, 10, dtype=np.float32)[1:-1:2] >> pa.array(np_array, type=pa.float64()) <pyarrow.lib.DoubleArray object at 0x7f8453de8138> [ 1, 2, 3, 4 ]
When copying the Numpy array to a new location is gives the expected output:
>> import pyarrow as pa >> import numpy as np >> np_array = np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2]) >> pa.array(np_array, type=pa.float64()) <pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8> [ 1, 3, 5, 7 ]
Looking at the code it seems that to determine the number of elements, the target type is used instead of the initial numpy type.
In this case the stride is 8 bytes which corresponds to 2 elements in float32 whereas the codes tries to determine the number of elements with the target type which gives 1 element of float64 and therefore it reads the array one by one instead of every 2 elements until reaching the total number of elements.