[ARROW-5651] [Python] Incorrect conversion from strided Numpy array when other type is specified - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.12.0
Fix Version/s: 0.15.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22086

Description

In the example below the PyArrow array gives wrong results for strided numpy arrays when the type is different from the initial Numpy type:

>> import pyarrow as pa
>> import numpy as np
>> np_array = np.arange(0, 10, dtype=np.float32)[1:-1:2]
>> pa.array(np_array, type=pa.float64())
<pyarrow.lib.DoubleArray object at 0x7f8453de8138>
[
  1,
  2,
  3,
  4
]

When copying the Numpy array to a new location is gives the expected output:

>> import pyarrow as pa
>> import numpy as np
>> np_array = np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2])
>> pa.array(np_array, type=pa.float64())
<pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8>                                                                           [    
 1,
 3,
 5,
 7 
]

Looking at the code it seems that to determine the number of elements, the target type is used instead of the initial numpy type.

In this case the stride is 8 bytes which corresponds to 2 elements in float32 whereas the codes tries to determine the number of elements with the target type which gives 1 element of float64 and therefore it reads the array one by one instead of every 2 elements until reaching the total number of elements.

Attachments

Issue Links

links to

GitHub Pull Request #4958

GitHub Pull Request #5005

Activity

People

Assignee:: Takuya Kato

Reporter:: Fabian Höring

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Jun/19 15:40

Updated:: 11/Jan/23 07:41

Resolved:: 05/Aug/19 09:04

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 20m