Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5651

[Python] Incorrect conversion from strided Numpy array when other type is specified

    XMLWordPrintableJSON

Details

    Description

      In the example below the PyArrow array gives wrong results for strided numpy arrays when the type is different from the initial Numpy type:

      >> import pyarrow as pa
      >> import numpy as np
      >> np_array = np.arange(0, 10, dtype=np.float32)[1:-1:2]
      >> pa.array(np_array, type=pa.float64())
      <pyarrow.lib.DoubleArray object at 0x7f8453de8138>
      [
        1,
        2,
        3,
        4
      ]
      

      When copying the Numpy array to a new location is gives the expected output:

      >> import pyarrow as pa
      >> import numpy as np
      >> np_array = np.array(np.arange(0, 10, dtype=np.float32)[1:-1:2])
      >> pa.array(np_array, type=pa.float64())
      <pyarrow.lib.DoubleArray object at 0x7f5a0af0a4a8>                                                                           [    
       1,
       3,
       5,
       7 
      ]  
      

      Looking at the code it seems that to determine the number of elements, the target type is used instead of the initial numpy type.

      In this case the stride is 8 bytes which corresponds to 2 elements in float32 whereas the codes tries to determine the number of elements with the target type which gives 1 element of float64 and therefore it reads the array one by one instead of every 2 elements until reaching the total number of elements.

      Attachments

        Activity

          People

            Ktakuya Takuya Kato
            fhoering Fabian Höring
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 20m
                2h 20m