Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13632

[Python] Filter mask is always applied to elements at the start of FixedSizeListArray when filtering a slice

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 5.0.0
    • 6.0.0, 5.0.1
    • Python
    • Windows 10, Python 3.9

    Description

      When calling FixedSizeListArray.filter for a slice, it is always applied to the first (len(slice)) elements at the begging of the array which a slice is created from.

      • The issue doesn't reproduce for ListArray.
      • a particular mask doesn't matter
      • slice length and position doesn't matter
      • a number of elements filtered at wrong position is always equal to a length of a slice
      • the issues is not reproduced with ListArray
      • a type of data (int32, float, ...) doesn't matter
      Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC v.1916 64 bit (AMD64)] on win32
      >>> import numpy as np
      >>> import pyarrow as pa
      >>> np.__version__
      '1.21.1'
      >>> pa.__version__
      '5.0.0'
      >>> data = [
          np.zeros(3, dtype='int32'),
          np.ones(3, dtype='int32'),
          np.ones(3, dtype='int32') + 1,
          np.ones(3, dtype='int32') + 2,
          np.ones(3, dtype='int32') + 3,
          np.ones(3, dtype='int32') + 4,
          np.ones(3, dtype='int32') + 5,
          np.ones(3, dtype='int32') + 6
      	]
      >>> a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray
      >>> a.filter(pa.array(len(a) * [True]))  # everything is ok 
      <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
      [
        [0, 0, 0],
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3],
        [4, 4, 4],
        [5, 5, 5],
        [6, 6, 6],
        [7, 7, 7]
      ]
      >>> a[3:7].filter(pa.array(4 * [True]))  # output is filtered elements of a[0:3] instead of a[3:7]
      <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
      [
        [0, 0, 0],
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]
      ]
      >>> a[3:7].filter(pa.array([True, False, True, False]))  # output is filtered elements of a[0:3] instead of a[3:7]
      <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
      [
        [0, 0, 0],
        [2, 2, 2]
      ]
      >>> a[4:].filter(pa.array([True, True, True, True]))  # output is filtered elements of a[0:3] instead of a[4:]
      <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
      [
        [0, 0, 0],
        [1, 1, 1],
        [2, 2, 2],
        [3, 3, 3]
      ]
      >>> a[4:6].filter(pa.array([True, True]))  # output is filtered elements of a[0:2] instead of a[4:6]
      <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
      [
        [0, 0, 0],
        [1, 1, 1]
      ]
      >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True]))  # ListArray slice filtering works ok
      <pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
      [
        [3, 3, 3],
        [4, 4, 4],
        [5, 5, 5],
        [6, 6, 6]
      ]
      

       

       

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              vzhernovyi Vadym Zhernovyi
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m