Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
5.0.0
-
Windows 10, Python 3.9
Description
When calling FixedSizeListArray.filter for a slice, it is always applied to the first (len(slice)) elements at the begging of the array which a slice is created from.
- The issue doesn't reproduce for ListArray.
- a particular mask doesn't matter
- slice length and position doesn't matter
- a number of elements filtered at wrong position is always equal to a length of a slice
- the issues is not reproduced with ListArray
- a type of data (int32, float, ...) doesn't matter
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC v.1916 64 bit (AMD64)] on win32 >>> import numpy as np >>> import pyarrow as pa >>> np.__version__ '1.21.1' >>> pa.__version__ '5.0.0' >>> data = [ np.zeros(3, dtype='int32'), np.ones(3, dtype='int32'), np.ones(3, dtype='int32') + 1, np.ones(3, dtype='int32') + 2, np.ones(3, dtype='int32') + 3, np.ones(3, dtype='int32') + 4, np.ones(3, dtype='int32') + 5, np.ones(3, dtype='int32') + 6 ] >>> a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray >>> a.filter(pa.array(len(a) * [True])) # everything is ok <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], [5, 5, 5], [6, 6, 6], [7, 7, 7] ] >>> a[3:7].filter(pa.array(4 * [True])) # output is filtered elements of a[0:3] instead of a[3:7] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3] ] >>> a[3:7].filter(pa.array([True, False, True, False])) # output is filtered elements of a[0:3] instead of a[3:7] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460> [ [0, 0, 0], [2, 2, 2] ] >>> a[4:].filter(pa.array([True, True, True, True])) # output is filtered elements of a[0:3] instead of a[4:] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3] ] >>> a[4:6].filter(pa.array([True, True])) # output is filtered elements of a[0:2] instead of a[4:6] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040> [ [0, 0, 0], [1, 1, 1] ] >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True])) # ListArray slice filtering works ok <pyarrow.lib.ListArray object at 0x000001E25E5F50A0> [ [3, 3, 3], [4, 4, 4], [5, 5, 5], [6, 6, 6] ]
Attachments
Issue Links
- links to