Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.0.1, 2.0.0
-
None
-
None
Description
reproducer below
import numpy as np import pyarrow as pa arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)]) nb_repeat = 2**32 // arr.offsets.to_numpy()[-1] indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat)) big_arr = arr.take(indices) print(big_arr.offsets[-5:]) big_arr.validate() # hopefully this can catch it [ -21, -16, -11, -6, -1 ] --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <ipython-input-1-09503f9cbb04> in <module> 6 big_arr = arr.take(indices) 7 print(big_arr.offsets[-5:]) ----> 8 big_arr.validate() /opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.validate() /opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Negative offsets in list array
and it works fine with large_array (as expected) :
import numpy as np import pyarrow as pa arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)], type=pa.large_list(pa.int8())) nb_repeat = 2**32 // arr.offsets.to_numpy()[-1] indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat)) big_arr = arr.take(indices) print(big_arr.offsets[-5:]) big_arr.validate() [ 4294967275, 4294967280, 4294967285, 4294967290, 4294967295 ]
Attachments
Issue Links
- is related to
-
ARROW-10172 [Python] pyarrow.concat_arrays segfaults if a resulting StringArray's capacity overflows
- Closed