Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.0
-
Conda, but pyarrow was installed using pip (in the conda environment)
Description
Extracting null values from a UnionArray containing nulls and constructing a UnionArray with a bitmask in pyarrow.Array.from_buffers causes segfaults in pyarrow 1.0.0. I have an environment with pyarrow 0.17.0 and all of the following run correctly without segfaults in the older version.
Here's a UnionArray that works (because there are no nulls):
# GOOD a = pyarrow.UnionArray.from_sparse( pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()), [ pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4]), pyarrow.array([True, True, False, True, False]), ], ) a.to_pylist()
Here's one the fails when you try a.to_pylist() or even just a[2], because one of the children has a null at 2:
# SEGFAULT a = pyarrow.UnionArray.from_sparse( pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()), [ pyarrow.array([0.0, 1.1, None, 3.3, 4.4]), pyarrow.array([True, True, False, True, False]), ], ) a.to_pylist() # also just a[2] causes a segfault
Here's another that fails because both children have nulls; the segfault occurs at both positions with nulls:
# SEGFAULT a = pyarrow.UnionArray.from_sparse( pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()), [ pyarrow.array([0.0, 1.1, None, 3.3, 4.4]), pyarrow.array([True, None, False, True, False]), ], ) a.to_pylist() # also a[1] and a[2] cause segfaults
Here's one that succeeds, but it's dense, rather than sparse:
# GOOD a = pyarrow.UnionArray.from_dense( pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()), pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()), [pyarrow.array([0.0, 1.1, 2.2, 3.3]), pyarrow.array([True, True, False])], ) a.to_pylist()
Here's a dense that fails because one child has a null:
# SEGFAULT a = pyarrow.UnionArray.from_dense( pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()), pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()), [pyarrow.array([0.0, 1.1, None, 3.3]), pyarrow.array([True, True, False])], ) a.to_pylist() # also just a[3] causes a segfault
Here's a dense that fails in two positions because both children have a null:
# SEGFAULT a = pyarrow.UnionArray.from_dense( pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()), pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()), [pyarrow.array([0.0, 1.1, None, 3.3]), pyarrow.array([True, None, False])], ) a.to_pylist() # also a[3] and a[5] cause segfaults
In all of the above, we created the UnionArray using its from_dense method. We could instead create it with pyarrow.Array.from_buffers. If created with content0 and content1 that have no nulls, it's fine, but if created with nulls in the content, it segfaults as soon as you view the null value.
# GOOD content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4]) content1 = pyarrow.array([True, True, False, True, False]) # SEGFAULT content0 = pyarrow.array([0.0, 1.1, 2.2, None, 4.4]) content1 = pyarrow.array([True, True, False, True, False]) types = pyarrow.union( [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)], "sparse", [0, 1], ) a = pyarrow.Array.from_buffers( types, 5, [ None, pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 1], numpy.int8)), ], children=[content0, content1], ) a.to_pylist() # also just a[3] causes a segfault
Similarly for a dense union.
# GOOD content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3]) content1 = pyarrow.array([True, True, False]) # SEGFAULT content0 = pyarrow.array([0.0, 1.1, None, 3.3]) content1 = pyarrow.array([True, True, False]) types = pyarrow.union( [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)], "dense", [0, 1], ) a = pyarrow.Array.from_buffers( types, 7, [ None, pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 0, 1, 1], numpy.int8)), pyarrow.py_buffer(numpy.array([0, 0, 1, 2, 3, 1, 2], numpy.int32)), ], children=[content0, content1], ) a.to_pylist() # also just a[3] causes a segfault
The next segfaults are different: instead of putting the null values in the content, we put the null value in the UnionArray itself. This time, it segfaults when it is being created. It also prints some output (all of the above were silent segfaults).
# SEGFAULT (even to create) content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4]) content1 = pyarrow.array([True, True, False, True, False]) types = pyarrow.union( [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)], "sparse", [0, 1], ) a = pyarrow.Array.from_buffers( types, 5, [ pyarrow.py_buffer(numpy.array([251], numpy.uint8)), # (11111011) pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 1], numpy.int8)), # exepct null here -----^ # None <--- placeholder required in pyarrow 0.17.0, not 1.0.0 ], children=[content0, content1], ) # /arrow/cpp/src/arrow/array/array_nested.cc:617: Check failed: (data_->buffers[0]) == (nullptr) # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(+0x4e9938)[0x7feea9937938] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7feea993814d] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow16SparseUnionArray7SetDataESt10shared_ptrINS_9ArrayDataEE+0x144)[0x7feea9a869a4] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow16SparseUnionArrayC1ESt10shared_ptrINS_9ArrayDataEE+0x5a)[0x7feea9a86a2a] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15VisitTypeInlineINS_8internal16ArrayDataWrapperEEENS_6StatusERKNS_8DataTypeEPT_+0x9fc)[0x7feea9a5145c] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x3f)[0x7feea9a2698f] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c7853)[0x7feeaa998853] # python(+0x13af9e)[0x56146ee77f9e] # python(_PyObject_MakeTpCall+0x3bf)[0x56146ee6d30f] # python(_PyEval_EvalFrameDefault+0x5452)[0x56146ef20602] # python(_PyEval_EvalCodeWithName+0x260)[0x56146ef06190] # python(PyEval_EvalCode+0x23)[0x56146ef07a03] # python(+0x23e2f2)[0x56146ef7b2f2] # python(+0x251082)[0x56146ef8e082] # python(+0x1063b9)[0x56146ee433b9] # python(PyRun_InteractiveLoopFlags+0xea)[0x56146ee43559] # python(+0x1065f3)[0x56146ee435f3] # python(+0x106817)[0x56146ee43817] # python(Py_BytesMain+0x39)[0x56146ef91a19] # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7feeac198b97] # python(+0x1f8807)[0x56146ef35807] # Aborted (core dumped)
And similarly for dense.
# SEGFAULT (even to create) content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3]) content1 = pyarrow.array([True, True, False]) types = pyarrow.union( [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)], "dense", [0, 1], ) a = pyarrow.Array.from_buffers( types, 7, [ pyarrow.py_buffer(numpy.array([251], numpy.uint8)), # (11111011) pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 0, 1, 1], numpy.int8)), pyarrow.py_buffer(numpy.array([0, 0, 1, 2, 3, 1, 2], numpy.int32)), # exepct null here -----^ ], children=[content0, content1], ) # /arrow/cpp/src/arrow/array/array_nested.cc:627: Check failed: (data_->buffers[0]) == (nullptr) # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(+0x4e9938)[0x7f2fb6ad7938] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7f2fb6ad814d] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15DenseUnionArray7SetDataERKSt10shared_ptrINS_9ArrayDataEE+0x174)[0x7f2fb6c274a4] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15DenseUnionArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x44)[0x7f2fb6c27524] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15VisitTypeInlineINS_8internal16ArrayDataWrapperEEENS_6StatusERKNS_8DataTypeEPT_+0xb14)[0x7f2fb6bf1574] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x3f)[0x7f2fb6bc698f] # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c7853)[0x7f2fb7b38853] # python(+0x13af9e)[0x558cf09edf9e] # python(_PyObject_MakeTpCall+0x3bf)[0x558cf09e330f] # python(_PyEval_EvalFrameDefault+0x5452)[0x558cf0a96602] # python(_PyEval_EvalCodeWithName+0x260)[0x558cf0a7c190] # python(PyEval_EvalCode+0x23)[0x558cf0a7da03] # python(+0x23e2f2)[0x558cf0af12f2] # python(+0x251082)[0x558cf0b04082] # python(+0x1063b9)[0x558cf09b93b9] # python(PyRun_InteractiveLoopFlags+0xea)[0x558cf09b9559] # python(+0x1065f3)[0x558cf09b95f3] # python(+0x106817)[0x558cf09b9817] # python(Py_BytesMain+0x39)[0x558cf0b07a19] # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f2fb9338b97] # python(+0x1f8807)[0x558cf0aab807] # Aborted (core dumped)
It might be two distinct bugs, but they're both related to UnionArrays and nulls, and they're both newer than 0.17.0.
Attachments
Issue Links
- links to