Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9556

[Python][C++] Segfaults in UnionArray with null values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.0.1, 2.0.0
    • Python
    • Conda, but pyarrow was installed using pip (in the conda environment)

    Description

      Extracting null values from a UnionArray containing nulls and constructing a UnionArray with a bitmask in pyarrow.Array.from_buffers causes segfaults in pyarrow 1.0.0. I have an environment with pyarrow 0.17.0 and all of the following run correctly without segfaults in the older version.

      Here's a UnionArray that works (because there are no nulls):

       

      # GOOD
      a = pyarrow.UnionArray.from_sparse(
       pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()),
       [
       pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4]),
       pyarrow.array([True, True, False, True, False]),
       ],
      )
      a.to_pylist()

       

      Here's one the fails when you try a.to_pylist() or even just a[2], because one of the children has a null at 2:

       

      # SEGFAULT
      a = pyarrow.UnionArray.from_sparse(
       pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()),
       [
       pyarrow.array([0.0, 1.1, None, 3.3, 4.4]),
       pyarrow.array([True, True, False, True, False]),
       ],
      )
      a.to_pylist() # also just a[2] causes a segfault

       

      Here's another that fails because both children have nulls; the segfault occurs at both positions with nulls:

       

      # SEGFAULT
      a = pyarrow.UnionArray.from_sparse(
       pyarrow.array([0, 1, 0, 0, 1], type=pyarrow.int8()),
       [
       pyarrow.array([0.0, 1.1, None, 3.3, 4.4]),
       pyarrow.array([True, None, False, True, False]),
       ],
      )
      a.to_pylist() # also a[1] and a[2] cause segfaults

       

      Here's one that succeeds, but it's dense, rather than sparse:

       

      # GOOD
      a = pyarrow.UnionArray.from_dense(
       pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()),
       pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()),
       [pyarrow.array([0.0, 1.1, 2.2, 3.3]), pyarrow.array([True, True, False])],
      )
      a.to_pylist()

       

      Here's a dense that fails because one child has a null:

       

      # SEGFAULT
      a = pyarrow.UnionArray.from_dense(
       pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()),
       pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()),
       [pyarrow.array([0.0, 1.1, None, 3.3]), pyarrow.array([True, True, False])],
      )
      a.to_pylist() # also just a[3] causes a segfault

       

      Here's a dense that fails in two positions because both children have a null:

       

      # SEGFAULT
      a = pyarrow.UnionArray.from_dense(
       pyarrow.array([0, 1, 0, 0, 0, 1, 1], type=pyarrow.int8()),
       pyarrow.array([0, 0, 1, 2, 3, 1, 2], type=pyarrow.int32()),
       [pyarrow.array([0.0, 1.1, None, 3.3]), pyarrow.array([True, None, False])],
      )
      a.to_pylist() # also a[3] and a[5] cause segfaults

       

      In all of the above, we created the UnionArray using its from_dense method. We could instead create it with pyarrow.Array.from_buffers. If created with content0 and content1 that have no nulls, it's fine, but if created with nulls in the content, it segfaults as soon as you view the null value.

       

      # GOOD
      content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4])
      content1 = pyarrow.array([True, True, False, True, False])
      # SEGFAULT
      content0 = pyarrow.array([0.0, 1.1, 2.2, None, 4.4])
      content1 = pyarrow.array([True, True, False, True, False])
      types = pyarrow.union(
       [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
       "sparse",
       [0, 1],
      )
      a = pyarrow.Array.from_buffers(
       types,
       5,
       [
       None,
       pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 1], numpy.int8)),
       ],
       children=[content0, content1],
      )
      a.to_pylist() # also just a[3] causes a segfault

       

      Similarly for a dense union.

       

      # GOOD
      content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3])
      content1 = pyarrow.array([True, True, False])
      # SEGFAULT
      content0 = pyarrow.array([0.0, 1.1, None, 3.3])
      content1 = pyarrow.array([True, True, False])
      types = pyarrow.union(
       [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
       "dense",
       [0, 1],
      )
      a = pyarrow.Array.from_buffers(
       types,
       7,
       [
       None,
       pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 0, 1, 1], numpy.int8)),
       pyarrow.py_buffer(numpy.array([0, 0, 1, 2, 3, 1, 2], numpy.int32)),
       ],
       children=[content0, content1],
      )
      a.to_pylist() # also just a[3] causes a segfault

       

      The next segfaults are different: instead of putting the null values in the content, we put the null value in the UnionArray itself. This time, it segfaults when it is being created. It also prints some output (all of the above were silent segfaults).

       

      # SEGFAULT (even to create)
      content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3, 4.4])
      content1 = pyarrow.array([True, True, False, True, False])
      types = pyarrow.union(
       [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
       "sparse",
       [0, 1],
      )
      a = pyarrow.Array.from_buffers(
       types,
       5,
       [
       pyarrow.py_buffer(numpy.array([251], numpy.uint8)), # (11111011)
       pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 1], numpy.int8)),
       # exepct null here -----^
      # None <--- placeholder required in pyarrow 0.17.0, not 1.0.0
       ],
       children=[content0, content1],
      )
      # /arrow/cpp/src/arrow/array/array_nested.cc:617: Check failed: (data_->buffers[0]) == (nullptr) 
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(+0x4e9938)[0x7feea9937938]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7feea993814d]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow16SparseUnionArray7SetDataESt10shared_ptrINS_9ArrayDataEE+0x144)[0x7feea9a869a4]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow16SparseUnionArrayC1ESt10shared_ptrINS_9ArrayDataEE+0x5a)[0x7feea9a86a2a]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15VisitTypeInlineINS_8internal16ArrayDataWrapperEEENS_6StatusERKNS_8DataTypeEPT_+0x9fc)[0x7feea9a5145c]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x3f)[0x7feea9a2698f]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c7853)[0x7feeaa998853]
      # python(+0x13af9e)[0x56146ee77f9e]
      # python(_PyObject_MakeTpCall+0x3bf)[0x56146ee6d30f]
      # python(_PyEval_EvalFrameDefault+0x5452)[0x56146ef20602]
      # python(_PyEval_EvalCodeWithName+0x260)[0x56146ef06190]
      # python(PyEval_EvalCode+0x23)[0x56146ef07a03]
      # python(+0x23e2f2)[0x56146ef7b2f2]
      # python(+0x251082)[0x56146ef8e082]
      # python(+0x1063b9)[0x56146ee433b9]
      # python(PyRun_InteractiveLoopFlags+0xea)[0x56146ee43559]
      # python(+0x1065f3)[0x56146ee435f3]
      # python(+0x106817)[0x56146ee43817]
      # python(Py_BytesMain+0x39)[0x56146ef91a19]
      # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7feeac198b97]
      # python(+0x1f8807)[0x56146ef35807]
      # Aborted (core dumped)
      

       

      And similarly for dense.

       

      # SEGFAULT (even to create)
      content0 = pyarrow.array([0.0, 1.1, 2.2, 3.3])
      content1 = pyarrow.array([True, True, False])
      types = pyarrow.union(
       [pyarrow.field("0", content0.type), pyarrow.field("1", content1.type)],
       "dense",
       [0, 1],
      )
      a = pyarrow.Array.from_buffers(
       types,
       7,
       [
       pyarrow.py_buffer(numpy.array([251], numpy.uint8)), # (11111011)
       pyarrow.py_buffer(numpy.array([0, 1, 0, 0, 0, 1, 1], numpy.int8)),
       pyarrow.py_buffer(numpy.array([0, 0, 1, 2, 3, 1, 2], numpy.int32)),
       # exepct null here -----^
       ],
       children=[content0, content1],
      )
      # /arrow/cpp/src/arrow/array/array_nested.cc:627: Check failed: (data_->buffers[0]) == (nullptr) 
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(+0x4e9938)[0x7f2fb6ad7938]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7f2fb6ad814d]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15DenseUnionArray7SetDataERKSt10shared_ptrINS_9ArrayDataEE+0x174)[0x7f2fb6c274a4]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15DenseUnionArrayC2ERKSt10shared_ptrINS_9ArrayDataEE+0x44)[0x7f2fb6c27524]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow15VisitTypeInlineINS_8internal16ArrayDataWrapperEEENS_6StatusERKNS_8DataTypeEPT_+0xb14)[0x7f2fb6bf1574]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/libarrow.so.100(_ZN5arrow9MakeArrayERKSt10shared_ptrINS_9ArrayDataEE+0x3f)[0x7f2fb6bc698f]
      # /home/pivarski/miniconda3/envs/test-arrow/lib/python3.8/site-packages/pyarrow/lib.cpython-38-x86_64-linux-gnu.so(+0x1c7853)[0x7f2fb7b38853]
      # python(+0x13af9e)[0x558cf09edf9e]
      # python(_PyObject_MakeTpCall+0x3bf)[0x558cf09e330f]
      # python(_PyEval_EvalFrameDefault+0x5452)[0x558cf0a96602]
      # python(_PyEval_EvalCodeWithName+0x260)[0x558cf0a7c190]
      # python(PyEval_EvalCode+0x23)[0x558cf0a7da03]
      # python(+0x23e2f2)[0x558cf0af12f2]
      # python(+0x251082)[0x558cf0b04082]
      # python(+0x1063b9)[0x558cf09b93b9]
      # python(PyRun_InteractiveLoopFlags+0xea)[0x558cf09b9559]
      # python(+0x1065f3)[0x558cf09b95f3]
      # python(+0x106817)[0x558cf09b9817]
      # python(Py_BytesMain+0x39)[0x558cf0b07a19]
      # /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f2fb9338b97]
      # python(+0x1f8807)[0x558cf0aab807]
      # Aborted (core dumped)

       

      It might be two distinct bugs, but they're both related to UnionArrays and nulls, and they're both newer than 0.17.0.

      Attachments

        Issue Links

          Activity

            People

              kszucs Krisztian Szucs
              jpivarski Jim Pivarski
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h