Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1972

Deserialization of buffer objects (and pandas dataframes) segfaults on different processes.

    XMLWordPrintableJSON

Details

    Description

      To see the issue, first serialize a pyarrow buffer.

      import pyarrow as pa
      
      serialized = pa.serialize(pa.frombuffer(b'hello')).to_buffer().to_pybytes()
      
      print(serialized)  # b'\x00\x00\x00\x00\x01...'
      

      Deserializing it within the same process succeeds, however deserializing it in a *separate process* causes a segfault. E.g.,

      import pyarrow as pa
      
      pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This segfaults
      

      The backtrace is

      (lldb) bt
      * thread #1, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
        * frame #0: 0x0000000000000000
          frame #1: 0x0000000105605534 libarrow_python.0.dylib`arrow::py::wrap_buffer(buffer=std::__1::shared_ptr<arrow::Buffer>::element_type @ 0x000000010060c348 strong=1 weak=1) at pyarrow.cc:48
          frame #2: 0x000000010554fdee libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, parent=0x0000000100645438, arr=0x0000000100622938, index=0, type=0, base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfd218) at arrow_to_python.cc:173
          frame #3: 0x000000010554d93a libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818, array=0x0000000100645438, start_idx=0, stop_idx=2, base=0x0000000108f0e528, blobs=0x0000000108f09588, out=0x00007fff5fbfd470) at arrow_to_python.cc:208
          frame #4: 0x000000010554d302 libarrow_python.0.dylib`arrow::py::DeserializeDict(context=0x0000000108f17818, array=0x0000000100645338, start_idx=0, stop_idx=2, base=0x0000000108f0e528, blobs=0x0000000108f09588, out=0x00007fff5fbfddd8) at arrow_to_python.cc:74
          frame #5: 0x000000010554f249 libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, parent=0x00000001006377a8, arr=0x0000000100645298, index=0, type=0, base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfddd8) at arrow_to_python.cc:158
          frame #6: 0x000000010554d93a libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818, array=0x00000001006377a8, start_idx=0, stop_idx=1, base=0x0000000108f0e528, blobs=0x0000000108f09588, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:208
          frame #7: 0x0000000105551fbf libarrow_python.0.dylib`arrow::py::DeserializeObject(context=0x0000000108f17818, obj=0x0000000108f09588, base=0x0000000108f0e528, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:287
          frame #8: 0x0000000104abecae lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_18SerializedPyObject_2deserialize(__pyx_v_self=0x0000000108f09570, __pyx_v_context=0x0000000108f17818) at lib.cxx:88592
          frame #9: 0x0000000104abdec4 lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_18SerializedPyObject_3deserialize(__pyx_v_self=0x0000000108f09570, __pyx_args=0x000000010231f358, __pyx_kwds=0x0000000000000000) at lib.cxx:88514
          frame #10: 0x000000010008b5f1 python`PyCFunction_Call + 145
          frame #11: 0x0000000104941208 lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108f302d0, arg=0x000000010231f358, kw=0x0000000000000000) at lib.cxx:116108
          frame #12: 0x0000000104b0e3fa lib.cpython-36m-darwin.so`__Pyx__PyObject_CallOneArg(func=0x0000000108f302d0, arg=0x0000000108f17818) at lib.cxx:116147
          frame #13: 0x0000000104944bc6 lib.cpython-36m-darwin.so`__Pyx_PyObject_CallOneArg(func=0x0000000108f302d0, arg=0x0000000108f17818) at lib.cxx:116166
          frame #14: 0x0000000104b09873 lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_124deserialize_from(__pyx_self=0x0000000000000000, __pyx_v_source=0x0000000108ddeee8, __pyx_v_base=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at lib.cxx:90327
          frame #15: 0x0000000104b09310 lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_125deserialize_from(__pyx_self=0x0000000000000000, __pyx_args=0x0000000108f10d38, __pyx_kwds=0x0000000000000000) at lib.cxx:90260
          frame #16: 0x000000010008b5f1 python`PyCFunction_Call + 145
          frame #17: 0x0000000104941208 lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf1b0, arg=0x0000000108f10d38, kw=0x0000000000000000) at lib.cxx:116108
          frame #18: 0x0000000104b0bf9d lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_128deserialize(__pyx_self=0x0000000000000000, __pyx_v_obj=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at lib.cxx:90770
          frame #19: 0x0000000104b0b7ec lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_129deserialize(__pyx_self=0x0000000000000000, __pyx_args=0x0000000108def1c8, __pyx_kwds=0x0000000000000000) at lib.cxx:90680
          frame #20: 0x000000010008b5f1 python`PyCFunction_Call + 145
          frame #21: 0x0000000108d5c468 plasma.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf240, arg=0x0000000108def1c8, kw=0x0000000000000000) at plasma.cxx:11200
          frame #22: 0x0000000108d744a7 plasma.cpython-36m-darwin.so`__pyx_pf_7pyarrow_6plasma_12PlasmaClient_10get(__pyx_v_self=0x0000000108f0e210, __pyx_v_object_ids=0x0000000108deb248, __pyx_v_timeout_ms=0, __pyx_v_serialization_context=0x0000000108f17818) at plasma.cxx:6480
          frame #23: 0x0000000108d6c250 plasma.cpython-36m-darwin.so`__pyx_pw_7pyarrow_6plasma_12PlasmaClient_11get(__pyx_v_self=0x0000000108f0e210, __pyx_args=0x0000000102363630, __pyx_kwds=0x0000000000000000) at plasma.cxx:6274
          frame #24: 0x000000010008bc5b python`_PyCFunction_FastCallDict + 363
          frame #25: 0x00000001001637f2 python`call_function + 146
          frame #26: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
          frame #27: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
          frame #28: 0x0000000100163c4c python`fast_function + 348
          frame #29: 0x000000010016383e python`call_function + 222
          frame #30: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
          frame #31: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
          frame #32: 0x0000000100163c4c python`fast_function + 348
          frame #33: 0x000000010016383e python`call_function + 222
          frame #34: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
          frame #35: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
          frame #36: 0x0000000100163c4c python`fast_function + 348
          frame #37: 0x000000010016383e python`call_function + 222
          frame #38: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093
          frame #39: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427
          frame #40: 0x00000001001b01dc python`PyRun_InteractiveOneObject + 1132
          frame #41: 0x00000001001ad15e python`PyRun_InteractiveLoopFlags + 334
          frame #42: 0x00000001001acfeb python`PyRun_AnyFileExFlags + 139
          frame #43: 0x00000001001d3378 python`Py_Main + 4632
          frame #44: 0x00000001000016bd python`main + 509
          frame #45: 0x00007fffb6073235 libdyld.dylib`start + 1
      

      Note however that if we first serialize something, then it works. E.g., the following succeeds.

      import pyarrow as pa
      
      pa.serialize(1)
      pa.deserialize(b'\x00\x00\x00\x00\x01...')  # This succeeds!
      

      I have a potential fix/workaround, which I will post momentarily.

      Attachments

        Issue Links

          Activity

            People

              robertnishihara Robert Nishihara
              robertnishihara Robert Nishihara
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: