Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
To see the issue, first serialize a pyarrow buffer.
import pyarrow as pa serialized = pa.serialize(pa.frombuffer(b'hello')).to_buffer().to_pybytes() print(serialized) # b'\x00\x00\x00\x00\x01...'
Deserializing it within the same process succeeds, however deserializing it in a *separate process* causes a segfault. E.g.,
import pyarrow as pa pa.deserialize(b'\x00\x00\x00\x00\x01...') # This segfaults
The backtrace is
(lldb) bt * thread #1, queue = ‘com.apple.main-thread’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x0000000000000000 frame #1: 0x0000000105605534 libarrow_python.0.dylib`arrow::py::wrap_buffer(buffer=std::__1::shared_ptr<arrow::Buffer>::element_type @ 0x000000010060c348 strong=1 weak=1) at pyarrow.cc:48 frame #2: 0x000000010554fdee libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, parent=0x0000000100645438, arr=0x0000000100622938, index=0, type=0, base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfd218) at arrow_to_python.cc:173 frame #3: 0x000000010554d93a libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818, array=0x0000000100645438, start_idx=0, stop_idx=2, base=0x0000000108f0e528, blobs=0x0000000108f09588, out=0x00007fff5fbfd470) at arrow_to_python.cc:208 frame #4: 0x000000010554d302 libarrow_python.0.dylib`arrow::py::DeserializeDict(context=0x0000000108f17818, array=0x0000000100645338, start_idx=0, stop_idx=2, base=0x0000000108f0e528, blobs=0x0000000108f09588, out=0x00007fff5fbfddd8) at arrow_to_python.cc:74 frame #5: 0x000000010554f249 libarrow_python.0.dylib`arrow::py::GetValue(context=0x0000000108f17818, parent=0x00000001006377a8, arr=0x0000000100645298, index=0, type=0, base=0x0000000108f0e528, blobs=0x0000000108f09588, result=0x00007fff5fbfddd8) at arrow_to_python.cc:158 frame #6: 0x000000010554d93a libarrow_python.0.dylib`arrow::py::DeserializeList(context=0x0000000108f17818, array=0x00000001006377a8, start_idx=0, stop_idx=1, base=0x0000000108f0e528, blobs=0x0000000108f09588, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:208 frame #7: 0x0000000105551fbf libarrow_python.0.dylib`arrow::py::DeserializeObject(context=0x0000000108f17818, obj=0x0000000108f09588, base=0x0000000108f0e528, out=0x00007fff5fbfdfe8) at arrow_to_python.cc:287 frame #8: 0x0000000104abecae lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_18SerializedPyObject_2deserialize(__pyx_v_self=0x0000000108f09570, __pyx_v_context=0x0000000108f17818) at lib.cxx:88592 frame #9: 0x0000000104abdec4 lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_18SerializedPyObject_3deserialize(__pyx_v_self=0x0000000108f09570, __pyx_args=0x000000010231f358, __pyx_kwds=0x0000000000000000) at lib.cxx:88514 frame #10: 0x000000010008b5f1 python`PyCFunction_Call + 145 frame #11: 0x0000000104941208 lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108f302d0, arg=0x000000010231f358, kw=0x0000000000000000) at lib.cxx:116108 frame #12: 0x0000000104b0e3fa lib.cpython-36m-darwin.so`__Pyx__PyObject_CallOneArg(func=0x0000000108f302d0, arg=0x0000000108f17818) at lib.cxx:116147 frame #13: 0x0000000104944bc6 lib.cpython-36m-darwin.so`__Pyx_PyObject_CallOneArg(func=0x0000000108f302d0, arg=0x0000000108f17818) at lib.cxx:116166 frame #14: 0x0000000104b09873 lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_124deserialize_from(__pyx_self=0x0000000000000000, __pyx_v_source=0x0000000108ddeee8, __pyx_v_base=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at lib.cxx:90327 frame #15: 0x0000000104b09310 lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_125deserialize_from(__pyx_self=0x0000000000000000, __pyx_args=0x0000000108f10d38, __pyx_kwds=0x0000000000000000) at lib.cxx:90260 frame #16: 0x000000010008b5f1 python`PyCFunction_Call + 145 frame #17: 0x0000000104941208 lib.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf1b0, arg=0x0000000108f10d38, kw=0x0000000000000000) at lib.cxx:116108 frame #18: 0x0000000104b0bf9d lib.cpython-36m-darwin.so`__pyx_pf_7pyarrow_3lib_128deserialize(__pyx_self=0x0000000000000000, __pyx_v_obj=0x0000000108f0e528, __pyx_v_context=0x0000000108f17818) at lib.cxx:90770 frame #19: 0x0000000104b0b7ec lib.cpython-36m-darwin.so`__pyx_pw_7pyarrow_3lib_129deserialize(__pyx_self=0x0000000000000000, __pyx_args=0x0000000108def1c8, __pyx_kwds=0x0000000000000000) at lib.cxx:90680 frame #20: 0x000000010008b5f1 python`PyCFunction_Call + 145 frame #21: 0x0000000108d5c468 plasma.cpython-36m-darwin.so`__Pyx_PyObject_Call(func=0x0000000108baf240, arg=0x0000000108def1c8, kw=0x0000000000000000) at plasma.cxx:11200 frame #22: 0x0000000108d744a7 plasma.cpython-36m-darwin.so`__pyx_pf_7pyarrow_6plasma_12PlasmaClient_10get(__pyx_v_self=0x0000000108f0e210, __pyx_v_object_ids=0x0000000108deb248, __pyx_v_timeout_ms=0, __pyx_v_serialization_context=0x0000000108f17818) at plasma.cxx:6480 frame #23: 0x0000000108d6c250 plasma.cpython-36m-darwin.so`__pyx_pw_7pyarrow_6plasma_12PlasmaClient_11get(__pyx_v_self=0x0000000108f0e210, __pyx_args=0x0000000102363630, __pyx_kwds=0x0000000000000000) at plasma.cxx:6274 frame #24: 0x000000010008bc5b python`_PyCFunction_FastCallDict + 363 frame #25: 0x00000001001637f2 python`call_function + 146 frame #26: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093 frame #27: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427 frame #28: 0x0000000100163c4c python`fast_function + 348 frame #29: 0x000000010016383e python`call_function + 222 frame #30: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093 frame #31: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427 frame #32: 0x0000000100163c4c python`fast_function + 348 frame #33: 0x000000010016383e python`call_function + 222 frame #34: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093 frame #35: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427 frame #36: 0x0000000100163c4c python`fast_function + 348 frame #37: 0x000000010016383e python`call_function + 222 frame #38: 0x00000001001614d5 python`_PyEval_EvalFrameDefault + 47093 frame #39: 0x0000000100154aab python`_PyEval_EvalCodeWithName + 427 frame #40: 0x00000001001b01dc python`PyRun_InteractiveOneObject + 1132 frame #41: 0x00000001001ad15e python`PyRun_InteractiveLoopFlags + 334 frame #42: 0x00000001001acfeb python`PyRun_AnyFileExFlags + 139 frame #43: 0x00000001001d3378 python`Py_Main + 4632 frame #44: 0x00000001000016bd python`main + 509 frame #45: 0x00007fffb6073235 libdyld.dylib`start + 1
Note however that if we first serialize something, then it works. E.g., the following succeeds.
import pyarrow as pa pa.serialize(1) pa.deserialize(b'\x00\x00\x00\x00\x01...') # This succeeds!
I have a potential fix/workaround, which I will post momentarily.
Attachments
Issue Links
- links to