Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.12.0, 0.12.1, 0.13.0
Description
Trying to deserialize a table from component in a fresh new process crashes with sigsegv:
#1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*, std::shared_ptr<arrow::Buffer>*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13 #2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, int, _object*, arrow::py::SerializedPyObject*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13 #3 0x00007fffd6b1cafe in __pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*, _object*, _object*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so #4 0x00000000004ad919 in PyCFunction_Call () #5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*) [clone .constprop.1186] () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so #6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so #7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so #8 0x00007fffd6ab087f in __pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*, _object*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so #9 0x00000000004adca7 in _PyCFunction_FastCallKeywords () #10 0x0000000000545e34 in ?? () #11 0x000000000054ac8c in _PyEval_EvalFrameDefault () #12 0x0000000000545a51 in ?? () #13 0x0000000000546890 in PyEval_EvalCode () #14 0x000000000042a9a8 in PyRun_FileExFlags () #15 0x000000000042ab8d in PyRun_SimpleFileExFlags () #16 0x000000000043e0ba in Py_Main () #17 0x0000000000421b04 in main ()
The following snippet can be used to reproduce the issue:
import pickle import sys import pandas as pd import pyarrow as pa if __name__ == '__main__': if sys.argv[1] == 'w': df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']}) table = pa.Table.from_pandas(df) table_serialized = pa.serialize(table) table_serialized_components = table_serialized.to_components() with open('/tmp/p.pickle', 'wb') as f: pickle.dump(table_serialized_components, f) print('/tmp/p.pickle written ok') if sys.argv[1] == 'r': # UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH #pa.serialize(0) with open('/tmp/p.pickle', 'rb') as f: table_serialized_components = pickle.load(f) table = pa.deserialize_components(table_serialized_components) print(table)
Then run:
$ python pa_serialization_crashes.py w /tmp/p.pickle written ok $ python pa_serialization_crashes.py r Segmentation fault (core dumped)
The crash would not occur if you try to serialize unrelated data before the deserialization (see a commented out line in the reproduction instructions)
Attachments
Issue Links
- links to