Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5260

[Python][C++] Crash when deserializing from components in a fresh new process

    XMLWordPrintableJSON

Details

    Description

      Trying to deserialize a table from component in a fresh new process crashes with sigsegv:

      #1 0x00007fffd5eb93f0 in arrow::py::unwrap_buffer(_object*, std::shared_ptr<arrow::Buffer>*) ()
      from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
      #2 0x00007fffd5e69260 in arrow::py::GetSerializedFromComponents(int, int, int, _object*, arrow::py::SerializedPyObject*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/./libarrow_python.so.13
      #3 0x00007fffd6b1cafe in __pyx_pw_7pyarrow_3lib_18SerializedPyObject_7from_components(_object*, _object*, _object*) () from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
      #4 0x00000000004ad919 in PyCFunction_Call ()
      #5 0x00007fffd6a88d10 in __Pyx_PyObject_Call(_object*, _object*, _object*) [clone .constprop.1186] ()
      from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
      #6 0x00007fffd6a41872 in __Pyx__PyObject_CallOneArg(_object*, _object*) ()
      from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
      #7 0x00007fffd6a89e59 in __Pyx_PyObject_CallOneArg(_object*, _object*) ()
      from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
      #8 0x00007fffd6ab087f in __pyx_pw_7pyarrow_3lib_165deserialize_components(_object*, _object*, _object*) ()
      from /home/yevgeni/uatc/.petastorm3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so
      #9 0x00000000004adca7 in _PyCFunction_FastCallKeywords ()
      #10 0x0000000000545e34 in ?? ()
      #11 0x000000000054ac8c in _PyEval_EvalFrameDefault ()
      #12 0x0000000000545a51 in ?? ()
      #13 0x0000000000546890 in PyEval_EvalCode ()
      #14 0x000000000042a9a8 in PyRun_FileExFlags ()
      #15 0x000000000042ab8d in PyRun_SimpleFileExFlags ()
      #16 0x000000000043e0ba in Py_Main ()
      #17 0x0000000000421b04 in main ()
      

       The following snippet can be used to reproduce the issue:

      import pickle
      import sys
      
      import pandas as pd
      import pyarrow as pa
      
      if __name__ == '__main__':
          if sys.argv[1] == 'w':
              df = pd.DataFrame({'int': [1, 2], 'str': ['a', 'b']})
              table = pa.Table.from_pandas(df)
              table_serialized = pa.serialize(table)
              table_serialized_components = table_serialized.to_components()
              with open('/tmp/p.pickle', 'wb') as f:
                  pickle.dump(table_serialized_components, f)
              print('/tmp/p.pickle written ok')
      
          if sys.argv[1] == 'r':
              # UNCOMMENT THE FOLLOWING LINE TO AVOID THE CRASH
              #pa.serialize(0)
              with open('/tmp/p.pickle', 'rb') as f:
                  table_serialized_components = pickle.load(f)
              table = pa.deserialize_components(table_serialized_components)
              print(table)
      
      

      Then run:

      $ python pa_serialization_crashes.py w
      /tmp/p.pickle written ok
      
      $ python pa_serialization_crashes.py r
      Segmentation fault (core dumped)

      The crash would not occur if you try to serialize unrelated data before the deserialization (see a commented out line in the reproduction instructions)

       

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              selitvin Yevgeni Litvin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m