Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.7.1
Description
The problem happens with the following code:
import numpy as np import pyarrow import sys class Bar(object): pass def bar_custom_serializer(obj): x = np.zeros(4) return x def bar_custom_deserializer(serialized_obj): return serialized_obj pyarrow._default_serialization_context.register_type(Bar, "Bar", pickle=False, custom_serializer=bar_custom_serializer, custom_deserializer=bar_custom_deserializer) pyarrow.serialize(Bar())
After execution of pyarrow.serialize, the interpreter crashes in the garbage collection routine.
This happens if a numpy array is returned in the custom serializer but there is no other reference to the numpy array. The reason this is not a problem in the current code is that so far we haven't created new numpy arrays in the custom serializer.
I think the problem here is that the numpy array hits reference count zero between the end of SerializeSequences in python_to_arrow.cc and the call to NdarrayToTensor. I'll push a fix later today, which just increases and decreases the reference counts at the appropriate places.
Attachments
Issue Links
- links to