Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.12.0
-
* pyarrow 0.12.0
* numpy 1.16.1
* Python 3.7.0, 2.7.15
* (macOS 10.13.6)
Description
np.bool is the only dtype I've found that causes this issue. Both empty and non-empty arrays cause it.
The issue only manifests from py2 to py3; staying within the same version succeeds, as does serializing from py3 and deserializing in py2.
This appears to just be due to Python 2 str being deserialized in Python 3 as bytes; it should be unicode on the py2 end to come back as str in py3. I suppose something in the serialization implementation is writing the dtype (just for bool arrays?) using a str, but haven't dug into it yet.
(two)bash-3.2$ python cereal.py (two)bash-3.2$ cat cereal.py # Python 2 import numpy as np import pyarrow as pa data = np.array([], dtype=np.dtype('bool')) buf = pa.serialize(data).to_buffer() outstream = pa.output_stream("buffer") outstream.write(buf) outstream.close() # ...switch to python 3 venv... (three)bash-3.2$ cat decereal.py # Python 3 import numpy as np import pyarrow as pa instream = pa.input_stream("buffer") buf = instream.read() data = pa.deserialize(buf) print(data) (three)bash-3.2$ python3 decereal.py Traceback (most recent call last): File "decereal.py", line 10, in <module> data = pa.deserialize(buf) File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize File "pyarrow/serialization.pxi", line 175, in pyarrow.lib.SerializationContext._deserialize_callback TypeError: can only concatenate str (not "bytes") to str
Attachments
Issue Links
- links to