Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4675

[Python] Error serializing bool ndarray in py2 and deserializing in py3

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.12.0
    • 0.14.0
    • Python
    • * pyarrow 0.12.0
      * numpy 1.16.1
      * Python 3.7.0, 2.7.15
      * (macOS 10.13.6)

    Description

      np.bool is the only dtype I've found that causes this issue. Both empty and non-empty arrays cause it.

      The issue only manifests from py2 to py3; staying within the same version succeeds, as does serializing from py3 and deserializing in py2.

      This appears to just be due to Python 2 str being deserialized in Python 3 as bytes; it should be unicode on the py2 end to come back as str in py3. I suppose something in the serialization implementation is writing the dtype (just for bool arrays?) using a str, but haven't dug into it yet.

      (two)bash-3.2$ python cereal.py
      (two)bash-3.2$ cat cereal.py 
      # Python 2
      import numpy as np
      import pyarrow as pa
      
      data = np.array([], dtype=np.dtype('bool'))
      buf = pa.serialize(data).to_buffer()
      
      outstream = pa.output_stream("buffer")
      outstream.write(buf)
      outstream.close()
      
      # ...switch to python 3 venv...
      (three)bash-3.2$ cat decereal.py 
      # Python 3
      import numpy as np
      import pyarrow as pa
      
      instream = pa.input_stream("buffer")
      buf = instream.read()
      
      data = pa.deserialize(buf)
      print(data)
      (three)bash-3.2$ python3 decereal.py 
      Traceback (most recent call last):
        File "decereal.py", line 10, in <module>
          data = pa.deserialize(buf)
        File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
        File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
        File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
        File "pyarrow/serialization.pxi", line 175, in pyarrow.lib.SerializationContext._deserialize_callback
      TypeError: can only concatenate str (not "bytes") to str
      

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              gabejoseph Gabe Joseph
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h