Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2954

PySpark MLlib serialization tests fail on Python 2.6

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 0.9.3, 1.0.3, 1.1.0
    • PySpark
    • None

    Description

      The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack:

      **********************************************************************
      File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
      Failed example:
          _deserialize_double(_serialize_double(1L)) == 1.0
      Exception raised:
          Traceback (most recent call last):
            File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
              compileflags, 1) in test.globs
            File "<doctest __main__._deserialize_double[4]>", line 1, in <module>
              _deserialize_double(_serialize_double(1L)) == 1.0
            File "pyspark/mllib/_common.py", line 194, in _deserialize_double
              return struct.unpack("d", ba[offset:])[0]
          error: unpack requires a string argument of length 8
      **********************************************************************
      File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
      Failed example:
          _deserialize_double(_serialize_double(sys.float_info.max)) == x
      Exception raised:
          Traceback (most recent call last):
            File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
              compileflags, 1) in test.globs
            File "<doctest __main__._deserialize_double[6]>", line 1, in <module>
              _deserialize_double(_serialize_double(sys.float_info.max)) == x
            File "pyspark/mllib/_common.py", line 194, in _deserialize_double
              return struct.unpack("d", ba[offset:])[0]
          error: unpack requires a string argument of length 8
      **********************************************************************
      File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
      Failed example:
          _deserialize_double(_serialize_double(sys.float_info.max)) == y
      Exception raised:
          Traceback (most recent call last):
            File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
              compileflags, 1) in test.globs
            File "<doctest __main__._deserialize_double[8]>", line 1, in <module>
              _deserialize_double(_serialize_double(sys.float_info.max)) == y
            File "pyspark/mllib/_common.py", line 194, in _deserialize_double
              return struct.unpack("d", ba[offset:])[0]
          error: unpack requires a string argument of length 8
      **********************************************************************
      

      It looks like one solution is to wrap the bytearray with buffer(): http://stackoverflow.com/a/15467046/590203

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: