Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2954

PySpark MLlib serialization tests fail on Python 2.6

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 0.9.3, 1.0.3, 1.1.0
    • Component/s: PySpark
    • Labels:
      None

      Description

      The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack:

      **********************************************************************
      File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double
      Failed example:
          _deserialize_double(_serialize_double(1L)) == 1.0
      Exception raised:
          Traceback (most recent call last):
            File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
              compileflags, 1) in test.globs
            File "<doctest __main__._deserialize_double[4]>", line 1, in <module>
              _deserialize_double(_serialize_double(1L)) == 1.0
            File "pyspark/mllib/_common.py", line 194, in _deserialize_double
              return struct.unpack("d", ba[offset:])[0]
          error: unpack requires a string argument of length 8
      **********************************************************************
      File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double
      Failed example:
          _deserialize_double(_serialize_double(sys.float_info.max)) == x
      Exception raised:
          Traceback (most recent call last):
            File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
              compileflags, 1) in test.globs
            File "<doctest __main__._deserialize_double[6]>", line 1, in <module>
              _deserialize_double(_serialize_double(sys.float_info.max)) == x
            File "pyspark/mllib/_common.py", line 194, in _deserialize_double
              return struct.unpack("d", ba[offset:])[0]
          error: unpack requires a string argument of length 8
      **********************************************************************
      File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double
      Failed example:
          _deserialize_double(_serialize_double(sys.float_info.max)) == y
      Exception raised:
          Traceback (most recent call last):
            File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run
              compileflags, 1) in test.globs
            File "<doctest __main__._deserialize_double[8]>", line 1, in <module>
              _deserialize_double(_serialize_double(sys.float_info.max)) == y
            File "pyspark/mllib/_common.py", line 194, in _deserialize_double
              return struct.unpack("d", ba[offset:])[0]
          error: unpack requires a string argument of length 8
      **********************************************************************
      

      It looks like one solution is to wrap the bytearray with buffer(): http://stackoverflow.com/a/15467046/590203

        Attachments

          Activity

            People

            • Assignee:
              joshrosen Josh Rosen
              Reporter:
              joshrosen Josh Rosen
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: