Description
The PySpark MLlib tests currently fail on Python 2.6 due to problems unpacking data from bytearray using struct.unpack:
********************************************************************** File "pyspark/mllib/_common.py", line 181, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(1L)) == 1.0 Exception raised: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run compileflags, 1) in test.globs File "<doctest __main__._deserialize_double[4]>", line 1, in <module> _deserialize_double(_serialize_double(1L)) == 1.0 File "pyspark/mllib/_common.py", line 194, in _deserialize_double return struct.unpack("d", ba[offset:])[0] error: unpack requires a string argument of length 8 ********************************************************************** File "pyspark/mllib/_common.py", line 184, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == x Exception raised: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run compileflags, 1) in test.globs File "<doctest __main__._deserialize_double[6]>", line 1, in <module> _deserialize_double(_serialize_double(sys.float_info.max)) == x File "pyspark/mllib/_common.py", line 194, in _deserialize_double return struct.unpack("d", ba[offset:])[0] error: unpack requires a string argument of length 8 ********************************************************************** File "pyspark/mllib/_common.py", line 187, in __main__._deserialize_double Failed example: _deserialize_double(_serialize_double(sys.float_info.max)) == y Exception raised: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/doctest.py", line 1253, in __run compileflags, 1) in test.globs File "<doctest __main__._deserialize_double[8]>", line 1, in <module> _deserialize_double(_serialize_double(sys.float_info.max)) == y File "pyspark/mllib/_common.py", line 194, in _deserialize_double return struct.unpack("d", ba[offset:])[0] error: unpack requires a string argument of length 8 **********************************************************************
It looks like one solution is to wrap the bytearray with buffer(): http://stackoverflow.com/a/15467046/590203