[SPARK-4897] Python 3 support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: PySpark
Labels:
None

Target Version/s:

1.4.0

Description

It would be nice to have Python 3 support in PySpark, provided that we can do it in a way that maintains backwards-compatibility with Python 2.6.

I started looking into porting this; my WIP work can be found at https://github.com/JoshRosen/spark/compare/python3

I was able to use the futurize tool to handle the basic conversion of things like print statements, etc. and had to manually fix up a few imports for packages that moved / were renamed, but the major blocker that I hit was cloudpickle:

[joshrosen python (python3)]$ PYSPARK_PYTHON=python3 ../bin/pyspark
Python 3.4.2 (default, Oct 19 2014, 17:52:17)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "/Users/joshrosen/Documents/Spark/python/pyspark/shell.py", line 28, in <module>
    import pyspark
  File "/Users/joshrosen/Documents/spark/python/pyspark/__init__.py", line 41, in <module>
    from pyspark.context import SparkContext
  File "/Users/joshrosen/Documents/spark/python/pyspark/context.py", line 26, in <module>
    from pyspark import accumulators
  File "/Users/joshrosen/Documents/spark/python/pyspark/accumulators.py", line 97, in <module>
    from pyspark.cloudpickle import CloudPickler
  File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 120, in <module>
    class CloudPickler(pickle.Pickler):
  File "/Users/joshrosen/Documents/spark/python/pyspark/cloudpickle.py", line 122, in CloudPickler
    dispatch = pickle.Pickler.dispatch.copy()
AttributeError: type object '_pickle.Pickler' has no attribute 'dispatch'

This code looks like it will be hard difficult to port to Python 3, so this might be a good reason to switch to Dill for Python serialization.

Attachments

Issue Links

is blocked by

SPARK-4898 Replace cloudpickle with Dill

Resolved

links to

[Github] Pull Request #5173 (davies)

Activity

People

Assignee:: Davies Liu

Reporter:: Josh Rosen

Votes:: 18 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 19/Dec/14 07:57

Updated:: 16/Apr/15 23:21

Resolved:: 16/Apr/15 23:21