It would be nice to have Python 3 support in PySpark, provided that we can do it in a way that maintains backwards-compatibility with Python 2.6.
I started looking into porting this; my WIP work can be found at https://github.com/JoshRosen/spark/compare/python3
I was able to use the futurize tool to handle the basic conversion of things like print statements, etc. and had to manually fix up a few imports for packages that moved / were renamed, but the major blocker that I hit was cloudpickle:
This code looks like it will be hard difficult to port to Python 3, so this might be a good reason to switch to Dill for Python serialization.