Currently, PySpark does not work with Python 3.6.0.
Running ./bin/pyspark simply throws the error as below:
The problem is in https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 as the error says and the cause seems because the arguments of namedtuple are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).
We currently copy this function via types.FunctionType which does not set the default values of keyword-only arguments (meaning namedtuple._kwdefaults_) and this seems causing internally missing values in the function (non-bound arguments).
This ends up as below:
If we call as below:
It throws an exception as above becuase _kwdefaults_ for required keyword arguments seem unset in the copied function. So, if we give explicit value for these,
It works fine.
It seems now we should properly set these into the hijected one.