Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We're using virtualenvs when running pyspark, and it would be great to be able to use a virtualenv as the python executable used by the Spark Driver (i.e. --master yarn-client). This value is currently hard-coded to python in org/apache/toree/kernel/interpreter/pyspark/PySparkProcess.scala.
I have a branch on my repo which adds an optional kernel parameter PYTHON_EXEC:
... "SPARK_HOME": "/usr/lib/spark", "PYTHON_EXEC" : "/usr/local/python/virtualenvs/myvenv/bin/python", ...
If PYTHON_EXEC is unspecified, the default of python is used.
Here's the diff of the branch, please let me know if it's ok for me to issue a PR against the main repo: https://github.com/ericchang/incubator-toree/compare/ericchang:master...custom-python-exec