Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.1
-
Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-38-generic x86_64), Spark 1.6.1, Azure HDInsight 3.4)
Description
When a cluster has PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables set (needed for using non-system Python e.g. /usr/bin/anaconda/bin/python), then you are unable to override this per submission in YARN cluster mode.
When using spark-submit (in this case via LIVY) to submit with an override:
spark-submit --master yarn --deploy-mode cluster --conf 'spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=python3' --conf' 'spark.yarn.appMasterEnv.PYSPARK_PYTHON=python3' probe.py
the environment variable values will override the conf settings. A workaround for some can be to unset the env vars but that is not always possible (e.g. submitting batch via LIVY where you can only pass through the parameters to spark-submit).
Expectation is that the conf values above override the environment variables.
Fix is to change the order of application of conf and env vars in the yarn client.
Related discussion:https://issues.cloudera.org/browse/LIVY-159
Backporting this to 1.6 would be great and unblocking for me.
Attachments
Issue Links
- is related to
-
SPARK-16744 spark.yarn.appMasterEnv handling assumes values should be appended
- Resolved
- links to