Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
1.3.1
-
None
Description
Currently in SparkSubmitArguments.scala when master is set to "yarn" (yarn-cluster mode)
https://github.com/apache/spark/blob/b1f4ca82d170935d15f1fe6beb9af0743b4d81cd/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L236
Spark checks if YARN_CONF_DIR or HADOOP_CONF_DIR is set in EVN.
However we should additionally allow passing YARN_CONF_DIR from command line argument this is particularly handy when Spark is being launched from schedulers like OOZIE or FALCON.
Reason being, oozie launcher App starts in one of the container assigned by Yarn RM and we do not want to set YARN_CONF_DIR in ENV for all the nodes in cluster. Just passing the argument like -yarnconfdir with conf dir (ex: /etc/hadoop/conf) should avoid setting the ENV variable.
This is blocking us to onboard spark from oozie or falcon. Thanks.