Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
It is not clear for most users that, while running Spark on YARN a SparkContext with a given execution plan can be run locally as yarn-client, but can not deploy itself to the cluster. This is currently performed using org.apache.spark.deploy.yarn.Client. I think we should support deployment through SparkContext, but this is not the point I wish to make here.
Configuring a SparkContext to deploy itself currently will yield an ERROR while accessing spark.yarn.app.id in YarnClusterSchedulerBackend, and after that a NullPointerException while referencing the ApplicationMaster instance.
Spark should clearly inform the user that it might be running in yarn-cluster mode without a proper submission using Client and that deploying is not supported directly from SparkContext.