Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.4.1, 1.5.1, 1.6.0
-
None
Description
NPE is throw from the yarn Client.scala because File.listFiles() can return null on directory that it doesn't have permission to list. This is the code fragment in question:
// In org/apache/spark/deploy/yarn/Client.scala Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey => sys.env.get(envKey).foreach { path => val dir = new File(path) if (dir.isDirectory()) { // dir.listFiles() can return null dir.listFiles().foreach { file => if (file.isFile && !hadoopConfFiles.contains(file.getName())) { hadoopConfFiles(file.getName()) = file } } } } }
To reproduce, simply do:
sudo mkdir /tmp/conf sudo chmod 700 /tmp/conf export HADOOP_CONF_DIR=/etc/hadoop/conf export YARN_CONF_DIR=/tmp/conf spark-submit --master yarn-client SimpleApp.py
It fails on any Spark app. Though not important, the SimpleApp.py I used looks like this:
from pyspark import SparkContext sc = SparkContext(None, "Simple App") data = [1, 2, 3, 4, 5] distData = sc.parallelize(data) total = distData.reduce(lambda a, b: a + b) print("Total: %i" % total)