Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13441

NullPointerException when either HADOOP_CONF_DIR or YARN_CONF_DIR is not readable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4.1, 1.5.1, 1.6.0
    • 1.6.1, 2.0.0
    • Spark Core, YARN
    • None

    Description

      NPE is throw from the yarn Client.scala because File.listFiles() can return null on directory that it doesn't have permission to list. This is the code fragment in question:

      // In org/apache/spark/deploy/yarn/Client.scala
          Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey =>
            sys.env.get(envKey).foreach { path =>
              val dir = new File(path)
              if (dir.isDirectory()) {
                // dir.listFiles() can return null
                dir.listFiles().foreach { file =>
                  if (file.isFile && !hadoopConfFiles.contains(file.getName())) {
                    hadoopConfFiles(file.getName()) = file
                  }
                }
              }
            }
          }
      

      To reproduce, simply do:

      sudo mkdir /tmp/conf
      sudo chmod 700 /tmp/conf
      export HADOOP_CONF_DIR=/etc/hadoop/conf
      export YARN_CONF_DIR=/tmp/conf
      spark-submit --master yarn-client SimpleApp.py
      

      It fails on any Spark app. Though not important, the SimpleApp.py I used looks like this:

      from pyspark import SparkContext
      
      sc = SparkContext(None, "Simple App")
      
      data = [1, 2, 3, 4, 5]
      distData = sc.parallelize(data)
      
      total = distData.reduce(lambda a, b: a + b)
      
      print("Total: %i" % total)
      

      Attachments

        Activity

          People

            chtyim Terence Yim
            chtyim Terence Yim
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: