Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21859

SparkFiles.get failed on driver in yarn-cluster and yarn-client mode

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.2
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      when use SparkFiles.get a file on driver in yarn-client or yarn-cluster, it will report file not found exception.
      This exception only happens on driver, SparkFiles.get works fine on executor.

      we can reproduce the bug as follows:
      ```scala
      def testOnDriver(fileName: String) = {
      val file = new File(SparkFiles.get(fileName))
      if (!file.exists())

      { logging.info(s"$file not exist") }

      else {
      // print file content on driver
      val content = Source.fromFile(file).getLines().mkString("\n")
      logging.info(s"File content: ${content}")
      }
      }
      // the output will be file not exist
      ```

      ```python
      conf = SparkConf().setAppName("test files")
      sc = SparkContext(appName="spark files test")

      def test_on_driver(filename):
      file = SparkFiles.get(filename)
      print("file path: {}".format(file))
      if os.path.exists(file):
      with open(file) as f:
      lines = f.readlines()
      print(lines)
      else:
      print("file doesn't exist")
      run_command("ls .")
      ```

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lgrcyanny Cyanny
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: