Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15371

YARNShuffleService doesn't get current local-dirs from NodeManager

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.6.2, 2.0.0
    • None
    • Shuffle, Spark Core, YARN
    • None

    Description

      In YarnShuffleService.java, the YarnShuffleService loads in the conf settings from YARN to get a list of local directories, and then if it doesn't find an existing levelDB file on any of them (for recovery), it will create one in the directory that is the first element of the list. Since it isn't asking YARN for the current list of healthy local-dirs (rather just the ones in the config), if the first directory is a known-bad location to the NodeManager, YarnShuffleService will continue to try to use it.

      Removing the bad directory from the config fixes this, but Spark should get a current list from YARN instead of using the list from the config. There are examples of this in https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java but I'm not sure the right way for Spark to implement that.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tfield Terra Field
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: