Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15371

YARNShuffleService doesn't get current local-dirs from NodeManager

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.6.2, 2.0.0
    • Fix Version/s: None
    • Component/s: Shuffle, YARN
    • Labels:
      None

      Description

      In YarnShuffleService.java, the YarnShuffleService loads in the conf settings from YARN to get a list of local directories, and then if it doesn't find an existing levelDB file on any of them (for recovery), it will create one in the directory that is the first element of the list. Since it isn't asking YARN for the current list of healthy local-dirs (rather just the ones in the config), if the first directory is a known-bad location to the NodeManager, YarnShuffleService will continue to try to use it.

      Removing the bad directory from the config fixes this, but Spark should get a current list from YARN instead of using the list from the config. There are examples of this in https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestDiskFailures.java but I'm not sure the right way for Spark to implement that.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jfield Jeff Field
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: