Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3788

Yarn dist cache code is not friendly to HDFS HA, Federation

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.1, 1.2.0
    • Component/s: YARN
    • Labels:
      None

      Description

      There are two bugs here.

      1. The compareFs() method in ClientBase considers the 'host' part of the URI to be an actual host. In the case of HA and Federation, that's a namespace name, which doesn't resolve to anything. So in those cases, compareFs() always says the file systems are different.

      2. In prepareLocalResources(), when adding a file to the distributed cache, that is done with the common FileSystem object instantiated at the start of the method. In the case of Federation that doesn't work: the qualified URL's scheme may differ from the non-qualified one, so the FileSystem instance will not work.

      Fixes are pretty trivial.

        Attachments

          Activity

            People

            • Assignee:
              vanzin Marcelo Masiero Vanzin
              Reporter:
              vanzin Marcelo Masiero Vanzin

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment