Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3788

Yarn dist cache code is not friendly to HDFS HA, Federation

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.1, 1.2.0
    • YARN
    • None

    Description

      There are two bugs here.

      1. The compareFs() method in ClientBase considers the 'host' part of the URI to be an actual host. In the case of HA and Federation, that's a namespace name, which doesn't resolve to anything. So in those cases, compareFs() always says the file systems are different.

      2. In prepareLocalResources(), when adding a file to the distributed cache, that is done with the common FileSystem object instantiated at the start of the method. In the case of Federation that doesn't work: the qualified URL's scheme may differ from the non-qualified one, so the FileSystem instance will not work.

      Fixes are pretty trivial.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vanzin Marcelo Masiero Vanzin
            vanzin Marcelo Masiero Vanzin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment