Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3788

Yarn dist cache code is not friendly to HDFS HA, Federation

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.1, 1.2.0
    • Component/s: YARN
    • Labels:
      None

      Description

      There are two bugs here.

      1. The compareFs() method in ClientBase considers the 'host' part of the URI to be an actual host. In the case of HA and Federation, that's a namespace name, which doesn't resolve to anything. So in those cases, compareFs() always says the file systems are different.

      2. In prepareLocalResources(), when adding a file to the distributed cache, that is done with the common FileSystem object instantiated at the start of the method. In the case of Federation that doesn't work: the qualified URL's scheme may differ from the non-qualified one, so the FileSystem instance will not work.

      Fixes are pretty trivial.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vanzin Marcelo Masiero Vanzin
                Reporter:
                vanzin Marcelo Masiero Vanzin
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: