Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3560

In yarn-cluster mode, the same jars are distributed through multiple mechanisms.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.1.1, 1.2.0
    • Component/s: YARN
    • Labels:
      None

      Description

      In yarn-cluster mode, jars given to spark-submit's --jars argument should be distributed to executors through the distributed cache, not through fetching.

      Currently, Spark tries to distribute the jars both ways, which can cause executor errors related to trying to overwrite symlinks without write permissions.

      It looks like this was introduced by SPARK-2260, which sets spark.jars in yarn-cluster mode. Setting spark.jars is necessary for standalone cluster deploy mode, but harmful for yarn cluster deploy mode.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mshen Min Shen
                Reporter:
                sandyr Sandy Ryza
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: