Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21714

SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.2.1, 2.3.0
    • Component/s: Spark Submit
    • Labels:
      None

      Description

      SPARK-10643 added the ability for spark-submit to download remote file in client mode.

      However in yarn mode this introduced a bug where it downloads them for the client but then yarn client just reuploads them to HDFS and uses them again. This should not happen when the remote file is HDFS. This is wasting resources and its defeating the distributed cache because if the original object was public it would have been shared by many users. By us downloading and reuploading, it becomes private.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jerryshao Saisai Shao
                Reporter:
                tgraves Thomas Graves
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: