[SPARK-21714] SparkSubmit in Yarn Client mode downloads remote files and then reuploads them again - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.1, 2.3.0
Component/s: Spark Submit
Labels:
None

Description

~~SPARK-10643~~ added the ability for spark-submit to download remote file in client mode.

However in yarn mode this introduced a bug where it downloads them for the client but then yarn client just reuploads them to HDFS and uses them again. This should not happen when the remote file is HDFS. This is wasting resources and its defeating the distributed cache because if the original object was public it would have been shared by many users. By us downloading and reuploading, it becomes private.

Attachments

Issue Links

is broken by

SPARK-10643 Support remote application download in client mode spark submit

Resolved

relates to

SPARK-21689 Spark submit will not get kerberos token token when hbase class not found

Closed

links to

[Github] Pull Request #19074 (jerryshao)

https://github.com/apache/spark/pull/18962

Activity

People

Assignee:: Saisai Shao

Reporter:: Thomas Graves

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Aug/17 18:54

Updated:: 30/Aug/17 19:33

Resolved:: 25/Aug/17 16:59