[SPARK-3560] In yarn-cluster mode, the same jars are distributed through multiple mechanisms. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.1.0
Fix Version/s: 1.1.1, 1.2.0
Component/s: YARN
Labels:
None

Target Version/s:

1.1.1, 1.2.0

Description

In yarn-cluster mode, jars given to spark-submit's --jars argument should be distributed to executors through the distributed cache, not through fetching.

Currently, Spark tries to distribute the jars both ways, which can cause executor errors related to trying to overwrite symlinks without write permissions.

It looks like this was introduced by ~~SPARK-2260~~, which sets spark.jars in yarn-cluster mode. Setting spark.jars is necessary for standalone cluster deploy mode, but harmful for yarn cluster deploy mode.

Attachments

Issue Links

is related to

SPARK-2260 Spark submit standalone-cluster mode is broken

Resolved

links to

[Github] Pull Request #2449 (Victsm)

Activity

People

Assignee:: Min Shen

Reporter:: Sandy Ryza

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Sep/14 00:38

Updated:: 27/Jun/16 22:08

Resolved:: 18/Sep/14 23:08