[SPARK-1900] Fix running PySpark files on YARN - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.0.0
Component/s: Deploy, PySpark
Labels:
None

Description

If I run the following on a YARN cluster

bin/spark-submit sheep.py --master yarn-client

it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:

bin/spark-submit file:/path/to/sheep.py --master yarn-client

However, this also fails. This time it is because python does not understand URI schemes.

This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes.

Attachments

Activity

People

Assignee:: Andrew Or

Reporter:: Andrew Or

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/May/14 08:31

Updated:: 05/Nov/14 10:45

Resolved:: 25/May/14 02:08