[SPARK-17855] Spark worker throw Exception when uber jar's http url contains query string - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.2, 2.0.1
Fix Version/s: 2.1.0
Component/s: Spark Core
Labels:
None

Description

spark-submit support jar url with http protocol

If the url contains any query strings, *worker.DriverRunner.downloadUserJar * method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings.

This is a problem when your jar is located on some web service which requires some additional information to retrieve the file. For example, to download a jar from s3 bucket via http, the url contains signature, datetime, etc as query string.

https://s3.amazonaws.com/deploy/spark-job.jar
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=<your-access-key-id>/20130721/us-east-1/s3/aws4_request
&X-Amz-Date=20130721T201207Z
&X-Amz-Expires=86400
&X-Amz-SignedHeaders=host
&X-Amz-Signature=<signature-value>

Worker will look for a jar named

"spark-job.jar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=<your-access-key-id>/20130721/us-east-1/s3/aws4_request&X-Amz-Date=20130721T201207Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=<signature-value>"

instead of

"spark-job.jar"

Hence, all the query string should be removed before checking jar existance.

I created a pr to fix this, if anyone can review it.

Attachments

Issue Links

links to

[Github] Pull Request #15420 (invkrh)

Activity

People

Assignee:: Hao Ren

Reporter:: Hao Ren

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Oct/16 13:03

Updated:: 14/Oct/16 11:53

Resolved:: 14/Oct/16 11:52

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified