Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.2, 2.0.1
-
None
Description
spark-submit support jar url with http protocol
If the url contains any query strings, *worker.DriverRunner.downloadUserJar * method will throw "Did not see expected jar" exception. This is because this method checks the existance of a downloaded jar whose name contains query strings.
This is a problem when your jar is located on some web service which requires some additional information to retrieve the file. For example, to download a jar from s3 bucket via http, the url contains signature, datetime, etc as query string.
https://s3.amazonaws.com/deploy/spark-job.jar
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=<your-access-key-id>/20130721/us-east-1/s3/aws4_request
&X-Amz-Date=20130721T201207Z
&X-Amz-Expires=86400
&X-Amz-SignedHeaders=host
&X-Amz-Signature=<signature-value>
Worker will look for a jar named
"spark-job.jar?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=<your-access-key-id>/20130721/us-east-1/s3/aws4_request&X-Amz-Date=20130721T201207Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=<signature-value>"
instead of
"spark-job.jar"
Hence, all the query string should be removed before checking jar existance.
I created a pr to fix this, if anyone can review it.