Details
-
Bug
-
Status: In Progress
-
Minor
-
Resolution: Unresolved
-
3.0.0, 3.0.1, 3.0.2, 3.1.0
-
None
-
None
Description
In AWS one can generate so-called presigned URLs. spark-submit accepts URLs for the driver program, e.g. http://my-web-server/driver.py. Now a presigned URL has a query fragment http://my-web-server/driver.py?signature.
Now the check for whether the given URL is a python driver simply checks whether it ends in .py – which the presigned URL does not, as it ends in signature.
The relevant check is in SparkSubmit.scala, Line 1051 (commit tagged v3.0.1):
Here is a more realistic example URL:
A fix could be to parse the the given path as a java.net.URI and look for the pathname to end in .py (as opposed to the whole thing).
To circumvent this issue I am currently appending a fragment to the query which makes it end in .py, i.e. http://my-web-server/driver.py?signature#.py which does work.