Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34438

Python Driver is not correctly detected using presigned URLs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.0.0, 3.0.1, 3.0.2, 3.1.0
    • Fix Version/s: None
    • Component/s: Spark Submit
    • Labels:
      None

      Description

      In AWS one can generate so-called presigned URLs. spark-submit accepts URLs for the driver program, e.g. http://my-web-server/driver.py. Now a presigned URL has a query fragment http://my-web-server/driver.py?signature.

      Now the check for whether the given URL is a python driver simply checks whether it ends in .py – which the presigned URL does not, as it ends in signature.

      The relevant check is in SparkSubmit.scala, Line 1051 (commit tagged v3.0.1):

      https://github.com/apache/spark/blob/v3.0.1/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1051 

      Here is a more realistic example URL:

      https://bucket-name.s3.us-east-1.amazonaws.com/driver.py?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIATBNPKWPCNUMWMLUR%2F20210214%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210214T062047Z&X-Amz-Expires=172800&X-Amz-SignedHeaders=host&X-Amz-Signature=49ef39b6bb7090001af9312692788892551916a6ac0ff6c961ce52efb9acc235

      A fix could be to parse the the given path as a java.net.URI and look for the pathname to end in .py (as opposed to the whole thing).

      To circumvent this issue I am currently appending a fragment to the query which makes it end in .py, i.e. http://my-web-server/driver.py?signature#.py which does work.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              scravy Julian Fleischer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: