Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.0.0
    • Deploy, PySpark
    • None

    Description

      If I run the following on a YARN cluster

      bin/spark-submit sheep.py --master yarn-client
      

      it fails because of a mismatch in paths: `spark-submit` thinks that `sheep.py` resides on HDFS, and balks when it can't find the file there. A natural workaround is to add the `file:` prefix to the file:

      bin/spark-submit file:/path/to/sheep.py --master yarn-client
      

      However, this also fails. This time it is because python does not understand URI schemes.

      This PR fixes this by automatically resolving all paths passed as command line argument to `spark-submit` properly. This has the added benefit of keeping file and jar paths consistent across different cluster modes.

      Attachments

        Activity

          People

            andrewor14 Andrew Or
            andrewor14 Andrew Or
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: