Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28095

Pyspark with kubernetes doesn't parse arguments with spaces as expected.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 2.4.3
    • None
    • Kubernetes, PySpark, Spark Core
    • Python 2.7.13

      Spark 2.4.3

      Kubernetes

       

    Description

      When passing in arguments to a bash script that sets up spark submit using a python file that sets up a pyspark context strings with spaces are processed as individual strings. This occurs even when the argument is encased in double quotes, using backslashes or unicode escape characters.

       

      Example

      Command entered: This uses and IBM specific driver, stochater hence the cos url

      ./scripts/spark-k8s.sh v0.0.32 --job-args "cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" --job pages

       

      Error Message

       

      + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/work-dir/main.py --job-args cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer --job pages
      19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
      19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      usage: main.py [-h] --job JOB --job-args JOB_ARGS
      main.py: error: unrecognized arguments: Balancer
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            emmadickson Emma Dickson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: