Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2678

`Spark-submit` overrides user application options

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.0.1, 1.0.2
    • 1.1.0
    • Deploy
    • None

    Description

      Here is an example:

      ./bin/spark-submit --class Foo some.jar --help
      

      SInce --help appears behind the primary resource (i.e. some.jar), it should be recognized as a user application option. But it's actually overriden by spark-submit and will show spark-submit help message.

      When directly invoking spark-submit, the constraints here are:

      1. Options before primary resource should be recognized as spark-submit options
      2. Options after primary resource should be recognized as user application options

      The tricky part is how to handle scripts like spark-shell that delegate spark-submit. These scripts allow users specify both spark-submit options like --master and user defined application options together. For example, say we'd like to write a new script start-thriftserver.sh to start the Hive Thrift server, basically we may do this:

      $SPARK_HOME/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal $@
      

      Then user may call this script like:

      ./sbin/start-thriftserver.sh --master spark://some-host:7077 --hiveconf key=value
      

      Notice that all options are captured by $@. If we put it before spark-internal, they are all recognized as spark-submit options, thus -hiveconf won't be passed to HiveThriftServer2; if we put it after spark-internal, they should all be recognized as options of HiveThriftServer2, but because of this bug, -master is still recognized as spark-submit option and leads to the right behavior.

      Although currently all scripts using spark-submit work correctly, we still should fix this bug, because it causes option name collision between spark-submit and user application, and every time we add a new option to spark-submit, some existing user applications may break. However, solving this bug may cause some incompatible changes.

      The suggested solution here is using -- as separator of spark-submit options and user application options. For the Hive Thrift server example above, user should call it in this way:

      ./sbin/start-thriftserver.sh --master spark://some-host:7077 -- --hiveconf key=value
      

      And SparkSubmitArguments should be responsible for splitting two sets of options and pass them correctly.

      Attachments

        Issue Links

          Activity

            People

              sarutak Kousuke Saruta
              lian cheng Cheng Lian
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: