Description
Here is an example:
./bin/spark-submit --class Foo some.jar --help
SInce --help appears behind the primary resource (i.e. some.jar), it should be recognized as a user application option. But it's actually overriden by spark-submit and will show spark-submit help message.
When directly invoking spark-submit, the constraints here are:
- Options before primary resource should be recognized as spark-submit options
- Options after primary resource should be recognized as user application options
The tricky part is how to handle scripts like spark-shell that delegate spark-submit. These scripts allow users specify both spark-submit options like --master and user defined application options together. For example, say we'd like to write a new script start-thriftserver.sh to start the Hive Thrift server, basically we may do this:
$SPARK_HOME/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal $@
Then user may call this script like:
./sbin/start-thriftserver.sh --master spark://some-host:7077 --hiveconf key=value
Notice that all options are captured by $@. If we put it before spark-internal, they are all recognized as spark-submit options, thus -hiveconf won't be passed to HiveThriftServer2; if we put it after spark-internal, they should all be recognized as options of HiveThriftServer2, but because of this bug, -master is still recognized as spark-submit option and leads to the right behavior.
Although currently all scripts using spark-submit work correctly, we still should fix this bug, because it causes option name collision between spark-submit and user application, and every time we add a new option to spark-submit, some existing user applications may break. However, solving this bug may cause some incompatible changes.
The suggested solution here is using -- as separator of spark-submit options and user application options. For the Hive Thrift server example above, user should call it in this way:
./sbin/start-thriftserver.sh --master spark://some-host:7077 -- --hiveconf key=value
And SparkSubmitArguments should be responsible for splitting two sets of options and pass them correctly.
Attachments
Issue Links
- is duplicated by
-
SPARK-2894 spark-shell doesn't accept flags
- Resolved
-
SPARK-2880 spark-submit processes app cmdline options
- Resolved
- is related to
-
SPARK-2110 Misleading help displayed for interactive mode pyspark --help
- Resolved
-
SPARK-2874 Spark SQL related scripts don't show complete usage message
- Resolved
- links to