Description
There are two issues w/ Spark action's argument parsing within SparkMain
Driver and executor extra classpaths: equals sign used
When the user specifies --conf spark.executor.extraClassPath=XYZ or --conf spark.driver.extraClassPath=ABC, the option --conf will be added to sparkArgs. Then when the code tries to evaluate spark.executor.extraClassPath=XYZ, it uses special logic and set addToSparkArgs = false. As a result there will be a extra --conf in the sparkArgs eventually.
For example: --conf spark.executor.extraClassPath=XYZ --conf otherProperty=ABC will become --conf --conf otherProperty=ABC, which will cause spark job submit failure later.
We might need to remove one prior --conf in sparkArgs if the current evaluated opt is EXECUTOR_CLASSPATH or DRIVER_CLASSPATH.
User provided files and archives: equals sign used
For the following workflow XML snippet:
<spark-opts>--files=${nameNode}/home/share/hive-site.xml --num-executors 4 --executor-memory 7g --driver-memory 7g</spark-opts>
the --files=${nameNode}/home/share/hive-site.xml opt will be placed into sparkArgs in previous Oozie version without any modification, because we don't have special handling for --files opt.
If the user specifies --files=${nameNode}/home/share/hive-site.xml --num-executor 4, then SparkMain code treats --num-executor as a file path / name. That caused the issue as I described in my previous comment. We might need to change the handling logic for FILES_OPTION and ARCHIVES_OPTION to be the same to DRIVER_CLASSPATH_OPTION.
Attachments
Attachments
Issue Links
- is related to
-
OOZIE-3228 [Spark action] Can't load properties from spark-defaults.conf
- Closed
- links to