Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
spark-defaults.conf is aimed to contain global configuration for Spark cluster. For example, in our HDP2.2 environment it contains:
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into MAHOUT_OPTS.
This happens because org.apache.mahout.sparkbindings.shell.Main is executed directly in initialization script:
"$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" "org.apache.mahout.sparkbindings.shell.Main" $@
In contrast, in Spark shell is indirectly invoked through spark-submit in spark-shell script:
"$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
SparkSubmit contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).
So there are two possible solutions:
- use proper Spark-like initialization logic
- use thin envelope like it is in H2O Sparkling Water (sparkling-shell)
Attachments
Issue Links
- relates to
-
MAHOUT-1951 Drivers don't run with remote Spark
- Resolved
- links to