Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1762

Pick up $SPARK_HOME/conf/spark-defaults.conf on startup

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • 0.13.0
    • None
    • None

    Description

      spark-defaults.conf is aimed to contain global configuration for Spark cluster. For example, in our HDP2.2 environment it contains:

      spark.driver.extraJavaOptions      -Dhdp.version=2.2.0.0–2041
      spark.yarn.am.extraJavaOptions     -Dhdp.version=2.2.0.0–2041
      

      and there are many other good things. Actually it is expected that when a user starts Spark Shell, it will be working fine. Unfortunately this does not happens with Mahout Spark Shell, because it ignores spark configuration and user has to copy-past lots of options into MAHOUT_OPTS.

      This happens because org.apache.mahout.sparkbindings.shell.Main is executed directly in initialization script:

      "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" "org.apache.mahout.sparkbindings.shell.Main" $@
      

      In contrast, in Spark shell is indirectly invoked through spark-submit in spark-shell script:

      "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
      

      SparkSubmit contains an additional initialization layer for loading properties file (see SparkSubmitArguments#mergeDefaultSparkProperties method).

      So there are two possible solutions:

      • use proper Spark-like initialization logic
      • use thin envelope like it is in H2O Sparkling Water (sparkling-shell)

      Attachments

        Issue Links

          Activity

            People

              rawkintrevo Trevor Grant
              sergeant Sergey Tryuber
              Votes:
              5 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: