Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2170

Oozie should automatically set configs to make Spark jobs show up in the Spark History Server

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 4.2.0
    • Component/s: action
    • Labels:
      None

      Description

      If you use "yarn-cluster" for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM.

      The user needs to set this in their Spark action in the workflow.xml:

      <spark-opts>--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true</spark-opts>
      

      It would be nice if Oozie did this automatically via some oozie-site.xml config(s). We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the <job-tracker>.

      While we're at it, it might be good to document how to use Spark on YARN:

      1. Include the spark-assembly jar with your workflow (this is unfortunately not published in maven)
      2. Specify "yarn-cluster" as the master

      Also, the Spark example should delete the output dir in <prepare>

        Attachments

        1. OOZIE-2170.patch
          25 kB
          Robert Kanter
        2. OOZIE-2170.patch
          25 kB
          Robert Kanter
        3. OOZIE-2170.patch
          24 kB
          Robert Kanter

          Activity

            People

            • Assignee:
              rkanter Robert Kanter
              Reporter:
              rkanter Robert Kanter
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: