Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2479

SparkContext Not Using Yarn Config

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.2.0
    • Fix Version/s: None
    • Component/s: workflow
    • Labels:
      None
    • Environment:

      Oozie 4.2.0.2.3.4.0-3485
      Spark 1.4.1
      Scala 2.10.5
      HDP 2.3

      Description

      The spark action does not appear to use the jobTracker setting in job.properties (or in the yarn config) when creating the SparkContext. When jobTracker property is set to use myDomain:8050 (to match the yarn.resourcemanager.address setting), I can see in the oozie UI (click on job > action > action configuration) that myDomain:8050 is being submitted but when I drill down into the hadoop job history logs I see the error indicating that a default 0.0.0.0:8032 is being used:

      job.properties

      nameNode=hdfs://myDomain:8020
      jobTracker=myOtherDomain:8050
      queueName=default
      master=yarn # have also tried yarn-cluster and yarn-client
       
      oozie.use.system.libpath=true
      oozie.wf.application.path=${nameNode}/bmp/
      oozie.action.sharelib.for.spark=spark2 # I've added the updated spark libs I need in here
      

      workflow

      <workflow-app xmlns='uri:oozie:workflow:0.5' name='MyWorkflow'>
          <start to='spark-node' />
          <action name='spark-node'>
              <spark xmlns="uri:oozie:spark-action:0.1">
                  <job-tracker>${jobTracker}</job-tracker>
                  <name-node>${nameNode}</name-node>
                  <prepare>
                      <delete path="${nameNode}/bmp/output"/>
                  </prepare>
                  <master>${master}</master>
                  <name>My Workflow</name>
                  <class&gt;uk.co.bmp.drivers.MyDriver</class&gt;
                  <jar>${nameNode}/bmp/lib/bmp.spark-assembly-1.0.jar</jar>
                  <spark-opts>--conf spark.yarn.historyServer.address=http://myDomain:18088 --conf spark.eventLog.dir=hdfs://myDomain/user/spark/applicationHistory --conf spark.eventLog.enabled=true</spark-opts>
                  <arg>${nameNode}/bmp/input/input_file.csv</arg>
              </spark>
              <ok to="end" />
              <error to="fail" />
          </action>
          <kill name="fail">
              <message>Workflow failed, error
                  message[${wf:errorMessage(wf:lastErrorNode())}]
              </message>
          </kill>
          <end name='end' />
      </workflow-app>
      

      Error

      Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception,Call From myDomain/ipAddress to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused. For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
      ...
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
      ...
      

      Where is it pulling 8032 from? Why does it not use the port configured in the job.properties?

        Attachments

          Activity

            People

            • Assignee:
              satishsaley Satish Saley
              Reporter:
              OneDeadEar Breandán Mac Parland
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: