Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2847

Oozie Ha timing issue

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 4.3.0, 5.0.0b1, 5.0.0
    • Fix Version/s: 5.0.0
    • Component/s: HA
    • Labels:
      None
    • Flags:
      Patch

      Description

      Oozie Ha timing issue

      When Oozie is launching the mapper, it is writing a job id into a file on hdfs. Let's assume the ApplicationMaster is killed, and Oozie will make a second try, during recovery. On the second try, Oozie is trying to see if the previously written job id on hdfs matches the current job id. In most occasion, this will match. However, in the event when Oozie launcher is killed right in the middle when Oozie is in the process of writing id in the file, the Oozie file in hdfs is created, but the id has yet to be written to the file. During the next recovery, Oozie will mistakenly think the id exists in the file while the file is actually empty, therefore throwing this exception:

      2015-07-10 05:56:58,137|beaver.machine|INFO|5208|1344|MainThread|------------------------------------------------------------------------------------------------------------------------------------
      2015-07-10 05:56:58,137|beaver.machine|INFO|5208|1344|MainThread|Console URL       : http://dal-ha21:8088/proxy/application_1436507526035_0001/
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Error Code        : JA018
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Error Message     : Hadoop job Id mismatch, action file [hdfs://hdp2-ha2/user/hadoopqa/oozie-hado/0000003-150710041341636-oozie-hado-W/pig-node--pig/0000003-150710041341636-oozie-hado-W@pig-node@0] declares Id [null] current Id [job_1436507526035_0001]
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|External ID       : job_1436507526035_0001
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|External Status   : FAILED/KILLED
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Name              : pig-node
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Retries           : 0
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Tracker URI       : dal-ha21:8032
      2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Type              : pig
      2015-07-10 05:56:58,158|beaver.machine|INFO|5208|1344|MainThread|Started           : 2015-07-10 05:55:19 GMT
      2015-07-10 05:56:58,160|beaver.machine|INFO|5208|1344|MainThread|Status            : ERROR
      2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|Ended             : 2015-07-10 05:56:42 GMT
      2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|External Stats    : null
      2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|External ChildIDs : null
      2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|------------------------------------------------------------------------------------------------------------------------------------
      Exception:
      2015-07-10 05:56:18,658 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://hdp2-ha2:8020]
      2015-07-10 05:56:18,665 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at hdfs://hdp2-ha2:8020/user/hadoopqa/.staging/job_1436507526035_0001/job_1436507526035_0001_1.jhist
      2015-07-10 05:56:18,693 WARN [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Unable to parse prior job history, aborting recovery
      java.io.IOException: Incompatible event log version: null
      	at org.apache.hadoop.mapreduce.jobhistory.EventReader.<init>(EventReader.java:71)
      	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:139)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.parsePreviousJobHistory(MRAppMaster.java:1206)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.processRecovery(MRAppMaster.java:1175)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1039)
      	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
      	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
      2015-07-10 05:56:18,737 INFO [main] org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system [hdfs://hdp2-ha2:8020]
      2015-07-10 05:56:18,745 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at hdfs://hdp2-ha2:8020/user/hadoopqa/.staging/job_1436507526035_0001/job_1436507526035_0001_1.jhist
      

        Attachments

        1. OOZIE-2847-5.0.patch
          6 kB
          Denes Bodo
        2. OOZIE-2847-4.3.patch
          7 kB
          Denes Bodo
        3. OOZIE-2847.patch
          4 kB
          Péter Gergő Barna

          Activity

            People

            • Assignee:
              dionusos Denes Bodo
              Reporter:
              bpgergo Péter Gergő Barna
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified