Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2579

Bulk kill tests in TestBulkWorkflowXCommand might fail because of a race condition

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.0
    • Component/s: None
    • Labels:
      None

      Description

      There are tests in TestBulkWorkflowXCommand which perform bulk killing.

      This might fail sometimes, because the externalChildIDs is set to "00000001-dummy-oozie-wrkf-W " that causes an action to fail:

      org.apache.oozie.action.ActionExecutorException: IllegalArgumentException: JobId string : 00000001-dummy-oozie-wrkf-W is not properly formed
      	at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
      	at org.apache.oozie.action.hadoop.JavaActionExecutor.kill(JavaActionExecutor.java:1614)
      	at org.apache.oozie.command.wf.ActionKillXCommand.execute(ActionKillXCommand.java:146)
      	at org.apache.oozie.command.wf.ActionKillXCommand.execute(ActionKillXCommand.java:1)
      	at org.apache.oozie.command.XCommand.call(XCommand.java:287)
      	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:331)
      	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:260)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.IllegalArgumentException: JobId string : 00000001-dummy-oozie-wrkf-W is not properly formed
      	at org.apache.hadoop.mapreduce.JobID.forName(JobID.java:154)
      	at org.apache.hadoop.mapred.JobID.forName(JobID.java:78)
      	at org.apache.oozie.action.hadoop.MapReduceActionExecutor.getRunningJob(MapReduceActionExecutor.java:342)
      	at org.apache.oozie.action.hadoop.JavaActionExecutor.kill(JavaActionExecutor.java:1604)
      	... 10 more
      

      Since this code runs on a separate thread, it might randomly interfere with the main test logic, which expects the job status to be "KILLED", but sometimes the ActionKillXCommand has a chance to update it to "FAILED".

      Solution: set a proper (parseable) job id:

      action.setExternalChildIDs("job_201601011800_0001");
      

      See https://hadoop.apache.org/docs/r2.6.2/api/org/apache/hadoop/mapred/JobID.html

        Attachments

        1. OOZIE-2579-001.patch
          0.7 kB
          Peter Bacsko
        2. OOZIE-2579-002.patch
          0.7 kB
          Peter Bacsko

          Activity

            People

            • Assignee:
              pbacsko Peter Bacsko
              Reporter:
              pbacsko Peter Bacsko
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: