Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1118 improve logic of purge service
  3. OOZIE-1401

PurgeCommand should purge the workflow jobs w/o end_time

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • trunk
    • 5.0.0b1
    • bundle, coordinator, workflow
    • None

    Description

      Currently, PurgeXCommand logic is not working with those workflow jobs with end_time=null. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking last_modified_time instead, if end_time is not available.

      The current query:

      select w from WorkflowJobBean w where w.endTimestamp < :endTime
      

      There is also an issue when:

      • there is a parent workflow that has its end_time set
      • is otherwise eligible for PurgeXCommand: end_time is older than configured number of days, and has status either KILLED, or FAILED, or SUCCEEDED
      • has a child workflow that has the parent_id set to the id of the parent workflow
      • child workflow has its end_time = NULL

      In this case, PurgeXCommand#fetchTerminatedWorkflow() throws a NullPointerException like this:

      2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
      2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Execute command [purge] key [null]
      2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
      2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, 
      java.lang.NullPointerException
      	at org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
      	at org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
      	at org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
      	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
      	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
      	at org.apache.oozie.command.XCommand.call(XCommand.java:286)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. OOZIE-1401-001.patch
          4 kB
          Attila Sasvári
        2. amend-OOZIE-1401-001.patch
          19 kB
          Attila Sasvári
        3. amend-OOZIE-1401-002.patch
          20 kB
          Attila Sasvári
        4. OOZIE-1401.amend.003.patch
          22 kB
          Andras Piros

        Issue Links

          Activity

            People

              asasvari Attila Sasvári
              chitnis Mona Chitnis
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: