Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1118 improve logic of purge service
  3. OOZIE-1401

PurgeCommand should purge the workflow jobs w/o end_time

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 5.0.0b1
    • Component/s: bundle, coordinator, workflow
    • Labels:
      None

      Description

      Currently, PurgeXCommand logic is not working with those workflow jobs with end_time=null. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking last_modified_time instead, if end_time is not available.

      The current query:

      select w from WorkflowJobBean w where w.endTimestamp < :endTime
      

      There is also an issue when:

      • there is a parent workflow that has its end_time set
      • is otherwise eligible for PurgeXCommand: end_time is older than configured number of days, and has status either KILLED, or FAILED, or SUCCEEDED
      • has a child workflow that has the parent_id set to the id of the parent workflow
      • child workflow has its end_time = NULL

      In this case, PurgeXCommand#fetchTerminatedWorkflow() throws a NullPointerException like this:

      2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
      2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Execute command [purge] key [null]
      2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
      2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, 
      java.lang.NullPointerException
      	at org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
      	at org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
      	at org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
      	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
      	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
      	at org.apache.oozie.command.XCommand.call(XCommand.java:286)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

        Attachments

        1. OOZIE-1401.amend.003.patch
          22 kB
          Andras Piros
        2. amend-OOZIE-1401-002.patch
          20 kB
          Attila Sasvári
        3. amend-OOZIE-1401-001.patch
          19 kB
          Attila Sasvári
        4. OOZIE-1401-001.patch
          4 kB
          Attila Sasvári

          Issue Links

            Activity

              People

              • Assignee:
                asasvari Attila Sasvári
                Reporter:
                chitnis Mona Chitnis
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: