Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
trunk
-
None
Description
Currently, PurgeXCommand logic is not working with those workflow jobs with end_time=null. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking last_modified_time instead, if end_time is not available.
The current query:
select w from WorkflowJobBean w where w.endTimestamp < :endTime
There is also an issue when:
- there is a parent workflow that has its end_time set
- is otherwise eligible for PurgeXCommand: end_time is older than configured number of days, and has status either KILLED, or FAILED, or SUCCEEDED
- has a child workflow that has the parent_id set to the id of the parent workflow
- child workflow has its end_time = NULL
In this case, PurgeXCommand#fetchTerminatedWorkflow() throws a NullPointerException like this:
2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on 2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Execute command [purge] key [null] 2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days. 2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, java.lang.NullPointerException at org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249) at org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227) at org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199) at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150) at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Attachments
Attachments
Issue Links
- duplicates
-
OOZIE-148 GH-126: PurgeCommand should purge the workflow jobs w/o end_time
- Closed