Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 5.0.0b1
    • Component/s: bundle, coordinator, workflow
    • Labels:
      None

      Description

      Currently, PurgeXCommand logic is not working with those workflow jobs with end_time=null. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking last_modified_time instead, if end_time is not available.

      The current query:

      select w from WorkflowJobBean w where w.endTimestamp < :endTime
      

      There is also an issue when:

      • there is a parent workflow that has its end_time set
      • is otherwise eligible for PurgeXCommand: end_time is older than configured number of days, and has status either KILLED, or FAILED, or SUCCEEDED
      • has a child workflow that has the parent_id set to the id of the parent workflow
      • child workflow has its end_time = NULL

      In this case, PurgeXCommand#fetchTerminatedWorkflow() throws a NullPointerException like this:

      2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
      2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Execute command [purge] key [null]
      2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
      2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, 
      java.lang.NullPointerException
      	at org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
      	at org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
      	at org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
      	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
      	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
      	at org.apache.oozie.command.XCommand.call(XCommand.java:286)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

        Attachments

        1. OOZIE-1401-001.patch
          4 kB
          Attila Sasvari
        2. amend-OOZIE-1401-001.patch
          19 kB
          Attila Sasvari
        3. amend-OOZIE-1401-002.patch
          20 kB
          Attila Sasvari
        4. OOZIE-1401.amend.003.patch
          22 kB
          Andras Piros

          Issue Links

            Activity

              People

              • Assignee:
                asasvari Attila Sasvari
                Reporter:
                chitnis Mona Chitnis
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: