Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
There are few places where job status is not updated properly
1. Receiving event which is out of order.
Ex "oozie.service.EventHandlerService.batch.size" is set to 50.
oozie.service.EventHandlerService.worker.threads is set to 15. Which means that there will be 15 thread processing event in the batch of 50.
It can happen that 51th event gets process before the 49th event.
If 49th is job started event and 51th is job completed event, then the job status will get overridden to running
2.
case COORDINATOR_ACTION: CoordinatorActionBean ca = jpaService.execute(new CoordActionGetForSLAJPAExecutor(slaCalc.getId())); if (ca.isTerminalWithFailure()) { isEndMiss = ended = true; slaCalc.setActualEnd(ca.getLastModifiedTime()); } if (ca.getExternalId() != null) { wf = jpaService.execute(new WorkflowJobGetForSLAJPAExecutor(ca.getExternalId())); if (wf.getEndTime() != null) { ended = true; if (wf.getEndTime().getTime() > slaCalc.getExpectedEnd().getTime()) { isEndMiss = true; } } slaCalc.setActualEnd(wf.getEndTime()); slaCalc.setActualStart(wf.getStartTime()); }
Oozie checks the wf status and update the sla status with coord job status.
We might have a case where coord is still running,but wf has ended.
3. HistoryPurgeWorker updates endtime but doesn't update status.
4. There other few locking issues.
Attachments
Attachments
Issue Links
- links to