Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
trunk
-
None
-
None
Description
In chronological order:
Server 1:
Job's SLA eventProcessed set to 0101 => Start and End sla processed.
Server 2:
Receives above job's status event, processes remaining 'duration' sla. eventProcessed now = 0111, but incremented to 1000 due to
SLACalculatorMemory.addJobStatus() : 762
if (slaCalc.getEventProcessed() == 7) {
slaInfo.setEventProcessed(8);
slaMap.remove(jobId);
}
Back to Server 1: (doing periodic SLA checks)
SLACalculatorMemory.updateJobSla() : 483 if ((eventProc & 1) == 0) { // first bit (start-processed) unset if (reg.getExpectedStart() != null) { if (reg.getExpectedStart().getTime() + jobEventLatency < System.currentTimeMillis()) { // goes ahead and enqueues another START_MISS event and DURATION_MET event
Conclusion, need to fix that check for least significant bit (and next to it) for 'start' and 'duration' to avoid duplicate events
Attachments
Attachments
Issue Links
- is superceded by
-
OOZIE-1933 SLACalculatorMemory HA changes assume SLARegistrationBean exists for all jobs
- Closed
- links to