Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1911

SLA calculation in HA mode does wrong bit comparison for 'start' and 'duration'

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None

      Description

      In chronological order:

      Server 1:
      Job's SLA eventProcessed set to 0101 => Start and End sla processed.

      Server 2:
      Receives above job's status event, processes remaining 'duration' sla. eventProcessed now = 0111, but incremented to 1000 due to

      SLACalculatorMemory.addJobStatus() : 762
      if (slaCalc.getEventProcessed() == 7) {
            slaInfo.setEventProcessed(8);
           slaMap.remove(jobId);
      }
      

      Back to Server 1: (doing periodic SLA checks)

      SLACalculatorMemory.updateJobSla() : 483
      if ((eventProc & 1) == 0) { // first bit (start-processed) unset
         if (reg.getExpectedStart() != null) {
               if (reg.getExpectedStart().getTime() + jobEventLatency < System.currentTimeMillis()) {
                     // goes ahead and enqueues another START_MISS event and DURATION_MET event
      

      Conclusion, need to fix that check for least significant bit (and next to it) for 'start' and 'duration' to avoid duplicate events

        Attachments

        1. OOZIE-1911-4.patch
          29 kB
          Mona Chitnis

          Issue Links

            Activity

              People

              • Assignee:
                chitnis Mona Chitnis
                Reporter:
                chitnis Mona Chitnis
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: