Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1911

SLA calculation in HA mode does wrong bit comparison for 'start' and 'duration'

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • trunk
    • 4.1.0
    • None
    • None

    Description

      In chronological order:

      Server 1:
      Job's SLA eventProcessed set to 0101 => Start and End sla processed.

      Server 2:
      Receives above job's status event, processes remaining 'duration' sla. eventProcessed now = 0111, but incremented to 1000 due to

      SLACalculatorMemory.addJobStatus() : 762
      if (slaCalc.getEventProcessed() == 7) {
            slaInfo.setEventProcessed(8);
           slaMap.remove(jobId);
      }
      

      Back to Server 1: (doing periodic SLA checks)

      SLACalculatorMemory.updateJobSla() : 483
      if ((eventProc & 1) == 0) { // first bit (start-processed) unset
         if (reg.getExpectedStart() != null) {
               if (reg.getExpectedStart().getTime() + jobEventLatency < System.currentTimeMillis()) {
                     // goes ahead and enqueues another START_MISS event and DURATION_MET event
      

      Conclusion, need to fix that check for least significant bit (and next to it) for 'start' and 'duration' to avoid duplicate events

      Attachments

        1. OOZIE-1911-4.patch
          29 kB
          Mona Chitnis

        Issue Links

          Activity

            People

              chitnis Mona Chitnis
              chitnis Mona Chitnis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: