Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-1984

SLACalculator in HA mode performs duplicate operations on records with completed jobs

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 4.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Scenario:

      SLA periodic run has already processed start,duration and end for a job's sla entry. But job notification for that job came after this, and triggers the sla listener.

      Buggy part:

      SLACalculatorMemory.java
      
      else if (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) {
                      // jobid might not exist in slaMap in HA Setting
                      SLARegistrationBean slaRegBean = SLARegistrationQueryExecutor.getInstance().get(
                              SLARegQuery.GET_SLA_REG_ALL, jobId);
                      if (slaRegBean != null) { // filter out jobs picked by SLA job event listener
                                                // but not actually configured for SLA
                          SLASummaryBean slaSummaryBean = SLASummaryQueryExecutor.getInstance().get(
                                  SLASummaryQuery.GET_SLA_SUMMARY, jobId);
                          slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean);
                          if (slaCalc.getEventProcessed() < 7) {
                              slaMap.put(jobId, slaCalc);
                          }
                      }
                  }
              }
              if (slaCalc != null) {
      ..
      Object eventProcObj = ((SLASummaryQueryExecutor) SLASummaryQueryExecutor.getInstance())
                                      .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId);
                              byte eventProc = ((Byte) eventProcObj).byteValue();
      ..
      processJobEndSuccessSLA(slaCalc, startTime, endTime);
      

      method processJobEndSuccesSLA goes ahead and checks second LSB bit of eventProc and sends duration event again. So the bug here is two-fold:

      • if all events are already processed, still invokes this function
      • event processed is 8 (1000), so second LSB bit is unset and hence duration processed.

      Fix - not invoke function when eventProc = 1000

        Attachments

        1. OOZIE-1984.patch
          1 kB
          Mona Chitnis
        2. OOZIE-1984-1.patch
          1.0 kB
          Mona Chitnis

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                chitnis Mona Chitnis
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: