Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6797

Job history server scans can become blocked on a single, slow entry

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4.0, 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: jobhistoryserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There is one more piece of code in HistoryFileManager where Synchronized keyword on HistoryFileInfo need to be removed. The JobHistoryServer contention issue is hit on our environment where stacktrace (attached) shows the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock on HistoryFileInfo.

      Synchronized on isMovePending and didMoveFail has been removed by Mapreduce-6684.

      HistoryFileInfo firstValue = cache.get(key);
          synchronized(firstValue) {  ---------------> Synchronized is not needed here
                    if (firstValue.isMovePending()) {
                      if(firstValue.didMoveFail() && 
                          firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
                        cache.remove(key);
                        //Now lets try to delete it
                        try {
                          firstValue.delete();
                        } catch (IOException e) {
                          LOG.error("Error while trying to delete history files" +
                          " that could not be moved to done.", e);
                        }
                      } else {
                        LOG.warn("Waiting to remove " + key
                            + " from JobListCache because it is not in done yet.");
                      }
                    } else {
                      cache.remove(key);
                    }
                  }
      
      
      Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest hadoop as well
      
      "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 nid=0xf133 waiting for monitor entry [0x00007f7c1d8dd000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
              - waiting to lock <0x000000040145c4d8> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
              - locked <0x0000000400375388> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
              at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
              at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
              at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
      

        Attachments

        1. jstack
          1.12 MB
          Prabhu Joseph
        2. 0001-MAPREDUCE-6797.patch
          2 kB
          Prabhu Joseph
        3. 0002-MAPREDUCE-6797.patch
          3 kB
          Prabhu Joseph
        4. 0003-MAPREDUCE-6797.patch
          3 kB
          Prabhu Joseph
        5. 0004-MAPREDUCE-6797.patch
          3 kB
          Prabhu Joseph

          Issue Links

            Activity

              People

              • Assignee:
                Prabhu Joseph Prabhu Joseph
                Reporter:
                Prabhu Joseph Prabhu Joseph
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: