Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.4, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None

      Description

      Problem:
      HistoryFileManager.addIfAbsent produces large amount of logs if number of
      cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
      larger than mapreduce.jobhistory.joblist.cache.size by far.

      Example:
      For example, if the cache contains 50000 entries in total and 10,000 entries
      newer than mapreduce.jobhistory.max-age-ms where
      mapreduce.jobhistory.joblist.cache.size is 20000, HistoryFileManager.addIfAbsent
      method produces 50000 - 20000 = 30000 lines of "Waiting to remove <key> from
      JobListCache because it is not in done yet" message.

      It will attach a stacktrace.

      Impact:
      In addition to large disk consumption, this issue blocks JobHistory.getJob
      long time and slows job execution down significantly because getJob is called
      by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
      This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
      eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
      multiple threads call scanIfNeeded simultaneously, one of them acquires lock
      and the other threads are blocked until the first thread completes long-running
      HistoryFileManager.addIfAbsent call.

      Solution:

      • Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too long time.
      • Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
        scanning if another thread is already scanning. This changes semantics of
        some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
        because scanIfNeeded keep outdated state.
      • Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
        not blocked by a loop at scale of tens of thousands.

      This patch implemented the first item.

      1. MAPREDUCE-6436.1.patch
        4 kB
        Ryu Kobayashi
      2. MAPREDUCE-6436.2.patch
        3 kB
        Kai Sasaki
      3. MAPREDUCE-6436.3.patch
        3 kB
        Kai Sasaki
      4. MAPREDUCE-6436.4.patch
        3 kB
        Kai Sasaki
      5. stacktrace1.txt
        59 kB
        Ryu Kobayashi
      6. stacktrace2.txt
        58 kB
        Ryu Kobayashi
      7. stacktrace3.txt
        60 kB
        Ryu Kobayashi

        Issue Links

          Activity

          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 18s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 8m 4s There were no new javac warning messages.
          +1 javadoc 10m 12s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 30s The applied patch generated 1 new checkstyle issues (total was 16, now 17).
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 install 1m 25s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 0m 53s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 mapreduce tests 5m 53s Tests passed in hadoop-mapreduce-client-hs.
              44m 15s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12745774/MAPREDUCE-6436.1.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / ee36f4f
          checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-hs.txt
          whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/whitespace.txt
          hadoop-mapreduce-client-hs test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt
          Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 18s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 4s There were no new javac warning messages. +1 javadoc 10m 12s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 30s The applied patch generated 1 new checkstyle issues (total was 16, now 17). -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 install 1m 25s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 53s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 mapreduce tests 5m 53s Tests passed in hadoop-mapreduce-client-hs.     44m 15s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12745774/MAPREDUCE-6436.1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / ee36f4f checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-hs.txt whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/whitespace.txt hadoop-mapreduce-client-hs test log https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5894/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          Thanks for working on this issue Ryu Kobayashi! It looks like the log will only be printed for the HistoryFileInfo at state IN_INTERMEDIATE or MOVE_FAILED. I think most of HistoryFileInfo should be at state IN_DONE.

          HistoryFileManager.addIfAbsent method produces 50000 - 20000 = 30000 lines of "Waiting to remove <key> from JobListCache because it is not in done yet" message

          The above statement may not be a valid case unless you have a performance issue at HDFS which cause HistoryFileInfo#moveToDone take very long time.
          The direct cause for your issue may be a HDFS performance issue. But we can improve the logs to print less message.
          About your patch, Changing scanIfNeeded to nonblocking may not be good because the following code at HistoryFileManager#getFileInfo expects jobListCache has the entry for the given job after scanIntermediateDirectory returns, which need block scanIfNeeded.

              // OK so scan the intermediate to be sure we did not lose it that way
              scanIntermediateDirectory();
              fileInfo = jobListCache.get(jobId);
              if (fileInfo != null) {
                return fileInfo;
              }
          

          Also the implementation of scanIfNeeded will make sure {{ scanIntermediateDirectory(p);}} will only be called once.

              if (modTime != newModTime) {
                  Path p = fs.getPath();
                  try {
                    scanIntermediateDirectory(p);
                    //If scanning fails, we will scan again.  We assume the failure is
                    // temporary.
                    modTime = newModTime;
                  } catch (IOException e) {
                    LOG.error("Error while trying to scan the directory " + p, e);
                  }
                } else {
                  if (LOG.isDebugEnabled()) {
                    LOG.debug("Scan not needed of " + fs.getPath());
                  }
                }
          

          So the performance overhead for scanIfNeeded won't be that much.

          We can make a patch to print less log message. The following logs are printed for HistoryFileInfo at both IN_INTERMEDIATE state and MOVE_FAILED state, Can we add two counters: one for IN_INTERMEDIATE and the other one for MOVE_FAILED?
          Also we can save the first key for HistoryFileInfo at state IN_INTERMEDIATE and the first key for HistoryFileInfo at state MOVE_FAILED, print these two keys in the logs.

                          } else {
                            LOG.warn("Waiting to remove " + key
                                + " from JobListCache because it is not in done yet.");
                          }
          
          Show
          zxu zhihai xu added a comment - Thanks for working on this issue Ryu Kobayashi ! It looks like the log will only be printed for the HistoryFileInfo at state IN_INTERMEDIATE or MOVE_FAILED . I think most of HistoryFileInfo should be at state IN_DONE . HistoryFileManager.addIfAbsent method produces 50000 - 20000 = 30000 lines of "Waiting to remove <key> from JobListCache because it is not in done yet" message The above statement may not be a valid case unless you have a performance issue at HDFS which cause HistoryFileInfo#moveToDone take very long time. The direct cause for your issue may be a HDFS performance issue. But we can improve the logs to print less message. About your patch, Changing scanIfNeeded to nonblocking may not be good because the following code at HistoryFileManager#getFileInfo expects jobListCache has the entry for the given job after scanIntermediateDirectory returns, which need block scanIfNeeded . // OK so scan the intermediate to be sure we did not lose it that way scanIntermediateDirectory(); fileInfo = jobListCache.get(jobId); if (fileInfo != null ) { return fileInfo; } Also the implementation of scanIfNeeded will make sure {{ scanIntermediateDirectory(p);}} will only be called once. if (modTime != newModTime) { Path p = fs.getPath(); try { scanIntermediateDirectory(p); //If scanning fails, we will scan again. We assume the failure is // temporary. modTime = newModTime; } catch (IOException e) { LOG.error( "Error while trying to scan the directory " + p, e); } } else { if (LOG.isDebugEnabled()) { LOG.debug( "Scan not needed of " + fs.getPath()); } } So the performance overhead for scanIfNeeded won't be that much. We can make a patch to print less log message. The following logs are printed for HistoryFileInfo at both IN_INTERMEDIATE state and MOVE_FAILED state, Can we add two counters: one for IN_INTERMEDIATE and the other one for MOVE_FAILED ? Also we can save the first key for HistoryFileInfo at state IN_INTERMEDIATE and the first key for HistoryFileInfo at state MOVE_FAILED , print these two keys in the logs. } else { LOG.warn( "Waiting to remove " + key + " from JobListCache because it is not in done yet." ); }
          Hide
          lewuathe Kai Sasaki added a comment -

          zhihai xu Hello, Zhihai. I'm sorry for late for responding. Ryu Kobayashi asked me to take over this JIRA. So I'll update current patch soon. Thank you.

          Show
          lewuathe Kai Sasaki added a comment - zhihai xu Hello, Zhihai. I'm sorry for late for responding. Ryu Kobayashi asked me to take over this JIRA. So I'll update current patch soon. Thank you.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 4s trunk passed
          +1 compile 0m 18s trunk passed with JDK v1.8.0_66
          +1 compile 0m 18s trunk passed with JDK v1.7.0_85
          +1 checkstyle 0m 10s trunk passed
          +1 mvnsite 0m 24s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 37s trunk passed
          +1 javadoc 0m 13s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 17s trunk passed with JDK v1.7.0_85
          +1 mvninstall 0m 22s the patch passed
          +1 compile 0m 17s the patch passed with JDK v1.8.0_66
          +1 javac 0m 17s the patch passed
          +1 compile 0m 19s the patch passed with JDK v1.7.0_85
          +1 javac 0m 19s the patch passed
          -1 checkstyle 0m 10s Patch generated 1 new checkstyle issues in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs (total was 16, now 17).
          +1 mvnsite 0m 25s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 findbugs 0m 45s the patch passed
          +1 javadoc 0m 14s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 16s the patch passed with JDK v1.7.0_85
          +1 unit 5m 48s hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_66.
          +1 unit 6m 12s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_85.
          -1 asflicense 0m 22s Patch generated 14 ASF License warnings.
          27m 12s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12745774/MAPREDUCE-6436.1.patch
          JIRA Issue MAPREDUCE-6436
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d5be684a9740 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3c4a34e
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
          whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/artifact/patchprocess/whitespace-eol.txt
          JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/testReport/
          asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs
          Max memory used 76MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 4s trunk passed +1 compile 0m 18s trunk passed with JDK v1.8.0_66 +1 compile 0m 18s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 10s trunk passed +1 mvnsite 0m 24s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 37s trunk passed +1 javadoc 0m 13s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 17s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 22s the patch passed +1 compile 0m 17s the patch passed with JDK v1.8.0_66 +1 javac 0m 17s the patch passed +1 compile 0m 19s the patch passed with JDK v1.7.0_85 +1 javac 0m 19s the patch passed -1 checkstyle 0m 10s Patch generated 1 new checkstyle issues in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs (total was 16, now 17). +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 13s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 0m 45s the patch passed +1 javadoc 0m 14s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 16s the patch passed with JDK v1.7.0_85 +1 unit 5m 48s hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_66. +1 unit 6m 12s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_85. -1 asflicense 0m 22s Patch generated 14 ASF License warnings. 27m 12s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12745774/MAPREDUCE-6436.1.patch JIRA Issue MAPREDUCE-6436 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d5be684a9740 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3c4a34e findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/artifact/patchprocess/whitespace-eol.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/testReport/ asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6199/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 44s trunk passed
          +1 compile 0m 19s trunk passed with JDK v1.8.0_66
          +1 compile 0m 20s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 10s trunk passed
          +1 mvnsite 0m 28s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 39s trunk passed
          +1 javadoc 0m 15s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 18s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 23s the patch passed
          +1 compile 0m 20s the patch passed with JDK v1.8.0_66
          +1 javac 0m 20s the patch passed
          +1 compile 0m 20s the patch passed with JDK v1.7.0_91
          +1 javac 0m 20s the patch passed
          -1 checkstyle 0m 10s Patch generated 5 new checkstyle issues in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs (total was 16, now 21).
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 0m 48s the patch passed
          +1 javadoc 0m 16s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 17s the patch passed with JDK v1.7.0_91
          +1 unit 6m 17s hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_66.
          +1 unit 6m 19s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_91.
          -1 asflicense 0m 24s Patch generated 14 ASF License warnings.
          28m 53s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775224/MAPREDUCE-6436.2.patch
          JIRA Issue MAPREDUCE-6436
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 421014cad788 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3c4a34e
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/testReport/
          asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs
          Max memory used 76MB
          Powered by Apache Yetus http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 44s trunk passed +1 compile 0m 19s trunk passed with JDK v1.8.0_66 +1 compile 0m 20s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 10s trunk passed +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 39s trunk passed +1 javadoc 0m 15s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 18s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 23s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_66 +1 javac 0m 20s the patch passed +1 compile 0m 20s the patch passed with JDK v1.7.0_91 +1 javac 0m 20s the patch passed -1 checkstyle 0m 10s Patch generated 5 new checkstyle issues in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs (total was 16, now 21). +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 48s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 17s the patch passed with JDK v1.7.0_91 +1 unit 6m 17s hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_66. +1 unit 6m 19s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_91. -1 asflicense 0m 24s Patch generated 14 ASF License warnings. 28m 53s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775224/MAPREDUCE-6436.2.patch JIRA Issue MAPREDUCE-6436 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 421014cad788 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3c4a34e findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/testReport/ asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6200/console This message was automatically generated.
          Hide
          lewuathe Kai Sasaki added a comment -

          zhihai xu Sorry for bothering you again. Could you review this?

          Show
          lewuathe Kai Sasaki added a comment - zhihai xu Sorry for bothering you again. Could you review this?
          Hide
          zxu zhihai xu added a comment -

          Kai Sasaki, thanks for working on this issue. About the patch, We don't need to calculate the count for the entries being removed.
          Can we do all the calculations in the else section:

          if(firstValue.didMoveFail() &&
                              firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
          .......
          } else {
            if (firstValue.didMoveFail()) {
              if (moveFailedCount == 0) {
                firstMoveFailedKey = key;
              }
              moveFailedCount += 1;
            } else {
              if (inIntermediateCount == 0) {
                firstInIntermediateKey = key;
              }
              inIntermediateCount += 1;
            }
          }
          
          Show
          zxu zhihai xu added a comment - Kai Sasaki , thanks for working on this issue. About the patch, We don't need to calculate the count for the entries being removed. Can we do all the calculations in the else section: if (firstValue.didMoveFail() && firstValue.jobIndexInfo.getFinishTime() <= cutoff) { ....... } else { if (firstValue.didMoveFail()) { if (moveFailedCount == 0) { firstMoveFailedKey = key; } moveFailedCount += 1; } else { if (inIntermediateCount == 0) { firstInIntermediateKey = key; } inIntermediateCount += 1; } }
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 20s trunk passed
          +1 compile 0m 18s trunk passed with JDK v1.8.0_66
          +1 compile 0m 19s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 11s trunk passed
          +1 mvnsite 0m 25s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 39s trunk passed
          +1 javadoc 0m 15s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 17s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 22s the patch passed
          +1 compile 0m 17s the patch passed with JDK v1.8.0_66
          +1 javac 0m 17s the patch passed
          +1 compile 0m 19s the patch passed with JDK v1.7.0_91
          +1 javac 0m 19s the patch passed
          -1 checkstyle 0m 11s Patch generated 5 new checkstyle issues in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs (total was 16, now 21).
          +1 mvnsite 0m 25s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 0m 47s the patch passed
          +1 javadoc 0m 15s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 17s the patch passed with JDK v1.7.0_91
          +1 unit 5m 55s hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_66.
          +1 unit 6m 12s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_91.
          -1 asflicense 0m 23s Patch generated 14 ASF License warnings.
          27m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777379/MAPREDUCE-6436.3.patch
          JIRA Issue MAPREDUCE-6436
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0098dd90cfb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 7fb212e
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/testReport/
          asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs
          Max memory used 75MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 20s trunk passed +1 compile 0m 18s trunk passed with JDK v1.8.0_66 +1 compile 0m 19s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 11s trunk passed +1 mvnsite 0m 25s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 39s trunk passed +1 javadoc 0m 15s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 17s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 22s the patch passed +1 compile 0m 17s the patch passed with JDK v1.8.0_66 +1 javac 0m 17s the patch passed +1 compile 0m 19s the patch passed with JDK v1.7.0_91 +1 javac 0m 19s the patch passed -1 checkstyle 0m 11s Patch generated 5 new checkstyle issues in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs (total was 16, now 21). +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 47s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 17s the patch passed with JDK v1.7.0_91 +1 unit 5m 55s hadoop-mapreduce-client-hs in the patch passed with JDK v1.8.0_66. +1 unit 6m 12s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_91. -1 asflicense 0m 23s Patch generated 14 ASF License warnings. 27m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777379/MAPREDUCE-6436.3.patch JIRA Issue MAPREDUCE-6436 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0098dd90cfb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7fb212e findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/testReport/ asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs Max memory used 75MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6207/console This message was automatically generated.
          Hide
          zxu zhihai xu added a comment -

          Thanks for updating the patch Kai Sasaki! the new patch looks good except the checkstyle issue.

          ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265: Line is longer than 80 characters (found 97).
          ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267: Line is longer than 80 characters (found 102).
          ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268: Line is longer than 80 characters (found 118).
          ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271: Line is longer than 80 characters (found 94).
          ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272: Line is longer than 80 characters (found 114).
          

          Could you fix the above checkstyle issue?

          Show
          zxu zhihai xu added a comment - Thanks for updating the patch Kai Sasaki ! the new patch looks good except the checkstyle issue. ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265: Line is longer than 80 characters (found 97). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267: Line is longer than 80 characters (found 102). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268: Line is longer than 80 characters (found 118). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271: Line is longer than 80 characters (found 94). ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272: Line is longer than 80 characters (found 114). Could you fix the above checkstyle issue?
          Hide
          djp Junping Du added a comment -

          I think the impact of this issue could be more severe than our description above: "In addition to large disk consumption, this issue blocks JobHistory.getJob() long time and slows job execution down significantly because getJob is called by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When multiple threads call scanIfNeeded simultaneously, one of them acquires lock and the other threads are blocked until the first thread completes long-running HistoryFileManager.addIfAbsent call. "
          It could cause JHS serious OOM because REST call of getJobs() could get blocked with some getJob() while unexpected caching other completedJob() in previous calls. Isn' it? zhihai xu and Kai Sasaki, may be we should set this JIRA as a blocker for 2.6.4 and 2.7.3?

          Show
          djp Junping Du added a comment - I think the impact of this issue could be more severe than our description above: "In addition to large disk consumption, this issue blocks JobHistory.getJob() long time and slows job execution down significantly because getJob is called by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When multiple threads call scanIfNeeded simultaneously, one of them acquires lock and the other threads are blocked until the first thread completes long-running HistoryFileManager.addIfAbsent call. " It could cause JHS serious OOM because REST call of getJobs() could get blocked with some getJob() while unexpected caching other completedJob() in previous calls. Isn' it? zhihai xu and Kai Sasaki , may be we should set this JIRA as a blocker for 2.6.4 and 2.7.3?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 7m 28s trunk passed
          +1 compile 0m 15s trunk passed with JDK v1.8.0_66
          +1 compile 0m 18s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 9s trunk passed
          +1 mvnsite 0m 24s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 34s trunk passed
          +1 javadoc 0m 13s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 16s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 21s the patch passed
          +1 compile 0m 15s the patch passed with JDK v1.8.0_66
          +1 javac 0m 15s the patch passed
          +1 compile 0m 18s the patch passed with JDK v1.7.0_91
          +1 javac 0m 18s the patch passed
          +1 checkstyle 0m 8s the patch passed
          +1 mvnsite 0m 24s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 0m 42s the patch passed
          +1 javadoc 0m 12s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 16s the patch passed with JDK v1.7.0_91
          -1 unit 5m 32s hadoop-mapreduce-client-hs in the patch failed with JDK v1.8.0_66.
          +1 unit 5m 58s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_91.
          -1 asflicense 0m 20s Patch generated 14 ASF License warnings.
          25m 38s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.mapreduce.v2.hs.TestHistoryFileManager



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777649/MAPREDUCE-6436.4.patch
          JIRA Issue MAPREDUCE-6436
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ca50a68ad968 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d8a4542
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdk1.8.0_66.txt
          unit test logs https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdk1.8.0_66.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/testReport/
          asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs
          Max memory used 75MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 28s trunk passed +1 compile 0m 15s trunk passed with JDK v1.8.0_66 +1 compile 0m 18s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 9s trunk passed +1 mvnsite 0m 24s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 34s trunk passed +1 javadoc 0m 13s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 16s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 21s the patch passed +1 compile 0m 15s the patch passed with JDK v1.8.0_66 +1 javac 0m 15s the patch passed +1 compile 0m 18s the patch passed with JDK v1.7.0_91 +1 javac 0m 18s the patch passed +1 checkstyle 0m 8s the patch passed +1 mvnsite 0m 24s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 42s the patch passed +1 javadoc 0m 12s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 16s the patch passed with JDK v1.7.0_91 -1 unit 5m 32s hadoop-mapreduce-client-hs in the patch failed with JDK v1.8.0_66. +1 unit 5m 58s hadoop-mapreduce-client-hs in the patch passed with JDK v1.7.0_91. -1 asflicense 0m 20s Patch generated 14 ASF License warnings. 25m 38s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.mapreduce.v2.hs.TestHistoryFileManager Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777649/MAPREDUCE-6436.4.patch JIRA Issue MAPREDUCE-6436 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ca50a68ad968 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d8a4542 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdk1.8.0_66.txt unit test logs https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-jdk1.8.0_66.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/testReport/ asflicense https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs Max memory used 75MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6208/console This message was automatically generated.
          Hide
          lewuathe Kai Sasaki added a comment -

          Junping Du Yes, as described above, scanIfNeeded slowness makes HistroyClientServer.HDSClientProtocolHandler.getJobReport slow that is called from job client. In some cases, it causes a performance issue of the job.
          But usually retuned from JobListCached retained by HistoryFileManager in this case scanIntermediateDirectory won't be required. So we cannot say that the performance issue is occurred immediately if there are a lot of failed and pending job logs in intermediate directory.
          I'm not sure we should set the JIRA as a blocker or not though...

          Show
          lewuathe Kai Sasaki added a comment - Junping Du Yes, as described above, scanIfNeeded slowness makes HistroyClientServer.HDSClientProtocolHandler.getJobReport slow that is called from job client. In some cases, it causes a performance issue of the job. But usually retuned from JobListCached retained by HistoryFileManager in this case scanIntermediateDirectory won't be required. So we cannot say that the performance issue is occurred immediately if there are a lot of failed and pending job logs in intermediate directory. I'm not sure we should set the JIRA as a blocker or not though...
          Hide
          zxu zhihai xu added a comment -

          Thanks Junping Du and Kai Sasaki! I changed it to a blocker, because it may let more people notice this potential performance issue.
          +1 for the latest patch. Will commit it shortly.

          Show
          zxu zhihai xu added a comment - Thanks Junping Du and Kai Sasaki ! I changed it to a blocker, because it may let more people notice this potential performance issue. +1 for the latest patch. Will commit it shortly.
          Hide
          zxu zhihai xu added a comment -

          Committed it to trunk, branch-2, branch-2.6 and branch-2.7! Thanks Kai Sasaki for the contributions! Thanks Junping Du for the additional review!

          Show
          zxu zhihai xu added a comment - Committed it to trunk, branch-2, branch-2.6 and branch-2.7! Thanks Kai Sasaki for the contributions! Thanks Junping Du for the additional review!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8968 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8968/)
          MAPREDUCE-6436. JobHistory cache issue. Contributed by Kai Sasaki (zxu: rev 5b7078d06921893200163a3d29c8901c3c0107cb)

          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8968 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8968/ ) MAPREDUCE-6436 . JobHistory cache issue. Contributed by Kai Sasaki (zxu: rev 5b7078d06921893200163a3d29c8901c3c0107cb) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #694 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/694/)
          MAPREDUCE-6436. JobHistory cache issue. Contributed by Kai Sasaki (zxu: rev 5b7078d06921893200163a3d29c8901c3c0107cb)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #694 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/694/ ) MAPREDUCE-6436 . JobHistory cache issue. Contributed by Kai Sasaki (zxu: rev 5b7078d06921893200163a3d29c8901c3c0107cb) hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java
          Hide
          aw Allen Wittenauer added a comment -

          Committed it to trunk, branch-2, branch-2.6 and branch-2.7!

          You missed branch-2.8...

          Show
          aw Allen Wittenauer added a comment - Committed it to trunk, branch-2, branch-2.6 and branch-2.7! You missed branch-2.8...
          Hide
          zxu zhihai xu added a comment -

          Thanks for the finding Allen Wittenauer, Just know we branched out 2.8. Will commit it to branch-2.8 shortly.

          Show
          zxu zhihai xu added a comment - Thanks for the finding Allen Wittenauer , Just know we branched out 2.8. Will commit it to branch-2.8 shortly.
          Hide
          lewuathe Kai Sasaki added a comment -

          zhihai xu Thank you so much!

          Do we need to create another JIRA as a follow up of optimization of JobHistoryServer RPC to implement below?

          Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
          not blocked by a loop at scale of tens of thousands.

          Show
          lewuathe Kai Sasaki added a comment - zhihai xu Thank you so much! Do we need to create another JIRA as a follow up of optimization of JobHistoryServer RPC to implement below? Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are not blocked by a loop at scale of tens of thousands.
          Hide
          zxu zhihai xu added a comment -

          Thanks Kai Sasaki for suggestion! There is a task MoveIntermediateToDoneRunnable which will call scanIntermediateDirectory periodically. So most time the job will be found in the cache jobListCache. Also making scanIfNeeded asynchronous may change the functionality in RPC calls: cannot find the job information which can be found before. I think about the other way to improve the performance which can decrease the times to call scanIntermediateDirectory:
          In getFileInfo, add scanOldDirsForJob before scanIntermediateDirectory, which means calling scanOldDirsForJob twice:
          one is before scanIntermediateDirectory, the other is after scanIntermediateDirectory.

            public HistoryFileInfo getFileInfo(JobId jobId) throws IOException {
              // FileInfo available in cache.
              HistoryFileInfo fileInfo = jobListCache.get(jobId);
              if (fileInfo != null) {
                return fileInfo;
              }
              // call scanOldDirsForJob before scanIntermediateDirectory
              fileInfo = scanOldDirsForJob(jobId);
              if (fileInfo != null) {
                return fileInfo;
              }
          
              // OK so scan the intermediate to be sure we did not lose it that way
              scanIntermediateDirectory();
              fileInfo = jobListCache.get(jobId);
              if (fileInfo != null) {
                return fileInfo;
              }
          
              // Intermediate directory does not contain job. Search through older ones.
              fileInfo = scanOldDirsForJob(jobId);
              if (fileInfo != null) {
                return fileInfo;
              }
              return null;
            }
          
          Show
          zxu zhihai xu added a comment - Thanks Kai Sasaki for suggestion! There is a task MoveIntermediateToDoneRunnable which will call scanIntermediateDirectory periodically. So most time the job will be found in the cache jobListCache . Also making scanIfNeeded asynchronous may change the functionality in RPC calls: cannot find the job information which can be found before. I think about the other way to improve the performance which can decrease the times to call scanIntermediateDirectory: In getFileInfo, add scanOldDirsForJob before scanIntermediateDirectory, which means calling scanOldDirsForJob twice: one is before scanIntermediateDirectory, the other is after scanIntermediateDirectory. public HistoryFileInfo getFileInfo(JobId jobId) throws IOException { // FileInfo available in cache. HistoryFileInfo fileInfo = jobListCache.get(jobId); if (fileInfo != null ) { return fileInfo; } // call scanOldDirsForJob before scanIntermediateDirectory fileInfo = scanOldDirsForJob(jobId); if (fileInfo != null ) { return fileInfo; } // OK so scan the intermediate to be sure we did not lose it that way scanIntermediateDirectory(); fileInfo = jobListCache.get(jobId); if (fileInfo != null ) { return fileInfo; } // Intermediate directory does not contain job. Search through older ones. fileInfo = scanOldDirsForJob(jobId); if (fileInfo != null ) { return fileInfo; } return null ; }
          Hide
          lewuathe Kai Sasaki added a comment -

          zhihai xu Thanks for clarifying. I created another JIRA for reducing unnecessary call of scanIntermediateDirectory. MAPREDUCE-6573
          Please discuss on the JIRA. If it is not necessary, it can be voided. Thanks anyway.

          Show
          lewuathe Kai Sasaki added a comment - zhihai xu Thanks for clarifying. I created another JIRA for reducing unnecessary call of scanIntermediateDirectory. MAPREDUCE-6573 Please discuss on the JIRA. If it is not necessary, it can be voided. Thanks anyway.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8973 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8973/)
          Update CHANGES.txt to move MAPREDUCE-6436 from YARN to MAPREDUCE (zxu: rev 7092d47fc0b3b792dd31f967c01d460dc089f60b)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8973 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8973/ ) Update CHANGES.txt to move MAPREDUCE-6436 from YARN to MAPREDUCE (zxu: rev 7092d47fc0b3b792dd31f967c01d460dc089f60b) hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/CHANGES.txt
          Hide
          zxu zhihai xu added a comment -

          Just committed it to branch-2.8!

          Show
          zxu zhihai xu added a comment - Just committed it to branch-2.8!
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #698 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/698/)
          Update CHANGES.txt to move MAPREDUCE-6436 from YARN to MAPREDUCE (zxu: rev 7092d47fc0b3b792dd31f967c01d460dc089f60b)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #698 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/698/ ) Update CHANGES.txt to move MAPREDUCE-6436 from YARN to MAPREDUCE (zxu: rev 7092d47fc0b3b792dd31f967c01d460dc089f60b) hadoop-yarn-project/CHANGES.txt hadoop-mapreduce-project/CHANGES.txt
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

            People

            • Assignee:
              lewuathe Kai Sasaki
              Reporter:
              ryu_kobayashi Ryu Kobayashi
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development