Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4414

Nodemanager connection errors are retried at multiple levels

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      This is related to YARN-3238. Ran into more scenarios where connection errors are being retried at multiple levels, like NoRouteToHostException. The fix for YARN-3238 was too specific, and I think we need a more general solution to catch a wider array of connection errors that can occur to avoid retrying them both at the RPC layer and at the NM proxy layer.

      1. YARN-4414.1.patch
        6 kB
        Chang Li
      2. YARN-4414.1.2.patch
        6 kB
        Chang Li
      3. YARN-4414.1.2.patch
        6 kB
        Chang Li
      4. YARN-4414.1.3.patch
        6 kB
        Chang Li
      5. YARN-4414.2.patch
        6 kB
        Chang Li
      6. YARN-4414.3.patch
        6 kB
        Chang Li

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9094 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9094/)
          YARN-4414. Nodemanager connection errors are retried at multiple levels. (jlowe: rev 13de8359a1c6d9fc78cd5013c860c1086d86176f)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9094 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9094/ ) YARN-4414 . Nodemanager connection errors are retried at multiple levels. (jlowe: rev 13de8359a1c6d9fc78cd5013c860c1086d86176f) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java
          Hide
          jlowe Jason Lowe added a comment -

          Thanks, Chang! I committed this to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6.

          Show
          jlowe Jason Lowe added a comment - Thanks, Chang! I committed this to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6.
          Hide
          jlowe Jason Lowe added a comment -

          +1 committing this.

          Show
          jlowe Jason Lowe added a comment - +1 committing this.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 50s trunk passed
          +1 compile 2m 13s trunk passed with JDK v1.8.0_66
          +1 compile 2m 20s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 1m 7s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 2m 41s trunk passed
          +1 javadoc 0m 59s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 3s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 57s the patch passed
          +1 compile 2m 31s the patch passed with JDK v1.8.0_66
          +1 javac 2m 31s the patch passed
          +1 compile 2m 29s the patch passed with JDK v1.7.0_91
          +1 javac 2m 29s the patch passed
          +1 checkstyle 0m 31s the patch passed
          +1 mvnsite 1m 6s the patch passed
          +1 mvneclipse 0m 25s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 49s the patch passed
          +1 javadoc 0m 54s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 0s the patch passed with JDK v1.7.0_91
          +1 unit 2m 30s hadoop-yarn-common in the patch passed with JDK v1.8.0_66.
          +1 unit 9m 16s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 2m 20s hadoop-yarn-common in the patch passed with JDK v1.7.0_91.
          +1 unit 9m 30s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 26s Patch does not generate ASF License warnings.
          58m 25s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781685/YARN-4414.3.patch
          JIRA Issue YARN-4414
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c0c93733bbb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / de37f37
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10234/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10234/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 50s trunk passed +1 compile 2m 13s trunk passed with JDK v1.8.0_66 +1 compile 2m 20s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 31s trunk passed +1 mvnsite 1m 7s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 2m 41s trunk passed +1 javadoc 0m 59s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 3s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 57s the patch passed +1 compile 2m 31s the patch passed with JDK v1.8.0_66 +1 javac 2m 31s the patch passed +1 compile 2m 29s the patch passed with JDK v1.7.0_91 +1 javac 2m 29s the patch passed +1 checkstyle 0m 31s the patch passed +1 mvnsite 1m 6s the patch passed +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 49s the patch passed +1 javadoc 0m 54s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 0s the patch passed with JDK v1.7.0_91 +1 unit 2m 30s hadoop-yarn-common in the patch passed with JDK v1.8.0_66. +1 unit 9m 16s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 2m 20s hadoop-yarn-common in the patch passed with JDK v1.7.0_91. +1 unit 9m 30s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 26s Patch does not generate ASF License warnings. 58m 25s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781685/YARN-4414.3.patch JIRA Issue YARN-4414 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c0c93733bbb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / de37f37 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10234/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10234/console This message was automatically generated.
          Hide
          lichangleo Chang Li added a comment -

          Thanks Jason Lowe for spotting that! updated .3 to remove the comment

          Show
          lichangleo Chang Li added a comment - Thanks Jason Lowe for spotting that! updated .3 to remove the comment
          Hide
          jlowe Jason Lowe added a comment -

          Some commented code that was left in the new unit test:

          +    //shouldThrowNMNotYetReadyException = true;
          

          Otherwise patch looks good to me.

          Show
          jlowe Jason Lowe added a comment - Some commented code that was left in the new unit test: + //shouldThrowNMNotYetReadyException = true ; Otherwise patch looks good to me.
          Hide
          lichangleo Chang Li added a comment -

          Jason Lowe, could you help review the latest patch? Thx

          Show
          lichangleo Chang Li added a comment - Jason Lowe , could you help review the latest patch? Thx
          Hide
          lichangleo Chang Li added a comment -

          Hi Xianyin Xin, RM HA already disables IPC retries. Also client should try to connect to RM really hard because it's catastrophic failure if it doesn't. Failure to connect to a NM is not. I think we should just make change for NMProxy in this jira.

          Show
          lichangleo Chang Li added a comment - Hi Xianyin Xin , RM HA already disables IPC retries. Also client should try to connect to RM really hard because it's catastrophic failure if it doesn't. Failure to connect to a NM is not. I think we should just make change for NMProxy in this jira.
          Hide
          xinxianyin Xianyin Xin added a comment -

          Hi Chang Li, need we also revisit the two layer retries in RMProxy? IIUC, the proxy layer will retry upto 15 min with a retry interval 30 sec, but at the background, the RM proxy will calculate a max retry times by the two values. The time consuming of IPC layer retry is more than 1 sec, and by default retry 10 times, the result of which is the actual total wait time is 15 min + 15 / 0.5 * 10 * (more than 1 sec), which is much more than 15 min.

          Show
          xinxianyin Xianyin Xin added a comment - Hi Chang Li , need we also revisit the two layer retries in RMProxy ? IIUC, the proxy layer will retry upto 15 min with a retry interval 30 sec, but at the background, the RM proxy will calculate a max retry times by the two values. The time consuming of IPC layer retry is more than 1 sec, and by default retry 10 times, the result of which is the actual total wait time is 15 min + 15 / 0.5 * 10 * (more than 1 sec), which is much more than 15 min.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 17s trunk passed
          +1 compile 2m 8s trunk passed with JDK v1.8.0_66
          +1 compile 2m 17s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 1m 3s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 2m 24s trunk passed
          +1 javadoc 0m 57s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 58s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 55s the patch passed
          +1 compile 2m 6s the patch passed with JDK v1.8.0_66
          +1 javac 2m 6s the patch passed
          +1 compile 2m 20s the patch passed with JDK v1.7.0_91
          +1 javac 2m 20s the patch passed
          +1 checkstyle 0m 30s the patch passed
          +1 mvnsite 0m 59s the patch passed
          +1 mvneclipse 0m 22s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 35s the patch passed
          +1 javadoc 0m 48s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 54s the patch passed with JDK v1.7.0_91
          +1 unit 2m 0s hadoop-yarn-common in the patch passed with JDK v1.8.0_66.
          +1 unit 8m 46s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 2m 14s hadoop-yarn-common in the patch passed with JDK v1.7.0_91.
          +1 unit 9m 10s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          54m 23s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781032/YARN-4414.2.patch
          JIRA Issue YARN-4414
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e3615dc81c84 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 52b7757
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10193/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10193/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 17s trunk passed +1 compile 2m 8s trunk passed with JDK v1.8.0_66 +1 compile 2m 17s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 30s trunk passed +1 mvnsite 1m 3s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 2m 24s trunk passed +1 javadoc 0m 57s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 58s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 55s the patch passed +1 compile 2m 6s the patch passed with JDK v1.8.0_66 +1 javac 2m 6s the patch passed +1 compile 2m 20s the patch passed with JDK v1.7.0_91 +1 javac 2m 20s the patch passed +1 checkstyle 0m 30s the patch passed +1 mvnsite 0m 59s the patch passed +1 mvneclipse 0m 22s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 35s the patch passed +1 javadoc 0m 48s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 54s the patch passed with JDK v1.7.0_91 +1 unit 2m 0s hadoop-yarn-common in the patch passed with JDK v1.8.0_66. +1 unit 8m 46s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 2m 14s hadoop-yarn-common in the patch passed with JDK v1.7.0_91. +1 unit 9m 10s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 54m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12781032/YARN-4414.2.patch JIRA Issue YARN-4414 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e3615dc81c84 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 52b7757 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10193/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10193/console This message was automatically generated.
          Hide
          lichangleo Chang Li added a comment -

          Thanks Jason Lowe for review!
          updated .2 patch to remove getNMProxy2 and implemented getProxy() in term of getProxy(Configuration).
          I set NM address to some dummy value 1234 so that it will trigger connection error and rpc level retires.
          BaseContainerManagerTest set it to

          "0.0.0.0:" + ServerSocketUtil.getPort(49162, 10); 

          a normal address thus rpc retry could not be triggered

          Show
          lichangleo Chang Li added a comment - Thanks Jason Lowe for review! updated .2 patch to remove getNMProxy2 and implemented getProxy() in term of getProxy(Configuration). I set NM address to some dummy value 1234 so that it will trigger connection error and rpc level retires. BaseContainerManagerTest set it to "0.0.0.0:" + ServerSocketUtil.getPort(49162, 10); a normal address thus rpc retry could not be triggered
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the patch, Chang! I'm a bit curious on the naming convention of the patches. Why .1.2 and .1.3 instead of just .2 and .3? In the future, I'd recommend using the patch naming conventions as described in http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch to be consistent with other contributors and help reduce confusion.

          As for the patch the main change looks OK to me, but I have some nits with the test:

          • Why are we explicitly setting the NM port to 1234? Shouldn't we inherit the same NM port setting from the base conf as the other connection retry tests already do?
          • getNMProxy2 should just be getNMProxy, overloaded for the Configuration parameter.
          • Rather than copying the entire method, getProxy() should be implemented in terms of getProxy(Configuration).
          Show
          jlowe Jason Lowe added a comment - Thanks for the patch, Chang! I'm a bit curious on the naming convention of the patches. Why .1.2 and .1.3 instead of just .2 and .3? In the future, I'd recommend using the patch naming conventions as described in http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch to be consistent with other contributors and help reduce confusion. As for the patch the main change looks OK to me, but I have some nits with the test: Why are we explicitly setting the NM port to 1234? Shouldn't we inherit the same NM port setting from the base conf as the other connection retry tests already do? getNMProxy2 should just be getNMProxy, overloaded for the Configuration parameter. Rather than copying the entire method, getProxy() should be implemented in terms of getProxy(Configuration).
          Hide
          lichangleo Chang Li added a comment -

          oops, my bad, intended to name latest patch as .1.3.
          removed the .2.2 patch and re-upload the latest as .1.3

          Show
          lichangleo Chang Li added a comment - oops, my bad, intended to name latest patch as .1.3. removed the .2.2 patch and re-upload the latest as .1.3
          Hide
          lichangleo Chang Li added a comment -

          .2.2 fix the white space issue

          Show
          lichangleo Chang Li added a comment - .2.2 fix the white space issue
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 56s trunk passed
          +1 compile 1m 47s trunk passed with JDK v1.8.0_66
          +1 compile 2m 4s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 0m 59s trunk passed
          +1 mvneclipse 0m 27s trunk passed
          +1 findbugs 2m 11s trunk passed
          +1 javadoc 0m 46s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 54s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 55s the patch passed
          +1 compile 1m 43s the patch passed with JDK v1.8.0_66
          +1 javac 1m 43s the patch passed
          +1 compile 2m 4s the patch passed with JDK v1.7.0_91
          +1 javac 2m 4s the patch passed
          +1 checkstyle 0m 27s the patch passed
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 findbugs 2m 25s the patch passed
          +1 javadoc 0m 44s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 53s the patch passed with JDK v1.7.0_91
          +1 unit 1m 52s hadoop-yarn-common in the patch passed with JDK v1.8.0_66.
          +1 unit 8m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 2m 7s hadoop-yarn-common in the patch passed with JDK v1.7.0_91.
          +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 23s Patch does not generate ASF License warnings.
          51m 26s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777496/YARN-4414.1.2.patch
          JIRA Issue YARN-4414
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8103ad6e33cd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 7fb212e
          findbugs v3.0.0
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/9966/artifact/patchprocess/whitespace-eol.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9966/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Max memory used 76MB
          Powered by Apache Yetus 0.1.0 http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/9966/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 56s trunk passed +1 compile 1m 47s trunk passed with JDK v1.8.0_66 +1 compile 2m 4s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 27s trunk passed +1 mvnsite 0m 59s trunk passed +1 mvneclipse 0m 27s trunk passed +1 findbugs 2m 11s trunk passed +1 javadoc 0m 46s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 54s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 55s the patch passed +1 compile 1m 43s the patch passed with JDK v1.8.0_66 +1 javac 1m 43s the patch passed +1 compile 2m 4s the patch passed with JDK v1.7.0_91 +1 javac 2m 4s the patch passed +1 checkstyle 0m 27s the patch passed +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 26s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 2m 25s the patch passed +1 javadoc 0m 44s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 53s the patch passed with JDK v1.7.0_91 +1 unit 1m 52s hadoop-yarn-common in the patch passed with JDK v1.8.0_66. +1 unit 8m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 2m 7s hadoop-yarn-common in the patch passed with JDK v1.7.0_91. +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 23s Patch does not generate ASF License warnings. 51m 26s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777496/YARN-4414.1.2.patch JIRA Issue YARN-4414 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8103ad6e33cd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7fb212e findbugs v3.0.0 whitespace https://builds.apache.org/job/PreCommit-YARN-Build/9966/artifact/patchprocess/whitespace-eol.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/9966/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Max memory used 76MB Powered by Apache Yetus 0.1.0 http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/9966/console This message was automatically generated.
          Hide
          lichangleo Chang Li added a comment -

          Jason Lowe, .1 patch disable rpc retries on NM server, and it adds back the retry for ConnectTimeoutException to NM proxy layer

          Show
          lichangleo Chang Li added a comment - Jason Lowe , .1 patch disable rpc retries on NM server, and it adds back the retry for ConnectTimeoutException to NM proxy layer
          Hide
          jlowe Jason Lowe added a comment -

          I noticed that HA proxies for the namenode and resourcemanager explicitly disable the connection retries in the RPC layer by default since it knows the HA proxy will do the retries. I think the same should apply for nodemanager proxies, since we're seeing even connection timeouts retried too often in the RPC layer given a container allocation is worthless after 10 minutes by default. By disabling retries in the RPC layer, we can add ConnectTimeoutException back to the list of exceptions retried at the NM proxy layer and simply retry all appropriate exceptions at the NM proxy layer.

          Show
          jlowe Jason Lowe added a comment - I noticed that HA proxies for the namenode and resourcemanager explicitly disable the connection retries in the RPC layer by default since it knows the HA proxy will do the retries. I think the same should apply for nodemanager proxies, since we're seeing even connection timeouts retried too often in the RPC layer given a container allocation is worthless after 10 minutes by default. By disabling retries in the RPC layer, we can add ConnectTimeoutException back to the list of exceptions retried at the NM proxy layer and simply retry all appropriate exceptions at the NM proxy layer.

            People

            • Assignee:
              lichangleo Chang Li
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development