Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3646

Applications are getting stuck some times in case of retry policy forever

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.1, 3.0.0-alpha1
    • Component/s: client
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We have set yarn.resourcemanager.connect.wait-ms to -1 to use FOREVER retry policy.

      Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further.

      Yarn client should not retry infinitely in case of non connection failures.

      We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously.

      private void testYarnClientRetryPolicy() throws  Exception{
              YarnConfiguration conf = new YarnConfiguration();
              conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1);
              YarnClient yarnClient = YarnClient.createYarnClient();
              yarnClient.init(conf);
              yarnClient.start();
              ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645);
              ApplicationReport report = yarnClient.getApplicationReport(appId);
          }
      

      RM logs:

      15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0
      org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM.
      	at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
      	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
      	at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
      	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
      	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
      
      ....
      
      15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0
      ....
      
      
      1. YARN-3646.patch
        6 kB
        Raju Bairishetti
      2. YARN-3646.002.patch
        3 kB
        Raju Bairishetti
      3. YARN-3646.001.patch
        8 kB
        Raju Bairishetti

        Issue Links

          Activity

          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Which version of Hadoop are you using? I don't see this problem in trunk or branch-2.

          Show
          rohithsharma Rohith Sharma K S added a comment - Which version of Hadoop are you using? I don't see this problem in trunk or branch-2.
          Hide
          raju.bairishetti Raju Bairishetti added a comment -

          Thanks for the quick response.

          I have reproduced it with apache 2.6.0 release (HDP 2.2.4 distribution). We are using 2.5.0 version.

          We are not having exceptionToPolicyMap for FOREVER retrypolicy. Updating the exceptionToPolicyMap only for other retry policies.

          RetryPolicies.java

          static class RetryForever implements RetryPolicy {
              @Override
              public RetryAction shouldRetry(Exception e, int retries, int failovers,
                  boolean isIdempotentOrAtMostOnce) throws Exception {
                return RetryAction.RETRY;
              }
            }
          

          RMProxy.java

          if (waitForEver) {
                return RetryPolicies.RETRY_FOREVER;
              }
          
          ...
          
              Map<Class<? extends Exception>, RetryPolicy> exceptionToPolicyMap =
                  new HashMap<Class<? extends Exception>, RetryPolicy>();
          
          Show
          raju.bairishetti Raju Bairishetti added a comment - Thanks for the quick response. I have reproduced it with apache 2.6.0 release (HDP 2.2.4 distribution). We are using 2.5.0 version. We are not having exceptionToPolicyMap for FOREVER retrypolicy. Updating the exceptionToPolicyMap only for other retry policies. RetryPolicies.java static class RetryForever implements RetryPolicy { @Override public RetryAction shouldRetry(Exception e, int retries, int failovers, boolean isIdempotentOrAtMostOnce) throws Exception { return RetryAction.RETRY; } } RMProxy.java if (waitForEver) { return RetryPolicies.RETRY_FOREVER; } ... Map< Class <? extends Exception>, RetryPolicy> exceptionToPolicyMap = new HashMap< Class <? extends Exception>, RetryPolicy>();
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks for the explanation.. I got the problem in my machines too. Last time when I test, the configuration settings had issue.

          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks for the explanation.. I got the problem in my machines too. Last time when I test, the configuration settings had issue.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          RetryPolicies.RETRY_FOREVER should also should use exceptionToPolicyMap.
          Raju Bairishetti Feel free to take up this JIRA.

          Show
          rohithsharma Rohith Sharma K S added a comment - RetryPolicies.RETRY_FOREVER should also should use exceptionToPolicyMap. Raju Bairishetti Feel free to take up this JIRA.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I was copied yarn.resourcemanager.connect.wait-ms from description but actual configuration is yarn.resourcemanager.connect.max-wait.ms.

          Show
          rohithsharma Rohith Sharma K S added a comment - I was copied yarn.resourcemanager.connect.wait-ms from description but actual configuration is yarn.resourcemanager.connect.max-wait.ms .
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy is not sufficient, but also RetryPolicies.RetryForever.shouldRetry() should check for Connect exceptions and handle it. Otherwise shouldRetry always return RetryAction.RETRY action.

          Show
          rohithsharma Rohith Sharma K S added a comment - Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy is not sufficient, but also RetryPolicies.RetryForever.shouldRetry() should check for Connect exceptions and handle it. Otherwise shouldRetry always return RetryAction.RETRY action.
          Hide
          devaraj.k Devaraj K added a comment -

          You can probably avoid this situation by setting a bigger value for "yarn.resourcemanager.connect.max-wait.ms"(like below) if you want to wait for long time to establish a connection to RM with retries.

              conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, Integer.MAX_VALUE);
          

          Anyway it seems this issue needs to be fixed.

          Show
          devaraj.k Devaraj K added a comment - You can probably avoid this situation by setting a bigger value for "yarn.resourcemanager.connect.max-wait.ms"(like below) if you want to wait for long time to establish a connection to RM with retries. conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, Integer.MAX_VALUE); Anyway it seems this issue needs to be fixed.
          Hide
          sriksun Srikanth Sundarrajan added a comment -

          You can probably avoid this situation by setting a bigger value

          Would this not cause the client to wait for too long (well after the rm has come back online)

          Show
          sriksun Srikanth Sundarrajan added a comment - You can probably avoid this situation by setting a bigger value Would this not cause the client to wait for too long (well after the rm has come back online)
          Hide
          devaraj.k Devaraj K added a comment -

          Would this not cause the client to wait for too long (well after the rm has come back online)

          "yarn.resourcemanager.connect.max-wait.ms" is the max time to wait to establish a connection to RM, If the RM comes online before this time it will connect immediately. IPC client would be internally retrying to connect RM for every "yarn.resourcemanager.connect.retry-interval.ms" (default value 30 * 1000) and exception will be thrown if it can't connect for "yarn.resourcemanager.connect.max-wait.ms".

          Show
          devaraj.k Devaraj K added a comment - Would this not cause the client to wait for too long (well after the rm has come back online) "yarn.resourcemanager.connect.max-wait.ms" is the max time to wait to establish a connection to RM, If the RM comes online before this time it will connect immediately. IPC client would be internally retrying to connect RM for every "yarn.resourcemanager.connect.retry-interval.ms" (default value 30 * 1000) and exception will be thrown if it can't connect for "yarn.resourcemanager.connect.max-wait.ms".
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Raju Bairishetti, would you like to provide a patch?

          /cc Xuan Gong, Jian He who wrote most of this code.

          Targeting 2.7.1/2.8.0, but more likely one is 2.8.0. Can see if we can get it into earlier releases too depending on their schedule.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Raju Bairishetti , would you like to provide a patch? /cc Xuan Gong , Jian He who wrote most of this code. Targeting 2.7.1/2.8.0, but more likely one is 2.8.0. Can see if we can get it into earlier releases too depending on their schedule.
          Hide
          raju.bairishetti Raju Bairishetti added a comment -

          Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy is not sufficient, but also RetryPolicies.RetryForever.shouldRetry() should check for Connect exceptions and handle it. Otherwise shouldRetry always return RetryAction.RETRY action.

          Do we need to catch exception in shouldRetry if we have separate exceptionToPolicy map which contains only connectionException entry. ( like exceptiontoPolicyMap.put(connectionException, FOREVER polcicy))

          Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method.

          thoughts?

          Show
          raju.bairishetti Raju Bairishetti added a comment - Setting RetryPolicies.RETRY_FOREVER for exceptionToPolicyMap as default policy is not sufficient, but also RetryPolicies.RetryForever.shouldRetry() should check for Connect exceptions and handle it. Otherwise shouldRetry always return RetryAction.RETRY action. Do we need to catch exception in shouldRetry if we have separate exceptionToPolicy map which contains only connectionException entry. ( like exceptiontoPolicyMap.put(connectionException, FOREVER polcicy)) Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method. thoughts?
          Hide
          raju.bairishetti Raju Bairishetti added a comment -

          Vinod Kumar Vavilapalli I will provide a patch shortly.
          I am not able to assign myself. Can anyone help me in assigning myself?

          Show
          raju.bairishetti Raju Bairishetti added a comment - Vinod Kumar Vavilapalli I will provide a patch shortly. I am not able to assign myself. Can anyone help me in assigning myself?
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 3 new or modified test files.
          +1 javac 7m 37s There were no new javac warning messages.
          +1 javadoc 9m 44s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 2m 1s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
          +1 findbugs 3m 2s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 22m 17s Tests passed in hadoop-common.
          +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.
              63m 53s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12733743/YARN-3646.patch
          Optional Tests javac unit findbugs checkstyle javadoc
          git revision trunk / 93972a3
          hadoop-common test log https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-yarn-common.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7994/testReport/
          Java 1.7.0_55
          uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/7994/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 3 new or modified test files. +1 javac 7m 37s There were no new javac warning messages. +1 javadoc 9m 44s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 2m 1s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 3m 2s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 22m 17s Tests passed in hadoop-common. +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.     63m 53s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12733743/YARN-3646.patch Optional Tests javac unit findbugs checkstyle javadoc git revision trunk / 93972a3 hadoop-common test log https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-common.txt hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/7994/artifact/patchprocess/testrun_hadoop-yarn-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/7994/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/7994/console This message was automatically generated.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method.

          make sense to me,will reveiw the patch, thanks

          Show
          rohithsharma Rohith Sharma K S added a comment - Seems we do not even require exceptionToPolicy for FOREVER policy if we catch the exception in shouldRetry method. make sense to me,will reveiw the patch, thanks
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks for working on this issue.. The patch overall looks good to me.
          nit : Can the test moved to Yarn package since issue is in Yarn? Otherwise if there is any changed in the RMProxy, test will not run.

          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks for working on this issue.. The patch overall looks good to me. nit : Can the test moved to Yarn package since issue is in Yarn? Otherwise if there is any changed in the RMProxy, test will not run.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          And I verified in one node cluster by enabling and disabling retryforever policy.

          Show
          rohithsharma Rohith Sharma K S added a comment - And I verified in one node cluster by enabling and disabling retryforever policy.
          Hide
          raju.bairishetti Raju Bairishetti added a comment -

          Thanks Rohith Sharma K S for the review.

          Looks like it is mainly an issue with retry policy.

          Show
          raju.bairishetti Raju Bairishetti added a comment - Thanks Rohith Sharma K S for the review. Looks like it is mainly an issue with retry policy.
          Hide
          raju.bairishetti Raju Bairishetti added a comment -

          Added a new unit test in hadoop-yarn-client. Rohith Sharma K S Could you please review?

          Ran the test without starting the RM and then test was getting timeout.

          Ran the test by starting the RM then client is getting ApplicationNotFoundException for older/invalid appId.

                rm = new ResourceManager();
                rm.init(conf);
                rm.start();
          
          Show
          raju.bairishetti Raju Bairishetti added a comment - Added a new unit test in hadoop-yarn-client. Rohith Sharma K S Could you please review? Ran the test without starting the RM and then test was getting timeout. Ran the test by starting the RM then client is getting ApplicationNotFoundException for older/invalid appId. rm = new ResourceManager(); rm.init(conf); rm.start();
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 46s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 4 new or modified test files.
          +1 javac 7m 35s There were no new javac warning messages.
          +1 javadoc 9m 43s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 2m 44s There were no new checkstyle issues.
          +1 whitespace 0m 1s The patch has no lines that end in whitespace.
          +1 install 1m 35s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 48s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 common tests 23m 54s Tests passed in hadoop-common.
          +1 yarn tests 6m 54s Tests passed in hadoop-yarn-client.
          +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.
              73m 55s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12734062/YARN-3646.001.patch
          Optional Tests javac unit findbugs checkstyle javadoc
          git revision trunk / ce53c8e
          hadoop-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-common.txt
          hadoop-yarn-client test log https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-client.txt
          hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-common.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8017/testReport/
          Java 1.7.0_55
          uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8017/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 46s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 4 new or modified test files. +1 javac 7m 35s There were no new javac warning messages. +1 javadoc 9m 43s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 2m 44s There were no new checkstyle issues. +1 whitespace 0m 1s The patch has no lines that end in whitespace. +1 install 1m 35s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 48s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 common tests 23m 54s Tests passed in hadoop-common. +1 yarn tests 6m 54s Tests passed in hadoop-yarn-client. +1 yarn tests 1m 56s Tests passed in hadoop-yarn-common.     73m 55s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12734062/YARN-3646.001.patch Optional Tests javac unit findbugs checkstyle javadoc git revision trunk / ce53c8e hadoop-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-common.txt hadoop-yarn-client test log https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-client.txt hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8017/artifact/patchprocess/testrun_hadoop-yarn-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8017/testReport/ Java 1.7.0_55 uname Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8017/console This message was automatically generated.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks for updating the patch, some comments on tests

          1. I think we can remove the tests added in the hadoop-common project, since yarn-client verifies required funcitionality. And basically hadoop-common test was mocking the RMProxy functionality which test was passing without RMProxy fix also.
          2. code never reach Assert.fail("");. better to remove it
          3. Catch the ApplicationNotFoundException instead of catching throwable. I think you can add expected = ApplicationNotFoundException.class in the @Test annotation like below.
            @Test(timeout = 30000, expected = ApplicationNotFoundException.class)
              public void testClientWithRetryPolicyForEver() throws Exception {
                YarnConfiguration conf = new YarnConfiguration();
                conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1);
            
                ResourceManager rm = null;
                YarnClient yarnClient = null;
                try {
                  // start rm
                  rm = new ResourceManager();
                  rm.init(conf);
                  rm.start();
            
                  yarnClient = YarnClient.createYarnClient();
                  yarnClient.init(conf);
                  yarnClient.start();
            
                  // create invalid application id
                  ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645);
            
                  // RM should throw ApplicationNotFoundException exception
                  yarnClient.getApplicationReport(appId);
                } finally {
                  if (yarnClient != null) {
                    yarnClient.stop();
                  }
                  if (rm != null) {
                    rm.stop();
                  }
                }
              }
            
          4. can you rename the test name with actual functionality test, like testShouldNotRetryForeverForNonNetworkExceptions
          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks for updating the patch, some comments on tests I think we can remove the tests added in the hadoop-common project, since yarn-client verifies required funcitionality. And basically hadoop-common test was mocking the RMProxy functionality which test was passing without RMProxy fix also. code never reach Assert.fail(""); . better to remove it Catch the ApplicationNotFoundException instead of catching throwable. I think you can add expected = ApplicationNotFoundException.class in the @Test annotation like below. @Test(timeout = 30000, expected = ApplicationNotFoundException.class) public void testClientWithRetryPolicyForEver() throws Exception { YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); ResourceManager rm = null ; YarnClient yarnClient = null ; try { // start rm rm = new ResourceManager(); rm.init(conf); rm.start(); yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); // create invalid application id ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); // RM should throw ApplicationNotFoundException exception yarnClient.getApplicationReport(appId); } finally { if (yarnClient != null ) { yarnClient.stop(); } if (rm != null ) { rm.stop(); } } } can you rename the test name with actual functionality test, like testShouldNotRetryForeverForNonNetworkExceptions
          Hide
          raju.bairishetti Raju Bairishetti added a comment -

          Rohith Sharma K S Thanks for the review and comments. Attached a new patch

          Show
          raju.bairishetti Raju Bairishetti added a comment - Rohith Sharma K S Thanks for the review and comments. Attached a new patch
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          +1 lgtm (non-binding).. wait for jenkins report!!

          Show
          rohithsharma Rohith Sharma K S added a comment - +1 lgtm (non-binding).. wait for jenkins report!!
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 34s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 32s There were no new javac warning messages.
          +1 javadoc 9m 37s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 38s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 34s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 2m 6s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 yarn tests 6m 51s Tests passed in hadoop-yarn-client.
          +1 yarn tests 1m 55s Tests passed in hadoop-yarn-common.
              45m 47s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12734115/YARN-3646.002.patch
          Optional Tests javac unit findbugs checkstyle javadoc
          git revision trunk / 4aa730c
          hadoop-yarn-client test log https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-client.txt
          hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-common.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8023/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/8023/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 34s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 32s There were no new javac warning messages. +1 javadoc 9m 37s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 38s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 34s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 2m 6s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 6m 51s Tests passed in hadoop-yarn-client. +1 yarn tests 1m 55s Tests passed in hadoop-yarn-common.     45m 47s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12734115/YARN-3646.002.patch Optional Tests javac unit findbugs checkstyle javadoc git revision trunk / 4aa730c hadoop-yarn-client test log https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-client.txt hadoop-yarn-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8023/artifact/patchprocess/testrun_hadoop-yarn-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8023/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8023/console This message was automatically generated.
          Hide
          devaraj.k Devaraj K added a comment -

          +1, latest patch looks good to me.

          Thanks Raju Bairishetti for reporting and contribution, Thanks Rohith Sharma K S for review.

          Show
          devaraj.k Devaraj K added a comment - +1, latest patch looks good to me. Thanks Raju Bairishetti for reporting and contribution, Thanks Rohith Sharma K S for review.
          Hide
          djp Junping Du added a comment -

          The patch LGTM too in overall.
          Just one minor issue:

          exceptionToPolicyMap.put(EOFException.class, retryPolicy);
          

          Do we need to apply RetryPolicies.RETRY_FOREVER on EOFException too? I don't think so. Jian He and Xuan Gong, any comments here?

          Show
          djp Junping Du added a comment - The patch LGTM too in overall. Just one minor issue: exceptionToPolicyMap.put(EOFException.class, retryPolicy); Do we need to apply RetryPolicies.RETRY_FOREVER on EOFException too? I don't think so. Jian He and Xuan Gong , any comments here?
          Hide
          devaraj.k Devaraj K added a comment -

          Committed to trunk, branch-2 and branch-2.7.

          Thanks Raju Bairishetti.

          Show
          devaraj.k Devaraj K added a comment - Committed to trunk, branch-2 and branch-2.7. Thanks Raju Bairishetti .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7882 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7882/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7882 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7882/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          Hide
          devaraj.k Devaraj K added a comment -

          Thanks Junping Du for the comment.
          I have just committed this patch before noticing your comment. Can we handle adding EOFException to the list as part of another issue/improvement? Thanks.

          Show
          devaraj.k Devaraj K added a comment - Thanks Junping Du for the comment. I have just committed this patch before noticing your comment. Can we handle adding EOFException to the list as part of another issue/improvement? Thanks.
          Hide
          djp Junping Du added a comment -

          Sure. I will file a separated JIRA for discussing this.

          Show
          djp Junping Du added a comment - Sure. I will file a separated JIRA for discussing this.
          Hide
          djp Junping Du added a comment -

          Filed YARN-3695 for continue the discussion.

          Show
          djp Junping Du added a comment - Filed YARN-3695 for continue the discussion.
          Hide
          djp Junping Du added a comment -

          Also, congratulations to Raju Bairishetti for contributing the first patch to Apache Hadoop project!

          Show
          djp Junping Du added a comment - Also, congratulations to Raju Bairishetti for contributing the first patch to Apache Hadoop project!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/935/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/935/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java hadoop-yarn-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/)
          YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/ ) YARN-3646 . Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java

            People

            • Assignee:
              raju.bairishetti Raju Bairishetti
              Reporter:
              raju.bairishetti Raju Bairishetti
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development