Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4074

Client continuously retries to RM When RM goes down before launching Application Master

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: None
    • Labels:
      None

      Description

      Client continuously tries to RM and logs the below messages when the RM goes down before launching App Master.

      I feel exception should be thrown or break the loop after finite no of retries.

      28/03/12 07:15:03 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 0 time(s).
      28/03/12 07:15:04 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 1 time(s).
      28/03/12 07:15:05 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 2 time(s).
      28/03/12 07:15:06 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 3 time(s).
      28/03/12 07:15:07 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 4 time(s).
      28/03/12 07:15:08 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 5 time(s).
      28/03/12 07:15:09 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 6 time(s).
      28/03/12 07:15:10 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 7 time(s).
      28/03/12 07:15:11 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 8 time(s).
      28/03/12 07:15:12 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 9 time(s).
      28/03/12 07:15:13 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 0 time(s).
      28/03/12 07:15:14 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 1 time(s).
      28/03/12 07:15:15 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 2 time(s).
      28/03/12 07:15:16 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 3 time(s).
      28/03/12 07:15:17 INFO ipc.Client: Retrying connect to server: linux-f330.site/10.18.40.182:8032. Already tried 4 time(s).
      
      1. MAPREDUCE-4074-3.patch
        11 kB
        xieguiming
      2. MAPREDUCE-4074-2.patch
        10 kB
        xieguiming
      3. MAPREDUCE-4074-1.patch
        9 kB
        xieguiming
      4. MAPREDUCE-4074.patch
        5 kB
        xieguiming

        Activity

        Hide
        xieguiming added a comment -

        The reason is below function:

        ClientServiceDelegate.java
        
          private synchronized Object invoke(String method, Class argClass,
              Object args) throws YarnRemoteException {
            Method methodOb = null;
            try {
              methodOb = MRClientProtocol.class.getMethod(method, argClass);
            } catch (SecurityException e) {
              throw new YarnException(e);
            } catch (NoSuchMethodException e) {
              throw new YarnException("Method name mismatch", e);
            }
            while (true) {
              try {
                return methodOb.invoke(getProxy(), args);
              } catch (YarnRemoteException yre) {
                LOG.warn("Exception thrown by remote end.", yre);
                throw yre;
              } catch (InvocationTargetException e) {
                if (e.getTargetException() instanceof YarnRemoteException) {
                  LOG.warn("Error from remote end: " + e
                      .getTargetException().getLocalizedMessage());
                  LOG.debug("Tracing remote error ", e.getTargetException());
                  throw (YarnRemoteException) e.getTargetException();
                }
                LOG.debug("Failed to contact AM/History for job " + jobId + 
                    " retrying..", e.getTargetException());
                // Force reconnection by setting the proxy to null.
                realProxy = null;
              } catch (Exception e) {
                LOG.debug("Failed to contact AM/History for job " + jobId
                    + "  Will retry..", e);
                // Force reconnection by setting the proxy to null.
                realProxy = null;
              }
            }
          }
        

        When RM goes down, and will throw the java.lang.reflect.UndeclaredThrowableException, and will continuously retry.

        Show
        xieguiming added a comment - The reason is below function: ClientServiceDelegate.java private synchronized Object invoke( String method, Class argClass, Object args) throws YarnRemoteException { Method methodOb = null ; try { methodOb = MRClientProtocol.class.getMethod(method, argClass); } catch (SecurityException e) { throw new YarnException(e); } catch (NoSuchMethodException e) { throw new YarnException( "Method name mismatch" , e); } while ( true ) { try { return methodOb.invoke(getProxy(), args); } catch (YarnRemoteException yre) { LOG.warn( "Exception thrown by remote end." , yre); throw yre; } catch (InvocationTargetException e) { if (e.getTargetException() instanceof YarnRemoteException) { LOG.warn( "Error from remote end: " + e .getTargetException().getLocalizedMessage()); LOG.debug( "Tracing remote error " , e.getTargetException()); throw (YarnRemoteException) e.getTargetException(); } LOG.debug( "Failed to contact AM/History for job " + jobId + " retrying.." , e.getTargetException()); // Force reconnection by setting the proxy to null . realProxy = null ; } catch (Exception e) { LOG.debug( "Failed to contact AM/History for job " + jobId + " Will retry.." , e); // Force reconnection by setting the proxy to null . realProxy = null ; } } } When RM goes down, and will throw the java.lang.reflect.UndeclaredThrowableException, and will continuously retry.
        Hide
        xieguiming added a comment -

        So, I think we should do the timeout processing.

        Show
        xieguiming added a comment - So, I think we should do the timeout processing.
        Hide
        xieguiming added a comment -

        Retry Scenarios:
        1, AM finished, and will contact HS, and need to retry
        2, If RM goes down, the IPC will retry automaticly. and no need retry here.
        3, If HS goes down, the IPC will retry automaticly. and no need retry here.

        So, we only need to consider scenario 1.

        Show
        xieguiming added a comment - Retry Scenarios: 1, AM finished, and will contact HS, and need to retry 2, If RM goes down, the IPC will retry automaticly. and no need retry here. 3, If HS goes down, the IPC will retry automaticly. and no need retry here. So, we only need to consider scenario 1.
        Hide
        xieguiming added a comment -

        Add one scenario:
        4, AM attempt, and will contack other AM, and need to retry.

        SO, we only need to consider scenario 1 and 4.

        Show
        xieguiming added a comment - Add one scenario: 4, AM attempt, and will contack other AM, and need to retry. SO, we only need to consider scenario 1 and 4.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12522407/MAPREDUCE-4074.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2211//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2211//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522407/MAPREDUCE-4074.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2211//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2211//console This message is automatically generated.
        Hide
        xieguiming added a comment -

        add 3 unit test cases.

        Show
        xieguiming added a comment - add 3 unit test cases.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12522906/MAPREDUCE-4074-1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
        org.apache.hadoop.yarn.server.TestDiskFailures
        org.apache.hadoop.yarn.server.TestContainerManagerSecurity
        org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
        org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
        org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
        org.apache.hadoop.mapred.TestMiniMRClasspath
        org.apache.hadoop.mapreduce.v2.TestMRJobs
        org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
        org.apache.hadoop.mapred.TestMiniMRBringup
        org.apache.hadoop.mapred.TestMiniMRChildTask
        org.apache.hadoop.mapred.TestReduceFetch
        org.apache.hadoop.mapred.TestClusterMRNotification
        org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
        org.apache.hadoop.mapred.TestJobCounters
        org.apache.hadoop.mapreduce.TestChild
        org.apache.hadoop.mapred.TestMiniMRClientCluster
        org.apache.hadoop.ipc.TestSocketFactory
        org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
        org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
        org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
        org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
        org.apache.hadoop.mapred.TestClientRedirect
        org.apache.hadoop.mapred.TestLazyOutput
        org.apache.hadoop.mapred.TestJobCleanup
        org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
        org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
        org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
        org.apache.hadoop.conf.TestNoDefaultsJobConf
        org.apache.hadoop.mapreduce.v2.TestRMNMInfo
        org.apache.hadoop.mapred.TestClusterMapReduceTestCase
        org.apache.hadoop.mapreduce.v2.TestNonExistentJob
        org.apache.hadoop.mapred.TestJobSysDirWithDFS
        org.apache.hadoop.mapreduce.v2.TestUberAM
        org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
        org.apache.hadoop.mapred.TestJobName
        org.apache.hadoop.mapreduce.security.TestJHSSecurity

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2233//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2233//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522906/MAPREDUCE-4074-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.server.TestDiskFailures org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestClusterMRNotification org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.ipc.TestSocketFactory org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapred.TestClientRedirect org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapred.TestJobName org.apache.hadoop.mapreduce.security.TestJHSSecurity +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2233//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2233//console This message is automatically generated.
        Hide
        Thomas Graves added a comment -

        Still reviewing, but one thing I noticed is that your config setting "client-rm.hs.am.max-retries" doesn't match what other yarn max-retries (like yarn.resurcemanager.am.max-retries and yarn.app.mapreduce.client-am.ipc.max-retries) do. In general if they are set to 3 it actually only tries 3 times total. I realize retries would imply that it tries 4 times total (the first and then 3 more) like yours does, but I think we should keep it consistent across yarn and change yours to only try 3 times total if its set to 3.

        I personally would prefer the setting "yarn.app.mapreduce.client-rm.hs.am.max-retries" just be called yarn.app.mapreduce.client.max-retries or atleast use - instead of . so something more like yarn.app.mapreduce.client-rm-hs-am.max-retries. I'm open to other ideas to just don't like the periods since its saying client to rm or hs or am.

        You should also add the default setting with description to yarn-default.xml

        Show
        Thomas Graves added a comment - Still reviewing, but one thing I noticed is that your config setting "client-rm.hs.am.max-retries" doesn't match what other yarn max-retries (like yarn.resurcemanager.am.max-retries and yarn.app.mapreduce.client-am.ipc.max-retries) do. In general if they are set to 3 it actually only tries 3 times total. I realize retries would imply that it tries 4 times total (the first and then 3 more) like yours does, but I think we should keep it consistent across yarn and change yours to only try 3 times total if its set to 3. I personally would prefer the setting "yarn.app.mapreduce.client-rm.hs.am.max-retries" just be called yarn.app.mapreduce.client.max-retries or atleast use - instead of . so something more like yarn.app.mapreduce.client-rm-hs-am.max-retries. I'm open to other ideas to just don't like the periods since its saying client to rm or hs or am. You should also add the default setting with description to yarn-default.xml
        Hide
        Devaraj K added a comment -

        Cancelling patch to address Thomas review comments.

        Show
        Devaraj K added a comment - Cancelling patch to address Thomas review comments.
        Hide
        xieguiming added a comment -

        modify 2 points:
        1, change to the logic (maxRetries >= 0) to (maxRetries > 0)

        2,add the
        <property>
        <name>yarn.app.mapreduce.client-rm.hs.am.max-retries</name>
        <value>3</value>
        <description>The number of client retries to the RM/HS/AM before
        throwing exception.</description>
        </property>

        to the mapred-default.xml

        Show
        xieguiming added a comment - modify 2 points: 1, change to the logic (maxRetries >= 0) to (maxRetries > 0) 2,add the <property> <name>yarn.app.mapreduce.client-rm.hs.am.max-retries</name> <value>3</value> <description>The number of client retries to the RM/HS/AM before throwing exception.</description> </property> to the mapred-default.xml
        Hide
        xieguiming added a comment -

        Hi Thomas,

        Thanks for you review,
        I have reworked by your opnions.

        And the new property "yarn.app.mapreduce.client-rm.hs.am.max-retries" is like "yarn.app.mapreduce.client-am.ipc.max-retries" present.

        Please check again. thanks.

        Show
        xieguiming added a comment - Hi Thomas, Thanks for you review, I have reworked by your opnions. And the new property "yarn.app.mapreduce.client-rm.hs.am.max-retries" is like "yarn.app.mapreduce.client-am.ipc.max-retries" present. Please check again. thanks.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12523070/MAPREDUCE-4074-2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
        org.apache.hadoop.yarn.server.TestDiskFailures
        org.apache.hadoop.yarn.server.TestContainerManagerSecurity
        org.apache.hadoop.yarn.server.resourcemanager.security.TestApplicationTokens
        org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
        org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry
        org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
        org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
        org.apache.hadoop.mapred.TestMiniMRClasspath
        org.apache.hadoop.mapreduce.v2.TestMRJobs
        org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
        org.apache.hadoop.mapred.TestMiniMRBringup
        org.apache.hadoop.mapred.TestMiniMRChildTask
        org.apache.hadoop.mapred.TestReduceFetch
        org.apache.hadoop.mapred.TestClusterMRNotification
        org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
        org.apache.hadoop.mapred.TestJobCounters
        org.apache.hadoop.mapreduce.TestChild
        org.apache.hadoop.mapred.TestMiniMRClientCluster
        org.apache.hadoop.ipc.TestSocketFactory
        org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
        org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
        org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
        org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
        org.apache.hadoop.mapred.TestClientRedirect
        org.apache.hadoop.mapred.TestLazyOutput
        org.apache.hadoop.mapred.TestJobCleanup
        org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
        org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
        org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
        org.apache.hadoop.conf.TestNoDefaultsJobConf
        org.apache.hadoop.mapreduce.v2.TestRMNMInfo
        org.apache.hadoop.mapred.TestClusterMapReduceTestCase
        org.apache.hadoop.mapreduce.v2.TestNonExistentJob
        org.apache.hadoop.mapred.TestJobSysDirWithDFS
        org.apache.hadoop.mapred.TestClientServiceDelegate
        org.apache.hadoop.mapreduce.v2.TestUberAM
        org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
        org.apache.hadoop.mapred.TestJobName
        org.apache.hadoop.mapreduce.security.TestJHSSecurity

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2244//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2244//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523070/MAPREDUCE-4074-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.yarn.server.TestDiskFailures org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.resourcemanager.security.TestApplicationTokens org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestNMExpiry org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapred.TestMiniMRChildTask org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.mapred.TestClusterMRNotification org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestJobCounters org.apache.hadoop.mapreduce.TestChild org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.ipc.TestSocketFactory org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapred.TestClientRedirect org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.conf.TestNoDefaultsJobConf org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapred.TestJobSysDirWithDFS org.apache.hadoop.mapred.TestClientServiceDelegate org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapred.TestJobName org.apache.hadoop.mapreduce.security.TestJHSSecurity +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2244//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2244//console This message is automatically generated.
        Hide
        Thomas Graves added a comment -

        Can you please update the test now to account for the > 0 rather then the >=.

        Show
        Thomas Graves added a comment - Can you please update the test now to account for the > 0 rather then the >=.
        Hide
        Thomas Graves added a comment -

        Sorry for the split on the review, I was going through the various cases.

        So if you can make the update to the test and
        can you also please update the name of the config from "yarn.app.mapreduce.client-rm.hs.am.max-retries" to "yarn.app.mapreduce.client.max-retries" and in mapred-default.xml add to the description to say "The number of client retries to the RM/HS/AM before throwing exception. This is the layer above the ipc." or similar.

        Show
        Thomas Graves added a comment - Sorry for the split on the review, I was going through the various cases. So if you can make the update to the test and can you also please update the name of the config from "yarn.app.mapreduce.client-rm.hs.am.max-retries" to "yarn.app.mapreduce.client.max-retries" and in mapred-default.xml add to the description to say "The number of client retries to the RM/HS/AM before throwing exception. This is the layer above the ipc." or similar.
        Hide
        xieguiming added a comment -

        Hi Thomas,
        I have complete the modification by your opinions.
        please check, thanks.

        Show
        xieguiming added a comment - Hi Thomas, I have complete the modification by your opinions. please check, thanks.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12523303/MAPREDUCE-4074-3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2253//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2253//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523303/MAPREDUCE-4074-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2253//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2253//console This message is automatically generated.
        Hide
        Thomas Graves added a comment -

        +1. Thanks xieguiming! I've committed this to trunk, branch-2, and branch-0.23.

        Show
        Thomas Graves added a comment - +1. Thanks xieguiming! I've committed this to trunk, branch-2, and branch-0.23.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2179 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2179/)
        MAPREDUCE-4074. Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2179 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2179/ ) MAPREDUCE-4074 . Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2105 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2105/)
        MAPREDUCE-4074. Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2105 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2105/ ) MAPREDUCE-4074 . Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2122/)
        MAPREDUCE-4074. Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2122/ ) MAPREDUCE-4074 . Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #233 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/233/)
        merge -r 1327976:1327977 from branch-2. FIXES: MAPREDUCE-4074 (Revision 1327978)

        Result = SUCCESS
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327978
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #233 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/233/ ) merge -r 1327976:1327977 from branch-2. FIXES: MAPREDUCE-4074 (Revision 1327978) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327978 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1020 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1020/)
        MAPREDUCE-4074. Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1020 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1020/ ) MAPREDUCE-4074 . Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1055 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1055/)
        MAPREDUCE-4074. Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972)

        Result = FAILURE
        tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1055 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1055/ ) MAPREDUCE-4074 . Client continuously retries to RM When RM goes down before launching Application Master (xieguiming via tgraves) (Revision 1327972) Result = FAILURE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1327972 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java

          People

          • Assignee:
            xieguiming
            Reporter:
            Devaraj K
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development