Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4497

RM might fail to restart when recovering apps whose attempts are missing

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Find following problem when discussing in YARN-3480.

      If RM fails to store some attempts in RMStateStore, there will be missing attempts in RMStateStore, for the case storing attempt1, attempt2 and attempt3, RM successfully stored attempt1 and attempt3, but failed to store attempt2. When RM restarts, in RMAppImpl#recover, we recover attempts one by one, for this case, we will recover attmept1, then attempt2. When recovering attempt2, we call ((RMAppAttemptImpl)this.currentAttempt).recover(state), it will first find its ApplicationAttemptStateData, but it could not find it, an error will come at assert attemptState != null(RMAppAttemptImpl#recover, line 880).

      1. YARN-4497.01.patch
        6 kB
        Jun Gong
      2. YARN-4497.02.patch
        8 kB
        Jun Gong
      3. YARN-4497.03.patch
        8 kB
        Jun Gong
      4. YARN-4497.04.patch
        8 kB
        Jun Gong

        Issue Links

          Activity

          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thinking when it can happen attempt1 is stored , attempt2 is not stored and attempt3 is stored? One way is manually delete the attempt2 node from zookeeper.

          Show
          rohithsharma Rohith Sharma K S added a comment - Thinking when it can happen attempt1 is stored , attempt2 is not stored and attempt3 is stored? One way is manually delete the attempt2 node from zookeeper.
          Hide
          hex108 Jun Gong added a comment -

          In RMStateStore#notifyStoreOperationFailedInternal, RMStateStore might skip store errors, so RMStateStore might fail to store attempt2 for some reasons(e.g. network error), but the app could continue running, and starts a new attempt attempt3, then RMStateStore stores attempt3 successfully(suppose network is OK now).

          Show
          hex108 Jun Gong added a comment - In RMStateStore#notifyStoreOperationFailedInternal , RMStateStore might skip store errors, so RMStateStore might fail to store attempt2 for some reasons(e.g. network error), but the app could continue running, and starts a new attempt attempt3, then RMStateStore stores attempt3 successfully(suppose network is OK now).
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Currently, If any errors happened while storing into RMstateStore then RMStatestore is FENCED. So no more attempts are stored in state-store. And the RMStatState store state machine has transition is only from ACTIVE to FENCED but there is No FENCED to ACTIVE.

          If I am missing anything in flow, could you explain elaborately?

          Show
          rohithsharma Rohith Sharma K S added a comment - Currently, If any errors happened while storing into RMstateStore then RMStatestore is FENCED. So no more attempts are stored in state-store. And the RMStatState store state machine has transition is only from ACTIVE to FENCED but there is No FENCED to ACTIVE . If I am missing anything in flow, could you explain elaborately?
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          I got your point, if RM HA is not configured and fail fast is false, this would happen.

          Show
          rohithsharma Rohith Sharma K S added a comment - I got your point, if RM HA is not configured and fail fast is false, this would happen.
          Hide
          hex108 Jun Gong added a comment -

          Yes, it is the problem.

          Show
          hex108 Jun Gong added a comment - Yes, it is the problem.
          Hide
          hex108 Jun Gong added a comment -

          In the patch, it deals with two cases:

          1. attempt is missed
          When recovering a attempt, remove the attempt from app.attempts if we could not find corresponding ApplicationAttemptStateData from RMStateStore. There is no ApplicationAttemptStateData found, it means corresponding AM is never launched(launching AM is after receiving event RMAppAttemptEventType.ATTEMPT_NEW_SAVED, and we must not have received the event.).

          2. attempt's final state is missed(fail to store its final state)
          When recovering these attempts, we set their state to FAILED(or any other final state, or adding a state UNKOWN if needed), then the attempt could deal well with event RMAppAttemptEventType.RECOVER.

          Show
          hex108 Jun Gong added a comment - In the patch, it deals with two cases: 1. attempt is missed When recovering a attempt, remove the attempt from app.attempts if we could not find corresponding ApplicationAttemptStateData from RMStateStore. There is no ApplicationAttemptStateData found, it means corresponding AM is never launched(launching AM is after receiving event RMAppAttemptEventType.ATTEMPT_NEW_SAVED , and we must not have received the event.). 2. attempt's final state is missed(fail to store its final state) When recovering these attempts, we set their state to FAILED(or any other final state, or adding a state UNKOWN if needed), then the attempt could deal well with event RMAppAttemptEventType.RECOVER .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 34s trunk passed
          +1 compile 0m 25s trunk passed with JDK v1.8.0_66
          +1 compile 0m 29s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 36s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 10s trunk passed
          +1 javadoc 0m 20s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 26s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.8.0_66
          +1 javac 0m 23s the patch passed
          +1 compile 0m 28s the patch passed with JDK v1.7.0_91
          +1 javac 0m 28s the patch passed
          -1 checkstyle 0m 15s Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 241, now 241).
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 16s the patch passed
          +1 javadoc 0m 19s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91
          -1 unit 63m 42s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 65m 13s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 19s Patch does not generate ASF License warnings.
          146m 12s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779998/YARN-4497.01.patch
          JIRA Issue YARN-4497
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux a8c845834578 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ad997fa
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10126/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10126/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 34s trunk passed +1 compile 0m 25s trunk passed with JDK v1.8.0_66 +1 compile 0m 29s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 10s trunk passed +1 javadoc 0m 20s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 26s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 31s the patch passed +1 compile 0m 23s the patch passed with JDK v1.8.0_66 +1 javac 0m 23s the patch passed +1 compile 0m 28s the patch passed with JDK v1.7.0_91 +1 javac 0m 28s the patch passed -1 checkstyle 0m 15s Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 241, now 241). +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 16s the patch passed +1 javadoc 0m 19s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91 -1 unit 63m 42s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 65m 13s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 146m 12s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779998/YARN-4497.01.patch JIRA Issue YARN-4497 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a8c845834578 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ad997fa Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10126/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10126/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10126/console This message was automatically generated.
          Hide
          jianhe Jian He added a comment -

          Jun Gong, thanks for working on this.
          for the patch, I think making below change in RMAppImpl#recover may be enough ?

          -    for(int i=0; i<appState.getAttemptCount(); ++i) {
          -      // create attempt
          -      createNewAttempt();
          +
          +    for (ApplicationAttemptId attemptId: appState.attempts.keySet()) {
          +      createNewAttempt(attemptId);
                 ((RMAppAttemptImpl)this.currentAttempt).recover(state);
          
          Show
          jianhe Jian He added a comment - Jun Gong , thanks for working on this. for the patch, I think making below change in RMAppImpl#recover may be enough ? - for ( int i=0; i<appState.getAttemptCount(); ++i) { - // create attempt - createNewAttempt(); + + for (ApplicationAttemptId attemptId: appState.attempts.keySet()) { + createNewAttempt(attemptId); ((RMAppAttemptImpl) this .currentAttempt).recover(state);
          Hide
          hex108 Jun Gong added a comment -

          Jian He Thanks for review and comments.

          for the patch, I think making below change in RMAppImpl#recover may be enough ?

          There might be some problems:
          1. appState.attempts.keySet() is not sorted by attempt ID, however we need recover them by order because we use currentAttempt to get AMBlacklist and we calle getNumFailedAppAttempts() in createNewAttempt() .
          2. We need update nextAttemptId after recovering attempts.
          3. We need to deal with the case 2 in previous comment: attempt's final state is missed(fail to store its final state), otherwise it will cause RM to relaunch this attempt: it will be in LAUNCEHD state after recover, and will time out(the attempt has already failed), then RM will relaunch it.

          Show
          hex108 Jun Gong added a comment - Jian He Thanks for review and comments. for the patch, I think making below change in RMAppImpl#recover may be enough ? There might be some problems: 1. appState.attempts.keySet() is not sorted by attempt ID, however we need recover them by order because we use currentAttempt to get AMBlacklist and we calle getNumFailedAppAttempts() in createNewAttempt() . 2. We need update nextAttemptId after recovering attempts. 3. We need to deal with the case 2 in previous comment: attempt's final state is missed(fail to store its final state), otherwise it will cause RM to relaunch this attempt: it will be in LAUNCEHD state after recover, and will time out(the attempt has already failed), then RM will relaunch it.
          Hide
          rohithsharma Rohith Sharma K S added a comment - - edited

          As a side note : since YARN-3840 removes the attempts from RMStateStore, it is very prone to get this issue (YARN-4584) nevertheless of without RM HA is configured and fail fast is false.

          About the solution, it is bit tricky to identify during recovery that whether-application-is-failed-to-store VS failed-attempts-were-removed-after-interval. So I think you can club both your solution and Jian He's thought together, so that we can eliminate failed-attempts-were-removed-after-interval attempts. And assume that attempts recovered are of failed to store only. Thoughts?
          Regarding iterating appState.attempts, it can be sorted before iterating it. If attempts are sorted, then there should not be problem with nextAttemptId.

          About the patch,

          1. attempt.recoveredFinalStatus is being set to always to FAILED. These attempts might be KILLED/FINISHED also.
          2. getNumFailedAppAttempts() is violated if attempt is failed to store since this attempt is removed from attempts. And also note that if attempts is failed to store, then many information such as getNumFailedAppAttempts also wont be exact number since attempt failure is taken from attempt.
          Show
          rohithsharma Rohith Sharma K S added a comment - - edited As a side note : since YARN-3840 removes the attempts from RMStateStore, it is very prone to get this issue ( YARN-4584 ) nevertheless of without RM HA is configured and fail fast is false . About the solution, it is bit tricky to identify during recovery that whether-application-is-failed-to-store VS failed-attempts-were-removed-after-interval . So I think you can club both your solution and Jian He 's thought together, so that we can eliminate failed-attempts-were-removed-after-interval attempts. And assume that attempts recovered are of failed to store only. Thoughts? Regarding iterating appState.attempts, it can be sorted before iterating it. If attempts are sorted, then there should not be problem with nextAttemptId. About the patch, attempt.recoveredFinalStatus is being set to always to FAILED. These attempts might be KILLED/FINISHED also. getNumFailedAppAttempts() is violated if attempt is failed to store since this attempt is removed from attempts . And also note that if attempts is failed to store, then many information such as getNumFailedAppAttempts also wont be exact number since attempt failure is taken from attempt.
          Hide
          hex108 Jun Gong added a comment -

          Rohith Sharma K S Thanks for the comments and suggestion.

          As a side note : since YARN-3840 removes the attempts from RMStateStore, it is very prone to get this issue (YARN-4584) nevertheless of without RM HA is configured and fail fast is false.

          As I commented in YARN-4584, "If attempt 1~28 are removed and attempt 29~31 has been saved to appstore successfully, there will be no NPE for RM recovery." I think we need analyze the RM log more. Removing attempts will cause NPE only when RM continues to run when failing to operate(e.g. store/remove) on RMStateStore. Is there any other case might cause NPE? Maybe we need fix it.

          About the solution, it is bit tricky to identify during recovery that whether-application-is-failed-to-store VS failed-attempts-were-removed-after-interval.

          I think we do not need to identify these two cases, because it makes no different for recovery.

          So I think you can club both your solution and Jian He's thought together, so that we can eliminate failed-attempts-were-removed-after-interval attempts. And assume that attempts recovered are of failed to store only.

          In RMAppImpl#createNewAttempt(), the first new attempt id is nextAttemptId which is initialized to the minimum attempt ID in RMStateStore in RMAppImpl#recover(). So we have skipped recovering those failed-attempts-were-removed-after-interval attempts.

          Regarding iterating appState.attempts, it can be sorted before iterating it. If attempts are sorted, then there should not be problem with nextAttemptId.

          Yes, we could sort it.I will update the patch if needed.

          attempt.recoveredFinalStatus is being set to always to FAILED. These attempts might be KILLED/FINISHED also.

          These attempts might be KILLED actually, but we could not make sure about it. If it is not reasonable to set it to FAILED, how about adding another state(e.g. UNKOWN)? My concern that is it will make things complex.

          getNumFailedAppAttempts() is violated if attempt is failed to store since this attempt is removed from attempts. And also note that if attempts is failed to store, then many information such as getNumFailedAppAttempts also wont be exact number since attempt failure is taken from attempt.

          Yes, the number is not exact number. I have not figured out a good method to solve it now . Since RM HA is not so often and removed attempts are kept in memory, it might be acceptable.

          Show
          hex108 Jun Gong added a comment - Rohith Sharma K S Thanks for the comments and suggestion. As a side note : since YARN-3840 removes the attempts from RMStateStore, it is very prone to get this issue ( YARN-4584 ) nevertheless of without RM HA is configured and fail fast is false. As I commented in YARN-4584 , "If attempt 1~28 are removed and attempt 29~31 has been saved to appstore successfully, there will be no NPE for RM recovery." I think we need analyze the RM log more. Removing attempts will cause NPE only when RM continues to run when failing to operate(e.g. store/remove) on RMStateStore. Is there any other case might cause NPE? Maybe we need fix it. About the solution, it is bit tricky to identify during recovery that whether-application-is-failed-to-store VS failed-attempts-were-removed-after-interval. I think we do not need to identify these two cases, because it makes no different for recovery. So I think you can club both your solution and Jian He's thought together, so that we can eliminate failed-attempts-were-removed-after-interval attempts. And assume that attempts recovered are of failed to store only. In RMAppImpl#createNewAttempt() , the first new attempt id is nextAttemptId which is initialized to the minimum attempt ID in RMStateStore in RMAppImpl#recover() . So we have skipped recovering those failed-attempts-were-removed-after-interval attempts. Regarding iterating appState.attempts, it can be sorted before iterating it. If attempts are sorted, then there should not be problem with nextAttemptId. Yes, we could sort it.I will update the patch if needed. attempt.recoveredFinalStatus is being set to always to FAILED. These attempts might be KILLED/FINISHED also. These attempts might be KILLED actually, but we could not make sure about it. If it is not reasonable to set it to FAILED, how about adding another state(e.g. UNKOWN)? My concern that is it will make things complex. getNumFailedAppAttempts() is violated if attempt is failed to store since this attempt is removed from attempts. And also note that if attempts is failed to store, then many information such as getNumFailedAppAttempts also wont be exact number since attempt failure is taken from attempt. Yes, the number is not exact number. I have not figured out a good method to solve it now . Since RM HA is not so often and removed attempts are kept in memory, it might be acceptable.
          Hide
          sunilg Sunil G added a comment -

          Hi Jun Gong
          I second the idea of sorting appState.attempts.keySet() which looks more clean.

          attempt.recoveredFinalStatus is being set to always to FAILED. These attempts might be KILLED/FINISHED also.

          Yes, there are no clear way to update this. We cannot rely much on diagnostics also. I feel keeping FAILED is fine till we have some clear information to updates as KILLED. I dont this having a final state UNKNOWN is a good idea. Too much of complexity to have a new final state.

          Show
          sunilg Sunil G added a comment - Hi Jun Gong I second the idea of sorting appState.attempts.keySet() which looks more clean. attempt.recoveredFinalStatus is being set to always to FAILED. These attempts might be KILLED/FINISHED also. Yes, there are no clear way to update this. We cannot rely much on diagnostics also. I feel keeping FAILED is fine till we have some clear information to updates as KILLED. I dont this having a final state UNKNOWN is a good idea. Too much of complexity to have a new final state.
          Hide
          hex108 Jun Gong added a comment -

          Sunil G Thanks for confirm and comments.

          I just updated a new patch. Thanks for review.

          Show
          hex108 Jun Gong added a comment - Sunil G Thanks for confirm and comments. I just updated a new patch. Thanks for review.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 27s trunk passed
          +1 compile 0m 28s trunk passed with JDK v1.8.0_66
          +1 compile 0m 31s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 11s trunk passed
          +1 javadoc 0m 25s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 25s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 32s the patch passed
          +1 compile 0m 25s the patch passed with JDK v1.8.0_66
          +1 javac 0m 25s the patch passed
          +1 compile 0m 28s the patch passed with JDK v1.7.0_91
          +1 javac 0m 28s the patch passed
          -1 checkstyle 0m 16s Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 241, now 241).
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 17s the patch passed
          +1 javadoc 0m 19s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91
          -1 unit 60m 26s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 61m 36s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 17s Patch does not generate ASF License warnings.
          139m 34s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12782177/YARN-4497.02.patch
          JIRA Issue YARN-4497
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9341d4fe214f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8315582
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10272/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10272/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 27s trunk passed +1 compile 0m 28s trunk passed with JDK v1.8.0_66 +1 compile 0m 31s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 11s trunk passed +1 javadoc 0m 25s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 25s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 32s the patch passed +1 compile 0m 25s the patch passed with JDK v1.8.0_66 +1 javac 0m 25s the patch passed +1 compile 0m 28s the patch passed with JDK v1.7.0_91 +1 javac 0m 28s the patch passed -1 checkstyle 0m 16s Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 241, now 241). +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 17s the patch passed +1 javadoc 0m 19s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91 -1 unit 60m 26s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 61m 36s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 17s Patch does not generate ASF License warnings. 139m 34s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12782177/YARN-4497.02.patch JIRA Issue YARN-4497 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9341d4fe214f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8315582 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10272/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10272/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10272/console This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Jun Gong
          YARN-4584 logs i have shared to Rohith Sharma K S offline.

          1. Could you also try adding testcases with conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, 2) related to recovery scenarios too as part of patch and amattempts more than 2.
          Show
          bibinchundatt Bibin A Chundatt added a comment - Jun Gong YARN-4584 logs i have shared to Rohith Sharma K S offline. Could you also try adding testcases with conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS, 2) related to recovery scenarios too as part of patch and amattempts more than 2.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -
          1. Also AM killed by RM cases too if possible
          Show
          bibinchundatt Bibin A Chundatt added a comment - Also AM killed by RM cases too if possible
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          "If attempt 1~28 are removed and attempt 29~31 has been saved to appstore successfully, there will be no NPE for RM recovery." I think we need analyze the RM log more. Removing attempts will cause NPE only when RM continues to run when failing to operate(e.g. store/remove) on RMStateStore. Is there any other case might cause NPE? Maybe we need fix it.

          The issue happens straightforwardly in the above case. Can you run test written for this JIRA without fix with slight change like below which is similar to YARN-3480.

          memStore.removeApplicationAttemptInternal(am0.getApplicationAttemptId());
          memStore.removeApplicationAttemptInternal(am1.getApplicationAttemptId());
          

          Reason : While recovering, nextAttemptId is set to firstAttemptIdInStateStore only if submissionContext.getAttemptFailuresValidityInterval() > 0.

              if (submissionContext.getAttemptFailuresValidityInterval() > 0) {
                this.firstAttemptIdInStateStore = appState.getFirstAttemptId();
                this.nextAttemptId = firstAttemptIdInStateStore;
              }
          

          What if submissionContext.getAttemptFailuresValidityInterval() is not set? Attempt id will always start from 1 event thought attempt is removed.
          log:
          Before recovery

          2016-01-14 11:16:52,775 INFO  [Thread-2] recovery.RMStateStore (MemoryRMStateStore.java:removeApplicationAttemptInternal(151)) - Removing state for attempt: appattempt_1452750396633_0001_000001
          2016-01-14 11:16:52,775 INFO  [Thread-2] recovery.RMStateStore (MemoryRMStateStore.java:removeApplicationAttemptInternal(151)) - Removing state for attempt: appattempt_1452750396633_0001_000002
          

          After Recovery I have removed attemptState.getState() in the log message to show attempt is created is attempt id 1

          2016-01-14 11:16:52,885 INFO  [Thread-2] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:recover(886)) - Recovering attempt: appattempt_1452750396633_0001_000001 with final state: 
          2016-01-14 11:16:52,885 ERROR [Thread-2] resourcemanager.ResourceManager (ResourceManager.java:serviceStart(599)) - Failed to load/recover state
          java.lang.NullPointerException
          	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:889)
          

          I will reopen the YARN-4584 for quick fix i.e before removing attempts from state store, need to check of validity interval. We shall move the discussion there.

          Show
          rohithsharma Rohith Sharma K S added a comment - "If attempt 1~28 are removed and attempt 29~31 has been saved to appstore successfully, there will be no NPE for RM recovery." I think we need analyze the RM log more. Removing attempts will cause NPE only when RM continues to run when failing to operate(e.g. store/remove) on RMStateStore. Is there any other case might cause NPE? Maybe we need fix it. The issue happens straightforwardly in the above case. Can you run test written for this JIRA without fix with slight change like below which is similar to YARN-3480 . memStore.removeApplicationAttemptInternal(am0.getApplicationAttemptId()); memStore.removeApplicationAttemptInternal(am1.getApplicationAttemptId()); Reason : While recovering, nextAttemptId is set to firstAttemptIdInStateStore only if submissionContext.getAttemptFailuresValidityInterval() > 0 . if (submissionContext.getAttemptFailuresValidityInterval() > 0) { this .firstAttemptIdInStateStore = appState.getFirstAttemptId(); this .nextAttemptId = firstAttemptIdInStateStore; } What if submissionContext.getAttemptFailuresValidityInterval() is not set? Attempt id will always start from 1 event thought attempt is removed. log : Before recovery 2016-01-14 11:16:52,775 INFO [Thread-2] recovery.RMStateStore (MemoryRMStateStore.java:removeApplicationAttemptInternal(151)) - Removing state for attempt: appattempt_1452750396633_0001_000001 2016-01-14 11:16:52,775 INFO [Thread-2] recovery.RMStateStore (MemoryRMStateStore.java:removeApplicationAttemptInternal(151)) - Removing state for attempt: appattempt_1452750396633_0001_000002 After Recovery I have removed attemptState.getState() in the log message to show attempt is created is attempt id 1 2016-01-14 11:16:52,885 INFO [Thread-2] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:recover(886)) - Recovering attempt: appattempt_1452750396633_0001_000001 with final state: 2016-01-14 11:16:52,885 ERROR [Thread-2] resourcemanager.ResourceManager (ResourceManager.java:serviceStart(599)) - Failed to load/recover state java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:889) I will reopen the YARN-4584 for quick fix i.e before removing attempts from state store, need to check of validity interval. We shall move the discussion there.
          Hide
          hex108 Jun Gong added a comment -

          Rohith Sharma K S Thanks for the analysis. Yes, it is a bug.

          Show
          hex108 Jun Gong added a comment - Rohith Sharma K S Thanks for the analysis. Yes, it is a bug.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          +1 LGTM, I will wait for couple of days before committing this in.
          Sunil G/Jian He do you have any comments on the patch?

          Show
          rohithsharma Rohith Sharma K S added a comment - +1 LGTM, I will wait for couple of days before committing this in. Sunil G / Jian He do you have any comments on the patch?
          Hide
          jianhe Jian He added a comment -

          looks good to me, minor comments is I think setRecoveredFinalState and getRecoveredFinalState does not need to acquire the lock, as they happen sequentially.
          this code can be formatted into single lines like below.

                if (preAttempt != null && preAttempt.getRecoveredFinalState() == null) {
                  preAttempt.setRecoveredFinalState(RMAppAttemptState.FAILED);
                }
          
          Show
          jianhe Jian He added a comment - looks good to me, minor comments is I think setRecoveredFinalState and getRecoveredFinalState does not need to acquire the lock, as they happen sequentially. this code can be formatted into single lines like below. if (preAttempt != null && preAttempt.getRecoveredFinalState() == null ) { preAttempt.setRecoveredFinalState(RMAppAttemptState.FAILED); }
          Hide
          hex108 Jun Gong added a comment -

          Jian He Thanks for review. Attach a new patch to address above problems.

          Show
          hex108 Jun Gong added a comment - Jian He Thanks for review. Attach a new patch to address above problems.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 31s trunk passed
          +1 compile 0m 31s trunk passed with JDK v1.8.0_66
          +1 compile 0m 32s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 17s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 1m 13s trunk passed
          +1 javadoc 0m 25s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 30s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 33s the patch passed
          +1 compile 0m 25s the patch passed with JDK v1.8.0_66
          +1 javac 0m 25s the patch passed
          +1 compile 0m 27s the patch passed with JDK v1.7.0_91
          +1 javac 0m 27s the patch passed
          +1 checkstyle 0m 16s the patch passed
          +1 mvnsite 0m 33s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 26s the patch passed
          +1 javadoc 0m 20s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91
          -1 unit 66m 21s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 67m 21s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          152m 58s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783740/YARN-4497.03.patch
          JIRA Issue YARN-4497
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c2fdfe21927d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f3427d3
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10364/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10364/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 31s trunk passed +1 compile 0m 31s trunk passed with JDK v1.8.0_66 +1 compile 0m 32s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 1m 13s trunk passed +1 javadoc 0m 25s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 30s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 33s the patch passed +1 compile 0m 25s the patch passed with JDK v1.8.0_66 +1 javac 0m 25s the patch passed +1 compile 0m 27s the patch passed with JDK v1.7.0_91 +1 javac 0m 27s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 26s the patch passed +1 javadoc 0m 20s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91 -1 unit 66m 21s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 67m 21s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 152m 58s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783740/YARN-4497.03.patch JIRA Issue YARN-4497 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c2fdfe21927d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f3427d3 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10364/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10364/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10364/console This message was automatically generated.
          Hide
          jianhe Jian He added a comment -

          +1, thanks

          Show
          jianhe Jian He added a comment - +1, thanks
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          +1, committing shortly

          Show
          rohithsharma Rohith Sharma K S added a comment - +1, committing shortly
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Jun Gong would mind rebase the patch? Since YARN-4584 has gone in first, there are few conflicts

          Show
          rohithsharma Rohith Sharma K S added a comment - Jun Gong would mind rebase the patch? Since YARN-4584 has gone in first, there are few conflicts
          Hide
          hex108 Jun Gong added a comment -

          Rohith Sharma K S thanks, I just attached a rebased patch.

          Show
          hex108 Jun Gong added a comment - Rohith Sharma K S thanks, I just attached a rebased patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 20s trunk passed
          +1 compile 0m 26s trunk passed with JDK v1.8.0_66
          +1 compile 0m 32s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 11s trunk passed
          +1 javadoc 0m 21s trunk passed with JDK v1.8.0_66
          +1 javadoc 0m 28s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.8.0_66
          +1 javac 0m 23s the patch passed
          +1 compile 0m 28s the patch passed with JDK v1.7.0_91
          +1 javac 0m 28s the patch passed
          +1 checkstyle 0m 15s the patch passed
          +1 mvnsite 0m 33s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 20s the patch passed
          +1 javadoc 0m 18s the patch passed with JDK v1.8.0_66
          +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91
          -1 unit 65m 41s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 68m 31s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          152m 31s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783797/YARN-4497.04.patch
          JIRA Issue YARN-4497
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 14143b93487e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f5c8c85
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10367/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10367/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 20s trunk passed +1 compile 0m 26s trunk passed with JDK v1.8.0_66 +1 compile 0m 32s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 11s trunk passed +1 javadoc 0m 21s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 28s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 31s the patch passed +1 compile 0m 23s the patch passed with JDK v1.8.0_66 +1 javac 0m 23s the patch passed +1 compile 0m 28s the patch passed with JDK v1.7.0_91 +1 javac 0m 28s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 20s the patch passed +1 javadoc 0m 18s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 24s the patch passed with JDK v1.7.0_91 -1 unit 65m 41s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 68m 31s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 152m 31s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification   hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12783797/YARN-4497.04.patch JIRA Issue YARN-4497 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 14143b93487e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f5c8c85 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10367/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10367/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10367/console This message was automatically generated.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          committed to trunk/branch-2.. thanks Jun Gong for your contributions! thanks Jian He for the review..

          Show
          rohithsharma Rohith Sharma K S added a comment - committed to trunk/branch-2.. thanks Jun Gong for your contributions! thanks Jian He for the review..
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9166 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9166/)
          YARN-4497. RM might fail to restart when recovering apps whose attempts (rohithsharmaks: rev d6258b33a7428a0725ead96bc43f4dd444c7c8f1)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9166 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9166/ ) YARN-4497 . RM might fail to restart when recovering apps whose attempts (rohithsharmaks: rev d6258b33a7428a0725ead96bc43f4dd444c7c8f1) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java hadoop-yarn-project/CHANGES.txt
          Hide
          hex108 Jun Gong added a comment -

          Rohith Sharma K S Thanks for the review, comments and commit! Thanks Jian He, Sunil G for the review and comments!

          Show
          hex108 Jun Gong added a comment - Rohith Sharma K S Thanks for the review, comments and commit! Thanks Jian He , Sunil G for the review and comments!

            People

            • Assignee:
              hex108 Jun Gong
              Reporter:
              hex108 Jun Gong
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development