Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5317

testAMRestartNotLostContainerCompleteMsg may fail

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      java.lang.Exception: test timed out after 30000 milliseconds
      at java.lang.Thread.sleep(Native Method)
      at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:261)
      at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:225)
      at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:207)
      at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:746)
      at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testAMRestartNotLostContainerCompleteMsg(TestAMRestart.java:841)

      see https://builds.apache.org/job/PreCommit-YARN-Build/12204/testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testAMRestartNotLostContainerCompleteMsg/

      1. YARN-5317.01.patch
        1 kB
        sandflee
      2. YARN-5317.02.patch
        3 kB
        sandflee

        Activity

        Hide
        sandflee sandflee added a comment -
            // launch the new AM
            RMAppAttempt attempt2 = app1.getCurrentAppAttempt();
            nm1.nodeHeartbeat(true);
            MockAM am2 = rm1.sendAMLaunched(attempt2.getAppAttemptId());
        

        before nodeHeartBeat we should wait appAttempt is in SCHEDULED state. cc Jian He

        Show
        sandflee sandflee added a comment - // launch the new AM RMAppAttempt attempt2 = app1.getCurrentAppAttempt(); nm1.nodeHeartbeat( true ); MockAM am2 = rm1.sendAMLaunched(attempt2.getAppAttemptId()); before nodeHeartBeat we should wait appAttempt is in SCHEDULED state. cc Jian He
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 29s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 15s trunk passed
        +1 compile 0m 30s trunk passed
        +1 checkstyle 0m 20s trunk passed
        +1 mvnsite 0m 34s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 52s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 28s the patch passed
        +1 compile 0m 26s the patch passed
        +1 javac 0m 26s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvnsite 0m 31s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 57s the patch passed
        +1 javadoc 0m 17s the patch passed
        +1 unit 35m 44s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        49m 17s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816993/YARN-5317.01.patch
        JIRA Issue YARN-5317
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 23dd8c83fc88 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9bdb5be
        Default Java 1.8.0_91
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12255/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12255/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 29s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 15s trunk passed +1 compile 0m 30s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 34s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 52s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 28s the patch passed +1 compile 0m 26s the patch passed +1 javac 0m 26s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 57s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 35m 44s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 49m 17s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816993/YARN-5317.01.patch JIRA Issue YARN-5317 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 23dd8c83fc88 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9bdb5be Default Java 1.8.0_91 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12255/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12255/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for the patch, sandflee!

        Also I noticed the same code snippet appears in testAMRestartWithExistingContainers, so I'm wondering if it could have the same issue. There are also other places that don't have the exact same code snippet but are waiting for an app state of ACCEPTED then doing things relative to the app attempt, so I'm thinking they too could be susceptible to the app-is-accepted-but-attempt-not-scheduled race.

        Show
        jlowe Jason Lowe added a comment - Thanks for the patch, sandflee ! Also I noticed the same code snippet appears in testAMRestartWithExistingContainers, so I'm wondering if it could have the same issue. There are also other places that don't have the exact same code snippet but are waiting for an app state of ACCEPTED then doing things relative to the app attempt, so I'm thinking they too could be susceptible to the app-is-accepted-but-attempt-not-scheduled race.
        Hide
        sunilg Sunil G added a comment -

        Yes. We have faced this problem in few HA tests and in priority test cases.
        Ideally its better to ensure that the attempt is in SCHEDULED state before sending node heartbeat. MockRM#launchAM will wrap this whole thing and it may be more better. Sample code

            MockAM am2 = MockRM.launchAM(app2, rm, nm1);
            am2.registerAppAttempt();
        
        Show
        sunilg Sunil G added a comment - Yes. We have faced this problem in few HA tests and in priority test cases. Ideally its better to ensure that the attempt is in SCHEDULED state before sending node heartbeat. MockRM#launchAM will wrap this whole thing and it may be more better. Sample code MockAM am2 = MockRM.launchAM(app2, rm, nm1); am2.registerAppAttempt();
        Hide
        sandflee sandflee added a comment -

        sendAMLaunched is mainly used in two scene:
        1,submit app, and send am launched.

            RMApp app1 = rm.submitApp(testAlloc);
            nm1.nodeHeartbeat(true);
            RMAppAttempt attempt1 = app1.getCurrentAppAttempt();
            MockAM am1 = rm.sendAMLaunched(attempt1.getAppAttemptId());
        

        this is ok, because after submitApp, app becomes ACCEPTED, appAttempt becomes SCHEDULED
        2, am container complete, and send am launched, this should explicitly wait appAttempt becomes SCHEDULED before send node heartbeat(or use mockRM#launchAM). this seems just happens in testAMRestart

        Also I noticed the same code snippet appears in testAMRestartWithExistingContainers, so I'm wondering if it could have the same issue.

        yes, it have the same issue, but in this test, it sleep 3s after am container complete, it's enough for appAttempt becomes SCHEDULED. but it's reasonable to add this check

        Show
        sandflee sandflee added a comment - sendAMLaunched is mainly used in two scene: 1,submit app, and send am launched. RMApp app1 = rm.submitApp(testAlloc); nm1.nodeHeartbeat( true ); RMAppAttempt attempt1 = app1.getCurrentAppAttempt(); MockAM am1 = rm.sendAMLaunched(attempt1.getAppAttemptId()); this is ok, because after submitApp, app becomes ACCEPTED, appAttempt becomes SCHEDULED 2, am container complete, and send am launched, this should explicitly wait appAttempt becomes SCHEDULED before send node heartbeat(or use mockRM#launchAM). this seems just happens in testAMRestart Also I noticed the same code snippet appears in testAMRestartWithExistingContainers, so I'm wondering if it could have the same issue. yes, it have the same issue, but in this test, it sleep 3s after am container complete, it's enough for appAttempt becomes SCHEDULED. but it's reasonable to add this check
        Hide
        sandflee sandflee added a comment -

        thanks Sunil G, could you share which test failed for this?

        Show
        sandflee sandflee added a comment - thanks Sunil G , could you share which test failed for this?
        Hide
        sunilg Sunil G added a comment -

        Thanks sandflee for the updated patch.
        Earlier we had few failures in similar lines. So we used launchAM and solved those problems. At present, I am not seeing this same issue in any other places. Approach in new patch looks fine and thanks for giving a detailed comment as well.

        Show
        sunilg Sunil G added a comment - Thanks sandflee for the updated patch. Earlier we had few failures in similar lines. So we used launchAM and solved those problems. At present, I am not seeing this same issue in any other places. Approach in new patch looks fine and thanks for giving a detailed comment as well.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
        +1 mvninstall 7m 21s trunk passed
        +1 compile 0m 34s trunk passed
        +1 checkstyle 0m 22s trunk passed
        +1 mvnsite 0m 41s trunk passed
        +1 mvneclipse 0m 20s trunk passed
        +1 findbugs 1m 1s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 34s the patch passed
        +1 compile 0m 33s the patch passed
        +1 javac 0m 33s the patch passed
        -1 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 77 unchanged - 1 fixed = 79 total (was 78)
        +1 mvnsite 0m 37s the patch passed
        +1 mvneclipse 0m 15s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 5s the patch passed
        +1 javadoc 0m 19s the patch passed
        -1 unit 33m 36s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        49m 7s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817463/YARN-5317.02.patch
        JIRA Issue YARN-5317
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c812955b271a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 7705812
        Default Java 1.8.0_91
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12288/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/12288/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12288/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12288/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12288/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 21s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 41s trunk passed +1 mvneclipse 0m 20s trunk passed +1 findbugs 1m 1s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 34s the patch passed +1 compile 0m 33s the patch passed +1 javac 0m 33s the patch passed -1 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 77 unchanged - 1 fixed = 79 total (was 78) +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 5s the patch passed +1 javadoc 0m 19s the patch passed -1 unit 33m 36s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 49m 7s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817463/YARN-5317.02.patch JIRA Issue YARN-5317 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c812955b271a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7705812 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12288/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/12288/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12288/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12288/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12288/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        +1 lgtm. Filed YARN-5362 for the unrelated TestRMRestart failure.

        Committing this.

        Show
        jlowe Jason Lowe added a comment - +1 lgtm. Filed YARN-5362 for the unrelated TestRMRestart failure. Committing this.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks to sandflee for the contribution and to Sunil G for additional review! I committed this to trunk, branch-2, and branch-2.8.

        Show
        jlowe Jason Lowe added a comment - Thanks to sandflee for the contribution and to Sunil G for additional review! I committed this to trunk, branch-2, and branch-2.8.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #10080 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10080/)
        YARN-5317. testAMRestartNotLostContainerCompleteMsg may fail. (jlowe: rev 10b704c5946afe7bfd4a6be40192ce7ca745d817)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10080 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10080/ ) YARN-5317 . testAMRestartNotLostContainerCompleteMsg may fail. (jlowe: rev 10b704c5946afe7bfd4a6be40192ce7ca745d817) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
        Hide
        sandflee sandflee added a comment -

        thanks sunil and jason for review and commit!

        Show
        sandflee sandflee added a comment - thanks sunil and jason for review and commit!

          People

          • Assignee:
            sandflee sandflee
            Reporter:
            sandflee sandflee
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development