Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6359

TestRM#testApplicationKillAtAcceptedState fails rarely due to race condition

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.0, 3.0.0-alpha4
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: test
    • Labels:
      None

      Description

      We've seen (very rarely) a test failure in TestRM#testApplicationKillAtAcceptedState

      java.lang.AssertionError: expected:<1> but was:<0>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:555)
      	at org.junit.Assert.assertEquals(Assert.java:542)
      	at org.apache.hadoop.yarn.server.resourcemanager.TestRM.testApplicationKillAtAcceptedState(TestRM.java:645)
      
      1. YARN-6359.003.patch
        2 kB
        Robert Kanter
      2. YARN-6359.002.patch
        2 kB
        Robert Kanter
      3. YARN-6359.001.patch
        1 kB
        Robert Kanter

        Activity

        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.
        Hide
        rkanter Robert Kanter added a comment -

        Thanks Jason Lowe!

        Show
        rkanter Robert Kanter added a comment - Thanks Jason Lowe !
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11481 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11481/)
        YARN-6359. TestRM#testApplicationKillAtAcceptedState fails rarely due to (jlowe: rev fdf8f8ebca9987a1956ce464fe33ea6a3ad28d72)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11481 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11481/ ) YARN-6359 . TestRM#testApplicationKillAtAcceptedState fails rarely due to (jlowe: rev fdf8f8ebca9987a1956ce464fe33ea6a3ad28d72) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
        Hide
        jlowe Jason Lowe added a comment -

        Thanks to Robert for the contribution and to Karthik for additional review! I committed this to trunk, branch-2, and branch-2.8.

        Show
        jlowe Jason Lowe added a comment - Thanks to Robert for the contribution and to Karthik for additional review! I committed this to trunk, branch-2, and branch-2.8.
        Hide
        kasha Karthik Kambatla added a comment -

        +1

        Show
        kasha Karthik Kambatla added a comment - +1
        Hide
        jlowe Jason Lowe added a comment -

        +1 lgtm. Will commit this tomorrow if there are no objections.

        Show
        jlowe Jason Lowe added a comment - +1 lgtm. Will commit this tomorrow if there are no objections.
        Hide
        rkanter Robert Kanter added a comment -

        Test failure unrelated.

        Show
        rkanter Robert Kanter added a comment - Test failure unrelated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 14m 5s trunk passed
        +1 compile 0m 34s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 35s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 5s trunk passed
        +1 javadoc 0m 22s trunk passed
        +1 mvninstall 0m 33s the patch passed
        +1 compile 0m 34s the patch passed
        +1 javac 0m 34s the patch passed
        +1 checkstyle 0m 23s the patch passed
        +1 mvnsite 0m 33s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 9s the patch passed
        +1 javadoc 0m 20s the patch passed
        -1 unit 40m 18s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 18s The patch does not generate ASF License warnings.
        63m 14s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-6359
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12860725/YARN-6359.003.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 980e856a8a17 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / cd014d5
        Default Java 1.8.0_121
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-YARN-Build/15398/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15398/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/15398/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 5s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 35s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 5s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 33s the patch passed +1 compile 0m 34s the patch passed +1 javac 0m 34s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 9s the patch passed +1 javadoc 0m 20s the patch passed -1 unit 40m 18s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 63m 14s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6359 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12860725/YARN-6359.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 980e856a8a17 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cd014d5 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/15398/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15398/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/15398/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        rkanter Robert Kanter added a comment -

        The 003 patch:

        • Reduces the interval from 500ms to 100ms
        • Changes the timeout from 20s to 10s, which I think is still pretty conservative just in case.
        Show
        rkanter Robert Kanter added a comment - The 003 patch: Reduces the interval from 500ms to 100ms Changes the timeout from 20s to 10s, which I think is still pretty conservative just in case.
        Hide
        kasha Karthik Kambatla added a comment - - edited

        Nit: I think we use 100 ms retry in other places; maybe we should use the same?

        Show
        kasha Karthik Kambatla added a comment - - edited Nit: I think we use 100 ms retry in other places; maybe we should use the same?
        Hide
        rkanter Robert Kanter added a comment -

        Test failure unrelated.

        Show
        rkanter Robert Kanter added a comment - Test failure unrelated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 14s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 12m 49s trunk passed
        +1 compile 0m 31s trunk passed
        +1 checkstyle 0m 25s trunk passed
        +1 mvnsite 0m 33s trunk passed
        +1 mvneclipse 0m 15s trunk passed
        +1 findbugs 1m 0s trunk passed
        +1 javadoc 0m 21s trunk passed
        +1 mvninstall 0m 29s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        +1 checkstyle 0m 22s the patch passed
        +1 mvnsite 0m 31s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 3s the patch passed
        +1 javadoc 0m 18s the patch passed
        -1 unit 39m 49s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        61m 4s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-6359
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859390/YARN-6359.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 6c35ddd6a89f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / e1a9980
        Default Java 1.8.0_121
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-YARN-Build/15323/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15323/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/15323/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 12m 49s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 25s trunk passed +1 mvnsite 0m 33s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 0s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 29s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 3s the patch passed +1 javadoc 0m 18s the patch passed -1 unit 39m 49s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 61m 4s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6359 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859390/YARN-6359.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6c35ddd6a89f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e1a9980 Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/15323/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15323/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/15323/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        rkanter Robert Kanter added a comment -

        Thanks for the review.

        Oops, I totally missed that the test has a 60sec timeout. And I had thought there was a waitFor somewhere, but couldn't find it for some reason so I went and did this. I didn't think we needed to check the timeout after the loop because we check the metric, which would have failed if it was wrong anyway.

        In any case, the 002 patch addresses the timeout and the waitFor.

        Show
        rkanter Robert Kanter added a comment - Thanks for the review. Oops, I totally missed that the test has a 60sec timeout. And I had thought there was a waitFor somewhere, but couldn't find it for some reason so I went and did this. I didn't think we needed to check the timeout after the loop because we check the metric, which would have failed if it was wrong anyway. In any case, the 002 patch addresses the timeout and the waitFor .
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for the report and patch!

        The timeout in the loop is 80 seconds, but there's a 60 second timeout for the entire test which seems weird. Is that why the loop doesn't check if the timeout occurred after it completes? It'd be nice to use GenericTestUtils#waitFor to have it check for timeouts, do the stacktrace if it does timeout, etc.

        Show
        jlowe Jason Lowe added a comment - Thanks for the report and patch! The timeout in the loop is 80 seconds, but there's a 60 second timeout for the entire test which seems weird. Is that why the loop doesn't check if the timeout occurred after it completes? It'd be nice to use GenericTestUtils#waitFor to have it check for timeouts, do the stacktrace if it does timeout, etc.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 0s trunk passed
        +1 compile 0m 33s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 34s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 1s trunk passed
        +1 javadoc 0m 24s trunk passed
        +1 mvninstall 0m 31s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        +1 checkstyle 0m 23s the patch passed
        +1 mvnsite 0m 32s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 7s the patch passed
        +1 javadoc 0m 21s the patch passed
        +1 unit 39m 30s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        61m 9s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-6359
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859196/YARN-6359.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux d6a187209ff7 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / c04fb35
        Default Java 1.8.0_121
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15306/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/15306/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 0s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 34s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 1s trunk passed +1 javadoc 0m 24s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 32s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 7s the patch passed +1 javadoc 0m 21s the patch passed +1 unit 39m 30s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 61m 9s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6359 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12859196/YARN-6359.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d6a187209ff7 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c04fb35 Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15306/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/15306/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        rkanter Robert Kanter added a comment -

        Despite running it over 1000 times, I wasn't able to reproduce this in my environment. However, it seems likely that the problem is due to a race condition between when the metric for the apps killed is checked versus when that metrics is updated. The 001 patch fixes this by adding some looping code, with a timeout, similar to what MockRM#waitForState does. I've verified that this helps solve the problem by (temporarily) adding in a sleep to the metrics updating code.

        Show
        rkanter Robert Kanter added a comment - Despite running it over 1000 times, I wasn't able to reproduce this in my environment. However, it seems likely that the problem is due to a race condition between when the metric for the apps killed is checked versus when that metrics is updated. The 001 patch fixes this by adding some looping code, with a timeout, similar to what MockRM#waitForState does. I've verified that this helps solve the problem by (temporarily) adding in a sleep to the metrics updating code.

          People

          • Assignee:
            rkanter Robert Kanter
            Reporter:
            rkanter Robert Kanter
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development