Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2096

Race in TestRMRestart#testQueueMetricsOnRMRestart

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition.
      The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values.
      It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly.
      We need to wait for the right transitions.

      1. YARN-2096.patch
        2 kB
        Anubhav Dhoot

        Issue Links

          Activity

          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          One good news: TestRMRestart with Anubhav's patch works well - after running tests hundreds times, no failure. Good job

          Show
          ozawa Tsuyoshi Ozawa added a comment - One good news: TestRMRestart with Anubhav's patch works well - after running tests hundreds times, no failure. Good job
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1782 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1782/)
          YARN-2096. Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1782 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1782/ ) YARN-2096 . Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1756 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1756/)
          YARN-2096. Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1756 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1756/ ) YARN-2096 . Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #564 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/564/)
          YARN-2096. Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #564 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/564/ ) YARN-2096 . Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5609/)
          YARN-2096. Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223)

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #5609 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5609/ ) YARN-2096 . Race in TestRMRestart#testQueueMetricsOnRMRestart. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597223 ) /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          Hide
          kkambatl Karthik Kambatla (Inactive) added a comment -

          TestFairScheduler failure is unrelated.

          Thanks Anubhav. Just committed this to trunk and branch-2.

          Show
          kkambatl Karthik Kambatla (Inactive) added a comment - TestFairScheduler failure is unrelated. Thanks Anubhav. Just committed this to trunk and branch-2.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12646464/YARN-2096.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3814//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3814//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646464/YARN-2096.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3814//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3814//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12646464/YARN-2096.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3795//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3795//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646464/YARN-2096.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3795//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3795//console This message is automatically generated.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          The change looks good to me too(non-binding).

          Show
          ozawa Tsuyoshi Ozawa added a comment - The change looks good to me too(non-binding).
          Hide
          kkambatl Karthik Kambatla (Inactive) added a comment - - edited

          Looks good to me. +1 pending Jenkins.

          Show
          kkambatl Karthik Kambatla (Inactive) added a comment - - edited Looks good to me. +1 pending Jenkins.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Thank you for taking this JIRA, Anubhav. I also faced this problem when reviewing YARN-1365. I'll try to run the tests again and again with your patch.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Thank you for taking this JIRA, Anubhav. I also faced this problem when reviewing YARN-1365 . I'll try to run the tests again and again with your patch.
          Hide
          adhoot Anubhav Dhoot added a comment -

          Fixed 2 race conditions by
          First one) waiting for appropriate transitions before checking metrics and
          Second one) resetting metrics before the events are triggered.

          Show
          adhoot Anubhav Dhoot added a comment - Fixed 2 race conditions by First one) waiting for appropriate transitions before checking metrics and Second one) resetting metrics before the events are triggered.

            People

            • Assignee:
              adhoot Anubhav Dhoot
              Reporter:
              adhoot Anubhav Dhoot
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development