Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5318

TestRMAdminService#testRefreshNodesResourceWithFileSystemBasedConfigurationProvider fails intermittently.

    Details

    • Type: Test
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      org.junit.ComparisonFailure: expected:<<memory:[4096, vCores:4]>> but was:<<memory:[5120, vCores:5]>>
      at org.junit.Assert.assertEquals(Assert.java:115)
      at org.junit.Assert.assertEquals(Assert.java:144)
      at org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService.testRefreshNodesResourceWithFileSystemBasedConfigurationProvider(TestRMAdminService.java:238)

      https://builds.apache.org/job/PreCommit-YARN-Build/12204/testReport/org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager/TestAMRestart/testAMRestartNotLostContainerCompleteMsg/

        Issue Links

          Activity

          Hide
          hex108 Jun Gong added a comment -

          Thanks sandflee for reporting the issue. I also saw this problem before at https://builds.apache.org/job/PreCommit-YARN-Build/12174/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMAdminService/testRefreshNodesResourceWithFileSystemBasedConfigurationProvider/. I looked it into it and found it is caused by that RESOURCE_UPDATE had not been processed at that time.

          Show
          hex108 Jun Gong added a comment - Thanks sandflee for reporting the issue. I also saw this problem before at https://builds.apache.org/job/PreCommit-YARN-Build/12174/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMAdminService/testRefreshNodesResourceWithFileSystemBasedConfigurationProvider/ . I looked it into it and found it is caused by that RESOURCE_UPDATE had not been processed at that time.
          Hide
          hex108 Jun Gong added a comment -

          I reproduced the issue with adding following change, the test will fail every time.

          diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          index f5361c8..0de10ea 100644
          --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
          @@ -99,6 +99,11 @@ public void run() {
                     }
                     Event event;
                     try {
          +            Thread.sleep(5000);
          +          } catch (InterruptedException e) {
          +            e.printStackTrace();
          +          }
          +          try {
                       event = eventQueue.take();
                     } catch(InterruptedException ie) {
                       if (!stopped) {
          

          With the patch, the test will pass.

          Show
          hex108 Jun Gong added a comment - I reproduced the issue with adding following change, the test will fail every time. diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java index f5361c8..0de10ea 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java @@ -99,6 +99,11 @@ public void run() { } Event event; try { + Thread .sleep(5000); + } catch (InterruptedException e) { + e.printStackTrace(); + } + try { event = eventQueue.take(); } catch (InterruptedException ie) { if (!stopped) { With the patch, the test will pass.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 21s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 7s trunk passed
          +1 compile 0m 31s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 35s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 57s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 17s the patch passed
          +1 mvnsite 0m 33s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 3s the patch passed
          +1 javadoc 0m 18s the patch passed
          -1 unit 44m 28s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          59m 14s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
            hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816842/YARN-5318.01.patch
          JIRA Issue YARN-5318
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4ae03f814c47 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 30ee57c
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/12229/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12229/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12229/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12229/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 7s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 35s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 3s the patch passed +1 javadoc 0m 18s the patch passed -1 unit 44m 28s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 59m 14s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart   hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816842/YARN-5318.01.patch JIRA Issue YARN-5318 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4ae03f814c47 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 30ee57c Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/12229/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12229/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12229/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12229/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          varun_saxena Varun Saxena added a comment -

          +1 LGTM.
          Will commit it shortly.

          Show
          varun_saxena Varun Saxena added a comment - +1 LGTM. Will commit it shortly.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #10069 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10069/)
          YARN-5318. Fix intermittent test failure of (varunsaxena: rev c04c5ec5018ebb14a86629e998bee3739014372e)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10069 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10069/ ) YARN-5318 . Fix intermittent test failure of (varunsaxena: rev c04c5ec5018ebb14a86629e998bee3739014372e) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
          Hide
          varun_saxena Varun Saxena added a comment -

          Committed to trunk,branch-2 and branch-2.8.
          Thanks Jun Gong for your contribution.

          Show
          varun_saxena Varun Saxena added a comment - Committed to trunk,branch-2 and branch-2.8. Thanks Jun Gong for your contribution.
          Hide
          hex108 Jun Gong added a comment -

          Thanks Varun Saxena for the review and commit!

          Show
          hex108 Jun Gong added a comment - Thanks Varun Saxena for the review and commit!
          Hide
          sandflee sandflee added a comment -

          noticed that MockRM#drainEvents just grants that there are no event in RM event queue, but not granted that there are no events in scheduler event queue, is this expected? Jason Lowe Jian He

          Show
          sandflee sandflee added a comment - noticed that MockRM#drainEvents just grants that there are no event in RM event queue, but not granted that there are no events in scheduler event queue, is this expected? Jason Lowe Jian He
          Hide
          sandflee sandflee added a comment -

          seems there is no granted that event is processed completely even if event queue is empty.

          Show
          sandflee sandflee added a comment - seems there is no granted that event is processed completely even if event queue is empty.
          Hide
          varun_saxena Varun Saxena added a comment - - edited

          sandflee,

          seems there is no granted that event is processed completely even if event queue is empty.

          That's correct. Merely draining main RM Dispatcher queue wont lead to processing of scheduler events sitting in the scheduler event queue.
          But for this specific test case, we check against resource value in RMNodeImpl and node events are processed from the main RM Dispatcher, which will be drained. So, scheduler event processing in this case is not required. The changed value we are checking against will get reflected as soon as the node event is processed.
          Correct me if I have missed something here.

          However, in some test cases draining of scheduler dispatcher can be useful. We can add support for in whichever JIRA we see the need. I guess when drainEvents was added in MockRM, it was because there was use for only that and as we already had a DrainDispatcher, a subclass for AsyncDispatcher, which is used for draining, it made such addition trivial.
          For scheduler dispatcher though, AsyncDispatcher is not used so some additional code will have to be added.

          Show
          varun_saxena Varun Saxena added a comment - - edited sandflee , seems there is no granted that event is processed completely even if event queue is empty. That's correct. Merely draining main RM Dispatcher queue wont lead to processing of scheduler events sitting in the scheduler event queue. But for this specific test case, we check against resource value in RMNodeImpl and node events are processed from the main RM Dispatcher, which will be drained. So, scheduler event processing in this case is not required. The changed value we are checking against will get reflected as soon as the node event is processed. Correct me if I have missed something here. However, in some test cases draining of scheduler dispatcher can be useful. We can add support for in whichever JIRA we see the need. I guess when drainEvents was added in MockRM, it was because there was use for only that and as we already had a DrainDispatcher, a subclass for AsyncDispatcher, which is used for draining, it made such addition trivial. For scheduler dispatcher though, AsyncDispatcher is not used so some additional code will have to be added.
          Hide
          sandflee sandflee added a comment -

          thanks Varun Saxena, yes, to this issue, scheduler process is not required, but there are a change that RESOURCE UPDATE event is taken by event handler and just before processing it, drainEvents get called and will return, the check will be fail. This can happen in theory,and not sure whether it will happen in practice, and this is used just for test, maybe we can ignore this race condition.

          not related to this patch, I was thinking a more general way to handle test failure, some failure are caused by scheduler event processed async, If we had to way to drain all RM events and scheduler event, I may help to reduce the test fail change.

          Show
          sandflee sandflee added a comment - thanks Varun Saxena , yes, to this issue, scheduler process is not required, but there are a change that RESOURCE UPDATE event is taken by event handler and just before processing it, drainEvents get called and will return, the check will be fail. This can happen in theory,and not sure whether it will happen in practice, and this is used just for test, maybe we can ignore this race condition. not related to this patch, I was thinking a more general way to handle test failure, some failure are caused by scheduler event processed async, If we had to way to drain all RM events and scheduler event, I may help to reduce the test fail change.
          Hide
          varun_saxena Varun Saxena added a comment -

          not related to this patch, I was thinking a more general way to handle test failure, some failure are caused by scheduler event processed async, If we had to way to drain all RM events and scheduler event, I may help to reduce the test fail change.

          Agree. It would be useful to have a mechanism to drain scheduler event queue as well. It would be better than having deterministic sleeps in tests.
          There will be several tests where we will be checking state of the scheduler after dispatching an event which would then generate a scheduler event.

          Show
          varun_saxena Varun Saxena added a comment - not related to this patch, I was thinking a more general way to handle test failure, some failure are caused by scheduler event processed async, If we had to way to drain all RM events and scheduler event, I may help to reduce the test fail change. Agree. It would be useful to have a mechanism to drain scheduler event queue as well. It would be better than having deterministic sleeps in tests. There will be several tests where we will be checking state of the scheduler after dispatching an event which would then generate a scheduler event.

            People

            • Assignee:
              hex108 Jun Gong
              Reporter:
              sandflee sandflee
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development