Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5491

Random Failure TestCapacityScheduler#testCSQueueBlocked

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Random testcase failure in trunk for org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testCSQueueBlocked

      https://builds.apache.org/job/PreCommit-YARN-Build/12694/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacityScheduler/testCSQueueBlocked/

      java.lang.AssertionError: B Used Resource should be 12 GB expected:<12288> but was:<11264>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:555)
      	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testCSQueueBlocked(TestCapacityScheduler.java:3667)
      
      1. Failure-TestCapacityScheduler-output.txt
        320 kB
        Bibin A Chundatt
      2. Sucess-TestCapacityScheduler-output.txt
        370 kB
        Bibin A Chundatt
      3. YARN-5491.0001.patch
        2 kB
        Bibin A Chundatt

        Activity

        Hide
        bibinchundatt Bibin A Chundatt added a comment -

        Attaching logs for same... Will try to check what is causing the failure.

        Show
        bibinchundatt Bibin A Chundatt added a comment - Attaching logs for same... Will try to check what is causing the failure.
        Hide
        bibinchundatt Bibin A Chundatt added a comment - - edited

        Looks like the issue is caused due below

        Explaining testcase states:

        1. Queue A 2048MB(1 container) and Queue B 13GB(13 containers)
        2. 2GB (1 container) submitted to Queue A and 1GB(1 container) added to Queue B
        3. ExpireEvent for 2 containers for Queue B app 2 is send
        4. Randomly container on queue B which is waiting is not allocated. Mostly because we have limit of ANY request that can be served.

        Tried running locally with patch about 60 times its success.

        Show
        bibinchundatt Bibin A Chundatt added a comment - - edited Looks like the issue is caused due below Explaining testcase states: Queue A 2048MB(1 container) and Queue B 13GB(13 containers) 2GB (1 container) submitted to Queue A and 1GB(1 container) added to Queue B ExpireEvent for 2 containers for Queue B app 2 is send Randomly container on queue B which is waiting is not allocated. Mostly because we have limit of ANY request that can be served. Tried running locally with patch about 60 times its success.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 20s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 0s trunk passed
        +1 compile 0m 33s trunk passed
        +1 checkstyle 0m 22s trunk passed
        +1 mvnsite 0m 39s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 0m 56s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 31s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        +1 checkstyle 0m 20s the patch passed
        +1 mvnsite 0m 35s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 0s the patch passed
        +1 javadoc 0m 19s the patch passed
        +1 unit 37m 59s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        52m 50s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823428/YARN-5491.0001.patch
        JIRA Issue YARN-5491
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 187b03a002e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / b5af9be
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12754/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12754/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 0s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 39s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 20s the patch passed +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 19s the patch passed +1 unit 37m 59s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 52m 50s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823428/YARN-5491.0001.patch JIRA Issue YARN-5491 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 187b03a002e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b5af9be Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12754/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12754/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        bibinchundatt Bibin A Chundatt added a comment -

        Varun Saxena/Rohit Sharma
        Could you help me review patch attached.

        Show
        bibinchundatt Bibin A Chundatt added a comment - Varun Saxena / Rohit Sharma Could you help me review patch attached.
        Hide
        varun_saxena Varun Saxena added a comment -

        Sure, will have a look.

        Show
        varun_saxena Varun Saxena added a comment - Sure, will have a look.
        Hide
        varun_saxena Varun Saxena added a comment -

        Changes look fine to me. IMO, draining events is not really required for this test to pass. Extra schedule after first container expiry should suffice.
        Basically issue will come, if the containers which have been expired in the test(i.e. containers ending in 10 and 11) have both been assigned on the same node. And the other node has no resources left (due to other container allocations).
        One schedule will mean only one of the stuck containers is scheduled (as its off switch).

        So calling schedule after each expiry should fix the issue.

        Will commit it later today.

        Show
        varun_saxena Varun Saxena added a comment - Changes look fine to me. IMO, draining events is not really required for this test to pass. Extra schedule after first container expiry should suffice. Basically issue will come, if the containers which have been expired in the test(i.e. containers ending in 10 and 11) have both been assigned on the same node. And the other node has no resources left (due to other container allocations). One schedule will mean only one of the stuck containers is scheduled (as its off switch). So calling schedule after each expiry should fix the issue. Will commit it later today.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10270 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10270/)
        YARN-5491. Fix random failure of (varunsaxena: rev d677b68c2599445fff56db4df26448a8bad0f5dd)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10270 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10270/ ) YARN-5491 . Fix random failure of (varunsaxena: rev d677b68c2599445fff56db4df26448a8bad0f5dd) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
        Hide
        varun_saxena Varun Saxena added a comment -

        Committed to trunk, branch-2.
        Thanks Bibin A Chundatt for your contribution.

        Show
        varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2. Thanks Bibin A Chundatt for your contribution.
        Hide
        ebadger Eric Badger added a comment -

        Varun Saxena, I am seeing this same failure on branch-2.8. Can you commit it to 2.8? The cherry-pick is clean.

        Show
        ebadger Eric Badger added a comment - Varun Saxena , I am seeing this same failure on branch-2.8. Can you commit it to 2.8? The cherry-pick is clean.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks, Bibin A Chundatt! I committed this to branch-2.8 as well.

        Show
        jlowe Jason Lowe added a comment - Thanks, Bibin A Chundatt ! I committed this to branch-2.8 as well.

          People

          • Assignee:
            bibinchundatt Bibin A Chundatt
            Reporter:
            bibinchundatt Bibin A Chundatt
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development