Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5521

TestCapacityScheduler#testKillAllAppsInQueue fails randomly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
      Tests run: 49, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 35.922 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
      testKillAllAppsInQueue(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler)  Time elapsed: 0.146 sec  <<< FAILURE!
      java.lang.AssertionError: null
              at org.junit.Assert.fail(Assert.java:86)
              at org.junit.Assert.assertTrue(Assert.java:41)
              at org.junit.Assert.assertTrue(Assert.java:52)
              at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testKillAllAppsInQueue(TestCapacityScheduler.java:2188)
      
      
      Results :
      
      Failed tests:
        TestCapacityScheduler.testKillAllAppsInQueue:2188 null
      
      Tests run: 49, Failures: 1, Errors: 0, Skipped: 0
      
      1. Failure.txt
        75 kB
        Bibin A Chundatt
      2. YARN-5521.01.patch
        1 kB
        sandflee

        Activity

        Hide
        bibinchundatt Bibin A Chundatt added a comment -

        Was able to reproduce the same after multiple runs.

        2016-08-15 19:08:40,755 INFO  [main] resourcemanager.MockRM (MockRM.java:waitForState(194)) - App State is : KILLED
        2016-08-15 19:08:40,755 INFO  [SchedulerEventDispatcher:Event Processor] scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(136)) - Application application_1471268320166_0001 requests cleared
        2016-08-15 19:08:40,756 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state STOPPED
        2016-08-15 19:08:40,755 DEBUG [AsyncDispatcher event handler] resourcemanager.RMAppManager (RMAppManager.java:handle(494)) - RMAppManager processing event for application_1471268320166_0001 of type APP_COMPLETED
        2016-08-15 19:08:40,757 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - ResourceManager: stopping services, size=3
        2016-08-15 19:08:40,756 INFO  [SchedulerEventDispatcher:Event Processor] capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(789)) - Application removed - appId: application_1471268320166_0001 user: user_0 queue: a1 #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
        

        The queue metrics is getting updated after the app state is set to KILLED. So during below check

           appsInRoot = scheduler.getAppsInQueue("root");
            assertTrue(appsInRoot.isEmpty());
        

        Application in queue is still 1 .

        Show
        bibinchundatt Bibin A Chundatt added a comment - Was able to reproduce the same after multiple runs. 2016-08-15 19:08:40,755 INFO [main] resourcemanager.MockRM (MockRM.java:waitForState(194)) - App State is : KILLED 2016-08-15 19:08:40,755 INFO [SchedulerEventDispatcher:Event Processor] scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(136)) - Application application_1471268320166_0001 requests cleared 2016-08-15 19:08:40,756 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state STOPPED 2016-08-15 19:08:40,755 DEBUG [AsyncDispatcher event handler] resourcemanager.RMAppManager (RMAppManager.java:handle(494)) - RMAppManager processing event for application_1471268320166_0001 of type APP_COMPLETED 2016-08-15 19:08:40,757 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - ResourceManager: stopping services, size=3 2016-08-15 19:08:40,756 INFO [SchedulerEventDispatcher:Event Processor] capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(789)) - Application removed - appId: application_1471268320166_0001 user: user_0 queue: a1 #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0 The queue metrics is getting updated after the app state is set to KILLED. So during below check appsInRoot = scheduler.getAppsInQueue( "root" ); assertTrue(appsInRoot.isEmpty()); Application in queue is still 1 .
        Hide
        sandflee sandflee added a comment -

        thanks Bibin A Chundatt, seems caused by app went to KILLED state, but APP_ATTEMPT_REMOVED event not processed by scheduler dispatcher. this could simple fixed by

            rm.waitForState(app.getApplicationId(), RMAppState.KILLED);
            rm.waitForAppRemovedFromScheduler(app.getApplicationId());
            appsInRoot = scheduler.getAppsInQueue("root");
        

        and YARN-5375 introduce a more general way , cc Sunil G Rohith Sharma K S

        Show
        sandflee sandflee added a comment - thanks Bibin A Chundatt , seems caused by app went to KILLED state, but APP_ATTEMPT_REMOVED event not processed by scheduler dispatcher. this could simple fixed by rm.waitForState(app.getApplicationId(), RMAppState.KILLED); rm.waitForAppRemovedFromScheduler(app.getApplicationId()); appsInRoot = scheduler.getAppsInQueue( "root" ); and YARN-5375 introduce a more general way , cc Sunil G Rohith Sharma K S
        Hide
        sunilg Sunil G added a comment -

        Yes. Analysis makes sense to me.

        I think its high time we need to make progress in YARN-5375, I will help in reviewing the same. cc/Rohith Sharma K S

        Show
        sunilg Sunil G added a comment - Yes. Analysis makes sense to me. I think its high time we need to make progress in YARN-5375 , I will help in reviewing the same. cc/ Rohith Sharma K S
        Hide
        varun_saxena Varun Saxena added a comment -

        LGTM. Will commit it pending Jenkins.

        Show
        varun_saxena Varun Saxena added a comment - LGTM. Will commit it pending Jenkins.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 19s trunk passed
        +1 compile 0m 38s trunk passed
        +1 checkstyle 0m 24s trunk passed
        +1 mvnsite 0m 46s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 1m 8s trunk passed
        +1 javadoc 0m 24s trunk passed
        +1 mvninstall 0m 38s the patch passed
        +1 compile 0m 35s the patch passed
        +1 javac 0m 35s the patch passed
        +1 checkstyle 0m 22s the patch passed
        +1 mvnsite 0m 43s the patch passed
        +1 mvneclipse 0m 16s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 16s the patch passed
        +1 javadoc 0m 21s the patch passed
        +1 unit 34m 49s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        52m 4s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823711/YARN-5521.01.patch
        JIRA Issue YARN-5521
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 278edabf28b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9f29f42
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12770/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12770/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 19s trunk passed +1 compile 0m 38s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 46s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 8s trunk passed +1 javadoc 0m 24s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 0m 35s the patch passed +1 javac 0m 35s the patch passed +1 checkstyle 0m 22s the patch passed +1 mvnsite 0m 43s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 16s the patch passed +1 javadoc 0m 21s the patch passed +1 unit 34m 49s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 52m 4s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12823711/YARN-5521.01.patch JIRA Issue YARN-5521 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 278edabf28b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9f29f42 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12770/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12770/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        varun_saxena Varun Saxena added a comment -

        Committed to trunk, branch-2.
        Thanks sandflee for your contribution and Sunil G, Bibin A Chundatt for reviews.

        Show
        varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2. Thanks sandflee for your contribution and Sunil G , Bibin A Chundatt for reviews.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10273 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10273/)
        YARN-5521. Fix random failure of (varunsaxena: rev 24249115bff3162c4202387da5bdd8eba13e6961)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10273 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10273/ ) YARN-5521 . Fix random failure of (varunsaxena: rev 24249115bff3162c4202387da5bdd8eba13e6961) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
        Hide
        sandflee sandflee added a comment -
        Show
        sandflee sandflee added a comment - Thanks Varun Saxena , Sunil G , Bibin A Chundatt
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Right, this need to be reviewed and committed. Let me also look at yarn-5375. Thanks.

        Show
        rohithsharma Rohith Sharma K S added a comment - Right, this need to be reviewed and committed. Let me also look at yarn-5375. Thanks.

          People

          • Assignee:
            sandflee sandflee
            Reporter:
            varun_saxena Varun Saxena
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development