Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5362

TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Saw the following in a precommit build that only changed an unrelated unit test:

      Tests run: 29, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 101.265 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
      testFinishedAppRemovalAfterRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)  Time elapsed: 0.411 sec  <<< FAILURE!
      java.lang.AssertionError: expected null, but was:<org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl@70ebeeba>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotNull(Assert.java:664)
      	at org.junit.Assert.assertNull(Assert.java:646)
      	at org.junit.Assert.assertNull(Assert.java:656)
      	at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1653)
      

        Activity

        Hide
        sandflee sandflee added a comment -

        this could simple reproduced by add a sleep to RMAppManager

              case APP_COMPLETED: 
              {
                try {
                  Thread.sleep(2000);
                } catch(InterruptedException e){}
                finishApplication(applicationId);
                logApplicationSummary(applicationId);
                checkAppNumCompletedLimit(); 
              } 
        

        APP_COMPLETED is processed async, and could simple be fixed by add mockRM#drainEvents() before calling asserts.

        Show
        sandflee sandflee added a comment - this could simple reproduced by add a sleep to RMAppManager case APP_COMPLETED: { try { Thread .sleep(2000); } catch (InterruptedException e){} finishApplication(applicationId); logApplicationSummary(applicationId); checkAppNumCompletedLimit(); } APP_COMPLETED is processed async, and could simple be fixed by add mockRM#drainEvents() before calling asserts.
        Hide
        sandflee sandflee added a comment -

        seen many test failures related to RMApp/RMAppattempt comes to some state but some event are not processed in rm event queue or scheduler event queue, cause test failure, seems we could implicitly invokes drainEvents(should also drain sheduler event) in some mockRM method like waitForState, thought? cc Sunil G Rohith Sharma K S

        void waitForState() {
           .... 
           drainEvents();
        }
        
        Show
        sandflee sandflee added a comment - seen many test failures related to RMApp/RMAppattempt comes to some state but some event are not processed in rm event queue or scheduler event queue, cause test failure, seems we could implicitly invokes drainEvents(should also drain sheduler event) in some mockRM method like waitForState, thought? cc Sunil G Rohith Sharma K S void waitForState() { .... drainEvents(); }
        Hide
        sandflee sandflee added a comment -

        update a patch to add drainEvents() before asserts, there had a very little race condition since drainEvents() just grant there are no event in event queue, not grant event are processed completely.

        Show
        sandflee sandflee added a comment - update a patch to add drainEvents() before asserts, there had a very little race condition since drainEvents() just grant there are no event in event queue, not grant event are processed completely.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Right, it make sense to add implicitly in mockRM. This would avoid many times for writing new test case, not to worry about calling explicitly drain events.

        Show
        rohithsharma Rohith Sharma K S added a comment - Right, it make sense to add implicitly in mockRM. This would avoid many times for writing new test case, not to worry about calling explicitly drain events.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        +1 LGTM. submitting patch for HadoopQA

        Show
        rohithsharma Rohith Sharma K S added a comment - +1 LGTM. submitting patch for HadoopQA
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 11m 33s trunk passed
        +1 compile 0m 33s trunk passed
        +1 checkstyle 0m 21s trunk passed
        +1 mvnsite 0m 38s trunk passed
        +1 mvneclipse 0m 19s trunk passed
        +1 findbugs 0m 58s trunk passed
        +1 javadoc 0m 21s trunk passed
        +1 mvninstall 0m 32s the patch passed
        +1 compile 0m 32s the patch passed
        +1 javac 0m 32s the patch passed
        +1 checkstyle 0m 19s the patch passed
        +1 mvnsite 0m 35s the patch passed
        +1 mvneclipse 0m 16s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 8s the patch passed
        +1 javadoc 0m 17s the patch passed
        +1 unit 33m 31s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        53m 2s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817573/YARN-5362.01.patch
        JIRA Issue YARN-5362
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 49a9ebbfd431 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 438b7c5
        Default Java 1.8.0_91
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12303/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/12303/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 11m 33s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 0m 58s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 32s the patch passed +1 compile 0m 32s the patch passed +1 javac 0m 32s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 8s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 33m 31s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 53m 2s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12817573/YARN-5362.01.patch JIRA Issue YARN-5362 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 49a9ebbfd431 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 438b7c5 Default Java 1.8.0_91 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12303/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12303/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        committed to trunk/branch-2.. thanks sandflee for the patch!!

        Show
        rohithsharma Rohith Sharma K S added a comment - committed to trunk/branch-2.. thanks sandflee for the patch!!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #10089 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10089/)
        YARN-5362. TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail. (rohithsharmaks: rev d6d41e820ac7b3ba73f5e4ea1ee72715dc1ffc9f)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10089 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10089/ ) YARN-5362 . TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail. (rohithsharmaks: rev d6d41e820ac7b3ba73f5e4ea1ee72715dc1ffc9f) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
        Hide
        sandflee sandflee added a comment -

        thanks Rohith Sharma K S for review and commit, open YARN-5375 to track implicitly invokes drainEvents in mockRM.

        Show
        sandflee sandflee added a comment - thanks Rohith Sharma K S for review and commit, open YARN-5375 to track implicitly invokes drainEvents in mockRM.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        able to still see this issue YARN-5256 as part of the build, may be one of you guys can take a relook at it ?

        java.lang.AssertionError: expected null, but was:<submit_time: 1475828790939 application_submission_context { application_id { id: 1 cluster_timestamp: 1475828790911 } application_name: "" queue: "default" priority { priority: 0 } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" keep_containers_across_application_attempts: false attempt_failures_validity_interval: 0 am_container_resource_request { priority { priority: 0 } resource_name: "*" capability { memory: 1024 virtual_cores: 1 } num_containers: 0 relax_locality: true node_label_expression: "" execution_type_request { execution_type: GUARANTEED enforce_execution_type: false } } } user: "jenkins" start_time: 1475828790939 application_state: RMAPP_FINISHED finish_time: 1475828790998>
        	at org.junit.Assert.fail(Assert.java:88)
        	at org.junit.Assert.failNotNull(Assert.java:664)
        	at org.junit.Assert.assertNull(Assert.java:646)
        	at org.junit.Assert.assertNull(Assert.java:656)
        	at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656)
        
        Show
        Naganarasimha Naganarasimha G R added a comment - able to still see this issue YARN-5256 as part of the build , may be one of you guys can take a relook at it ? java.lang.AssertionError: expected null , but was:<submit_time: 1475828790939 application_submission_context { application_id { id: 1 cluster_timestamp: 1475828790911 } application_name: "" queue: " default " priority { priority: 0 } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 resource { memory: 1024 virtual_cores: 1 } applicationType: " YARN " keep_containers_across_application_attempts: false attempt_failures_validity_interval: 0 am_container_resource_request { priority { priority: 0 } resource_name: " * " capability { memory: 1024 virtual_cores: 1 } num_containers: 0 relax_locality: true node_label_expression: " " execution_type_request { execution_type: GUARANTEED enforce_execution_type: false } } } user: " jenkins" start_time: 1475828790939 application_state: RMAPP_FINISHED finish_time: 1475828790998> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotNull(Assert.java:664) at org.junit.Assert.assertNull(Assert.java:646) at org.junit.Assert.assertNull(Assert.java:656) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656)
        Hide
        sandflee sandflee added a comment -

        thanks Naganarasimha G R, I'll have a look.

        Show
        sandflee sandflee added a comment - thanks Naganarasimha G R , I'll have a look.
        Hide
        sunilg Sunil G added a comment -

        +1. I am still getting same as Naganarasimha Garla.
        https://builds.apache.org/job/PreCommit-YARN-Build/13472/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/

        I think events are not fully drained here which would have come from StateStore. YARN-5375 would have been a clean solution for this. I think we can make progress there with review.

        Show
        sunilg Sunil G added a comment - +1. I am still getting same as Naganarasimha Garla . https://builds.apache.org/job/PreCommit-YARN-Build/13472/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/ I think events are not fully drained here which would have come from StateStore. YARN-5375 would have been a clean solution for this. I think we can make progress there with review.
        Show
        templedf Daniel Templeton added a comment - Ditto: https://builds.apache.org/job/PreCommit-YARN-Build/13997/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
        Hide
        varun_saxena Varun Saxena added a comment -

        This will be fixed by YARN-5548

        Show
        varun_saxena Varun Saxena added a comment - This will be fixed by YARN-5548

          People

          • Assignee:
            sandflee sandflee
            Reporter:
            jlowe Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development