Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3094

reset timer for liveness monitors after RM recovery

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps).

      In our system, we found the recover process took about 3 mins, and all AM time out.

      1. YARN-3094.5.patch
        8 kB
        Jun Gong
      2. YARN-3094.4.patch
        8 kB
        Jun Gong
      3. YARN-3094.3.patch
        8 kB
        Jun Gong
      4. YARN-3094.2.patch
        6 kB
        Jun Gong
      5. YARN-3094.patch
        3 kB
        Jun Gong

        Activity

        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Pulled this into 2.6.1. Ran compilation and TestAMLivelinessMonitor before the push. Patch applied cleanly.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1. Ran compilation and TestAMLivelinessMonitor before the push. Patch applied cleanly.
        Hide
        l201514 Siqi Li added a comment -

        The latest patch can be applied to 2.6.0 branch cleanly

        Show
        l201514 Siqi Li added a comment - The latest patch can be applied to 2.6.0 branch cleanly
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #97 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/97/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #97 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/97/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2051 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2051/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2051 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2051/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #101 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/101/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #101 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/101/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2032 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2032/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2032 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2032/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #834 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/834/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #834 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/834/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #100 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/100/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #100 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/100/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7053/)
        YARN-3094. Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #7053 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7053/ ) YARN-3094 . Reset timer for liveness monitors after RM recovery. Contributed by Jun Gong (jianhe: rev 0af6a99a3fcfa4b47d3bcba5e5cc5fe7b312a152) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestAMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AbstractLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/AMLivelinessMonitor.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java hadoop-yarn-project/CHANGES.txt
        Hide
        jianhe Jian He added a comment -

        Committed to trunk and branch-2, thanks Jun Gong !
        thanks Anubhav Dhoot, Rohith Sharma K S for the review !

        Show
        jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Jun Gong ! thanks Anubhav Dhoot , Rohith Sharma K S for the review !
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        +1(non-binding) LGTM

        Show
        rohithsharma Rohith Sharma K S added a comment - +1(non-binding) LGTM
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12696637/YARN-3094.5.patch
        against trunk revision f990e9d.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6516//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6516//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696637/YARN-3094.5.patch against trunk revision f990e9d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6516//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6516//console This message is automatically generated.
        Hide
        hex108 Jun Gong added a comment -

        The failed test case seems unrelated. Re-submit the same patch.

        Show
        hex108 Jun Gong added a comment - The failed test case seems unrelated. Re-submit the same patch.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12696477/YARN-3094.4.patch
        against trunk revision 42548f4.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6506//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6506//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696477/YARN-3094.4.patch against trunk revision 42548f4. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6506//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6506//console This message is automatically generated.
        Hide
        hex108 Jun Gong added a comment -

        Thanks Jian He and Rohith Sharma K S for review and comments.

        The new patch addressed above problems.

        Show
        hex108 Jun Gong added a comment - Thanks Jian He and Rohith Sharma K S for review and comments. The new patch addressed above problems.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        nit : In the test assert, Assert.assertFalse(expired[0]);, it it better to leave a comment over here. When I initially looked into tests, I felt this assert would not require since initialization value and assertion value both are same.But later I got to know purpose of this assertion.

        Show
        rohithsharma Rohith Sharma K S added a comment - nit : In the test assert, Assert.assertFalse(expired[0]); , it it better to leave a comment over here. When I initially looked into tests, I felt this assert would not require since initialization value and assertion value both are same.But later I got to know purpose of this assertion.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        patch looks overall good,
        nit : there is unused import in test class : org.apache.hadoop.yarn.util.Clock. This can be removed

        Show
        rohithsharma Rohith Sharma K S added a comment - patch looks overall good, nit : there is unused import in test class : org.apache.hadoop.yarn.util.Clock. This can be removed
        Hide
        jianhe Jian He added a comment -

        thanks Jun Gong for the patch and thanks Anubhav Dhoot for reviewing the patch !

        one comment on my side:

            Thread.sleep(1000); // make sure that monitor has been working
            Assert.assertEquals(Service.STATE.STARTED, monitor.getServiceState());
        

        Instead of hard sleep, we can wait for the monitor state to be started

        Show
        jianhe Jian He added a comment - thanks Jun Gong for the patch and thanks Anubhav Dhoot for reviewing the patch ! one comment on my side: Thread .sleep(1000); // make sure that monitor has been working Assert.assertEquals(Service.STATE.STARTED, monitor.getServiceState()); Instead of hard sleep, we can wait for the monitor state to be started
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12695186/YARN-3094.3.patch
        against trunk revision 5a0051f.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6452//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6452//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695186/YARN-3094.3.patch against trunk revision 5a0051f. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6452//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6452//console This message is automatically generated.
        Hide
        hex108 Jun Gong added a comment -

        Anubhav Dhoot Thank you for the comment. It is really very useful. The new patch addressed it.

        Show
        hex108 Jun Gong added a comment - Anubhav Dhoot Thank you for the comment. It is really very useful. The new patch addressed it.
        Hide
        adhoot Anubhav Dhoot added a comment -

        Hi [~Jun Gong]

        Can you please use the ControlledClock for manipulating time instead of sleeps?
        AbstractLivenessMonitor should take an argument for Clock instead of creating a new SystemClock.
        That way you can have loadState call ControlledClock#setTime instead of sleep and AbstractLivenessMonitor can read the same time

        Thanks

        Show
        adhoot Anubhav Dhoot added a comment - Hi [~Jun Gong] Can you please use the ControlledClock for manipulating time instead of sleeps? AbstractLivenessMonitor should take an argument for Clock instead of creating a new SystemClock. That way you can have loadState call ControlledClock#setTime instead of sleep and AbstractLivenessMonitor can read the same time Thanks
        Hide
        hex108 Jun Gong added a comment -

        Hi Jian He, could you please help review it? Thank you.

        Show
        hex108 Jun Gong added a comment - Hi Jian He , could you please help review it? Thank you.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12694477/YARN-3094.2.patch
        against trunk revision 7b82c4a.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6415//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6415//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694477/YARN-3094.2.patch against trunk revision 7b82c4a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6415//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6415//console This message is automatically generated.
        Hide
        hex108 Jun Gong added a comment -

        Add a test case.

        Show
        hex108 Jun Gong added a comment - Add a test case.
        Hide
        hex108 Jun Gong added a comment -

        Chun Chen Thanks for the suggestion. I think the time for service start is very short, so we could just ignore it. What is more, we need init AMLivelinessMonitor before ApplicationMasterService because RM recovery process will use it.

        Show
        hex108 Jun Gong added a comment - Chun Chen Thanks for the suggestion. I think the time for service start is very short, so we could just ignore it. What is more, we need init AMLivelinessMonitor before ApplicationMasterService because RM recovery process will use it.
        Hide
        chenchun Chun Chen added a comment -

        Since RM can't receive ping from AM util ApplicationMasterService starts, I think it is more accurate to reset time in AMLivelinessMonitor service after ApplicationMasterService starts. I suggest init AMLivelinessMonitor service after ApplicationMasterService in RMActiveServices#serviceInit.

        Show
        chenchun Chun Chen added a comment - Since RM can't receive ping from AM util ApplicationMasterService starts, I think it is more accurate to reset time in AMLivelinessMonitor service after ApplicationMasterService starts. I suggest init AMLivelinessMonitor service after ApplicationMasterService in RMActiveServices#serviceInit.
        Hide
        hex108 Jun Gong added a comment -

        Rohith Sharma K S Thanks for your review. I will add a test case if needed.

        How many RUNNING applications are running in cluster?

        Just several hundreds apps running. The reason for slow recovery might be because a lot of exceptions when storing RMApps' data using RMApplicationHistoryWriter. We will make further investigation.

        What is the AM liveliness timeout configured in cluster?

        3 mins. Then we could find it earlier if AM crashes.

        Show
        hex108 Jun Gong added a comment - Rohith Sharma K S Thanks for your review. I will add a test case if needed. How many RUNNING applications are running in cluster? Just several hundreds apps running. The reason for slow recovery might be because a lot of exceptions when storing RMApps' data using RMApplicationHistoryWriter. We will make further investigation. What is the AM liveliness timeout configured in cluster? 3 mins. Then we could find it earlier if AM crashes.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12694133/YARN-3094.patch
        against trunk revision 3aab354.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6397//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6397//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694133/YARN-3094.patch against trunk revision 3aab354. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6397//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6397//console This message is automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Thanks Jun Gong for reporting the issue and for your contributions.
        Patch looks to me good. Can you add tests for this?

        And could you give some general information like

        1. How many RUNNING applications are running in cluster?
        2. What is the AM liveliness timeout configured in cluster?
        Show
        rohithsharma Rohith Sharma K S added a comment - Thanks Jun Gong for reporting the issue and for your contributions. Patch looks to me good. Can you add tests for this? And could you give some general information like How many RUNNING applications are running in cluster? What is the AM liveliness timeout configured in cluster?

          People

          • Assignee:
            hex108 Jun Gong
            Reporter:
            hex108 Jun Gong
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development