Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3222

RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the scheduler in a events node_added,node_removed or node_resource_update. These events should be notified in an sequential order i.e node_added event and next node_resource_update events.
      But if the node is reconnected with different http port, the oder of scheduler events are node_removed --> node_resource_update --> node_added which causes scheduler does not find the node and throw NPE and RM exit.

      Node_Resource_update event should be always should be triggered via RMNodeEventType.RESOURCE_UPDATE

      1. 0005-YARN-3222.patch
        7 kB
        Rohith Sharma K S
      2. 0004-YARN-3222.patch
        7 kB
        Rohith Sharma K S
      3. 0003-YARN-3222.patch
        8 kB
        Rohith Sharma K S
      4. 0002-YARN-3222.patch
        5 kB
        Rohith Sharma K S
      5. 0001-YARN-3222.patch
        2 kB
        Rohith Sharma K S

        Activity

        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2279 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2279/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2279 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2279/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #338 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/338/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #338 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/338/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #330 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/330/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #330 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/330/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2260 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2260/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2260 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2260/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #321 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/321/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #321 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/321/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1065 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1065/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1065 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1065/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8382 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8382/)
        YARN-3222. Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69)

        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8382 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8382/ ) YARN-3222 . Added the missing CHANGES.txt entry. (vinodkv: rev 4620767156ecc43424bc6c7c4d50519e2563cc69) hadoop-yarn-project/CHANGES.txt
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        The original commit missed CHANGES.txt entries, added them.

        Pulled this into 2.6.1. Ran compilation and TestResourceTrackerService before the push. Patch applied cleanly.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - The original commit missed CHANGES.txt entries, added them. Pulled this into 2.6.1. Ran compilation and TestResourceTrackerService before the push. Patch applied cleanly.
        Hide
        sjlee0 Sangjin Lee added a comment -

        The merge to 2.6.0 is straightforward.

        Show
        sjlee0 Sangjin Lee added a comment - The merge to 2.6.0 is straightforward.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/122/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #113 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/113/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2054 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2054/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/856/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #856 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/856/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #122 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/122/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #7248 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7248/)
        YARN-3222. Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7248 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7248/ ) YARN-3222 . Fixed NPE on RMNodeImpl#ReconnectNodeTransition when a node is reconnected with a different port. Contributed by Rohith Sharmaks (jianhe: rev b2f1ec312ee431aef762cfb49cb29cd6f4661e86) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
        Hide
        jianhe Jian He added a comment -

        Committed to trunk and branch-2, thanks Rohith !

        Show
        jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Rohith !
        Hide
        jianhe Jian He added a comment -

        thanks ! committing

        Show
        jianhe Jian He added a comment - thanks ! committing
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Had glance at javac and javadoc warning, this looks unrelated to patch

        Show
        rohithsharma Rohith Sharma K S added a comment - Had glance at javac and javadoc warning, this looks unrelated to patch
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12702276/0005-YARN-3222.patch
        against trunk revision e17e5ba.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        -1 javac. The applied patch generated 1151 javac compiler warnings (more than the trunk's current 185 warnings).

        -1 javadoc. The javadoc tool appears to have generated 43 warning messages.
        See https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavadocWarnings.txt for details.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6828//testReport/
        Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6828//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702276/0005-YARN-3222.patch against trunk revision e17e5ba. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The applied patch generated 1151 javac compiler warnings (more than the trunk's current 185 warnings). -1 javadoc . The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavadocWarnings.txt for details. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6828//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6828//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6828//console This message is automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        check you added earlier about sending NodeResourceUpdate event only if the node resource is different

        Agree

        Updated the patch addressing above comment. Kindly review it.

        Show
        rohithsharma Rohith Sharma K S added a comment - check you added earlier about sending NodeResourceUpdate event only if the node resource is different Agree Updated the patch addressing above comment. Kindly review it.
        Hide
        jianhe Jian He added a comment -

        thanks Rohith !
        I think the condition check you added earlier about sending NodeResourceUpdate event only if the node resource is different is useful, that saves some traffic. would you mind adding that too ?

                if (rmNode.getState().equals(NodeState.RUNNING)) {
                  // Update scheduler node's capacity for reconnect node.
                  rmNode.context
                      .getDispatcher()
                      .getEventHandler()
                      .handle(
                          new NodeResourceUpdateSchedulerEvent(rmNode, ResourceOption
                              .newInstance(newNode.getTotalCapability(), -1)));
                }
        
        Show
        jianhe Jian He added a comment - thanks Rohith ! I think the condition check you added earlier about sending NodeResourceUpdate event only if the node resource is different is useful, that saves some traffic. would you mind adding that too ? if (rmNode.getState().equals(NodeState.RUNNING)) { // Update scheduler node's capacity for reconnect node. rmNode.context .getDispatcher() .getEventHandler() .handle( new NodeResourceUpdateSchedulerEvent(rmNode, ResourceOption .newInstance(newNode.getTotalCapability(), -1))); }
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12702122/0004-YARN-3222.patch
        against trunk revision 9ae7f9e.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6818//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6818//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6818//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702122/0004-YARN-3222.patch against trunk revision 9ae7f9e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6818//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6818//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6818//console This message is automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Kindly review the update patch that fixes 1& 2 in as mentioned in earlier comment.

        Show
        rohithsharma Rohith Sharma K S added a comment - Kindly review the update patch that fixes 1& 2 in as mentioned in earlier comment.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        For handling 3rd point, raised issue YARN-3286

        Show
        rohithsharma Rohith Sharma K S added a comment - For handling 3rd point, raised issue YARN-3286
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Had a mail chat with Jian He regarding the issue's observed in this jira discussions and decided to split up the jira into 2 separate jira. The observed issues in ReconnectNodeTransition are

        1. As per defect description, order of node_resource_update and node_added events sending to schedulers. If Node_added events is being sent to schedulers then no need of sending node_resource_update event from RMNode again to scheduler which is not necessarily required.
        2. If the RMNode state is RUNNING then Node_usable event not necessarily to be sent.
        3. If a node is reconnceted with different capability, then RMNode#totalCapability remains with old capability. This has to be updated with new capability.

        1 and 2 are going to handle in this jira. 3 issue will be done in separate jira.

        Show
        rohithsharma Rohith Sharma K S added a comment - Had a mail chat with Jian He regarding the issue's observed in this jira discussions and decided to split up the jira into 2 separate jira. The observed issues in ReconnectNodeTransition are As per defect description, order of node_resource_update and node_added events sending to schedulers. If Node_added events is being sent to schedulers then no need of sending node_resource_update event from RMNode again to scheduler which is not necessarily required. If the RMNode state is RUNNING then Node_usable event not necessarily to be sent. If a node is reconnceted with different capability, then RMNode#totalCapability remains with old capability. This has to be updated with new capability. 1 and 2 are going to handle in this jira. 3 issue will be done in separate jira.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        I thought in the below way, for handling race in the above scenario discussed

        1. if oldNode is same as newNode and change in the capability, then update the resource first in scheduler.
          1. ClusterResource=5gb+5gb
          2. Update Resource with new node capability, ClusterResource=5gb+10gb(new capability).
        2. Remove node with new capability
          1. ClusterResource=15gb-10gb(new capability)=5gb
        3. Add Node with new capability
          1. ClusterResouce=5gb+10gb=15gb which is expected and RMNode#totalCapability is 10gb

        Does it make sense?

        Show
        rohithsharma Rohith Sharma K S added a comment - I thought in the below way, for handling race in the above scenario discussed if oldNode is same as newNode and change in the capability, then update the resource first in scheduler. ClusterResource=5gb+5gb Update Resource with new node capability, ClusterResource=5gb+10gb(new capability). Remove node with new capability ClusterResource=15gb-10gb(new capability)=5gb Add Node with new capability ClusterResouce=5gb+10gb=15gb which is expected and RMNode#totalCapability is 10gb Does it make sense?
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to send the node_usable event in ReconnectEvent. As you said earlier, the next heartbeat will trigger this event based on the node's own health report.

        Right.. It is not required. I will remove this

        The transition is invoked only at running and unhealthy state, so I think this is not possible?

        I see.

        Even by sending an event it's still possible that removeNode was removing new capability from cluster resource ?

        I see a potential risk even if RMNodeResourceUpdateEvent has sent because say Asyndispatcher has events Node_removed,RMNodeResourceUpdate. AsyncDispatcher fetch Node_removed and put it SchedulerEventDispatcher queue. IAC, if SchedulerEventDispatcher is dealyed processing the node_removed may be because of more scheduler events, then RMNodeResourceUpdate is processed first. So there is chance of removing new capability from cluster resource.
        Any thoughts for handling this issue?

        Show
        rohithsharma Rohith Sharma K S added a comment - I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to send the node_usable event in ReconnectEvent. As you said earlier, the next heartbeat will trigger this event based on the node's own health report. Right.. It is not required. I will remove this The transition is invoked only at running and unhealthy state, so I think this is not possible? I see. Even by sending an event it's still possible that removeNode was removing new capability from cluster resource ? I see a potential risk even if RMNodeResourceUpdateEvent has sent because say Asyndispatcher has events Node_removed,RMNodeResourceUpdate. AsyncDispatcher fetch Node_removed and put it SchedulerEventDispatcher queue. IAC, if SchedulerEventDispatcher is dealyed processing the node_removed may be because of more scheduler events, then RMNodeResourceUpdate is processed first. So there is chance of removing new capability from cluster resource. Any thoughts for handling this issue?
        Hide
        jianhe Jian He added a comment -

        I have handled this by sending RMNodeResourceUpdateEvent if there is any change in capability

        Even by sending an event, it's still possible that removeNode was removing new capability from cluster resource ?

        Show
        jianhe Jian He added a comment - I have handled this by sending RMNodeResourceUpdateEvent if there is any change in capability Even by sending an event, it's still possible that removeNode was removing new capability from cluster resource ?
        Hide
        jianhe Jian He added a comment -

        thanks for updating.
        I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to send the node_usable event in ReconnectEvent. As you said earlier, the next heartbeat will trigger this event based on the node's own health report.

        It mean, node state can be decommissioned/lost/running

        The transition is invoked only at running and unhealthy state, so I think this is not possible?

        Show
        jianhe Jian He added a comment - thanks for updating. I think we may not need to call sendNodeUsableEventIfNodeStateIsNotRunning to send the node_usable event in ReconnectEvent. As you said earlier, the next heartbeat will trigger this event based on the node's own health report. It mean, node state can be decommissioned/lost/running The transition is invoked only at running and unhealthy state, so I think this is not possible?
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12701861/0003-YARN-3222.patch
        against trunk revision ca1c00b.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6800//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6800//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6800//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701861/0003-YARN-3222.patch against trunk revision ca1c00b. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6800//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6800//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6800//console This message is automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        are the test failures related ?

        Yes , Since totalCapability was set directly before sending NodeRemovedEvent, removeNode was removing new capability from cluster resource. I have handled this by sending RMNodeResourceUpdateEvent if there is any change in capability

        we may not need to send the NODE_USABLE event, if the node were already at the running state, right ?

        yes, done

        we can make the following two condition checks consistent as checking for RUNNING

        here check is done for not unhealthy state. It mean, node state can be decommissioned/lost/running. I'd suggest to keep as it is.

        Show
        rohithsharma Rohith Sharma K S added a comment - are the test failures related ? Yes , Since totalCapability was set directly before sending NodeRemovedEvent, removeNode was removing new capability from cluster resource. I have handled this by sending RMNodeResourceUpdateEvent if there is any change in capability we may not need to send the NODE_USABLE event, if the node were already at the running state, right ? yes, done we can make the following two condition checks consistent as checking for RUNNING here check is done for not unhealthy state. It mean, node state can be decommissioned/lost/running. I'd suggest to keep as it is.
        Hide
        jianhe Jian He added a comment -

        actually, we may not need to send the NODE_USABLE event, if the node were already at the running state, right ?
        also, we can make the following two condition checks consistent as checking for RUNNING.

         if (rmNode.getState() != NodeState.UNHEALTHY) {
                    // Only add new node if old state is not UNHEALTHY
         if (rmNode.getState().equals(NodeState.RUNNING)) {
        
        Show
        jianhe Jian He added a comment - actually, we may not need to send the NODE_USABLE event, if the node were already at the running state, right ? also, we can make the following two condition checks consistent as checking for RUNNING. if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY if (rmNode.getState().equals(NodeState.RUNNING)) {
        Hide
        jianhe Jian He added a comment -

        lgtm, are the test failures related ?

        Show
        jianhe Jian He added a comment - lgtm, are the test failures related ?
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12701317/0002-YARN-3222.patch
        against trunk revision 48c7ee7.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
        org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
        org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6779//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6779//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6779//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701317/0002-YARN-3222.patch against trunk revision 48c7ee7. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6779//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6779//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6779//console This message is automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Updated the patch for handling following scenarios

        1. Avoid sending Node_resource_Update event to schedulers from RMNode when Node_added event is sent previously
        2. Send NODE_USABLE event if reconnected node is healthy only.
        3. Update resource totalCapability in RMNode if reconnected node is same as old node
        Show
        rohithsharma Rohith Sharma K S added a comment - Updated the patch for handling following scenarios Avoid sending Node_resource_Update event to schedulers from RMNode when Node_added event is sent previously Send NODE_USABLE event if reconnected node is healthy only. Update resource totalCapability in RMNode if reconnected node is same as old node
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ?

        Yes, I think it was assumed like if new node is reconnecting then NM is healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st heartbeat NodeStatus can be moved from Unhealthy to Running.

        I see another potential issue that if old node is retaining then RMnode has to be updated totalCapability with new RMNode resource. But in flow, totalCapability is not updated. This result , scheduler has updated resources value but RMNode has stale memory. Any client getting RMnode capabilit from RMnode would end up in wrong node resource value.

        if (noRunningApps) {
        // some code        
                rmNode.context.getDispatcher().getEventHandler().handle(
                    new NodeRemovedSchedulerEvent(rmNode));
                
                if (rmNode.getHttpPort() == newNode.getHttpPort()) {
                   if (rmNode.getState() != NodeState.UNHEALTHY) {
                    // Only add new node if old state is not UNHEALTHY
                    rmNode.context.getDispatcher().getEventHandler().handle(
                        new NodeAddedSchedulerEvent(newNode));  // NEW NODE CAPABILITY SHOULD BE UPDATED TO OLD NODE
                  }
                } else {
                  // Reconnected node differs, so replace old node and start new node
                    rmNode.context.getDispatcher().getEventHandler().handle(
                        new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No need to update totalCapability since old node is replaced with new node.
                }
              }
        
        Show
        rohithsharma Rohith Sharma K S added a comment - NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? Yes, I think it was assumed like if new node is reconnecting then NM is healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st heartbeat NodeStatus can be moved from Unhealthy to Running. I see another potential issue that if old node is retaining then RMnode has to be updated totalCapability with new RMNode resource. But in flow, totalCapability is not updated. This result , scheduler has updated resources value but RMNode has stale memory. Any client getting RMnode capabilit from RMnode would end up in wrong node resource value. if (noRunningApps) { // some code rmNode.context.getDispatcher().getEventHandler().handle( new NodeRemovedSchedulerEvent(rmNode)); if (rmNode.getHttpPort() == newNode.getHttpPort()) { if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY rmNode.context.getDispatcher().getEventHandler().handle( new NodeAddedSchedulerEvent(newNode)); // NEW NODE CAPABILITY SHOULD BE UPDATED TO OLD NODE } } else { // Reconnected node differs, so replace old node and start new node rmNode.context.getDispatcher().getEventHandler().handle( new RMNodeStartedEvent(newNode.getNodeID(), null , null )); // No need to update totalCapability since old node is replaced with new node. } }
        Hide
        jianhe Jian He added a comment -

        looks good to me.
        while looking at this, may found another bug; NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ?

              rmNode.context.getDispatcher().getEventHandler().handle(
                  new NodesListManagerEvent(
                      NodesListManagerEventType.NODE_USABLE, rmNode));
        
        Show
        jianhe Jian He added a comment - looks good to me. while looking at this, may found another bug; NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? rmNode.context.getDispatcher().getEventHandler().handle( new NodesListManagerEvent( NodesListManagerEventType.NODE_USABLE, rmNode));
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Jian He kindly review the analysis and patch. I had look at test failures and dont think test failures are not related to this patch.

        Show
        rohithsharma Rohith Sharma K S added a comment - Jian He kindly review the analysis and patch. I had look at test failures and dont think test failures are not related to this patch.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12700180/0001-YARN-3222.patch
        against trunk revision fe7a302.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.TestRM
        org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
        org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairReservationSystem

        The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

        org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6698//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6698//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6698//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700180/0001-YARN-3222.patch against trunk revision fe7a302. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.reservation.TestFairReservationSystem The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6698//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6698//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6698//console This message is automatically generated.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Kindly review the patch, the patch is verified mannually deploying in cluster since tests is not added.
        In the patch, I have moved handlingRunningApplications to inside of else block. This need not be in common to noAppsRunning.

        Show
        rohithsharma Rohith Sharma K S added a comment - Kindly review the patch, the patch is verified mannually deploying in cluster since tests is not added. In the patch, I have moved handlingRunningApplications to inside of else block. This need not be in common to noAppsRunning.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        I see there are 2 ways of fixing the issue.

        1. Always send NODE_RESOURCE_UPDATE event to scheduler via RMNodeEventType.RESOURCE_UPDATE of RMnode
        2. When NODE_ADDED event is sent to scheduler, again sending NODE_RESOURCE_UPDATE event to same node ReconnectedNodeTransition is duplicate update request because scheduler has already been updated resources with newly added node i.e NODE_REMOVED->NODE_ADDED->NODE_RESOURCE_UPDATE->. So if NO applications are running in the node, then it is not required to send node_resource_update request.

        I would prefer for 2nd option because here one duplicate resource update can be optimized.

        Show
        rohithsharma Rohith Sharma K S added a comment - I see there are 2 ways of fixing the issue. Always send NODE_RESOURCE_UPDATE event to scheduler via RMNodeEventType.RESOURCE_UPDATE of RMnode When NODE_ADDED event is sent to scheduler, again sending NODE_RESOURCE_UPDATE event to same node ReconnectedNodeTransition is duplicate update request because scheduler has already been updated resources with newly added node i.e NODE_REMOVED->NODE_ADDED- >NODE_RESOURCE_UPDATE ->. So if NO applications are running in the node, then it is not required to send node_resource_update request. I would prefer for 2nd option because here one duplicate resource update can be optimized.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Attaching the logs which gives more information about issue. In the below log, RM has shutdown with NPE while updating node_resource. And observe scheduler events dispatched from AsyncDispatcher in org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.*. Here the order is NODE_REMOVED --> NODE_RESOURCE_UPDATE --> NODE_ADDED --> NODE_LABELS_UPDATE

        2015-02-19 09:14:57,212 INFO  [main] util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved 127.0.0.1 to /default-rack
        2015-02-19 09:14:57,213 INFO  [main] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(313)) - Reconnect from the node at: 127.0.0.1
        2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeReconnectEvent.EventType: RECONNECTED
        2015-02-19 09:14:57,215 INFO  [main] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(343)) - NodeManager from node 127.0.0.1(cmPort: 1234 httpPort: 3) registered with capability: <memory:16384, vCores:16>, assigned nodeId 127.0.0.1:1234
        2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type RECONNECTED
        2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeRemovedSchedulerEvent.EventType: NODE_REMOVED
        2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStartedEvent.EventType: STARTED
        2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type STARTED
        2015-02-19 09:14:57,266 INFO  [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(424)) - 127.0.0.1:1234 Node Transitioned from NEW to RUNNING
        2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: NODE_USABLE
        2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeResourceUpdateSchedulerEvent.EventType: NODE_RESOURCE_UPDATE
        2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeAddedSchedulerEvent.EventType: NODE_ADDED
        2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: NODE_USABLE
        2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType: NODE_LABELS_UPDATE
        2015-02-19 09:14:57,267 INFO  [ResourceManager Event Processor] capacity.CapacityScheduler (CapacityScheduler.java:removeNode(1267)) - Removed node 127.0.0.1:1234 clusterResource: <memory:0, vCores:0>
        2015-02-19 09:14:57,267 FATAL [ResourceManager Event Processor] resourcemanager.ResourceManager (ResourceManager.java:run(688)) - Error in handling event type NODE_RESOURCE_UPDATE to the scheduler
        java.lang.NullPointerException
        	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548)
        	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:992)
        	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1119)
        	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:120)
        	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:679)
        	at java.lang.Thread.run(Thread.java:745)
        2015-02-19 09:14:57,280 INFO  [ResourceManager Event Processor] resourcemanager.ResourceManager (ResourceManager.java:run(692)) - Exiting, bbye..
        
        Show
        rohithsharma Rohith Sharma K S added a comment - Attaching the logs which gives more information about issue. In the below log, RM has shutdown with NPE while updating node_resource. And observe scheduler events dispatched from AsyncDispatcher in org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.* . Here the order is NODE_REMOVED --> NODE_RESOURCE_UPDATE --> NODE_ADDED --> NODE_LABELS_UPDATE 2015-02-19 09:14:57,212 INFO [main] util.RackResolver (RackResolver.java:coreResolve(109)) - Resolved 127.0.0.1 to /default-rack 2015-02-19 09:14:57,213 INFO [main] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(313)) - Reconnect from the node at: 127.0.0.1 2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeReconnectEvent.EventType: RECONNECTED 2015-02-19 09:14:57,215 INFO [main] resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(343)) - NodeManager from node 127.0.0.1(cmPort: 1234 httpPort: 3) registered with capability: <memory:16384, vCores:16>, assigned nodeId 127.0.0.1:1234 2015-02-19 09:14:57,215 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type RECONNECTED 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeRemovedSchedulerEvent.EventType: NODE_REMOVED 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStartedEvent.EventType: STARTED 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(412)) - Processing 127.0.0.1:1234 of type STARTED 2015-02-19 09:14:57,266 INFO [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(424)) - 127.0.0.1:1234 Node Transitioned from NEW to RUNNING 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: NODE_USABLE 2015-02-19 09:14:57,266 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeResourceUpdateSchedulerEvent.EventType: NODE_RESOURCE_UPDATE 2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeAddedSchedulerEvent.EventType: NODE_ADDED 2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEvent.EventType: NODE_USABLE 2015-02-19 09:14:57,267 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeLabelsUpdateSchedulerEvent.EventType: NODE_LABELS_UPDATE 2015-02-19 09:14:57,267 INFO [ResourceManager Event Processor] capacity.CapacityScheduler (CapacityScheduler.java:removeNode(1267)) - Removed node 127.0.0.1:1234 clusterResource: <memory:0, vCores:0> 2015-02-19 09:14:57,267 FATAL [ResourceManager Event Processor] resourcemanager.ResourceManager (ResourceManager.java:run(688)) - Error in handling event type NODE_RESOURCE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:992) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:679) at java.lang.Thread.run(Thread.java:745) 2015-02-19 09:14:57,280 INFO [ResourceManager Event Processor] resourcemanager.ResourceManager (ResourceManager.java:run(692)) - Exiting, bbye..

          People

          • Assignee:
            rohithsharma Rohith Sharma K S
            Reporter:
            rohithsharma Rohith Sharma K S
          • Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development