Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4454

NM to nodelabel mapping going wrong after RM restart

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None

      Description

      Nodelabel mapping with NodeManager is going wrong if combination of hostname and then NodeId is used to update nodelabel mapping

      Steps to reproduce

      1.Create cluster with 2 NM
      2.Add label X,Y to cluster
      3.replace Label of node 1 using <HOSTNAME1:PORT>,x
      4.replace label for node 1 by <HOSTNAME1>,y
      5.Again replace label of node 1 by <HOSTNAME1:PORT>,x

      Check cluster label mapping HOSTNAME1 will be mapped with X

      Now restart RM 2 times NODE LABEL mapping of HOSTNAME1:PORT changes to Y

      2015-12-14 17:17:54,901 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [<ResourcePool_1:exclusivity=true>,<ResourcePool_null:exclusivity=true>]
      2015-12-14 17:17:54,905 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: REPLACE labels on nodes:
      2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
      2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   NM=host-10-19-92-188:0, labels=[ResourcePool_null]
      2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   NM=host-10-19-92-187:64318, labels=[ResourcePool_null]
      
      1. 0001-YARN-4454.patch
        7 kB
        Bibin A Chundatt
      2. test.patch
        5 kB
        Bibin A Chundatt

        Issue Links

          Activity

          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          On 2nd time recovery the ordering is going wrong

          2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   NM=host-10-19-92-188:64318, labels=[ResourcePool_1]
          2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager:   NM=host-10-19-92-188:0, labels=[ResourcePool_null]
          Show
          bibinchundatt Bibin A Chundatt added a comment - On 2nd time recovery the ordering is going wrong 2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: NM=host-10-19-92-188:64318, labels=[ResourcePool_1] 2015-12-14 17:17:54,906 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: NM=host-10-19-92-188:0, labels=[ResourcePool_null]
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Attaching testcode to reproduce the same

          Show
          bibinchundatt Bibin A Chundatt added a comment - Attaching testcode to reproduce the same
          Hide
          leftnoteasy Wangda Tan added a comment -

          Bibin A Chundatt, thanks for reporting and looking at the issue.

          The root cause of this issue is, when the RM restart first time, it will generate a mirror file which has a complete node->label mappings:

          node1:port=x 
          node1=y
          

          And when we restart the RM again, we will load the mapping, but node1:port loaded first, so node1=y will overwrite the previous one.

          In: org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager#checkReplaceLabelsOnNode

          Instead of directly iterate the map:

              for (Entry<NodeId, Set<String>> entry : replaceLabelsToNode.entrySet()) {
                NodeId nodeId = entry.getKey();
          

          We should sort the map so that the node without port should be handled first before node with port specified to avoid overwriting happens.

          Is it make sense to you?

          Show
          leftnoteasy Wangda Tan added a comment - Bibin A Chundatt , thanks for reporting and looking at the issue. The root cause of this issue is, when the RM restart first time, it will generate a mirror file which has a complete node->label mappings: node1:port=x node1=y And when we restart the RM again, we will load the mapping, but node1:port loaded first, so node1=y will overwrite the previous one. In: org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager#checkReplaceLabelsOnNode Instead of directly iterate the map: for (Entry<NodeId, Set< String >> entry : replaceLabelsToNode.entrySet()) { NodeId nodeId = entry.getKey(); We should sort the map so that the node without port should be handled first before node with port specified to avoid overwriting happens. Is it make sense to you?
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Hi Wangda Tan
          Thank you for looking into the issue . i was checking how to keep the order based on insertion order. Thts not required since when host is used all other labels we are getting updated in replace.

          Your solution totally make sense. Currently before internal label update for the normalize have used treeMap to keep sorting based on Node ID is required.

          Please do review the same.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Hi Wangda Tan Thank you for looking into the issue . i was checking how to keep the order based on insertion order. Thts not required since when host is used all other labels we are getting updated in replace. Your solution totally make sense. Currently before internal label update for the normalize have used treeMap to keep sorting based on Node ID is required. Please do review the same.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 9m 10s trunk passed
          +1 compile 8m 54s trunk passed with JDK v1.8.0_66
          +1 compile 9m 28s trunk passed with JDK v1.7.0_91
          +1 checkstyle 1m 3s trunk passed
          +1 mvnsite 1m 47s trunk passed
          +1 mvneclipse 0m 46s trunk passed
          +1 findbugs 3m 40s trunk passed
          +1 javadoc 1m 18s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 25s trunk passed with JDK v1.7.0_91
          +1 mvninstall 1m 46s the patch passed
          +1 compile 10m 13s the patch passed with JDK v1.8.0_66
          +1 javac 10m 13s the patch passed
          +1 compile 10m 7s the patch passed with JDK v1.7.0_91
          +1 javac 10m 7s the patch passed
          +1 checkstyle 1m 3s the patch passed
          +1 mvnsite 1m 44s the patch passed
          +1 mvneclipse 0m 44s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 3m 49s the patch passed
          +1 javadoc 1m 9s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 23s the patch passed with JDK v1.7.0_91
          +1 unit 1m 59s hadoop-yarn-common in the patch passed with JDK v1.8.0_66.
          -1 unit 66m 59s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 9m 29s hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_66.
          +1 unit 2m 16s hadoop-yarn-common in the patch passed with JDK v1.7.0_91.
          -1 unit 66m 42s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91.
          -1 unit 10m 9s hadoop-mapreduce-client-app in the patch failed with JDK v1.7.0_91.
          -1 asflicense 0m 24s Patch generated 1 ASF License warnings.
          228m 58s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt
          JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.mapreduce.v2.app.job.impl.TestJobImpl
            hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12778735/0001-YARN-4454.patch
          JIRA Issue YARN-4454
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9df2c49a61d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0f82b5d
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10050/testReport/
          asflicense https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: .
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10050/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 9m 10s trunk passed +1 compile 8m 54s trunk passed with JDK v1.8.0_66 +1 compile 9m 28s trunk passed with JDK v1.7.0_91 +1 checkstyle 1m 3s trunk passed +1 mvnsite 1m 47s trunk passed +1 mvneclipse 0m 46s trunk passed +1 findbugs 3m 40s trunk passed +1 javadoc 1m 18s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 25s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 46s the patch passed +1 compile 10m 13s the patch passed with JDK v1.8.0_66 +1 javac 10m 13s the patch passed +1 compile 10m 7s the patch passed with JDK v1.7.0_91 +1 javac 10m 7s the patch passed +1 checkstyle 1m 3s the patch passed +1 mvnsite 1m 44s the patch passed +1 mvneclipse 0m 44s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 3m 49s the patch passed +1 javadoc 1m 9s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 23s the patch passed with JDK v1.7.0_91 +1 unit 1m 59s hadoop-yarn-common in the patch passed with JDK v1.8.0_66. -1 unit 66m 59s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. -1 unit 9m 29s hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_66. +1 unit 2m 16s hadoop-yarn-common in the patch passed with JDK v1.7.0_91. -1 unit 66m 42s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. -1 unit 10m 9s hadoop-mapreduce-client-app in the patch failed with JDK v1.7.0_91. -1 asflicense 0m 24s Patch generated 1 ASF License warnings. 228m 58s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt JDK v1.7.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.mapreduce.v2.app.job.impl.TestJobImpl   hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12778735/0001-YARN-4454.patch JIRA Issue YARN-4454 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9df2c49a61d2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0f82b5d findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91.txt https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10050/testReport/ asflicense https://builds.apache.org/job/PreCommit-YARN-Build/10050/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: . Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10050/console This message was automatically generated.
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Test failures are already tracked as part of umbrella JIRA YARN-4474

          Show
          bibinchundatt Bibin A Chundatt added a comment - Test failures are already tracked as part of umbrella JIRA YARN-4474
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Sorry i mentioned wrong jira ID its YARN-4478

          Show
          bibinchundatt Bibin A Chundatt added a comment - Sorry i mentioned wrong jira ID its YARN-4478
          Hide
          leftnoteasy Wangda Tan added a comment -

          Looks good, +1. thanks Bibin A Chundatt! Committing..

          Show
          leftnoteasy Wangda Tan added a comment - Looks good, +1. thanks Bibin A Chundatt ! Committing..
          Hide
          leftnoteasy Wangda Tan added a comment -

          Committed to branch-2/2.8/trunk. Thanks Bibin A Chundatt!

          Show
          leftnoteasy Wangda Tan added a comment - Committed to branch-2/2.8/trunk. Thanks Bibin A Chundatt !
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9009 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9009/)
          YARN-4454. NM to nodelabel mapping going wrong after RM restart. (Bibin (wangda: rev bc038b382cb2ce561ce718405fbcee4382f3b204)

          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9009 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9009/ ) YARN-4454 . NM to nodelabel mapping going wrong after RM restart. (Bibin (wangda: rev bc038b382cb2ce561ce718405fbcee4382f3b204) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
          Hide
          bibinchundatt Bibin A Chundatt added a comment -

          Wangda Tan Thank you for review and commit.

          Show
          bibinchundatt Bibin A Chundatt added a comment - Wangda Tan Thank you for review and commit.

            People

            • Assignee:
              bibinchundatt Bibin A Chundatt
              Reporter:
              bibinchundatt Bibin A Chundatt
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development