Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6585

RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      Caused by: java.io.IOException: Not all labels being replaced contained by known label collections, please check, new labels=[abc]
              at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
              at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
              at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
              at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
              at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
              at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
              at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
              at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
              ... 13 more
      
      1. YARN-6585.0003.patch
        10 kB
        Sunil G
      2. YARN-6585.0002.patch
        10 kB
        Sunil G
      3. YARN-6585.0001.patch
        1 kB
        Sunil G

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11853 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11853/)
          YARN-6585. RM fails to start when upgrading from 2.7 for clusters with (epayne: rev 5578af860335ae44c9780082508c3dcf726f60fc)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsRequestPBImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11853 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11853/ ) YARN-6585 . RM fails to start when upgrading from 2.7 for clusters with (epayne: rev 5578af860335ae44c9780082508c3dcf726f60fc) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/AddToClusterNodeLabelsRequestPBImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
          Hide
          eepayne Eric Payne added a comment -

          BTW, the findbugs warning for org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat is is not relevant for this patch.

          Show
          eepayne Eric Payne added a comment - BTW, the findbugs warning for org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat is is not relevant for this patch.
          Hide
          eepayne Eric Payne added a comment -

          Thanks Sunil G. The patch looks good.
          +1. Will commit soon.

          Show
          eepayne Eric Payne added a comment - Thanks Sunil G . The patch looks good. +1. Will commit soon.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 44s Maven dependency ordering for branch
          +1 mvninstall 13m 21s trunk passed
          +1 compile 8m 33s trunk passed
          +1 checkstyle 0m 51s trunk passed
          +1 mvnsite 1m 18s trunk passed
          -1 findbugs 1m 3s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 1 extant Findbugs warnings.
          +1 javadoc 1m 2s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 1m 0s the patch passed
          +1 compile 5m 9s the patch passed
          +1 javac 5m 9s the patch passed
          -0 checkstyle 0m 50s hadoop-yarn-project/hadoop-yarn: The patch generated 9 new + 58 unchanged - 0 fixed = 67 total (was 58)
          +1 mvnsite 1m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 20s the patch passed
          +1 javadoc 1m 1s the patch passed
          +1 unit 2m 23s hadoop-yarn-common in the patch passed.
          +1 unit 38m 57s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 32s The patch does not generate ASF License warnings.
          89m 40s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-6585
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872246/YARN-6585.0003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux fc1ee985dc37 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 99634d1
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16169/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16169/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16169/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/16169/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 44s Maven dependency ordering for branch +1 mvninstall 13m 21s trunk passed +1 compile 8m 33s trunk passed +1 checkstyle 0m 51s trunk passed +1 mvnsite 1m 18s trunk passed -1 findbugs 1m 3s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 1 extant Findbugs warnings. +1 javadoc 1m 2s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 1m 0s the patch passed +1 compile 5m 9s the patch passed +1 javac 5m 9s the patch passed -0 checkstyle 0m 50s hadoop-yarn-project/hadoop-yarn: The patch generated 9 new + 58 unchanged - 0 fixed = 67 total (was 58) +1 mvnsite 1m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 20s the patch passed +1 javadoc 1m 1s the patch passed +1 unit 2m 23s hadoop-yarn-common in the patch passed. +1 unit 38m 57s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 32s The patch does not generate ASF License warnings. 89m 40s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6585 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12872246/YARN-6585.0003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux fc1ee985dc37 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 99634d1 Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16169/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16169/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16169/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/16169/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Updating a new patch.

          Show
          sunilg Sunil G added a comment - Updating a new patch.
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne

          I feel that these issues should be fixed as part of a separate JIRA since they are confusing things here and are not related to fixing this problem.

          I think its fine to take this changes in separate ticket. Ideally initNodeLabels changes are enough to support for breakage sceanrios. Also thanks you for sharing the test case which I think will help here.

          I ll update a patch shortwhile.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne I feel that these issues should be fixed as part of a separate JIRA since they are confusing things here and are not related to fixing this problem. I think its fine to take this changes in separate ticket. Ideally initNodeLabels changes are enough to support for breakage sceanrios. Also thanks you for sharing the test case which I think will help here. I ll update a patch shortwhile.
          Hide
          sunilg Sunil G added a comment -

          Sorry Eric Payne for the delay here. I ll look this today.

          Show
          sunilg Sunil G added a comment - Sorry Eric Payne for the delay here. I ll look this today.
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G,

          I have come up with a unit test for this that I think will work well. It creates a 2.7-formatted levelDB file and starts the 2.8 RM while referencing it. I have tested it before and after the fix. Before the fix, it throws a similar exception as was experienced when we discovered this error:

          java.io.IOException: Not all labels being replaced contained by known label collections, please check, new labels=[a]
                  at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
                  at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
                  at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
          ...
          
          TestRMNodeLabelsManager#testBackwardsCompatableMirror
            @Test(timeout = 60000)
            public void testBackwardsCompatableMirror() throws Exception {
              lmgr = new RMNodeLabelsManager();
              Configuration conf = new Configuration();
              File tempDir = File.createTempFile("nlb", ".tmp");
              tempDir.delete();
              tempDir.mkdirs();
              tempDir.deleteOnExit();
              String tempDirName = tempDir.getAbsolutePath();
              conf.set(YarnConfiguration.FS_NODE_LABELS_STORE_ROOT_DIR, tempDirName);
          
              // The following are the contents of a 2.7-formatted levelDB file to be
              // placed in nodelabel.mirror. There are 3 labels: 'a', 'b', and 'c'.
              // host1 is labeled with 'a', host2 is labeled with 'b', and c is not
              // associated with a node.
              byte[] contents =
                {
                    0x09, 0x0A, 0x01, 0x61, 0x0A, 0x01, 0x62, 0x0A, 0x01, 0x63, 0x20, 
                    0x0A, 0x0E, 0x0A, 0x09, 0x0A, 0x05, 0x68, 0x6F, 0x73, 0x74, 0x32, 
                    0x10, 0x00, 0x12, 0x01, 0x62, 0x0A, 0x0E, 0x0A, 0x09, 0x0A, 0x05, 
                    0x68, 0x6F, 0x73, 0x74, 0x31, 0x10, 0x00, 0x12, 0x01, 0x61
                };
              File file = new File(tempDirName + "/nodelabel.mirror");
              file.createNewFile();
              FileOutputStream stream = new FileOutputStream(file);
              stream.write(contents);
              stream.close();
          
              conf.setBoolean(YarnConfiguration.NODE_LABELS_ENABLED, true);
              conf.set(YarnConfiguration.RM_SCHEDULER,
                  "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler");
              Configuration withQueueLabels = getConfigurationWithQueueLabels(conf);
          
              MockRM rm = initRM(withQueueLabels);
              Set<String> labelNames = lmgr.getClusterNodeLabelNames();
              Map<String, Set<NodeId>> labeledNodes = lmgr.getLabelsToNodes();
          
              Assert.assertTrue(labelNames.contains("a"));
              Assert.assertTrue(labelNames.contains("b"));
              Assert.assertTrue(labelNames.contains("c"));
              Assert.assertTrue(labeledNodes.get("a")
                  .contains(NodeId.newInstance("host1", 0)));
              Assert.assertTrue(labeledNodes.get("b")
                  .contains(NodeId.newInstance("host2", 0)));
          
              rm.stop();
            }
          
          Show
          eepayne Eric Payne added a comment - Hi Sunil G , I have come up with a unit test for this that I think will work well. It creates a 2.7-formatted levelDB file and starts the 2.8 RM while referencing it. I have tested it before and after the fix. Before the fix, it throws a similar exception as was experienced when we discovered this error: java.io.IOException: Not all labels being replaced contained by known label collections, please check, new labels=[a] at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189) ... TestRMNodeLabelsManager#testBackwardsCompatableMirror @Test(timeout = 60000) public void testBackwardsCompatableMirror() throws Exception { lmgr = new RMNodeLabelsManager(); Configuration conf = new Configuration(); File tempDir = File.createTempFile( "nlb" , ".tmp" ); tempDir.delete(); tempDir.mkdirs(); tempDir.deleteOnExit(); String tempDirName = tempDir.getAbsolutePath(); conf.set(YarnConfiguration.FS_NODE_LABELS_STORE_ROOT_DIR, tempDirName); // The following are the contents of a 2.7-formatted levelDB file to be // placed in nodelabel.mirror. There are 3 labels: 'a', 'b', and 'c'. // host1 is labeled with 'a', host2 is labeled with 'b', and c is not // associated with a node. byte [] contents = { 0x09, 0x0A, 0x01, 0x61, 0x0A, 0x01, 0x62, 0x0A, 0x01, 0x63, 0x20, 0x0A, 0x0E, 0x0A, 0x09, 0x0A, 0x05, 0x68, 0x6F, 0x73, 0x74, 0x32, 0x10, 0x00, 0x12, 0x01, 0x62, 0x0A, 0x0E, 0x0A, 0x09, 0x0A, 0x05, 0x68, 0x6F, 0x73, 0x74, 0x31, 0x10, 0x00, 0x12, 0x01, 0x61 }; File file = new File(tempDirName + "/nodelabel.mirror" ); file.createNewFile(); FileOutputStream stream = new FileOutputStream(file); stream.write(contents); stream.close(); conf.setBoolean(YarnConfiguration.NODE_LABELS_ENABLED, true ); conf.set(YarnConfiguration.RM_SCHEDULER, "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler" ); Configuration withQueueLabels = getConfigurationWithQueueLabels(conf); MockRM rm = initRM(withQueueLabels); Set< String > labelNames = lmgr.getClusterNodeLabelNames(); Map< String , Set<NodeId>> labeledNodes = lmgr.getLabelsToNodes(); Assert.assertTrue(labelNames.contains( "a" )); Assert.assertTrue(labelNames.contains( "b" )); Assert.assertTrue(labelNames.contains( "c" )); Assert.assertTrue(labeledNodes.get( "a" ) .contains(NodeId.newInstance( "host1" , 0))); Assert.assertTrue(labeledNodes.get( "b" ) .contains(NodeId.newInstance( "host2" , 0))); rm.stop(); }
          Hide
          eepayne Eric Payne added a comment -

          In similar to GetClusterNodeLabelsResponse, it was also support labels as string earlier

          If I understand correctly, I think you are saying the following:

          1. since GetClusterNodeLabelsResponse has similar APIs, then AddToClusterNodeLabelsRequest should also have them
          2. YARN-3413 removed the APIs for newInstance(Set<String> labels), setNodeLabels(Set<String> labels), and Set<String> getNodeLabels() and replaced them with APIs for newInstance(List<NodeLabel> nodeLabels), void setNodeLabels(List<NodeLabel> nodeLabels), and List<NodeLabel> getNodeLabels(). It is incompatible to change APIs between minor releases, and this should be corrected.

          Is that correct? While both of those are valid points, I feel that these issues should be fixed as part of a separate JIRA since they are confusing things here and are not related to fixing this problem.

          Also, if the concern is compatability with 2.7, the APIs in 2.7 that operated on Set<String> were named [get|set]NodeLabels rather than [get|set]NodeLabelsList

          i was making sure that we have stored labels as string (like in 2.7 version), then read it back to ensure labels are still loaded an readable from RM

          I still need to think more about the unit test, but the deprecated API AddToClusterNodeLabelsRequest newInstance(Set<String> nodeLabels) doesn't write the labels in the 2.7 format. It just turns around and calls the 2.8 API.

          Also, as I said above, I don't think the code in AddToClusterNodeLabelsRequestPBImpl#initLocalNodeLabels is getting exercised by the unit test. I took out the code and the test still passes.

          Show
          eepayne Eric Payne added a comment - In similar to GetClusterNodeLabelsResponse, it was also support labels as string earlier If I understand correctly, I think you are saying the following: since GetClusterNodeLabelsResponse has similar APIs, then AddToClusterNodeLabelsRequest should also have them YARN-3413 removed the APIs for newInstance(Set<String> labels) , setNodeLabels(Set<String> labels) , and Set<String> getNodeLabels() and replaced them with APIs for newInstance(List<NodeLabel> nodeLabels) , void setNodeLabels(List<NodeLabel> nodeLabels) , and List<NodeLabel> getNodeLabels() . It is incompatible to change APIs between minor releases, and this should be corrected. Is that correct? While both of those are valid points, I feel that these issues should be fixed as part of a separate JIRA since they are confusing things here and are not related to fixing this problem. Also, if the concern is compatability with 2.7, the APIs in 2.7 that operated on Set<String> were named [get|set]NodeLabels rather than [get|set]NodeLabelsList i was making sure that we have stored labels as string (like in 2.7 version), then read it back to ensure labels are still loaded an readable from RM I still need to think more about the unit test, but the deprecated API AddToClusterNodeLabelsRequest newInstance(Set<String> nodeLabels) doesn't write the labels in the 2.7 format. It just turns around and calls the 2.8 API. Also, as I said above, I don't think the code in AddToClusterNodeLabelsRequestPBImpl#initLocalNodeLabels is getting exercised by the unit test. I took out the code and the test still passes.
          Hide
          sunilg Sunil G added a comment -

          HI Eric Payne

          Thanks for the comments.

          In similar to GetClusterNodeLabelsResponse, it was also support labels as string earlier. then its moved to list of NodeLabel. Ideally initLocalNodeLabels helps to cover the case.

          In my test case, i was making sure that we have stored labels as string (like in 2.7 version), then read it back to ensure labels are still loaded an readable from RM. Ideally we need not have to have mirror file specific case. Labels as string to be addd to cluster, and read it back in current form (as list of NodeLabel) to ensure all labels are added with exclusivity on. Hence i needed setter and getter. All cases were same in GetClusterNodeLabelsResponse and its PBImpl also.

          Pls share your thoughts.

          Show
          sunilg Sunil G added a comment - HI Eric Payne Thanks for the comments. In similar to GetClusterNodeLabelsResponse , it was also support labels as string earlier. then its moved to list of NodeLabel. Ideally initLocalNodeLabels helps to cover the case. In my test case, i was making sure that we have stored labels as string (like in 2.7 version), then read it back to ensure labels are still loaded an readable from RM. Ideally we need not have to have mirror file specific case. Labels as string to be addd to cluster, and read it back in current form (as list of NodeLabel) to ensure all labels are added with exclusivity on. Hence i needed setter and getter. All cases were same in GetClusterNodeLabelsResponse and its PBImpl also. Pls share your thoughts.
          Hide
          eepayne Eric Payne added a comment -

          I don't understand why we needed to add a deprecated newInstance method

          I'm also confused about the need for void setNodeLabelsList(Set<String> labels) and Set<String> getNodeLabelsList() in ddToClusterNodeLabelsRequest[PBImpl]. AFAICT, the code added to initLocalNodeLabels should be sufficient:

          AddToClusterNodeLabelsRequestPBImp#initLocalNodeLabelsl
              if (this.updatedNodeLabels.isEmpty()) {
                List<String> deprecatedLabelsList = p.getDeprecatedNodeLabelsList();
                for (String l : deprecatedLabelsList) {
                  this.updatedNodeLabels.add(NodeLabel.newInstance(l));
                }
              }
          

          Also, I don't think the unit test is actually testing the above code. I took out the above lines, ran the test, and it still succeeded.

          In fact, I think this will be difficult to test, since AddToClusterNodeLabelsRequestPBImp#initLocalNodeLabelsl is called by FileSystemNodeLabelsStore#loadFromMirror, which is reading the nodelabel.mirror and nodelabel.editlog files from HDFS, and the current test doesn't seem to be mocking any part of that. I'm still thinking about this one ;-/

          Show
          eepayne Eric Payne added a comment - I don't understand why we needed to add a deprecated newInstance method I'm also confused about the need for void setNodeLabelsList(Set<String> labels) and Set<String> getNodeLabelsList() in ddToClusterNodeLabelsRequest[PBImpl] . AFAICT, the code added to initLocalNodeLabels should be sufficient: AddToClusterNodeLabelsRequestPBImp#initLocalNodeLabelsl if ( this .updatedNodeLabels.isEmpty()) { List< String > deprecatedLabelsList = p.getDeprecatedNodeLabelsList(); for ( String l : deprecatedLabelsList) { this .updatedNodeLabels.add(NodeLabel.newInstance(l)); } } Also, I don't think the unit test is actually testing the above code. I took out the above lines, ran the test, and it still succeeded. In fact, I think this will be difficult to test, since AddToClusterNodeLabelsRequestPBImp#initLocalNodeLabelsl is called by FileSystemNodeLabelsStore#loadFromMirror , which is reading the nodelabel.mirror and nodelabel.editlog files from HDFS, and the current test doesn't seem to be mocking any part of that. I'm still thinking about this one ;-/
          Hide
          eepayne Eric Payne added a comment -

          i also had to a newInstance method in addToClusterNodeLabelsRequest to accept labels as string.

          I'm sorry, maybe I'm missing something Sunil G, but I don't understand why we needed to add a deprecated newInstance method that is only called by the new unit test.

          Show
          eepayne Eric Payne added a comment - i also had to a newInstance method in addToClusterNodeLabelsRequest to accept labels as string. I'm sorry, maybe I'm missing something Sunil G , but I don't understand why we needed to add a deprecated newInstance method that is only called by the new unit test.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 37s Maven dependency ordering for branch
          +1 mvninstall 12m 55s trunk passed
          +1 compile 7m 50s trunk passed
          +1 checkstyle 0m 51s trunk passed
          +1 mvnsite 1m 54s trunk passed
          +1 mvneclipse 1m 10s trunk passed
          -1 findbugs 1m 3s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 1 extant Findbugs warnings.
          +1 javadoc 1m 30s trunk passed
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 1m 21s the patch passed
          +1 compile 7m 15s the patch passed
          -1 javac 7m 15s hadoop-yarn-project_hadoop-yarn generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45)
          -0 checkstyle 0m 45s hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 37 unchanged - 0 fixed = 38 total (was 37)
          +1 mvnsite 1m 49s the patch passed
          +1 mvneclipse 1m 7s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 findbugs 1m 4s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
          -1 javadoc 0m 21s hadoop-yarn-api in the patch failed.
          -1 javadoc 0m 29s hadoop-yarn-common in the patch failed.
          +1 unit 0m 26s hadoop-yarn-api in the patch passed.
          +1 unit 2m 27s hadoop-yarn-common in the patch passed.
          -1 unit 39m 55s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 33s The patch does not generate ASF License warnings.
          100m 10s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
            Inconsistent synchronization of org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.updatedNodeLabels; locked 75% of time Unsynchronized access at AddToClusterNodeLabelsRequestPBImpl.java:75% of time Unsynchronized access at AddToClusterNodeLabelsRequestPBImpl.java:[line 67]
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart
            hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-6585
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868811/YARN-6585.0002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0100ed64cc3d 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 46d9c4c
          Default Java 1.8.0_131
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html
          javac https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn.txt
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15989/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/15989/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 37s Maven dependency ordering for branch +1 mvninstall 12m 55s trunk passed +1 compile 7m 50s trunk passed +1 checkstyle 0m 51s trunk passed +1 mvnsite 1m 54s trunk passed +1 mvneclipse 1m 10s trunk passed -1 findbugs 1m 3s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 1 extant Findbugs warnings. +1 javadoc 1m 30s trunk passed 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 1m 21s the patch passed +1 compile 7m 15s the patch passed -1 javac 7m 15s hadoop-yarn-project_hadoop-yarn generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) -0 checkstyle 0m 45s hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 37 unchanged - 0 fixed = 38 total (was 37) +1 mvnsite 1m 49s the patch passed +1 mvneclipse 1m 7s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 1m 4s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) -1 javadoc 0m 21s hadoop-yarn-api in the patch failed. -1 javadoc 0m 29s hadoop-yarn-common in the patch failed. +1 unit 0m 26s hadoop-yarn-api in the patch passed. +1 unit 2m 27s hadoop-yarn-common in the patch passed. -1 unit 39m 55s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 33s The patch does not generate ASF License warnings. 100m 10s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common   Inconsistent synchronization of org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.AddToClusterNodeLabelsRequestPBImpl.updatedNodeLabels; locked 75% of time Unsynchronized access at AddToClusterNodeLabelsRequestPBImpl.java:75% of time Unsynchronized access at AddToClusterNodeLabelsRequestPBImpl.java: [line 67] Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6585 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868811/YARN-6585.0002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0100ed64cc3d 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 46d9c4c Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html javac https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn.txt checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/15989/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15989/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15989/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          AddToClusterNodeLabelsRequestPBImpl.initLocalNodeLabels() was not handling deprecated fields. Thanks Wangda Tan for clarifying the same.

          Attaching a new patch which handles deprecated labels in string format. However i also had to a newInstance method in addToClusterNodeLabelsRequest to accept labels as string. Added a test case also. Wangda Tan, could you please take a look.

          Show
          sunilg Sunil G added a comment - AddToClusterNodeLabelsRequestPBImpl.initLocalNodeLabels() was not handling deprecated fields. Thanks Wangda Tan for clarifying the same. Attaching a new patch which handles deprecated labels in string format. However i also had to a newInstance method in addToClusterNodeLabelsRequest to accept labels as string. Added a test case also. Wangda Tan , could you please take a look.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Eric Payne/Nathan Roberts/Sunil G for reporting and investigating this issue.

          Sunil G I felt this fix is not correct, reversing fields is an incompatible change.
          In branch-2.7, we have string node labels in AddToClusterNodeLabelProto as 1st field. In existing branch-2.8, we added a NodeLabelProto to 2nd field and renamed 1st field to "deprecated-". So far this is compatible.

          The problem is, existing implementation:

            private void initLocalNodeLabels() {
              AddToClusterNodeLabelsRequestProtoOrBuilder p = viaProto ? proto : builder;
              List<NodeLabelProto> attributesProtoList = p.getNodeLabelsList();
              this.updatedNodeLabels = new ArrayList<NodeLabel>();
              for (NodeLabelProto r : attributesProtoList) {
                this.updatedNodeLabels.add(convertFromProtoFormat(r));
              }
            }
          

          Inside AddToClusterNodeLabelsRequestPBImpl doesn't read from deprecated node label string field (1st). In FileSystemNodeLabelStore, YARN read from serialized PB message and call new AddToClusterNodeLabelsRequestPBImpl(AddToClusterNodeLabelsRequestProto proto). If it fails to read from 2nd field, it should try to read from the 1st one.

          To make sure we have enough coverage, I suggest an unit test to read from branch-2.7 stored node label file and make sure all fields can be read from branch-2.8 and above.

          Thoughts?

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Eric Payne / Nathan Roberts / Sunil G for reporting and investigating this issue. Sunil G I felt this fix is not correct, reversing fields is an incompatible change. In branch-2.7, we have string node labels in AddToClusterNodeLabelProto as 1st field. In existing branch-2.8, we added a NodeLabelProto to 2nd field and renamed 1st field to "deprecated-". So far this is compatible. The problem is, existing implementation: private void initLocalNodeLabels() { AddToClusterNodeLabelsRequestProtoOrBuilder p = viaProto ? proto : builder; List<NodeLabelProto> attributesProtoList = p.getNodeLabelsList(); this .updatedNodeLabels = new ArrayList<NodeLabel>(); for (NodeLabelProto r : attributesProtoList) { this .updatedNodeLabels.add(convertFromProtoFormat(r)); } } Inside AddToClusterNodeLabelsRequestPBImpl doesn't read from deprecated node label string field (1st). In FileSystemNodeLabelStore, YARN read from serialized PB message and call new AddToClusterNodeLabelsRequestPBImpl(AddToClusterNodeLabelsRequestProto proto) . If it fails to read from 2nd field, it should try to read from the 1st one. To make sure we have enough coverage, I suggest an unit test to read from branch-2.7 stored node label file and make sure all fields can be read from branch-2.8 and above. Thoughts?
          Hide
          sunilg Sunil G added a comment -

          updating basic patch for review.
          cc/Wangda Tan

          Show
          sunilg Sunil G added a comment - updating basic patch for review. cc/ Wangda Tan
          Hide
          sunilg Sunil G added a comment -

          Yes. Order is changed somehow.

          message AddToClusterNodeLabelsRequestProto {
            repeated string deprecatedNodeLabels = 1;
            repeated NodeLabelProto nodeLabels = 2;
          }
          

          This should be changed like

          message AddToClusterNodeLabelsRequestProto {
            repeated NodeLabelProto nodeLabels = 1;
            repeated string deprecatedNodeLabels = 2;
          }
          

          I ll update a patch now.

          Show
          sunilg Sunil G added a comment - Yes. Order is changed somehow. message AddToClusterNodeLabelsRequestProto { repeated string deprecatedNodeLabels = 1; repeated NodeLabelProto nodeLabels = 2; } This should be changed like message AddToClusterNodeLabelsRequestProto { repeated NodeLabelProto nodeLabels = 1; repeated string deprecatedNodeLabels = 2; } I ll update a patch now.
          Hide
          nroberts Nathan Roberts added a comment - - edited

          YARN-6143 changed AddToClusterNodeLabelsRequestProto such that field 1 is now referred to as deprecatedNodeLabels. FileSystemNodeLabelsStore is referencing field 2 (nodeLabels) which will not be present in a 2.7 labelStore.

          CC: Wangda Tan Sunil G

          Show
          nroberts Nathan Roberts added a comment - - edited YARN-6143 changed AddToClusterNodeLabelsRequestProto such that field 1 is now referred to as deprecatedNodeLabels. FileSystemNodeLabelsStore is referencing field 2 (nodeLabels) which will not be present in a 2.7 labelStore. CC: Wangda Tan Sunil G

            People

            • Assignee:
              sunilg Sunil G
              Reporter:
              eepayne Eric Payne
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development