Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Target of this JIRA is to allow admin specify labels in each NM, this covers

      • User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by Allen Wittenauer (YARN-2729) )
      • NM will send labels to RM via ResourceTracker API
      • RM will set labels in NodeLabelManager when NM register/update labels
      1. YARN-2495.20150324-1.patch
        80 kB
        Naganarasimha G R
      2. YARN-2495.20150321-1.patch
        80 kB
        Naganarasimha G R
      3. YARN-2495.20150320-1.patch
        80 kB
        Naganarasimha G R
      4. YARN-2495.20150318-1.patch
        77 kB
        Naganarasimha G R
      5. YARN-2495.20150309-1.patch
        76 kB
        Naganarasimha G R
      6. YARN-2495.20150305-1.patch
        76 kB
        Naganarasimha G R
      7. YARN-2495.20141208-1.patch
        75 kB
        Naganarasimha G R
      8. YARN-2495.20141204-1.patch
        78 kB
        Naganarasimha G R
      9. YARN-2495.20141126-1.patch
        87 kB
        Naganarasimha G R
      10. YARN-2495.20141119-1.patch
        87 kB
        Naganarasimha G R
      11. YARN-2495.20141031-1.patch
        89 kB
        Naganarasimha G R
      12. YARN-2495.20141030-1.patch
        90 kB
        Naganarasimha G R
      13. YARN-2495.20141024-1.patch
        73 kB
        Naganarasimha G R
      14. YARN-2495.20141023-1.patch
        61 kB
        Naganarasimha G R
      15. YARN-2495_20141022.1.patch
        77 kB
        Naganarasimha G R

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2099 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2099/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2099 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2099/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #149 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/149/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #149 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/149/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #140 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/140/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #140 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/140/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2081 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2081/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2081 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2081/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/883/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #883 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/883/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          • hadoop-yarn-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #149 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/149/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java hadoop-yarn-project/CHANGES.txt
          Hide
          Naganarasimha Naganarasimha added a comment -

          Wow, it feels great to see my first work on feature going in. Thanks a lot for providing an oppurtunity and supporting me till the end Tan, Wangda. thanks for the reviews from Allen Wittenauer,Craig Welch,Vinod Kumar Vavilapalli & Jian Fang

          Show
          Naganarasimha Naganarasimha added a comment - Wow, it feels great to see my first work on feature going in. Thanks a lot for providing an oppurtunity and supporting me till the end Tan, Wangda . thanks for the reviews from Allen Wittenauer , Craig Welch , Vinod Kumar Vavilapalli & Jian Fang
          Hide
          Naganarasimha Naganarasimha added a comment -

          Wow, it feels great to see my first work on feature going in. Thanks a lot for providing an oppurtunity and supporting me till the end Tan, Wangda. thanks for the reviews from Allen Wittenauer,Craig Welch,Vinod Kumar Vavilapalli & Jian Fang

          Show
          Naganarasimha Naganarasimha added a comment - Wow, it feels great to see my first work on feature going in. Thanks a lot for providing an oppurtunity and supporting me till the end Tan, Wangda . thanks for the reviews from Allen Wittenauer , Craig Welch , Vinod Kumar Vavilapalli & Jian Fang
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7467 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7467/)
          YARN-2495. Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7467 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7467/ ) YARN-2495 . Allow admin specify labels from each NM (Distributed configuration for node label). (Naganarasimha G R via wangda) (wangda: rev 2a945d24f7de1a7ae6e7bd6636188ce3b55c7f52) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RegisterNodeManagerRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceTrackerOnHA.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatRequestPBImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/NodeLabelsProvider.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatRequest.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerResponse.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RegisterNodeManagerRequest.java
          Hide
          leftnoteasy Wangda Tan added a comment -

          Committed to trunk/branch-2, thanks Naganarasimha G R working on this and reviews from Allen Wittenauer, Craig Welch, Vinod Kumar Vavilapalli and Jian Fang.

          Show
          leftnoteasy Wangda Tan added a comment - Committed to trunk/branch-2, thanks Naganarasimha G R working on this and reviews from Allen Wittenauer , Craig Welch , Vinod Kumar Vavilapalli and Jian Fang .
          Hide
          leftnoteasy Wangda Tan added a comment -

          Last call for comments, I'm planning to commit this by the end of this weekend.

          Show
          leftnoteasy Wangda Tan added a comment - Last call for comments, I'm planning to commit this by the end of this weekend.
          Hide
          john.jian.fang Jian Fang added a comment -

          Great, thanks. Will try them.

          Show
          john.jian.fang Jian Fang added a comment - Great, thanks. Will try them.
          Hide
          leftnoteasy Wangda Tan added a comment -

          is it possible to provide a configuration to hook in different label providers on NM, for example, a third party one? (Sorry if this feature already exists).

          Yes, you can check in this patch, how LabelProvider created is leaving blank, and we have two JIRAs to make it configurable:

          This should be pluggable and new provider can be added in the future.

          Show
          leftnoteasy Wangda Tan added a comment - is it possible to provide a configuration to hook in different label providers on NM, for example, a third party one? (Sorry if this feature already exists). Yes, you can check in this patch, how LabelProvider created is leaving blank, and we have two JIRAs to make it configurable: YARN-2729 for script based YARN-2923 for config based This should be pluggable and new provider can be added in the future.
          Hide
          aw Allen Wittenauer added a comment -

          That's effectively what the executable interface is for....

          Show
          aw Allen Wittenauer added a comment - That's effectively what the executable interface is for....
          Hide
          john.jian.fang Jian Fang added a comment -

          BTW, I haven't gone through all the details of YARN-2492 yet, is it possible to provide a configuration to hook in different label providers on NM, for example, a third party one? (Sorry if this feature already exists).

          Show
          john.jian.fang Jian Fang added a comment - BTW, I haven't gone through all the details of YARN-2492 yet, is it possible to provide a configuration to hook in different label providers on NM, for example, a third party one? (Sorry if this feature already exists).
          Hide
          john.jian.fang Jian Fang added a comment -

          On each EC2 instance, the metadata about that instance such as its market type, i.e., spot or on-demand, CPUs, memory and etc are available when the instance starts up. All these information are injected to yarn-site.xml by our instance controller and they will not be changed afterwards. Different instances in an EMR cluster could have different static lables since one EMR hadoop consists of multiple instance groups, i.e., different types of instances.

          I think it is ok that no duplicated data are sent to RM if not NM lable changes.

          Thanks.

          Show
          john.jian.fang Jian Fang added a comment - On each EC2 instance, the metadata about that instance such as its market type, i.e., spot or on-demand, CPUs, memory and etc are available when the instance starts up. All these information are injected to yarn-site.xml by our instance controller and they will not be changed afterwards. Different instances in an EMR cluster could have different static lables since one EMR hadoop consists of multiple instance groups, i.e., different types of instances. I think it is ok that no duplicated data are sent to RM if not NM lable changes. Thanks.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Jian Fang,
          Thanks for your comments,
          I'm not sure if I completely understood what you said. Did you mean there're two different types of NMs, which is: some labels are not changed in NM's lifetime, some labels could be modified when NM's running (I think the decommission case you provided is better to be resolved by graceful NM decommission instead of node label.).

          Having a centralized node label list is mostly for resource planning, you can take a look at conversions on YARN-3214 for more details about resource planning stuffs.

          Regardless of the centralized node label list in RM side, I think current implementation of attached patch should work for you. Even if labels could be modified via heartbeat, but you can simply not change them in your own script, if there's no changes of NM's labels, no duplicated data will be sent to RM side.

          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Hi Jian Fang , Thanks for your comments, I'm not sure if I completely understood what you said. Did you mean there're two different types of NMs, which is: some labels are not changed in NM's lifetime, some labels could be modified when NM's running (I think the decommission case you provided is better to be resolved by graceful NM decommission instead of node label.). Having a centralized node label list is mostly for resource planning, you can take a look at conversions on YARN-3214 for more details about resource planning stuffs. Regardless of the centralized node label list in RM side, I think current implementation of attached patch should work for you. Even if labels could be modified via heartbeat, but you can simply not change them in your own script, if there's no changes of NM's labels, no duplicated data will be sent to RM side. Wangda
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Jian Fang,
          Well this jira is followed by YARN-2729, where in labels got from the script are passed as part of heartbeat which makes the distributed label configuration as dynamic. Also as part of this jira we have tried to ensure that only when there is change in labels we send and if not we do not send static lables to each heartbeat.
          And for your case if cluster controller process wants to label a node so that it can graceful shrink can be done in 2 ways:

          • Use REST API and change the label of the node to some unique label which is not visible to other users
          • After YARN-2729, may be you can have some script which has appropriate logic to update RM with some some unique label when it wants to shrink itself gracefully.
            Hope i have addressed your scenario
          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Jian Fang , Well this jira is followed by YARN-2729 , where in labels got from the script are passed as part of heartbeat which makes the distributed label configuration as dynamic. Also as part of this jira we have tried to ensure that only when there is change in labels we send and if not we do not send static lables to each heartbeat. And for your case if cluster controller process wants to label a node so that it can graceful shrink can be done in 2 ways: Use REST API and change the label of the node to some unique label which is not visible to other users After YARN-2729 , may be you can have some script which has appropriate logic to update RM with some some unique label when it wants to shrink itself gracefully. Hope i have addressed your scenario
          Hide
          john.jian.fang Jian Fang added a comment -

          In a cloud environment such as Amazon EMR, a hadoop cluster is launched as a service by a single command line. There is no admin at all and everything is automated. The lables are basically of two types, one is static. For example, the nature of an EC2 instance such as spot or on-demand. The other is dynamic. For example, the cluster controller process can set an instance to be a candidate to be terminated in the case of graceful shrink so that resource manager will not assign new tasks to it.

          Most likely, the labels specified from each NM are static and are provided by a cluster controller process to write into yarn-site.xml based on EC2 metadata available on each EC2 instance. As a result, at least you should defined a static lablel provider (plus a dynamic lable provider? not sure) so that these lables are only sent to resource manager at NM registeration time. There is no point to add the static lables to each heartbeat.

          I think the idea of central and distributed label configurations are not ideal to use in a cloud environment. Usually we have a mix of static lables from each node and dynamic labels that are specified against the resource manager directly. Static and dynamic lable concepts are more appopriate at least for Amazon EMR.

          Show
          john.jian.fang Jian Fang added a comment - In a cloud environment such as Amazon EMR, a hadoop cluster is launched as a service by a single command line. There is no admin at all and everything is automated. The lables are basically of two types, one is static. For example, the nature of an EC2 instance such as spot or on-demand. The other is dynamic. For example, the cluster controller process can set an instance to be a candidate to be terminated in the case of graceful shrink so that resource manager will not assign new tasks to it. Most likely, the labels specified from each NM are static and are provided by a cluster controller process to write into yarn-site.xml based on EC2 metadata available on each EC2 instance. As a result, at least you should defined a static lablel provider (plus a dynamic lable provider? not sure) so that these lables are only sent to resource manager at NM registeration time. There is no point to add the static lables to each heartbeat. I think the idea of central and distributed label configurations are not ideal to use in a cloud environment. Usually we have a mix of static lables from each node and dynamic labels that are specified against the resource manager directly. Static and dynamic lable concepts are more appopriate at least for Amazon EMR.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for update. Patch LGTM, +1. will wait and commit in a few days if there's no opposite opinions.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for update. Patch LGTM, +1. will wait and commit in a few days if there's no opposite opinions.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706826/YARN-2495.20150324-1.patch
          against trunk revision 9fae455.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7088//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7088//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706826/YARN-2495.20150324-1.patch against trunk revision 9fae455. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7088//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7088//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Wangda Tan,
          Oops mistake from my side, Have uploaded the patch with correction, Please check.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Wangda Tan , Oops mistake from my side, Have uploaded the patch with correction, Please check.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hmm.. StringArrayProto.stringElement -> elements is still not changed in latest patch, could you take a look again?
          I meant to remove the "string" prefix, since the StringArrayProto already indicates that. Beyond that, patch LGTM.

          Show
          leftnoteasy Wangda Tan added a comment - Hmm.. StringArrayProto.stringElement -> elements is still not changed in latest patch, could you take a look again? I meant to remove the "string" prefix, since the StringArrayProto already indicates that. Beyond that, patch LGTM.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Wangda Tan following test case failures are not related to the changes made in the patch

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Wangda Tan following test case failures are not related to the changes made in the patch
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12706116/YARN-2495.20150321-1.patch
          against trunk revision e1feb4e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7059//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7059//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706116/YARN-2495.20150321-1.patch against trunk revision e1feb4e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7059//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7059//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hopefully final patch
          Have taken care of the following :

          • StringArrayProto.stringElement -> elements
          • remove original.setNodeLabels(null) in testNodeHeartbeatRequestPBImplWithNullLabels
          • NodeLabelsProviderService -> NodeLabelsProvider modifications
          • nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM
          Show
          Naganarasimha Naganarasimha G R added a comment - Hopefully final patch Have taken care of the following : StringArrayProto.stringElement -> elements remove original.setNodeLabels(null) in testNodeHeartbeatRequestPBImplWithNullLabels NodeLabelsProviderService -> NodeLabelsProvider modifications nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM
          Hide
          leftnoteasy Wangda Tan added a comment -

          2) You decide.
          5)/7) You're correct, provider should only know it's label instead of "updated" or not, so null from provider should be considered to be "empty". And areNodeLabelsUpdated only receives non-null parameters, add a comment before areNodeLabelsUpdated to indicate null variables will be handled.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - 2) You decide. 5)/7) You're correct, provider should only know it's label instead of "updated" or not, so null from provider should be considered to be "empty". And areNodeLabelsUpdated only receives non-null parameters, add a comment before areNodeLabelsUpdated to indicate null variables will be handled. Thanks,
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks for review Tan, Wangda :
          1) StringArrayProto.stringElement -> elements, while naming had done based on Protoc but as you mentioned seems like
          elements would make more sense in terms of java api, will correct it as part of the next patch.

          2)

          After thought, I think optional bool areNodeLabelsAcceptedByRM = 7 [default = false]; should be true to be more defensive: We need make sure there's no error when somebody forget to set this field.


          Well my view as explained earlier is lil diff, as even if some body forget it test case will ensure they do not miss it but the case which i mentioned earlier check whether its legitamate scenario
          But consider the case where NM gets upgraded first then it should not be the case NM sends labels and older RM ignores the additional labels but response by default sends labels are accepted. And also felt by name/functionality, it should be set to true only after RM accepts the labels

          3) will be taken care in next patch

          4)

          NodeLabelsProviderService -> NodeLabelsProvider, like most other modules, we don't need to make "service" as a part of the classname, change sub classes and NodeManager.createNodeLabelsProviderService as well.


          Well i agree with this but long b4 you had only asked me to change it NodeLabelsProviderService Prev comment 4th point. will get it done in next patch

          5)

          NodeStatusUpdaterImpl.run:
          617 int lastHeartbeatID = 0;
          618 Set<String> nodeLabelsLastUpdatedToRM = null;
          619 if (hasNodeLabelsProvider)
          No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? By default is "not change" instead of "empty", correct?


          nodeLabelsLastUpdatedToRM => lastUpdatedNodeLabelsToRM i.e. its not the one which will be sent as part of heartbeat (nodeLabelsForHeartbeat is used) and nodeLabelsLastUpdatedToRM is reference for comparison whether labels have changed since the last call. And as per the logic, NodeLabelsProvider even if its return null we consider it as empty labels. So i feel its correctly set.

          6) will take care in next patch

          7) areNodeLabelsUpdated: Need check null? And could you add more test to cover the case when new fetched node labels and/or last node labels are null?
          Need check null has been taken care before calling this method, also you had asked for commenting this which i have done in NodeLabelsProvider, wil do this here too.
          add more test to cover the case when new fetched node labels already there in TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(ln num 271 -277)
          As explained in prev comment even though the NodeLabelsProvider return null node labels we consider it as empty set hence there is no way last node labels are null. Null is sent as part of heartbeat only when lastUpdatedNodeLabelsToRM == nodeLabelsForHeartbeat . Will put some comments for this.

          Please let me know your comments will try to give the patch ASAP..

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks for review Tan, Wangda : 1) StringArrayProto.stringElement -> elements , while naming had done based on Protoc but as you mentioned seems like elements would make more sense in terms of java api, will correct it as part of the next patch. 2) After thought, I think optional bool areNodeLabelsAcceptedByRM = 7 [default = false]; should be true to be more defensive: We need make sure there's no error when somebody forget to set this field. Well my view as explained earlier is lil diff, as even if some body forget it test case will ensure they do not miss it but the case which i mentioned earlier check whether its legitamate scenario But consider the case where NM gets upgraded first then it should not be the case NM sends labels and older RM ignores the additional labels but response by default sends labels are accepted. And also felt by name/functionality, it should be set to true only after RM accepts the labels 3) will be taken care in next patch 4) NodeLabelsProviderService -> NodeLabelsProvider, like most other modules, we don't need to make "service" as a part of the classname, change sub classes and NodeManager.createNodeLabelsProviderService as well. Well i agree with this but long b4 you had only asked me to change it NodeLabelsProviderService Prev comment 4th point . will get it done in next patch 5) NodeStatusUpdaterImpl.run: 617 int lastHeartbeatID = 0; 618 Set<String> nodeLabelsLastUpdatedToRM = null; 619 if (hasNodeLabelsProvider) No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? By default is "not change" instead of "empty", correct? nodeLabelsLastUpdatedToRM => lastUpdatedNodeLabelsToRM i.e. its not the one which will be sent as part of heartbeat (nodeLabelsForHeartbeat is used) and nodeLabelsLastUpdatedToRM is reference for comparison whether labels have changed since the last call. And as per the logic, NodeLabelsProvider even if its return null we consider it as empty labels. So i feel its correctly set. 6) will take care in next patch 7) areNodeLabelsUpdated: Need check null? And could you add more test to cover the case when new fetched node labels and/or last node labels are null? Need check null has been taken care before calling this method, also you had asked for commenting this which i have done in NodeLabelsProvider, wil do this here too. add more test to cover the case when new fetched node labels already there in TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(ln num 271 -277) As explained in prev comment even though the NodeLabelsProvider return null node labels we consider it as empty set hence there is no way last node labels are null. Null is sent as part of heartbeat only when lastUpdatedNodeLabelsToRM == nodeLabelsForHeartbeat . Will put some comments for this. Please let me know your comments will try to give the patch ASAP..
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705721/YARN-2495.20150320-1.patch
          against trunk revision 91baca1.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7028//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7028//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705721/YARN-2495.20150320-1.patch against trunk revision 91baca1. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7028//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7028//console This message is automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Go through patch again, hopefully this is last round from my side

          1) StringArrayProto.stringElement -> elements

          2) After thought, I think optional bool areNodeLabelsAcceptedByRM = 7 [default = false]; should be true to be more defensive: We need make sure there's no error when somebody forget to set this field.

          3) testNodeHeartbeatRequestPBImplWithNullLabels: remove original.setNodeLabels(null);, test should still pass.

          4) NodeLabelsProviderService -> NodeLabelsProvider, like most other modules, we don't need to make "service" as a part of the classname, change sub classes and NodeManager.createNodeLabelsProviderService as well.

          5) NodeStatusUpdaterImpl.run:

          617	        int lastHeartbeatID = 0;
          618	        Set<String> nodeLabelsLastUpdatedToRM = null;
          619	        if (hasNodeLabelsProvider) {
          

          No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? By default is "not change" instead of "empty", correct?

          6) nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM

          7) areNodeLabelsUpdated: Need check null? And could you add more test to cover the case when new fetched node labels and/or last node labels are null?

          Show
          leftnoteasy Wangda Tan added a comment - Go through patch again, hopefully this is last round from my side 1) StringArrayProto.stringElement -> elements 2) After thought, I think optional bool areNodeLabelsAcceptedByRM = 7 [default = false]; should be true to be more defensive: We need make sure there's no error when somebody forget to set this field. 3) testNodeHeartbeatRequestPBImplWithNullLabels: remove original.setNodeLabels(null); , test should still pass. 4) NodeLabelsProviderService -> NodeLabelsProvider, like most other modules, we don't need to make "service" as a part of the classname, change sub classes and NodeManager.createNodeLabelsProviderService as well. 5) NodeStatusUpdaterImpl.run: 617 int lastHeartbeatID = 0; 618 Set< String > nodeLabelsLastUpdatedToRM = null ; 619 if (hasNodeLabelsProvider) { No matter if hasNodeLabelsProvider, nodeLabelsLastUpdatedToRM should be null? By default is "not change" instead of "empty", correct? 6) nodeLabelsLastUpdatedToRM -> lastUpdatedNodeLabelsToRM 7) areNodeLabelsUpdated: Need check null? And could you add more test to cover the case when new fetched node labels and/or last node labels are null?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Updated Patch with following changes :

          • Typo, lable -> label,
          • NodeStatusUpdaterImpl: no need to call nodeLabelsProvider.getNodeLabels() twice when register/heartbeat
          • HeartBeat -> Heartbeat
          • NodeStatusUpdaterImpl: When labels are rejected by RM, you should log it with diag message.
          • StringArrayProto instead of NodeIdToLabelsProto
          • NodeStatusUpdaterTest.testNMRegistrationWithLabels to testNodeStatusUpdaterForNodeLabels
          • TestResourceTrackerService, lblsMgr -> nodeLabelsMgr
          • Validation for "heartbeat without updating labels"
          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Updated Patch with following changes : Typo, lable -> label, NodeStatusUpdaterImpl: no need to call nodeLabelsProvider.getNodeLabels() twice when register/heartbeat HeartBeat -> Heartbeat NodeStatusUpdaterImpl: When labels are rejected by RM, you should log it with diag message. StringArrayProto instead of NodeIdToLabelsProto NodeStatusUpdaterTest.testNMRegistrationWithLabels to testNodeStatusUpdaterForNodeLabels TestResourceTrackerService, lblsMgr -> nodeLabelsMgr Validation for "heartbeat without updating labels"
          Hide
          leftnoteasy Wangda Tan added a comment -

          ResourceTrackerForLabels.labels -> lastReceivedLabels

          262	    // heartbeat without updating labels
          263	    nm.getNodeStatusUpdater().sendOutofBandHeartBeat();
          264	    resourceTracker.waitTillHeartBeat();
          265	    resourceTracker.resetNMHeartbeatReceiveFlag();
          

          Need add some check for this operation?

          Show
          leftnoteasy Wangda Tan added a comment - ResourceTrackerForLabels.labels -> lastReceivedLabels 262 // heartbeat without updating labels 263 nm.getNodeStatusUpdater().sendOutofBandHeartBeat(); 264 resourceTracker.waitTillHeartBeat(); 265 resourceTracker.resetNMHeartbeatReceiveFlag(); Need add some check for this operation?
          Hide
          Naganarasimha Naganarasimha added a comment -

          For 1) I would like to opt for generic like StringArrayProto as it can be reused if req by others in future. Will work on this and place this in yarn_protos.proto

          bq 1.NodeStatusUpdaterTest: Some places need to cover:
          My mistake did not name the method properly it was testing all the flows in a single method testNMRegistrationWithLabels,
          Following cases are covered
          1. Register with EMPTY_SET //line no 247
          2. heartbeat with updated labels // line no 253 -262
          3. heartbeat without updating labels // line no 262 - 267
          4. provider return with null labels // line no 267 - 273
          Hope it covers all the scenario's will correct the name in the next patch

          bq lblsMgr -> nodeLabelsMgr or labelsMgr
          will correct in next patch

          Show
          Naganarasimha Naganarasimha added a comment - For 1) I would like to opt for generic like StringArrayProto as it can be reused if req by others in future. Will work on this and place this in yarn_protos.proto bq 1.NodeStatusUpdaterTest: Some places need to cover: My mistake did not name the method properly it was testing all the flows in a single method testNMRegistrationWithLabels , Following cases are covered 1. Register with EMPTY_SET //line no 247 2. heartbeat with updated labels // line no 253 -262 3. heartbeat without updating labels // line no 262 - 267 4. provider return with null labels // line no 267 - 273 Hope it covers all the scenario's will correct the name in the next patch bq lblsMgr -> nodeLabelsMgr or labelsMgr will correct in next patch
          Hide
          Naganarasimha Naganarasimha added a comment -

          For 1) I would like to opt for generic like StringArrayProto as it can be reused if req by others in future. Will work on this and place this in yarn_protos.proto

          bq 1.NodeStatusUpdaterTest: Some places need to cover:
          My mistake did not name the method properly it was testing all the flows in a single method testNMRegistrationWithLabels,
          Following cases are covered
          1. Register with EMPTY_SET //line no 247
          2. heartbeat with updated labels // line no 253 -262
          3. heartbeat without updating labels // line no 262 - 267
          4. provider return with null labels // line no 267 - 273
          Hope it covers all the scenario's will correct the name in the next patch

          bq lblsMgr -> nodeLabelsMgr or labelsMgr
          will correct in next patch

          Show
          Naganarasimha Naganarasimha added a comment - For 1) I would like to opt for generic like StringArrayProto as it can be reused if req by others in future. Will work on this and place this in yarn_protos.proto bq 1.NodeStatusUpdaterTest: Some places need to cover: My mistake did not name the method properly it was testing all the flows in a single method testNMRegistrationWithLabels , Following cases are covered 1. Register with EMPTY_SET //line no 247 2. heartbeat with updated labels // line no 253 -262 3. heartbeat without updating labels // line no 262 - 267 4. provider return with null labels // line no 267 - 273 Hope it covers all the scenario's will correct the name in the next patch bq lblsMgr -> nodeLabelsMgr or labelsMgr will correct in next patch
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naganarasimha G R,
          For 1)
          I suggest to use a separated PB message, such as NodeLabelsProto (or make it generic like StringArrayProto), which contains repeat string.
          When using NodeIdToLabelsProto, but don't use nodeId, that will confuse people.

          About test cases
          1. NodeStatusUpdaterTest:
          Some places need to cover:

          • NM register, should check RTS (ResourceTrackerService) labels (done)
          • NM heartheat, should check RTS labels (TODO)
          • NM headtbeat without update, should check RTS received labels (TODO)
          • NM heartbeat with update, should check RTS received labels (TODO)

          2. TestResourceTrackerService
          Test generally very good to me,

          • lblsMgr -> nodeLabelsMgr or labelsMgr
          Show
          leftnoteasy Wangda Tan added a comment - Hi Naganarasimha G R , For 1) I suggest to use a separated PB message, such as NodeLabelsProto (or make it generic like StringArrayProto), which contains repeat string . When using NodeIdToLabelsProto, but don't use nodeId, that will confuse people. About test cases 1. NodeStatusUpdaterTest: Some places need to cover: NM register, should check RTS (ResourceTrackerService) labels (done) NM heartheat, should check RTS labels (TODO) NM headtbeat without update, should check RTS received labels (TODO) NM heartbeat with update, should check RTS received labels (TODO) 2. TestResourceTrackerService Test generally very good to me, lblsMgr -> nodeLabelsMgr or labelsMgr
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          1. Well i think i am missing something or you have miss-read my patch.

          • in both request Protos i have used NodeIdToLabelsProto optional NodeIdToLabelsProto nodeLabels = 8; and optional NodeIdToLabelsProto nodeLabels = 4; and not used repeat string nodeLabels directly
          • Test cases TestYarnServerApiClasses.testNodeHeartbeatRequestPBImpl, testNodeHeartbeatRequestPBImplWithNullLabels, testRegisterNodeManagerRequestWithNullLabels and testRegisterNodeManagerRequestWithValidLabels validates that the approach in the patch supports null and filled labelsset, manually tested for empty set and as expected that too worked. will get this test case added in next patch
          • thought of creating a new proto like "StringSetProto" but felt why create another proto class just for this purpose and you too had mentioned to use NodeIdToLabelsProto hence made use of existing proto class

          2. Typo, lable -> label, : oops because of typo in proto, generated methods also had issues hence proto and places accessing these methods(6 instances) have this error. Will get it corrected in next patch.

          3. optional bool areNodeLablesAcceptedByRM = 7 [default = false], I think default should be true.
          Personally i felt it should not matter as I am explicitly handling in the code. But consider the case where NM gets upgraded first then it should not be the case NM sends labels and older RM ignores the additional labels but response by default sends labels are accepted. And also felt by name/functionality, it should be set to true only after RM accepts the labels

          4,5,6
          --will get it corrected as part of next patch.

          Also one favor, it would be helpful if you review the test-cases and give feed back on them too, as it will reduce my effort in creating multiple patches. I understand that its huge size patch but anyway i feel major aspects/functionality seems to be stable with the last patch.

          Show
          Naganarasimha Naganarasimha G R added a comment - 1. Well i think i am missing something or you have miss-read my patch. in both request Protos i have used NodeIdToLabelsProto optional NodeIdToLabelsProto nodeLabels = 8; and optional NodeIdToLabelsProto nodeLabels = 4; and not used repeat string nodeLabels directly Test cases TestYarnServerApiClasses.testNodeHeartbeatRequestPBImpl, testNodeHeartbeatRequestPBImplWithNullLabels, testRegisterNodeManagerRequestWithNullLabels and testRegisterNodeManagerRequestWithValidLabels validates that the approach in the patch supports null and filled labelsset, manually tested for empty set and as expected that too worked. will get this test case added in next patch thought of creating a new proto like "StringSetProto" but felt why create another proto class just for this purpose and you too had mentioned to use NodeIdToLabelsProto hence made use of existing proto class 2. Typo, lable -> label, : oops because of typo in proto, generated methods also had issues hence proto and places accessing these methods(6 instances) have this error. Will get it corrected in next patch. 3. optional bool areNodeLablesAcceptedByRM = 7 [default = false], I think default should be true. Personally i felt it should not matter as I am explicitly handling in the code. But consider the case where NM gets upgraded first then it should not be the case NM sends labels and older RM ignores the additional labels but response by default sends labels are accepted. And also felt by name/functionality, it should be set to true only after RM accepts the labels 4,5,6 --will get it corrected as part of next patch. Also one favor, it would be helpful if you review the test-cases and give feed back on them too, as it will reduce my effort in creating multiple patches. I understand that its huge size patch but anyway i feel major aspects/functionality seems to be stable with the last patch.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for updating.

          1. I saw you're trying to make repeat string nodeLabels to be 3 different values: null, empty, and non-empty. But I'm not sure if this works in PB, could you write a test to verify that? (Not-set/Set-nothing/Set-value to RegisterNodeManagerRequestPBImpl.nodeLabels, create a new RegisterNodeManagerRequestPBImpl from old.getProto(), and get nodeLabels() to see if it works).

          If this doesn't work, you can create a PB message like StringSetProto use it in messages like RegisterNodeManagerRequest, which can support null/empty/non-empty

          2. Typo, lable -> label, I found several in your patch.

          3. optional bool areNodeLablesAcceptedByRM = 7 [default = false], I think default should be true.

          4. NodeStatusUpdaterImpl: no need to call nodeLabelsProvider.getNodeLabels() twice when register/heartbeat

          5. HeartBeat -> Heartbeat

          6. NodeStatusUpdaterImpl: When labels are rejected by RM, you should log it with diag message.

          Will include test review in next round.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for updating. 1. I saw you're trying to make repeat string nodeLabels to be 3 different values: null, empty, and non-empty. But I'm not sure if this works in PB, could you write a test to verify that? (Not-set/Set-nothing/Set-value to RegisterNodeManagerRequestPBImpl.nodeLabels, create a new RegisterNodeManagerRequestPBImpl from old.getProto(), and get nodeLabels() to see if it works). If this doesn't work, you can create a PB message like StringSetProto use it in messages like RegisterNodeManagerRequest, which can support null/empty/non-empty 2. Typo, lable -> label, I found several in your patch. 3. optional bool areNodeLablesAcceptedByRM = 7 [default = false], I think default should be true. 4. NodeStatusUpdaterImpl: no need to call nodeLabelsProvider.getNodeLabels() twice when register/heartbeat 5. HeartBeat -> Heartbeat 6. NodeStatusUpdaterImpl: When labels are rejected by RM, you should log it with diag message. Will include test review in next round.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705179/YARN-2495.20150318-1.patch
          against trunk revision 968425e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7003//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7003//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705179/YARN-2495.20150318-1.patch against trunk revision 968425e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7003//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7003//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Thanks for finalizing the solution ,
          +1 for use of NodeIdToLabelsProto, i have incorporated this in the patch but ensured that interface will have set and get NodeLabels in NodeHeartbeatRequest and RegisterNodeManagerRequest but manipulate in PBImpl's such that NodeIdToLabelsProto is communicated across. So NodeLabelsSet will be null if no labels are sent across.

          +1 for handling of "Regarding the invalid node label issue of registration/hearbeating" , Making the node labels empty will ensure that based on old labels, scheduler will not wrongly allocate. But as part of this patch will be only handling the logging of invalid labels in NM and RM. WebUI updation and setting the node labels empty will be handled as part of new jira, as this jira is becoming bulkier

          I have incorporated other review comments (including the NM and NodeStatusUpdateImpl modifications wrt Wangda's comments)

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Thanks for finalizing the solution , +1 for use of NodeIdToLabelsProto , i have incorporated this in the patch but ensured that interface will have set and get NodeLabels in NodeHeartbeatRequest and RegisterNodeManagerRequest but manipulate in PBImpl's such that NodeIdToLabelsProto is communicated across. So NodeLabelsSet will be null if no labels are sent across. +1 for handling of "Regarding the invalid node label issue of registration/hearbeating" , Making the node labels empty will ensure that based on old labels, scheduler will not wrongly allocate. But as part of this patch will be only handling the logging of invalid labels in NM and RM. WebUI updation and setting the node labels empty will be handled as part of new jira, as this jira is becoming bulkier I have incorporated other review comments (including the NM and NodeStatusUpdateImpl modifications wrt Wangda's comments)
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naganarasimha G R,
          For areNodeLabelsSetInReq, one appoarch to avoid it is, you can add a PB structure (like NodeLabelsProto, could placed in yarn_protos.proto, like NodeIdToLabelsProto), which contains repeated string nodeLabels. It will be a part of RegisterProto and HeartbeatProto, and it could be null. "Null" means not set, agree?

          Regarding the invalid node label issue of registration/hearbeating, bring down the whole cluster when label is invalid is a serious problem as mentioned by Craig, maybe you can:
          If label(s) of a node being reporting is invalid, we can

          • Show/log diagnostic in RM (nodes) page and NM page, saying label is invalid. (Need modify web UI, can be done in a separated task)
          • Make the node's labels to be empty, so that applications can continue use it.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naganarasimha G R , For areNodeLabelsSetInReq, one appoarch to avoid it is, you can add a PB structure (like NodeLabelsProto, could placed in yarn_protos.proto, like NodeIdToLabelsProto), which contains repeated string nodeLabels . It will be a part of RegisterProto and HeartbeatProto, and it could be null. "Null" means not set, agree? Regarding the invalid node label issue of registration/hearbeating, bring down the whole cluster when label is invalid is a serious problem as mentioned by Craig, maybe you can: If label(s) of a node being reporting is invalid, we can Show/log diagnostic in RM (nodes) page and NM page, saying label is invalid. (Need modify web UI, can be done in a separated task) Make the node's labels to be empty, so that applications can continue use it. Thanks,
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          It's just that I don't like this areNodeLabelsSetInReq flag in the protocol. Are there other ways of achieving this?

          Other way out is always send the set of labels as part of every heartbeat. We wanted to avoid this traffic, hence initially we came up with this approach when we were supporting multiple labels for a node (may be in future we might support multiple labels again right ?)

          I think treating invalid labels as a disaster case will be, well, a disaster. : liked the sentence

          How about we let the node run (just like we let an unhealthy node run) and report it in the diagnostics? I'm okay keeping that same behavior during registration too.

          Yes this is similar to the earlier behavior we had in the Dec's patch , Additionally to inform back the failure, we had added one flag to inform NM whether RM accepted the labels or not and Diag Message was also set with appropriate message. Whether this approach is fine ?

          Show
          Naganarasimha Naganarasimha G R added a comment - It's just that I don't like this areNodeLabelsSetInReq flag in the protocol. Are there other ways of achieving this? Other way out is always send the set of labels as part of every heartbeat. We wanted to avoid this traffic, hence initially we came up with this approach when we were supporting multiple labels for a node (may be in future we might support multiple labels again right ?) I think treating invalid labels as a disaster case will be, well, a disaster. : liked the sentence How about we let the node run (just like we let an unhealthy node run) and report it in the diagnostics? I'm okay keeping that same behavior during registration too. Yes this is similar to the earlier behavior we had in the Dec's patch , Additionally to inform back the failure, we had added one flag to inform NM whether RM accepted the labels or not and Diag Message was also set with appropriate message. Whether this approach is fine ?
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required.

          It's just that I don't like this areNodeLabelsSetInReq flag in the protocol. Are there other ways of achieving this?

          Well i am little confused here, As per wangda's earlier comment i understand that it was your comment to send shutdown (which i felt correct in terms of maintenance)

          I think treating invalid labels as a disaster case will be, well, a disaster. How about we let the node run (just like we let an unhealthy node run) and report it in the diagnostics? I'm okay keeping that same behavior during registration too.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required. It's just that I don't like this areNodeLabelsSetInReq flag in the protocol. Are there other ways of achieving this? Well i am little confused here, As per wangda's earlier comment i understand that it was your comment to send shutdown (which i felt correct in terms of maintenance) I think treating invalid labels as a disaster case will be, well, a disaster. How about we let the node run (just like we let an unhealthy node run) and report it in the diagnostics? I'm okay keeping that same behavior during registration too.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          I'm wondering if we might introduce a situation where a script error or other configuration issue could bring down an entire cluster

          Exactly my concern too.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - I'm wondering if we might introduce a situation where a script error or other configuration issue could bring down an entire cluster Exactly my concern too.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Actually I've thought about this before, but since the node-labels.enabled is already shipped with Hadoop 2.7, we cannot change this.

          Show
          leftnoteasy Wangda Tan added a comment - Actually I've thought about this before, but since the node-labels.enabled is already shipped with Hadoop 2.7, we cannot change this.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naga,

          Well modifications side is clear to me but is it good to allow the configurations being different from NM and RM ? Infact i wanted to discuss regarding whether to send shutdown during register if NM is configured differently from RM, but waited for the base changes to go in before discussing new stuff.

          It is not make the configuration different, my thinking of this is, NM doesn't need understand what is "distribtued-configuration", admin should know it. When the node-label is "distributed-configuration", NM should go ahead and properly configure script provider, etc.
          So we aren't trying to create different, we just eliminate one option in NM side.

          I feel better to Log this as "Error" as we are sending the labels only in case of any change and there has to be some way to identify if labels for a given NM and also currently we are sending out shutdown signal too.

          What I meant are the two line,

          498	      LOG.info("Node Labels {" + StringUtils.join(",", nodeLabels)
          499	          + "} from Node " + nodeId + " were Accepted from RM");
          

          I guess you may misread my comment


          For the field to indicate if node labels are set in NodeHeartbeatRequest/NodeRegistrationRequest, there're two proposals:

          • setAreNodeLabelsSetInReq
          • setAreNodeLabelsUpdated
            Which one you prefer, Vinod/Craig? I vote the latter one

          We should not even accept a node's registration when it reports invalid labels

          IIUC, now the patch already reject node when it reports invalid labels.

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naga, – Well modifications side is clear to me but is it good to allow the configurations being different from NM and RM ? Infact i wanted to discuss regarding whether to send shutdown during register if NM is configured differently from RM, but waited for the base changes to go in before discussing new stuff. It is not make the configuration different, my thinking of this is, NM doesn't need understand what is "distribtued-configuration", admin should know it. When the node-label is "distributed-configuration", NM should go ahead and properly configure script provider, etc. So we aren't trying to create different, we just eliminate one option in NM side. – I feel better to Log this as "Error" as we are sending the labels only in case of any change and there has to be some way to identify if labels for a given NM and also currently we are sending out shutdown signal too. What I meant are the two line, 498 LOG.info( "Node Labels {" + StringUtils.join( "," , nodeLabels) 499 + "} from Node " + nodeId + " were Accepted from RM" ); I guess you may misread my comment – For the field to indicate if node labels are set in NodeHeartbeatRequest/NodeRegistrationRequest, there're two proposals: setAreNodeLabelsSetInReq setAreNodeLabelsUpdated Which one you prefer, Vinod/Craig? I vote the latter one – We should not even accept a node's registration when it reports invalid labels IIUC, now the patch already reject node when it reports invalid labels.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda & Vinod Kumar Vavilapalli,
          Also one query/suggestion, would it be better to have single configuration for node labels
          yarn.node-labels.configuration.type=<centralized/distributed/disabled> (with default as disabled) instead of currently available 2 configurations i.e. "yarn.node-labels.enabled" and "yarn.node-labels.configuration.type", so that we can avoid one more configuration ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda & Vinod Kumar Vavilapalli , Also one query/suggestion, would it be better to have single configuration for node labels yarn.node-labels.configuration.type=<centralized/distributed/disabled> (with default as disabled) instead of currently available 2 configurations i.e. "yarn.node-labels.enabled" and "yarn.node-labels.configuration.type", so that we can avoid one more configuration ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          1) IMO method name was not readable when it was setAreNodeLabelsSet but i have changed it to setAreNodeLabelsSetInReq i feel this is sufficient. setAreNodeLabelsUpdated is same as earlier for which Craig had commented (which i also feel valid)

          I would go with areNodeLablesSet (all "isNodeLabels" => "areNodeLabels" wherever it appears, actually) - wrt "Set" vs "Updated" - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, "updatedness", whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name.

          Yes true lets finalize the name this time after that will start working on the patch if not it will be a wasted effort
          5)

          It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration.
          So I suggest to make a changes: In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf, since we will not have "distributedNodeLabelConf" in NM side if you agree on previously comment, instead, it will check null of provider.

          Well modifications side is clear to me but is it good to allow the configurations being different from NM and RM ? Infact i wanted to discuss regarding whether to send shutdown during register if NM is configured differently from RM, but waited for the base changes to go in before discussing new stuff.

          8) You can add an additional comments in line 626 for this. Ok will add a comment in LabelProvider.getLabels , Idea is LabelProvider is expected to give same Labels continiously untill there is a change and if null or empty is returned then No label is assumed

          10) updateNodeLabelsInNodeLabelsManager -> updateNodeLabelsFromNMReport : will take care in next patch
          LOG.info(... accepted from RM, use LOG.debug and check isDebugEnabled. : I feel better to Log this as "Error" as we are sending the labels only in case of any change and there has to be some way to identify if labels for a given NM and also currently we are sending out shutdown signal too.

          Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of "not properly configured".
          i think i have captured first point, but any way will reframe it as "Node Labels <labels> reported from the NM with id <nodeID> were rejected from RM with exception message as <exceptionMsg>.

          Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). Will work on this once 2495 and 2729 are done ..

          Thanks Vinod Kumar Vavilapalli & Craig Welch for reviewing it
          configuration.type -> configuration-type will take care in next patch

          Should RegisterNodeManagerRequestProto.nodeLabels be a set instead?
          Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment?

          Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required.
          RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario.
          Well though i am not much aware of Rolling upgrades, i don't see any problems in a normal case because RM tries to read the labels from NM's req only when its distributed conf and also areNodeLabelsSetInReq is by default false.
          But I had queries when some existing setup they want to modify to distributed conf setup

          1. Whether we need to send shutdown during register if NM is configured differently from RM ?
          2. Will the new configurations be added in NM and RM and then Rolling upgrade will be done ? or we do rolling upgrade first and then reconfigure & restart RM's and NM's

          How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration
          Well i have the same opinion of Wangda's on this point i feel both the flows we should take the same decision.

          We should not even accept a node's registration when it reports invalid labels
          Well i am little confused here, As per wangda's earlier comment i understand that it was your comment to send shutdown (which i felt correct in terms of maintenance)

          One more suggestion (as per suggested by Vinod Kumar Vavilapalli), when there's anything wrong with node label reported from NM, we should fail NM (ask it to shutdown and give it proper diagnostic message). This is because if NM report a label but rejected, even if RM tell NM this, NM cannot handle it properly except print some error messages (we don't have smart logic now). Which will lead to problems in debugging (A NM reported some label to RM but scheduler failed allocating containers on the NM). To avoid it, a simple way is to shutdown the NM and admin can take a look at what happened.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , 1) IMO method name was not readable when it was setAreNodeLabelsSet but i have changed it to setAreNodeLabelsSetInReq i feel this is sufficient. setAreNodeLabelsUpdated is same as earlier for which Craig had commented (which i also feel valid) I would go with areNodeLablesSet (all "isNodeLabels" => "areNodeLabels" wherever it appears, actually) - wrt "Set" vs "Updated" - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, "updatedness", whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name. Yes true lets finalize the name this time after that will start working on the patch if not it will be a wasted effort 5) It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration. So I suggest to make a changes: In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf, since we will not have "distributedNodeLabelConf" in NM side if you agree on previously comment, instead, it will check null of provider. Well modifications side is clear to me but is it good to allow the configurations being different from NM and RM ? Infact i wanted to discuss regarding whether to send shutdown during register if NM is configured differently from RM, but waited for the base changes to go in before discussing new stuff. 8) You can add an additional comments in line 626 for this. Ok will add a comment in LabelProvider.getLabels , Idea is LabelProvider is expected to give same Labels continiously untill there is a change and if null or empty is returned then No label is assumed 10) updateNodeLabelsInNodeLabelsManager -> updateNodeLabelsFromNMReport : will take care in next patch LOG.info(... accepted from RM, use LOG.debug and check isDebugEnabled. : I feel better to Log this as "Error" as we are sending the labels only in case of any change and there has to be some way to identify if labels for a given NM and also currently we are sending out shutdown signal too. Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of "not properly configured". i think i have captured first point, but any way will reframe it as "Node Labels <labels> reported from the NM with id <nodeID> were rejected from RM with exception message as <exceptionMsg>. Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). Will work on this once 2495 and 2729 are done .. Thanks Vinod Kumar Vavilapalli & Craig Welch for reviewing it configuration.type -> configuration-type will take care in next patch Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment? Well as craig informed, RegisterNodeManagerRequestProto.nodeLabels is already a set but as by default empty set is provided by protoc, its req to inform whether labels are set as part of request hence areNodeLabelsSetInReq is required. RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario. Well though i am not much aware of Rolling upgrades, i don't see any problems in a normal case because RM tries to read the labels from NM's req only when its distributed conf and also areNodeLabelsSetInReq is by default false. But I had queries when some existing setup they want to modify to distributed conf setup Whether we need to send shutdown during register if NM is configured differently from RM ? Will the new configurations be added in NM and RM and then Rolling upgrade will be done ? or we do rolling upgrade first and then reconfigure & restart RM's and NM's How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration Well i have the same opinion of Wangda's on this point i feel both the flows we should take the same decision. We should not even accept a node's registration when it reports invalid labels Well i am little confused here, As per wangda's earlier comment i understand that it was your comment to send shutdown (which i felt correct in terms of maintenance) One more suggestion (as per suggested by Vinod Kumar Vavilapalli), when there's anything wrong with node label reported from NM, we should fail NM (ask it to shutdown and give it proper diagnostic message). This is because if NM report a label but rejected, even if RM tell NM this, NM cannot handle it properly except print some error messages (we don't have smart logic now). Which will lead to problems in debugging (A NM reported some label to RM but scheduler failed allocating containers on the NM). To avoid it, a simple way is to shutdown the NM and admin can take a look at what happened.
          Hide
          cwelch Craig Welch added a comment -

          I understand the desire for fail-fast behavior to indicate an issue, but I wonder if this should really be a fatal case - I'm wondering if we might introduce a situation where a script error or other configuration issue could bring down an entire cluster (or even just a portion of the cluster) which would otherwise be able to remain functional. It's not clear to me that this should be thought of as a "fatal condition", esp. when the potential exists for escalating a rather minor issue to a major one.

          Show
          cwelch Craig Welch added a comment - I understand the desire for fail-fast behavior to indicate an issue, but I wonder if this should really be a fatal case - I'm wondering if we might introduce a situation where a script error or other configuration issue could bring down an entire cluster (or even just a portion of the cluster) which would otherwise be able to remain functional. It's not clear to me that this should be thought of as a "fatal condition", esp. when the potential exists for escalating a rather minor issue to a major one.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I think the two issues are identical, and we should have a consistent way to handle them. If we stop node when any invalid labels during registration, we should stop node when same issue happened when heartbeat after registration.

          I think we can either allow them running or stop both of them, I'm fine with both approach.

          Show
          leftnoteasy Wangda Tan added a comment - I think the two issues are identical, and we should have a consistent way to handle them. If we stop node when any invalid labels during registration, we should stop node when same issue happened when heartbeat after registration. I think we can either allow them running or stop both of them, I'm fine with both approach.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Ah, right. Forgot about that. Given that, it seems that we have the following

          1. Node reports with invalid labels during registration - we reject it rightaway
          2. Node gets successfully registered, but then the labels script starts generating invalid labels mid way through

          I think in case (2), we are better off ignoring the newly reported invalid labels, report this in the UI/NodeReport and let the node continue running. Thoughts?

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Ah, right. Forgot about that. Given that, it seems that we have the following Node reports with invalid labels during registration - we reject it rightaway Node gets successfully registered, but then the labels script starts generating invalid labels mid way through I think in case (2), we are better off ignoring the newly reported invalid labels, report this in the UI/NodeReport and let the node continue running. Thoughts?
          Hide
          cwelch Craig Welch added a comment -

          -re

          How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration?

          As I understand the requirements, it's necessary to handle the case where the derived set of labels changes during the lifetime of the nodemanager, e.g. externally libraries might be installed or some other condition may change which effects the labels & no nodemanager re-registration is involved, and yet the changed labels need to be reflected

          Show
          cwelch Craig Welch added a comment - -re How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? As I understand the requirements, it's necessary to handle the case where the derived set of labels changes during the lifetime of the nodemanager, e.g. externally libraries might be installed or some other condition may change which effects the labels & no nodemanager re-registration is involved, and yet the changed labels need to be reflected
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Quick comments

          • configuration.type -> configuration-type
          • Should RegisterNodeManagerRequestProto.nodeLabels be a set instead?
          • Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment?
          • RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario.
          • How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration?
          • We should not even accept a node's registration when it reports invalid labels?
          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Quick comments configuration.type -> configuration-type Should RegisterNodeManagerRequestProto.nodeLabels be a set instead? Do we really need NodeHeartbeatRequest.areNodeLabelsSetInReq()? Why not just look at the set as mentioned in the previous comment? RegisterNodeManagerRequest is getting changed. It will be interesting to reason about rolling-upgrades in this scenario. How about we simply things? Instead of accepting labels on both registration and heartbeat, why not restrict it to be just during registration? We should not even accept a node's registration when it reports invalid labels?
          Hide
          leftnoteasy Wangda Tan added a comment -

          For your comments:
          1) For the name, do you think is setAreNodeLabelsUpdated a better name since it avoids "set" occured twice (I understand this needs lots of refactorings, if you have any suggestions, we can finalize it before renaming.

          5) I made a mistake that sent an incompleted comment :-p, what I wanted to say is:
          It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration.
          So I suggest to make a changes:

          • In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config.
          • In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf, since we will not have "distributedNodeLabelConf" in NM side if you agree on previously comment, instead, it will check null of provider.

          Regarding your "fail-fast" concern, it shouldn't be a problem if you agree on comment I just made. (I know there could be some back-and-forth comment from my side on this, I feel sorry about this since this feature is evolving itself, please just feel free to let me know your ideas.).

          7) I should address your question on 5).

          8) You can add an additional comments in line 626 for this.

          9) Took a look at TestNodeStatusUpdater, your comment make sense to me, it's a very complex class, you can just leave this comment alone.

          10) Few comments for your added code:

          • updateNodeLabelsInNodeLabelsManager -> updateNodeLabelsFromNMReport
          • LOG.info(... accepted from RM, use LOG.debug and check isDebugEnabled.
          • Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of "not properly configured".

          In addition:
          Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). This can be done in a separated JIRA.

          Show
          leftnoteasy Wangda Tan added a comment - For your comments: 1) For the name, do you think is setAreNodeLabelsUpdated a better name since it avoids "set" occured twice (I understand this needs lots of refactorings, if you have any suggestions, we can finalize it before renaming. 5) I made a mistake that sent an incompleted comment :-p, what I wanted to say is: It will be problematic to ask admins make NM/RM configuration keep synchronized, so I don't want (and also not necessary) NM depends on RM's configuration. So I suggest to make a changes: In NodeManager.java: when user doesn't configure provider, it should be null. In your patch, you can return a null directly, and YARN-2729 will implement the logic of instancing provider from config. In NodeStatusUpdaterImpl: avoid using isDistributedNodeLabelsConf , since we will not have "distributedNodeLabelConf" in NM side if you agree on previously comment, instead, it will check null of provider. Regarding your "fail-fast" concern, it shouldn't be a problem if you agree on comment I just made. (I know there could be some back-and-forth comment from my side on this, I feel sorry about this since this feature is evolving itself, please just feel free to let me know your ideas.). 7) I should address your question on 5). 8) You can add an additional comments in line 626 for this. 9) Took a look at TestNodeStatusUpdater, your comment make sense to me, it's a very complex class, you can just leave this comment alone. 10) Few comments for your added code: updateNodeLabelsInNodeLabelsManager -> updateNodeLabelsFromNMReport LOG.info(... accepted from RM , use LOG.debug and check isDebugEnabled . Make errorMessage clear: indicate 1# this is node labels reported from NM, and 2# it's failed to be put to RM instead of "not properly configured". In addition: Another thing we should do is, when distributed node label configuration is set, any direct modify node to labels mapping from RMAdminCLI should be rejected (like -replaceNodeToLabels). This can be done in a separated JIRA.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Most of the find bugs reported are from Fair scheduler and nothing to do with changes in the patch and tests failed are due to timeout and those are also not related to the patch.

          Show
          Naganarasimha Naganarasimha G R added a comment - Most of the find bugs reported are from Fair scheduler and nothing to do with changes in the patch and tests failed are due to timeout and those are also not related to the patch.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12703435/YARN-2495.20150309-1.patch
          against trunk revision 5578e22.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
          org.apache.hadoop.yarn.server.resourcemanager.TestRM

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6894//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6894//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6894//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12703435/YARN-2495.20150309-1.patch against trunk revision 5578e22. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6894//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6894//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6894//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Attaching the updated patch and find the status of the comments :
          1,2,3,4 : all have been rectified, method name having "set" in the begin and the end was not sounding appropriate, hence modified it to areNodeLabelsSetInReq and setAreNodeLabelsSetInReq in both heartbeat and Register

          5) I think we may not need to check centralized/distributed configuration here, centralized/distributed is a config in RM side.
          Earlier had moved this configuration type check in the NM.serviceInit (l) before calling getNodeLabelsProviderService But now changed the method name to createNodeLabelsProviderService and moved back the check inside this method itself. As part of the YARN-2729 will be returning Script based node label provider in the createNodeLabelsProviderService method.

          In NM side, it should be how to get node labels, if user doesn't configure any script file for it, it should be null and no instance of NodeLabelProviderService will be added to NM.
          In current patch, null will be set only if configuration type is set as centralized in NM and based on earlier(other jira) feedback from Vinod, i think we need to fail fast and let the user know the error at the earliest, so script node label provider will throw exception on erroneous conditions like script not configured,no rights to execute etc.. and ensure NM will fail to start.

          So back to code, you can just leave getNodeLabelsProviderService(..), which will be implemented in YARN-2729.If you agree, we need change the name isDistributedNodeLabelsConf to
          Actually dint get the intent of these 2 lines and felt like comment was not complete... Is it you want to avoid check of configuration type in NM and move it script node label provider or something ?

          6) has been rectified, was added while analyzing test case failure.

          7) isDistributedNodeLabels seems not so necessary here, and if you agree with 5), it's better to remove the field
          IIUC point5 was related to NM initializing the provider and point7 is related to NodeStatusUpdaterImpl if so i dint get the relation. can you please clarify these 2 points

          8) Add null check or comment (provider returned node labels will always be not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl
          Before calling areNodeLabelsUpdated, i had already checked for null and set empty labels @ line 626 (startStatusUpdater method)

          9)Since we already have TestNodeStatusUpdater, it's better to merge TestNodeStatusUpdaterForLabels to it.
          Well there was already too many internal classes extending NodeStatusUpdaterImpl and ResourceTrackerService. And personally felt very very difficult to walk through the test case and try to reuse it and class had already crossed 1666 lines of code and hence as it was loosing readability added a new class. Please inform if required will merge it to the existing class only

          10) Have modified ResourceTrackerService based on ur comments and have pushed some common code in register and Heartbeat to the common method.

          All findbugs issues are not related to my modifications and following test case failure is not related to my modification
          TestRMRestart.testRMRestartGetApplicationList.

          Also will be uploading a patch for 2729 to get the view of complete flow and also should will be testable

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Attaching the updated patch and find the status of the comments : 1,2,3,4 : all have been rectified, method name having "set" in the begin and the end was not sounding appropriate, hence modified it to areNodeLabelsSetInReq and setAreNodeLabelsSetInReq in both heartbeat and Register 5) I think we may not need to check centralized/distributed configuration here, centralized/distributed is a config in RM side. Earlier had moved this configuration type check in the NM.serviceInit (l) before calling getNodeLabelsProviderService But now changed the method name to createNodeLabelsProviderService and moved back the check inside this method itself. As part of the YARN-2729 will be returning Script based node label provider in the createNodeLabelsProviderService method. In NM side, it should be how to get node labels, if user doesn't configure any script file for it, it should be null and no instance of NodeLabelProviderService will be added to NM. In current patch, null will be set only if configuration type is set as centralized in NM and based on earlier(other jira) feedback from Vinod, i think we need to fail fast and let the user know the error at the earliest, so script node label provider will throw exception on erroneous conditions like script not configured,no rights to execute etc.. and ensure NM will fail to start. So back to code, you can just leave getNodeLabelsProviderService(..), which will be implemented in YARN-2729 .If you agree, we need change the name isDistributedNodeLabelsConf to Actually dint get the intent of these 2 lines and felt like comment was not complete... Is it you want to avoid check of configuration type in NM and move it script node label provider or something ? 6) has been rectified, was added while analyzing test case failure. 7) isDistributedNodeLabels seems not so necessary here, and if you agree with 5), it's better to remove the field IIUC point5 was related to NM initializing the provider and point7 is related to NodeStatusUpdaterImpl if so i dint get the relation. can you please clarify these 2 points 8) Add null check or comment (provider returned node labels will always be not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl Before calling areNodeLabelsUpdated, i had already checked for null and set empty labels @ line 626 (startStatusUpdater method) 9) Since we already have TestNodeStatusUpdater, it's better to merge TestNodeStatusUpdaterForLabels to it. Well there was already too many internal classes extending NodeStatusUpdaterImpl and ResourceTrackerService. And personally felt very very difficult to walk through the test case and try to reuse it and class had already crossed 1666 lines of code and hence as it was loosing readability added a new class. Please inform if required will merge it to the existing class only 10) Have modified ResourceTrackerService based on ur comments and have pushed some common code in register and Heartbeat to the common method. All findbugs issues are not related to my modifications and following test case failure is not related to my modification TestRMRestart.testRMRestartGetApplicationList. Also will be uploading a patch for 2729 to get the view of complete flow and also should will be testable
          Hide
          leftnoteasy Wangda Tan added a comment -

          1) Changes of CommonNodeLalelsManager is uncessary

          2) NodeHeartbeat's protobuf field name should consistent with java class method name:
          nodeLabels-> areNodeLabelsSet

          3) NodeHeartbeatRequestPBImpl: getter/setter should consistent, like get/setAreNodeLabelsSet

          4) Same comment on RegisterNodeManagerRequestPBImpl

          5) Integration with NM (NodeManager.java):
          I think we may not need to check centralized/distributed configuration here, centralized/distributed is a config in RM side. In NM side, it should be how to get node labels, if user doesn't configure any script file for it, it should be null and no instance of NodeLabelProviderService will be added to NM.

          So back to code, you can just leave getNodeLabelsProviderService(..), which will be implemented in YARN-2729.

          If you agree, we need change the name isDistributedNodeLabelsConf to

          6) Why change this to notifyAll? I think it may be unnecessary:

              synchronized (this.heartbeatMonitor) {
                this.heartbeatMonitor.notify();
              }
          

          7) isDistributedNodeLabels seems not so necessary here, and if you agree with 5), it's better to remove the field.

          8) Add null check or comment (provider returned node labels will always be not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl

          9) Since we already have TestNodeStatusUpdater, it's better to merge TestNodeStatusUpdaterForLabels to it.

          10) ResourceTrackerService:
          When register and found illegal labels, this should happen before instanting RMNodeImpl
          Maybe you can still merge some common code for register/update?

          Will include more tests reviewing in next turn. And could you take a look at failed tests/findbugs, etc.?

          Show
          leftnoteasy Wangda Tan added a comment - 1) Changes of CommonNodeLalelsManager is uncessary 2) NodeHeartbeat's protobuf field name should consistent with java class method name: nodeLabels-> areNodeLabelsSet 3) NodeHeartbeatRequestPBImpl: getter/setter should consistent, like get/setAreNodeLabelsSet 4) Same comment on RegisterNodeManagerRequestPBImpl 5) Integration with NM (NodeManager.java): I think we may not need to check centralized/distributed configuration here, centralized/distributed is a config in RM side. In NM side, it should be how to get node labels, if user doesn't configure any script file for it, it should be null and no instance of NodeLabelProviderService will be added to NM. So back to code, you can just leave getNodeLabelsProviderService(..), which will be implemented in YARN-2729 . If you agree, we need change the name isDistributedNodeLabelsConf to 6) Why change this to notifyAll? I think it may be unnecessary: synchronized ( this .heartbeatMonitor) { this .heartbeatMonitor.notify(); } 7) isDistributedNodeLabels seems not so necessary here, and if you agree with 5), it's better to remove the field. 8) Add null check or comment (provider returned node labels will always be not-null, for areNodeLabelsUpdated in NodeStatusUpdaterImpl 9) Since we already have TestNodeStatusUpdater, it's better to merge TestNodeStatusUpdaterForLabels to it. 10) ResourceTrackerService: When register and found illegal labels, this should happen before instanting RMNodeImpl Maybe you can still merge some common code for register/update? Will include more tests reviewing in next turn. And could you take a look at failed tests/findbugs, etc.?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12702856/YARN-2495.20150305-1.patch
          against trunk revision 28b85a2.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels
          org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6864//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6864//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6864//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702856/YARN-2495.20150305-1.patch against trunk revision 28b85a2. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6864//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6864//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6864//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Attaching a patch with following changes :

          • Name changes is->are
          • Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest
          • Change decentralized-configuration.enabled to input, accept value of <centralized, distributed> (by default is centralized)
          • you can leave an empty provider implementation : have taken care in the patch, please check the approach
          • stop NM if ResourceTrackerService informs it as invalid labels
          • In Analogus to the issue mentioned by Vinod (stop NM if ResourceTrackerService informs it as invalid labels), I feel even during registration RM should send out shutdown with proper message. I have this handled in the patch
          • RM on receiving invalid Labels for a node, it will set the state of RMNode as NodeState.UNHEALTHY using a new event RMNodeInValidNodeLabelsUpdateEvent , please inform whether to create new State or NodeState.UNHEALTHY should suffice.

          Script related issued will discuss/fix in script jira (YARN-2729).

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Attaching a patch with following changes : Name changes is->are Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest Change decentralized-configuration.enabled to input, accept value of <centralized, distributed> (by default is centralized) you can leave an empty provider implementation : have taken care in the patch, please check the approach stop NM if ResourceTrackerService informs it as invalid labels In Analogus to the issue mentioned by Vinod (stop NM if ResourceTrackerService informs it as invalid labels), I feel even during registration RM should send out shutdown with proper message. I have this handled in the patch RM on receiving invalid Labels for a node, it will set the state of RMNode as NodeState.UNHEALTHY using a new event RMNodeInValidNodeLabelsUpdateEvent , please inform whether to create new State or NodeState.UNHEALTHY should suffice. Script related issued will discuss/fix in script jira ( YARN-2729 ).
          Hide
          leftnoteasy Wangda Tan added a comment -

          For your questions:
          1. It's not first priority, let's just keep it here and move on with the script one
          2. That's make sense, and could you take a look at the health check script, is it as same as you mentioned? (I guess so)
          3. I can remember that , make sense to me. If there's any merge needed, we can tackle it in a separated JIRA.
          4. I just marked to 2.8, since 2.7 is planned to be released within 2 weeks, I think it will be hard to fit in that timeframe.

          IIRC, this JIRA contains a abstract node label provider only. I suggest don't add more options for configuring node label provider for now (saying, set provider=script-based-provider), to avoid potentially unnecessary switches.

          For now, as we discussed, you can leave an empty provider implementation (but need proper impl to do tests), script-based-provider can be addressed separately.

          Show
          leftnoteasy Wangda Tan added a comment - For your questions: 1. It's not first priority, let's just keep it here and move on with the script one 2. That's make sense, and could you take a look at the health check script, is it as same as you mentioned? (I guess so) 3. I can remember that , make sense to me. If there's any merge needed, we can tackle it in a separated JIRA. 4. I just marked to 2.8, since 2.7 is planned to be released within 2 weeks, I think it will be hard to fit in that timeframe. IIRC, this JIRA contains a abstract node label provider only. I suggest don't add more options for configuring node label provider for now (saying, set provider=script-based-provider), to avoid potentially unnecessary switches. For now, as we discussed, you can leave an empty provider implementation (but need proper impl to do tests), script-based-provider can be addressed separately.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Thanks for discussing and summarizing this, few points to discuss:

          1. YARN-2923 (configuration based NodeLabelProviderService), what is the scope of this jira as its again one more classification under distributed ? whether is it req?
          2. Apart from the Vinod's suggestion, i would like to add one more change : In some jiras we were discussing to have fail fast approach i.e. in our case if we configure script based distributed configurations and the script file doesn't exist, rights not sufficient etc.. then NM should fail to start.
          3. yarn-2980 is mostly just moving NodeHealthScriptRunner to HadoopCommon and the code in it differs a lot than what we req, if you remember earlier i had refactored that code and make it reusable for our scenario but you had suggested not to touch the existing NodeHealthScriptRunner and to make ScriptBasedNodeLabelsProvider similar NodeHealthScript serving the needs of Node labels.
          4. Earlier target version was mentioned as 2.6, do we modify it to 2.7 or 2.8?
          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Thanks for discussing and summarizing this, few points to discuss: YARN-2923 (configuration based NodeLabelProviderService), what is the scope of this jira as its again one more classification under distributed ? whether is it req? Apart from the Vinod's suggestion, i would like to add one more change : In some jiras we were discussing to have fail fast approach i.e. in our case if we configure script based distributed configurations and the script file doesn't exist, rights not sufficient etc.. then NM should fail to start. yarn-2980 is mostly just moving NodeHealthScriptRunner to HadoopCommon and the code in it differs a lot than what we req, if you remember earlier i had refactored that code and make it reusable for our scenario but you had suggested not to touch the existing NodeHealthScriptRunner and to make ScriptBasedNodeLabelsProvider similar NodeHealthScript serving the needs of Node labels. Earlier target version was mentioned as 2.6, do we modify it to 2.7 or 2.8?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Craig Welch,
          Thanks for your thoughtful inputs, configuration switches include changes proposed in this patch is:

                            node-label
                             /     \
                        enable     disable
                       /      \
              centralized     distributed
                   |                  |
           node-label-store          node-manager script
           need to be configured     need to be configured in NM
          

          You can take a look at discussion of why we do YARN-2800, we need enable/disable this feature (not enable/disable centralized API).
          And there're some difference between centralized/distributed configure

          So there're totally 3 options of node-label feature itself in RM's view: {disabled, centralized, distributed}. I don't think we can eliminate any of them, and I didn't see your suggestion can make it .

          So if you agree, I suggest we can move forward this with following changes:

          Sounds good? Naganarasimha G R please let me know if you have any other ideas .

          Show
          leftnoteasy Wangda Tan added a comment - Craig Welch , Thanks for your thoughtful inputs, configuration switches include changes proposed in this patch is: node-label / \ enable disable / \ centralized distributed | | node-label-store node-manager script need to be configured need to be configured in NM You can take a look at discussion of why we do YARN-2800 , we need enable/disable this feature (not enable/disable centralized API). And there're some difference between centralized/distributed configure We don't need to store node->label mappings since NM will store it locally We cannot mix them together as I mentioned in https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14317048&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14317048 . So there're totally 3 options of node-label feature itself in RM's view: {disabled, centralized, distributed}. I don't think we can eliminate any of them, and I didn't see your suggestion can make it . So if you agree, I suggest we can move forward this with following changes: Change decentralized-configuration.enabled to input , accept value of {centralized, distributed} (by default is centralized) Combining suggestion in https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14317048&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14317048 . Not sure if https://issues.apache.org/jira/browse/YARN-2980 will impact implementation of this patch (I haven't looked at it), but it's best to reusage existing code. Sounds good? Naganarasimha G R please let me know if you have any other ideas .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch
          against trunk revision 2fd02af.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6670//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch against trunk revision 2fd02af. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6670//console This message is automatically generated.
          Hide
          cwelch Craig Welch added a comment -

          So, here's my proposal Naganarasimha G R Wangda Tan, take a minute and consider whether or not DECENTRALIZED_CONFIGURATION_ENABLED is more likely to cause difficulty than prevent it, as I'm suggesting, and then you all can decide to keep it or not as you wish - I don't want to hold up the way forward over something which is, on the whole, a detail...

          Show
          cwelch Craig Welch added a comment - So, here's my proposal Naganarasimha G R Wangda Tan , take a minute and consider whether or not DECENTRALIZED_CONFIGURATION_ENABLED is more likely to cause difficulty than prevent it, as I'm suggesting, and then you all can decide to keep it or not as you wish - I don't want to hold up the way forward over something which is, on the whole, a detail...
          Hide
          cwelch Craig Welch added a comment -

          My point is that everything necessary to manage labels properly exists without DECENTRALIZED_CONFIGURATION_ENABLED, it is a duplication of existing functionality. The user controls this by:

          1. choosing to specify or not specify a way of managing the nodes at the node manager
          2. choosing to set or not set node labels and associations using the centralized apis

          ergo, DECENTRALIZED_CONFIGURATION_ENABLED is completely redundant, it provides no capabilities not already present. Users will need to understand how the feature works to use it effectively anyway, there is no value add by requiring that they repeat themselves (both by specifying a way of determining node labels at the node manager level and by having to set this switch.). My prediction is that, if the switch is present, it's chief function will be to confuse and annoy users when they setup a configuration for the node managers to generate node labels and then the labels don't appear in the cluster as they expect them to.

          Show
          cwelch Craig Welch added a comment - My point is that everything necessary to manage labels properly exists without DECENTRALIZED_CONFIGURATION_ENABLED, it is a duplication of existing functionality. The user controls this by: 1. choosing to specify or not specify a way of managing the nodes at the node manager 2. choosing to set or not set node labels and associations using the centralized apis ergo, DECENTRALIZED_CONFIGURATION_ENABLED is completely redundant, it provides no capabilities not already present. Users will need to understand how the feature works to use it effectively anyway, there is no value add by requiring that they repeat themselves (both by specifying a way of determining node labels at the node manager level and by having to set this switch.). My prediction is that, if the switch is present, it's chief function will be to confuse and annoy users when they setup a configuration for the node managers to generate node labels and then the labels don't appear in the cluster as they expect them to.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Thanks for sharing your thoughts

          mixing decentralized/centralized is dangerous and will cause non-determinate result.

          My opinion is also same, i feel it should not be mixed.

          And would like to resume working (rebase & rework on the review comments) on this once all are in the sync with the core objective of this jira,
          So Craig Welch can you please share your thoughts on this at the earliest.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Thanks for sharing your thoughts mixing decentralized/centralized is dangerous and will cause non-determinate result. My opinion is also same, i feel it should not be mixed. And would like to resume working (rebase & rework on the review comments) on this once all are in the sync with the core objective of this jira, So Craig Welch can you please share your thoughts on this at the earliest.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Craig Welch,
          Thanks for jumping in and providing your thoughts, and really sorry for the late response.
          I think biggest concern of you is about DECENTRALIZED_CONFIGURATION_ENABLED, let me talk about my thinkings

          IMHO, mixing decentralized/centralized is dangerous and will cause non-determinated result. You may think about merging them together, such as some labels set by admin using RMAdminCLI, and some others are set by NM. But I can give you an example shows it is still non-determinated even if we have +/- for ResourceTracker protocol:

          • Assume a node has label x,y (reported +x,+y)
          • RMAdmin remove y from the node (-y)
          • NM failure then restart, and report it has x,y (+x, +y). What should labels on the node be?

          I also don't like adding too much switches in configuration, but it seems a good way that we can support both with determinated behavior.

          For your other suggestions,

          • Name changes is->are,
          • Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest
            I all agree with

          One more suggestion (as per suggested by Vinod Kumar Vavilapalli), when there's anything wrong with node label reported from NM, we should fail NM (ask it to shutdown and give it proper diagnostic message). This is because if NM report a label but rejected, even if RM tell NM this, NM cannot handle it properly except print some error messages (we don't have smart logic now). Which will lead to problems in debugging (A NM reported some label to RM but scheduler failed allocating containers on the NM). To avoid it, a simple way is to shutdown the NM and admin can take a look at what happened.

          Thoughts?
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Hi Craig Welch , Thanks for jumping in and providing your thoughts, and really sorry for the late response. I think biggest concern of you is about DECENTRALIZED_CONFIGURATION_ENABLED, let me talk about my thinkings IMHO, mixing decentralized/centralized is dangerous and will cause non-determinated result. You may think about merging them together, such as some labels set by admin using RMAdminCLI, and some others are set by NM. But I can give you an example shows it is still non-determinated even if we have +/- for ResourceTracker protocol: Assume a node has label x,y (reported +x,+y) RMAdmin remove y from the node (-y) NM failure then restart, and report it has x,y (+x, +y). What should labels on the node be? I also don't like adding too much switches in configuration, but it seems a good way that we can support both with determinated behavior. For your other suggestions, Name changes is->are, Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest I all agree with One more suggestion (as per suggested by Vinod Kumar Vavilapalli ), when there's anything wrong with node label reported from NM, we should fail NM (ask it to shutdown and give it proper diagnostic message). This is because if NM report a label but rejected, even if RM tell NM this, NM cannot handle it properly except print some error messages (we don't have smart logic now). Which will lead to problems in debugging (A NM reported some label to RM but scheduler failed allocating containers on the NM). To avoid it, a simple way is to shutdown the NM and admin can take a look at what happened. Thoughts? Wangda
          Hide
          cwelch Craig Welch added a comment -

          Naganarasimha G R I understand the desire to have the feature, it does seem more like a convenience / simplification measure than an introduction of something that can't be otherwise accomplished, but convenience and simplification can matter a great deal, so why not

          There will be always confusion ... Do we need to "And" or "OR"

          What I'm getting at is that I think there are just too many switches and knobs in play once you consider the flag DECENTRALIZED_CONFIGURATION_ENABLED in addition to the other configuration relationships (defining the configuration script to do the update from nodes), I think that the act of configuring something to send node labels from the node manager is sufficient "intent" that it is the desired behavior, and the additional DECENTRALIZED_CONFIGURATION_ENABLED is just an extra ceiling for someone to bump their head against while setting this up. wrt supporting add vs replace behavior, I think that as it's described now the idea is to just support replace form the node script, meaning that it will effectively be the only definition used when it is active (which is fine for many cases). In the future, if there is a need for hybrid configuration of labels that can become an enhancement. An option would be to use a different parameter for a script which will do add and remove instead of replace, and then say have it return + or - (for add and remove) with the label instead of a fixed set of labels for replacement. From what I see above, the replacement approach, where the script determines the full label set, looks to be the immediate need - the other could be added in a compatible way later if it was needed.

          Show
          cwelch Craig Welch added a comment - Naganarasimha G R I understand the desire to have the feature, it does seem more like a convenience / simplification measure than an introduction of something that can't be otherwise accomplished, but convenience and simplification can matter a great deal, so why not There will be always confusion ... Do we need to "And" or "OR" What I'm getting at is that I think there are just too many switches and knobs in play once you consider the flag DECENTRALIZED_CONFIGURATION_ENABLED in addition to the other configuration relationships (defining the configuration script to do the update from nodes), I think that the act of configuring something to send node labels from the node manager is sufficient "intent" that it is the desired behavior, and the additional DECENTRALIZED_CONFIGURATION_ENABLED is just an extra ceiling for someone to bump their head against while setting this up. wrt supporting add vs replace behavior, I think that as it's described now the idea is to just support replace form the node script, meaning that it will effectively be the only definition used when it is active (which is fine for many cases). In the future, if there is a need for hybrid configuration of labels that can become an enhancement. An option would be to use a different parameter for a script which will do add and remove instead of replace, and then say have it return + or - (for add and remove) with the label instead of a fixed set of labels for replacement. From what I see above, the replacement approach, where the script determines the full label set, looks to be the immediate need - the other could be added in a compatible way later if it was needed.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Craig Welch
          Thanks for taking time and reviewing.

          This can already be accomplished using the admin cli from a script or calling a web service from the node

          IIUC the current way to accomplish running the script is through configuring the label script in health monitor script ? please correct me if i am wrong.
          If it is so i see following problems

          • It seems more like a workaround solution(or more like break in the health check functionality) to get the same thing acheived
          • One important point as Allen Wittenauer mentioned developer needs to take care of security issues which might be too much effort
          • Feels like extra effort needs to be put by the developer to get it done which takes care of security, error handling , retries etc ... apart from the actual script which might be very simple
          • Documentation and conveying the functionality to the users would be difficult
          • What if admin wants both health script and node labels script ?

          Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? It's not needed for the dynamic script case, which is all I've seen discussed here from a "requirements" perspective (putting it into the config file / adding it to the heartbeat is implementation, I don't see a requirement for it as such).

          As Allen Wittenauer mentioned for this, main idea is not to support through configuration file but run the configured script periodically and to directly update the RM when change in the labels. Also it can do the error handling as required if RM doesn't accept it.

          but I don't see any reason to have conditional enablement of the node manager side of things as well as a provider specification and I think it just adds unnecessary complexity and possible surprise at configuration time

          Well this topic is debatable, but few points which i feel to be addressed if we want to support both central and distributed simultaneously :

          1. There will be always confusion which labels needs to be taken suppose user updates some labels for a node and NM label script doesnt have them then which do we give priority ?
          2. Do we need to "And" or "OR" when the labels are updated either by script /by admin CLI / REST web service

          AFAIU Wangda Tan (No longer used) wanted to do these in stages and in the first stage to avoid confusion it would be better to have mutually exclusive i.e either central or distributed will be supported and if its distributed CLI modifications will be disabled (YARN-2728) and if central, labels from heartbeat will be discarded.

          +1 for changes which you mentioned in NodeHeartbeatRequest & RegisterNodeLabelManagerResponse but for RegisterNodeManagerRequest yes it would be easy to read the interface if both are same but as explained in previous point if either one of them(central/distributed) then in case of distributed it anyway needs to send what ever it has and non distibuted case RM doesnt care abt it. So there was no need for it.

          It would be helpful if Wangda Tan (No longer used) & Vinod Kumar Vavilapalli can further provide their views on the requirement level comments which has been raised by Craig Welch

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Craig Welch Thanks for taking time and reviewing. This can already be accomplished using the admin cli from a script or calling a web service from the node IIUC the current way to accomplish running the script is through configuring the label script in health monitor script ? please correct me if i am wrong. If it is so i see following problems It seems more like a workaround solution(or more like break in the health check functionality) to get the same thing acheived One important point as Allen Wittenauer mentioned developer needs to take care of security issues which might be too much effort Feels like extra effort needs to be put by the developer to get it done which takes care of security, error handling , retries etc ... apart from the actual script which might be very simple Documentation and conveying the functionality to the users would be difficult What if admin wants both health script and node labels script ? Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? It's not needed for the dynamic script case, which is all I've seen discussed here from a "requirements" perspective (putting it into the config file / adding it to the heartbeat is implementation, I don't see a requirement for it as such). As Allen Wittenauer mentioned for this, main idea is not to support through configuration file but run the configured script periodically and to directly update the RM when change in the labels. Also it can do the error handling as required if RM doesn't accept it. but I don't see any reason to have conditional enablement of the node manager side of things as well as a provider specification and I think it just adds unnecessary complexity and possible surprise at configuration time Well this topic is debatable, but few points which i feel to be addressed if we want to support both central and distributed simultaneously : There will be always confusion which labels needs to be taken suppose user updates some labels for a node and NM label script doesnt have them then which do we give priority ? Do we need to "And" or "OR" when the labels are updated either by script /by admin CLI / REST web service AFAIU Wangda Tan (No longer used) wanted to do these in stages and in the first stage to avoid confusion it would be better to have mutually exclusive i.e either central or distributed will be supported and if its distributed CLI modifications will be disabled ( YARN-2728 ) and if central, labels from heartbeat will be discarded. +1 for changes which you mentioned in NodeHeartbeatRequest & RegisterNodeLabelManagerResponse but for RegisterNodeManagerRequest yes it would be easy to read the interface if both are same but as explained in previous point if either one of them(central/distributed) then in case of distributed it anyway needs to send what ever it has and non distibuted case RM doesnt care abt it. So there was no need for it. It would be helpful if Wangda Tan (No longer used) & Vinod Kumar Vavilapalli can further provide their views on the requirement level comments which has been raised by Craig Welch
          Hide
          cwelch Craig Welch added a comment -

          Other comments on the patch as such, assuming we really do need this part of the change...

          DECENTRALIZED_CONFIGURATION_ENABLED et all - I do see the basis for enabling and disabling the enforcement of a centralized list and discussion around that, but I don't see any reason to have conditional enablement of the node manager side of things as well as a provider specification and I think it just adds unnecessary complexity and possible surprise at configuration time - at the level of this configuration I think it should just be enabled (and I don't mean just by default, I mean if we add this, it should just be a way to manage node labels, not conditionally enabled or disabled, any more than the web service or cli are conditionally enabled or disabled, and so we don't have this parameter / it's associated branching at all).

          I think the default node labels provider service should definitely be a null provider that always returns an empty list and areLabelsUpdated false - this takes out the need to decide "which is default", a no-op one is, and it allows us to get rid of the extra enabled/disabled configuration above without adding a new configuration (the provider will be specified anyway if the feature is going to be used)

          NodeHeartbeatRequest -

          isNodeLabelsUpdated - I would go with areNodeLablesSet (all "isNodeLabels" => "areNodeLabels" wherever it appears, actually) - wrt "Set" vs "Updated" - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, "updatedness", whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name.

          RegisterNodeLabelManagerResponse - get/set IsNodeLabelsAcceptedByRM - I would make it get/set AreNodeLablesAcceptedByRM (and on imples, etc, of course)

          RegisterNodeManagerRequest - missing spaces in args (l 42) also, assuming we drop the "distributed on/off" config as I'm suggesting, you'll need the "areNodeLablesSet" to be passed here as well. (I also like this better because it harmonizes the api between registration and heartbeat, which is easier to understand b/c they are doing the same thing / should do it the same way).

          Show
          cwelch Craig Welch added a comment - Other comments on the patch as such, assuming we really do need this part of the change... DECENTRALIZED_CONFIGURATION_ENABLED et all - I do see the basis for enabling and disabling the enforcement of a centralized list and discussion around that, but I don't see any reason to have conditional enablement of the node manager side of things as well as a provider specification and I think it just adds unnecessary complexity and possible surprise at configuration time - at the level of this configuration I think it should just be enabled (and I don't mean just by default, I mean if we add this, it should just be a way to manage node labels, not conditionally enabled or disabled, any more than the web service or cli are conditionally enabled or disabled, and so we don't have this parameter / it's associated branching at all). I think the default node labels provider service should definitely be a null provider that always returns an empty list and areLabelsUpdated false - this takes out the need to decide "which is default", a no-op one is, and it allows us to get rid of the extra enabled/disabled configuration above without adding a new configuration (the provider will be specified anyway if the feature is going to be used) NodeHeartbeatRequest - isNodeLabelsUpdated - I would go with areNodeLablesSet (all "isNodeLabels" => "areNodeLabels" wherever it appears, actually) - wrt "Set" vs "Updated" - this is primarily a workaround for the null/empty ambiguity and I think this name better reflects what is really going on (am I sending a value to act on or not), but I also think that this is a better contract, the receiver (rm) shouldn't really care about the logic the nm side is using to decide whether or not to set it's labels (freshness, "updatedness", whatever), so all that should be communicated in the api is whether or not the value is set, not whether it's an update/whether it's checking freshness, etc. that's a nit, but I think it's a clearer name. RegisterNodeLabelManagerResponse - get/set IsNodeLabelsAcceptedByRM - I would make it get/set AreNodeLablesAcceptedByRM (and on imples, etc, of course) RegisterNodeManagerRequest - missing spaces in args (l 42) also, assuming we drop the "distributed on/off" config as I'm suggesting, you'll need the "areNodeLablesSet" to be passed here as well. (I also like this better because it harmonizes the api between registration and heartbeat, which is easier to understand b/c they are doing the same thing / should do it the same way).
          Hide
          aw Allen Wittenauer added a comment -

          This can already be accomplished using the admin cli from a script or calling a web service from the node

          Think about the secure cluster case. A whole new level of complexity is required to get this functionality using your proposed method vs. having the NM just run the script itself.

          Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that?

          No, this is so we don't have to have hard-coded labels in a file. If we are doing an external software change, we need to be able to reflect that change up the chain. Think rolling upgrade. Think multiple service owners.

          FWIW, yes, we do have a solid need for this feature. Almost every ops person I talked to has said they'd likely make use it for the exact use cases I've highlighted above. Being able to roll out new versions of Java and make scheduling decisions on its installation is extremely powerful and useful.

          Show
          aw Allen Wittenauer added a comment - This can already be accomplished using the admin cli from a script or calling a web service from the node Think about the secure cluster case. A whole new level of complexity is required to get this functionality using your proposed method vs. having the NM just run the script itself. Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? No, this is so we don't have to have hard-coded labels in a file. If we are doing an external software change, we need to be able to reflect that change up the chain. Think rolling upgrade. Think multiple service owners. FWIW, yes, we do have a solid need for this feature. Almost every ops person I talked to has said they'd likely make use it for the exact use cases I've highlighted above. Being able to roll out new versions of Java and make scheduling decisions on its installation is extremely powerful and useful.
          Hide
          cwelch Craig Welch added a comment -

          So, I see the language around the script based vs the conf based provider, etc, so I assume that's where the scripting side comes in. However, it's still not clear to me that it's really a good idea to add all of this when there is already a way to accomplish the activity with a script - and already ways to run scripts from the node manager...

          Show
          cwelch Craig Welch added a comment - So, I see the language around the script based vs the conf based provider, etc, so I assume that's where the scripting side comes in. However, it's still not clear to me that it's really a good idea to add all of this when there is already a way to accomplish the activity with a script - and already ways to run scripts from the node manager...
          Hide
          cwelch Craig Welch added a comment -

          Sorry if I'm jumping in late and asking redundant questions as a result, but I've gone through the various related jiras and the design documents (incl updates) ( and the patch ) and I have some requirements related questions as a result. Just a bit of background to make sure I understand things - it appears that we've settled on two different but related features here:

          1. To enable node labels to be added or removed for a given node without validation against a centralized list of node labels (not on this patch, but relevant to the discussion) and
          2. To enable node managers to specify their node labels based on local configuration and scripting (this patch is specific to that feature).

          These are, strictly speaking, orthogonal, but may be used together and will provide something more in a 'combined feature'

          A couple things about this feature (2) - I don't believe that it is necessary to add the node label configuration to the local configuration (yarn-site) or the heartbeat as such to enable configuration of labels for a node from the node in a decentralized fashion (e.g. a script on the node saying "put these labels on me"). This can already be accomplished using the admin cli from a script or calling a web service from the node (most likely the former, but either is possible...), so I don't think we need this change to support the script case, it's already possible to write a script to add a label to a node on the fly today without any changes. To make this dynamic we would need feature 1, from the above, but that's not covered in this patch / is a separate discussion / and also does not require this change. Also, I don't see how this change allows a script to dynamically configure labels unless it was changing the yarn-site or the like (I may have missed it, but I don't see that logic here) - and in any case, it would not be necessary to add this logic to support that sort of configuration as I pointed out. Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? It's not needed for the dynamic script case, which is all I've seen discussed here from a "requirements" perspective (putting it into the config file / adding it to the heartbeat is implementation, I don't see a requirement for it as such).

          In a nutshell - do we need change 2 (this), or do we really just need change 1 (eliminating validation of labels against a centralized list, at least as a configurable option)?

          Show
          cwelch Craig Welch added a comment - Sorry if I'm jumping in late and asking redundant questions as a result, but I've gone through the various related jiras and the design documents (incl updates) ( and the patch ) and I have some requirements related questions as a result. Just a bit of background to make sure I understand things - it appears that we've settled on two different but related features here: 1. To enable node labels to be added or removed for a given node without validation against a centralized list of node labels (not on this patch, but relevant to the discussion) and 2. To enable node managers to specify their node labels based on local configuration and scripting (this patch is specific to that feature). These are, strictly speaking, orthogonal, but may be used together and will provide something more in a 'combined feature' A couple things about this feature (2) - I don't believe that it is necessary to add the node label configuration to the local configuration (yarn-site) or the heartbeat as such to enable configuration of labels for a node from the node in a decentralized fashion (e.g. a script on the node saying "put these labels on me"). This can already be accomplished using the admin cli from a script or calling a web service from the node (most likely the former, but either is possible...), so I don't think we need this change to support the script case, it's already possible to write a script to add a label to a node on the fly today without any changes. To make this dynamic we would need feature 1, from the above, but that's not covered in this patch / is a separate discussion / and also does not require this change. Also, I don't see how this change allows a script to dynamically configure labels unless it was changing the yarn-site or the like (I may have missed it, but I don't see that logic here) - and in any case, it would not be necessary to add this logic to support that sort of configuration as I pointed out. Is this all just to support putting labels into the node-managers configuration file and introducing them that way? Do we have a solid need for that? It's not needed for the dynamic script case, which is all I've seen discussed here from a "requirements" perspective (putting it into the config file / adding it to the heartbeat is implementation, I don't see a requirement for it as such). In a nutshell - do we need change 2 (this), or do we really just need change 1 (eliminating validation of labels against a centralized list, at least as a configurable option)?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda
          All the test case issues reported seems to be not related to my changes .

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda All the test case issues reported seems to be not related to my changes .
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch
          against trunk revision 144da2e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 38 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
          org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
          org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6036//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6036//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6036//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6036//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch against trunk revision 144da2e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 38 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6036//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6036//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6036//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6036//console This message is automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Naganarasimha G R,
          I meant you can leave it empty in this patch and set it after you have completed script/conf-based patch, just add a TODO comment should be fine.
          Make sense?

          Show
          leftnoteasy Wangda Tan added a comment - Naganarasimha G R , I meant you can leave it empty in this patch and set it after you have completed script/conf-based patch, just add a TODO comment should be fine. Make sense?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          hi Tan, Wangda,
          I have set the default implementation class to be null as per your comment, but as you mentioned after review, based on feedback need to set default class with either configuration based or script based Node Labels provider as default provider class

          Show
          Naganarasimha Naganarasimha G R added a comment - hi Tan, Wangda , I have set the default implementation class to be null as per your comment, but as you mentioned after review, based on feedback need to set default class with either configuration based or script based Node Labels provider as default provider class
          Hide
          leftnoteasy Wangda Tan added a comment -

          Naganarasimha G R,
          You don't have to specify the default implementation here. I think you can set the default implementation class to be null, with this review completed, we can decide either script-based or conf-based provider as the default implementation. Make sense?

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Naganarasimha G R , You don't have to specify the default implementation here. I think you can set the default implementation class to be null, with this review completed, we can decide either script-based or conf-based provider as the default implementation. Make sense? Thanks, Wangda
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12685133/YARN-2495.20141204-1.patch
          against trunk revision 0653918.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6001//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6001//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685133/YARN-2495.20141204-1.patch against trunk revision 0653918. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6001//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6001//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          Yes i wanted to completely remove it, but more than test cases using it was used in NodeManager.createNodeLabelsProviderService()

          Class<? extends NodeLabelsProviderService> labelsProviderClass =
          conf.getClass(YarnConfiguration.NM_NODE_LABELS_PROVIDER_CLASS,
          ConfigurationNodeLabelsProvider.class,
          NodeLabelsProviderService.class);
          provider = labelsProviderClass.newInstance();

          Which i felt is required and hence gave a stub class to segregate the patch

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Yes i wanted to completely remove it, but more than test cases using it was used in NodeManager.createNodeLabelsProviderService() Class<? extends NodeLabelsProviderService> labelsProviderClass = conf.getClass(YarnConfiguration.NM_NODE_LABELS_PROVIDER_CLASS, ConfigurationNodeLabelsProvider.class , NodeLabelsProviderService.class); provider = labelsProviderClass.newInstance(); Which i felt is required and hence gave a stub class to segregate the patch
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naganarasimha G R,
          What I meant is completely remove conf-based node label provider implementation (not leave a not-implemented class it in the patch), it should be fine to use a dummy node label provider in tests.

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naganarasimha G R , What I meant is completely remove conf-based node label provider implementation (not leave a not-implemented class it in the patch), it should be fine to use a dummy node label provider in tests.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          I have done the required modifications for segregating conf-based node label provider implementation to a separated ticket.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , I have done the required modifications for segregating conf-based node label provider implementation to a separated ticket.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Naganarasimha G R,
          Since size of the patch grows, and it will be hard for new people to review. I suggest to move conf-based node label provider implementation to a separated ticket under YARN-2492?
          And update title of this ticket accordingly.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Naganarasimha G R , Since size of the patch grows, and it will be hard for new people to review. I suggest to move conf-based node label provider implementation to a separated ticket under YARN-2492 ? And update title of this ticket accordingly. Thanks, Wangda
          Hide
          leftnoteasy Wangda Tan added a comment -

          Naganarasimha G R,

          In this case i feel we need to start discussing on YARN-2728 : Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup.

          Yes we need, that is originally why I created that ticket, thanks for taking that. Could you add your proposal to YARN-2728 first? We can go through it first before working on a patch.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Naganarasimha G R , In this case i feel we need to start discussing on YARN-2728 : Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup. Yes we need, that is originally why I created that ticket, thanks for taking that. Could you add your proposal to YARN-2728 first? We can go through it first before working on a patch. Thanks, Wangda
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12683672/YARN-2495.20141126-1.patch
          against trunk revision 78f7cdb.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5936//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5936//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683672/YARN-2495.20141126-1.patch against trunk revision 78f7cdb. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5936//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5936//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda, corrected the earlier 3 review comments and uploading the patch .

          I was going through YARN-2786 and Vinod had mentioned in one of his comments

          IAC, the crux of what you want is a per node script based configuration. That IS happening as we speak as part of the tickets I referred to. Further, you don't want a central list of valid labels which I disagree with. But sure, we will make this configurable to or not to require a central list so that you can turn it off. Others like me can turn it on to avoid random-labels-bubbling-up-in-my-cluster madness.

          In this case i feel we need to start discussing on YARN-2728 : Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , corrected the earlier 3 review comments and uploading the patch . I was going through YARN-2786 and Vinod had mentioned in one of his comments IAC, the crux of what you want is a per node script based configuration. That IS happening as we speak as part of the tickets I referred to. Further, you don't want a central list of valid labels which I disagree with. But sure, we will make this configurable to or not to require a central list so that you can turn it off. Others like me can turn it on to avoid random-labels-bubbling-up-in-my-cluster madness. In this case i feel we need to start discussing on YARN-2728 : Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naganarasimha G R,
          For 1#, sorry my bad, it has been more than half month since that comment . I prefer use a consistent way, we have a "isDecentralizedNodeLabelConfigurationEnabled", let's use it. Do you agree?

          Tests looks good to me.

          Thanks

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naganarasimha G R , For 1#, sorry my bad, it has been more than half month since that comment . I prefer use a consistent way, we have a "isDecentralizedNodeLabelConfigurationEnabled", let's use it. Do you agree? Tests looks good to me. Thanks
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda, Thanks for the review

          1) As same as ResourceTrackerService, it's better to have a field like isDecentralizedNodeLabelConfigurationEnabled (or some other name you like) in NodeStatusUpdaterImpl. Should be more clear than statement like

          Earlier in one of your comment, you had mentioned as

          5) NodeStatusUpdaterImpl:
          isDecentralizedNodeLabelsConf may not need here, if nodeLablesProvider passed in is null. That means isDecentralizedNodeLabelsConf is false.
          + nodeLabelsForHeartBeat = null;
          + if (isDecentralizedNodeLabelsConf) {
          ...

          so had changed it , may be earlier you meant some thing else and i read the comment wrongly ?

          for 2 & 3 will rework and share the patch, but hope test cases are fine too ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Thanks for the review 1) As same as ResourceTrackerService, it's better to have a field like isDecentralizedNodeLabelConfigurationEnabled (or some other name you like) in NodeStatusUpdaterImpl. Should be more clear than statement like Earlier in one of your comment , you had mentioned as 5) NodeStatusUpdaterImpl: isDecentralizedNodeLabelsConf may not need here, if nodeLablesProvider passed in is null. That means isDecentralizedNodeLabelsConf is false. + nodeLabelsForHeartBeat = null; + if (isDecentralizedNodeLabelsConf) { ... so had changed it , may be earlier you meant some thing else and i read the comment wrongly ? for 2 & 3 will rework and share the patch, but hope test cases are fine too ?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naganarasimha G R,
          Thanks for update,
          Several minor comments,
          1) As same as ResourceTrackerService, it's better to have a field like isDecentralizedNodeLabelConfigurationEnabled (or some other name you like) in NodeStatusUpdaterImpl. Should be more clear than statement like

          +            if (nodeLabelsProvider!=null) {
          

          2) In ResourceTrackerService,
          The message:

               String message =
                   "NodeManager from node " + host + "(cmPort: " + cmPort + " httpPort: "
                       + httpPort + ") " + "registered with capability: " + capability
          -            + ", assigned nodeId " + nodeId;
          +            + ", assigned nodeId " + nodeId + ", node labels { "
          +            + StringUtils.join(",", nodeLabels) + " } ";
          

          Should add a check, only logging node labels message when replace is succeeded. Ideally you should have a StringBuilder do this

          3) A style suggestion is, as convention, bi-opts like "=", "!=", "+", etc. should leave a space between and after it. I can see several occurrences in the patch.

          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naganarasimha G R , Thanks for update, Several minor comments, 1) As same as ResourceTrackerService, it's better to have a field like isDecentralizedNodeLabelConfigurationEnabled (or some other name you like) in NodeStatusUpdaterImpl. Should be more clear than statement like + if (nodeLabelsProvider!= null ) { 2) In ResourceTrackerService, The message: String message = "NodeManager from node " + host + "(cmPort: " + cmPort + " httpPort: " + httpPort + ") " + "registered with capability: " + capability - + ", assigned nodeId " + nodeId; + + ", assigned nodeId " + nodeId + ", node labels { " + + StringUtils.join( "," , nodeLabels) + " } " ; Should add a check, only logging node labels message when replace is succeeded. Ideally you should have a StringBuilder do this 3) A style suggestion is, as convention, bi-opts like "=", "!=", "+", etc. should leave a space between and after it. I can see several occurrences in the patch. Wangda
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Following 3 test failure issues does not seem to be introduced from of my modifications ....

          TestApplicationClientProtocolOnHA.testGetContainersOnHA:154
          TestApplicationClientProtocolOnHA.testSubmitApplicationOnHA:173
          TestApplicationClientProtocolOnHA.testGetClusterMetricsOnHA:85

          Show
          Naganarasimha Naganarasimha G R added a comment - Following 3 test failure issues does not seem to be introduced from of my modifications .... TestApplicationClientProtocolOnHA.testGetContainersOnHA:154 TestApplicationClientProtocolOnHA.testSubmitApplicationOnHA:173 TestApplicationClientProtocolOnHA.testGetClusterMetricsOnHA:85
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12682448/YARN-2495.20141119-1.patch
          against trunk revision 5bd048e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5880//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5880//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682448/YARN-2495.20141119-1.patch against trunk revision 5bd048e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5880//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5880//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Updating patch With Tan, Wangda 's review comments...

          Show
          Naganarasimha Naganarasimha G R added a comment - Updating patch With Tan, Wangda 's review comments...
          Hide
          leftnoteasy Wangda Tan added a comment -

          ... hence shall i create a new class extending TestPBImplRecords in hadoop-yarn-server-common project. ?

          This is an issue, I suggest to keep your code as-is, but please add checks in your tests for the values you added. And in the future, PB objects in h-y-sever-common should have an easier way to do testing as h-y-common.

          Show
          leftnoteasy Wangda Tan added a comment - ... hence shall i create a new class extending TestPBImplRecords in hadoop-yarn-server-common project. ? This is an issue, I suggest to keep your code as-is, but please add checks in your tests for the values you added. And in the future, PB objects in h-y-sever-common should have an easier way to do testing as h-y-common.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          The benefit are
          1) You don't have to update test cases for that
          2) The semanic are clear, create a register request with label or not.

          True, and will be able to revert some unwanted testcase modification. have corrected it.

          I suggest to have different option for script-based/config-based, even if we can combine them together.

          Ok, will have different config param for script and config based

          IIUC, NM_NODE_LABELS_FROM_CONFIG is a list of labels, even if we want to separate the two properties, we cannot remove NM_NODE_LABELS_FROM_CONFIG, correct?

          Had searched it wrongly and as you mentioned the name of was not good enough for me to recollect back too. corrected it

          I think it's better to leverage existing utility class instead of implement your own. For example, you have set values but not check them, which is incorrect, but using utility class can avoid such problem. Even if you added new fields, tests will cover them without any changes:

          Problem is TestPBImplRecords is in hadoop-yarn-common project and NodeHeartbeatRequestPBImpl and others are in hadoop-yarn-server-common project. So as we cant add dependency on hadoop-yarn-server-common in hadoop-yarn-common, hence shall i create a new class extending TestPBImplRecords in hadoop-yarn-server-common project. ?

          Show
          Naganarasimha Naganarasimha G R added a comment - The benefit are 1) You don't have to update test cases for that 2) The semanic are clear, create a register request with label or not. True, and will be able to revert some unwanted testcase modification. have corrected it. I suggest to have different option for script-based/config-based, even if we can combine them together. Ok, will have different config param for script and config based IIUC, NM_NODE_LABELS_FROM_CONFIG is a list of labels, even if we want to separate the two properties, we cannot remove NM_NODE_LABELS_FROM_CONFIG, correct? Had searched it wrongly and as you mentioned the name of was not good enough for me to recollect back too. corrected it I think it's better to leverage existing utility class instead of implement your own. For example, you have set values but not check them, which is incorrect, but using utility class can avoid such problem. Even if you added new fields, tests will cover them without any changes: Problem is TestPBImplRecords is in hadoop-yarn-common project and NodeHeartbeatRequestPBImpl and others are in hadoop-yarn-server-common project. So as we cant add dependency on hadoop-yarn-server-common in hadoop-yarn-common , hence shall i create a new class extending TestPBImplRecords in hadoop-yarn-server-common project. ?
          Hide
          leftnoteasy Wangda Tan added a comment -

          I have already added the below code in the NodeHeartbeatRequest interface for this(or i dint get your comment correctly, please elaborate):

          I think I've misreading your patch, your patch should have done what we discusse.

          This if check will be there in the overloaded case to identify which overloaded method to choose right ? i am not able to see any benefit from this.

          The benefit are
          1) You don't have to update test cases for that
          2) The semanic are clear, create a register request with label or not.

          NM_NODE_LABELS_FETCH_INTERVAL_MS & DEFAULT_NM_NODE_LABELS_FETCH_INTERVAL_MS is used in both the script based and also config based hence had not moved it under config-based ?

          I suggest to have different option for script-based/config-based, even if we can combine them together.

          if reuse then can delete NM_NODE_LABELS_FROM_CONFIG, as its not used.

          IIUC, NM_NODE_LABELS_FROM_CONFIG is a list of labels, even if we want to separate the two properties, we cannot remove NM_NODE_LABELS_FROM_CONFIG, correct?
          (But I still suggest you to change it to: NM_NODE_LABELS_PREFIX + "config-based" + ".node-labels".)

          These i wanted to discuss with you , based on your patch changes for labels had figured out this class but as i was modifying the existing PB class awas wondering why these existing PB's are not added here.

          I think it's better to leverage existing utility class instead of implement your own. For example, you have set values but not check them, which is incorrect, but using utility class can avoid such problem. Even if you added new fields, tests will cover them without any changes:

          -            ApplicationId.newInstance(1234L, 2)));
          +            ApplicationId.newInstance(1234L, 2)),new HashSet<String>());
          

          Will include review of unit tests after you uploaded new patch.

          Show
          leftnoteasy Wangda Tan added a comment - I have already added the below code in the NodeHeartbeatRequest interface for this(or i dint get your comment correctly, please elaborate): I think I've misreading your patch, your patch should have done what we discusse. This if check will be there in the overloaded case to identify which overloaded method to choose right ? i am not able to see any benefit from this. The benefit are 1) You don't have to update test cases for that 2) The semanic are clear, create a register request with label or not. NM_NODE_LABELS_FETCH_INTERVAL_MS & DEFAULT_NM_NODE_LABELS_FETCH_INTERVAL_MS is used in both the script based and also config based hence had not moved it under config-based ? I suggest to have different option for script-based/config-based, even if we can combine them together. if reuse then can delete NM_NODE_LABELS_FROM_CONFIG, as its not used. IIUC, NM_NODE_LABELS_FROM_CONFIG is a list of labels, even if we want to separate the two properties, we cannot remove NM_NODE_LABELS_FROM_CONFIG, correct? (But I still suggest you to change it to: NM_NODE_LABELS_PREFIX + "config-based" + ".node-labels".) These i wanted to discuss with you , based on your patch changes for labels had figured out this class but as i was modifying the existing PB class awas wondering why these existing PB's are not added here. I think it's better to leverage existing utility class instead of implement your own. For example, you have set values but not check them, which is incorrect, but using utility class can avoid such problem. Even if you added new fields, tests will cover them without any changes: - ApplicationId.newInstance(1234L, 2))); + ApplicationId.newInstance(1234L, 2)), new HashSet< String >()); Will include review of unit tests after you uploaded new patch.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda, Thanks for reviewing and sorry for the delayed reply ...

          Major comments:
          1.
          Makes sense to me, I suggest to add a field like node-labels-updated in NodeHeartbeatRequest

          I have already added the below code in the NodeHeartbeatRequest interface for this(or i dint get your comment correctly, please elaborate):

          @@ -26,7 +28,8 @@

          public static NodeHeartbeatRequest newInstance(NodeStatus nodeStatus,
          MasterKey lastKnownContainerTokenMasterKey,

          • MasterKey lastKnownNMTokenMasterKey) {
            + MasterKey lastKnownNMTokenMasterKey, Set<String> nodeLabels,
            + boolean isNodeLabelsUpdated

          @@ -45,4 +50,10 @@ public static NodeHeartbeatRequest newInstance(NodeStatus nodeStatus,
          + public abstract boolean isNodeLabelsUpdated();
          + public abstract void setNodeLabelsUpdated(boolean isNodeLabelsUpdated);

          3.Suggest to create a overload newInstance function without labels for RegisterNodeManagerRequest to avoid check like:
          (nodeLabelsProvider == null) ? null : nodeLabels);

          This if check will be there in the overloaded case to identify which overloaded method to choose right ? i am not able to see any benefit from this.

          4.YarnConfiguration:
          NM_NODE_LABELS_FROM_CONFIG sounds like a boolean value to me, how about call it NM_NODE_LABELS_PREFIX + "config-based" + ".node-labels"?
          NM_NODE_LABELS_FETCH_INTERVAL_MS should also be a part of config-based.

          NM_NODE_LABELS_FETCH_INTERVAL_MS & DEFAULT_NM_NODE_LABELS_FETCH_INTERVAL_MS is used in both the script based and also config based hence had not moved it under config-based ?
          whats your opinion having two properties separately for script and config based is better? if reuse then can delete NM_NODE_LABELS_FROM_CONFIG, as its not used.

          5. PB tests
          I think you can leverage TestPBImplRecords do all PB related tests, does it enough?

          These i wanted to discuss with you , based on your patch changes for labels had figured out this class but as i was modifying the existing PB class awas wondering why these existing PB's are not added here.

          Others comments have reworked, after clarifications of the above points will upload the patch tomorrow...

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , Thanks for reviewing and sorry for the delayed reply ... Major comments: 1. Makes sense to me, I suggest to add a field like node-labels-updated in NodeHeartbeatRequest I have already added the below code in the NodeHeartbeatRequest interface for this(or i dint get your comment correctly, please elaborate): @@ -26,7 +28,8 @@ public static NodeHeartbeatRequest newInstance(NodeStatus nodeStatus, MasterKey lastKnownContainerTokenMasterKey, MasterKey lastKnownNMTokenMasterKey) { + MasterKey lastKnownNMTokenMasterKey, Set<String> nodeLabels, + boolean isNodeLabelsUpdated @@ -45,4 +50,10 @@ public static NodeHeartbeatRequest newInstance(NodeStatus nodeStatus, + public abstract boolean isNodeLabelsUpdated(); + public abstract void setNodeLabelsUpdated(boolean isNodeLabelsUpdated); 3.Suggest to create a overload newInstance function without labels for RegisterNodeManagerRequest to avoid check like: (nodeLabelsProvider == null) ? null : nodeLabels); This if check will be there in the overloaded case to identify which overloaded method to choose right ? i am not able to see any benefit from this. 4.YarnConfiguration: NM_NODE_LABELS_FROM_CONFIG sounds like a boolean value to me, how about call it NM_NODE_LABELS_PREFIX + "config-based" + ".node-labels"? NM_NODE_LABELS_FETCH_INTERVAL_MS should also be a part of config-based. NM_NODE_LABELS_FETCH_INTERVAL_MS & DEFAULT_NM_NODE_LABELS_FETCH_INTERVAL_MS is used in both the script based and also config based hence had not moved it under config-based ? whats your opinion having two properties separately for script and config based is better? if reuse then can delete NM_NODE_LABELS_FROM_CONFIG, as its not used. 5. PB tests I think you can leverage TestPBImplRecords do all PB related tests, does it enough? These i wanted to discuss with you , based on your patch changes for labels had figured out this class but as i was modifying the existing PB class awas wondering why these existing PB's are not added here. Others comments have reworked, after clarifications of the above points will upload the patch tomorrow...
          Hide
          leftnoteasy Wangda Tan added a comment -

          Major comments:
          1.
          As I commented, https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14184146&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14184146, first line.
          I think the statement

          if number of labels are more than the weight of heartbeat message will be more and if the cluster nodes are more than more network IO, checking of labels from Prev state to current state for all nodes is done in RM in earlier method each NM was taking care of it self. Lot of read locks needs to be held @ NodesLabelsManager

          Makes sense to me, I suggest to add a field like node-labels-updated in NodeHeartbeatRequest

          Minor comments:
          1.

          static NodeLabelsProviderService createNodeLabelsProviderService

          It's better to make this not static or protected in case the tests can override it. Since it is only used by test, I don't suggest to make it static for that.

          2.
          NodeStatusUpdater:
          2.1

          +    @SuppressWarnings("unchecked")
          +    Set<String> nodeLabels =
          +        (nodeLabelsProvider == null) ? Collections.EMPTY_SET
          

          You can use CommonNodeLabelsManager##EMPTY_STRING_SET to avoid the warning
          2.2
          It's better to name areLabelsSameAsPrevOutPut -> areNodeLabelsUnchanged, it's shorter and has same semantics of nodeLabelsUpdated.
          2.3
          Need handle null value for areNodeLabelsUnchanged (or areLabelsSameAsPrevOutPut previously). Since others can implement a new NodeLabelsProvider
          2.4

          + Set<String> nodeLabelsForHeartBeat = null;

          Shoud put in while (...) { }

          3.
          Suggest to create a overload newInstance function without labels for RegisterNodeManagerRequest to avoid check like:

          +            (nodeLabelsProvider == null) ? null : nodeLabels);
          

          4.
          YarnConfiguration:

          • ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION -> DECENTRALIZED_NODE_LABELS_CONFIGURATION_ENABLED
          • labels-provider.class -> node-labels-provider.class
          • For configuration file based options, I suggest to add a same prefix for them, such as:
            NM_NODE_LABELS_PREFIX + "config-based"
          • NM_NODE_LABELS_FROM_CONFIG sounds like a boolean value to me, how about call it NM_NODE_LABELS_PREFIX + "config-based" + ".node-labels"?
          • NM_NODE_LABELS_FETCH_INTERVAL_MS should also be a part of config-based.
          • Remove "+ /** End of Configurations in NodeManager for NodeLabelsFeature*/"

          5. PB tests
          I think you can leverage TestPBImplRecords do all PB related tests, does it enough?

          6. ConfigurationNodeLabelsProvider

          • checkForNodeLabelModification: how about call it updateNodeLabelsFromConfig? Since the checkForNodeLabelModification is more like a read-only operation to me.
          • Remove
            +    } else {
            +      if (LOG.isDebugEnabled())
            +        LOG.debug(errorMsg.toString());
            +    }
            

            Since the error message is handled by ConfigurationMonitorTimerTask

          • + e.printStackTrace(); to LOG.error(e)

          7. NodeLabelsProviderService:

          @return

          Need to be added for getNodeLabels, or javadocs warning will raise

          8. Change the NodeHeartbeatResponse#getAcceptNodeLabels -> getIsNodeLabelsAcceptedByRM, same as RegisterResponse (Or some beter name you can think about, since the getAcceptedNodeLabels is more like a list of labels)

          9. ResourceTrackerService:
          9.1

          +        LOG.info("Node Labels {" + StringUtils.join(",", nodeLabels)
          +            + "} of NodeManager from  " + host
          +            + " is not properly configured: " + ex.getMessage() + " ");
          

          Again, should use error here

          9.2
          The node label update logic for register/hearbeat are very similar, could you merge common part of them?

          Will include review of tests in next turn.

          Show
          leftnoteasy Wangda Tan added a comment - Major comments: 1. As I commented, https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14184146&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14184146 , first line. I think the statement if number of labels are more than the weight of heartbeat message will be more and if the cluster nodes are more than more network IO, checking of labels from Prev state to current state for all nodes is done in RM in earlier method each NM was taking care of it self. Lot of read locks needs to be held @ NodesLabelsManager Makes sense to me, I suggest to add a field like node-labels-updated in NodeHeartbeatRequest Minor comments: 1. static NodeLabelsProviderService createNodeLabelsProviderService It's better to make this not static or protected in case the tests can override it. Since it is only used by test, I don't suggest to make it static for that. 2. NodeStatusUpdater: 2.1 + @SuppressWarnings( "unchecked" ) + Set< String > nodeLabels = + (nodeLabelsProvider == null ) ? Collections.EMPTY_SET You can use CommonNodeLabelsManager##EMPTY_STRING_SET to avoid the warning 2.2 It's better to name areLabelsSameAsPrevOutPut -> areNodeLabelsUnchanged, it's shorter and has same semantics of nodeLabelsUpdated. 2.3 Need handle null value for areNodeLabelsUnchanged (or areLabelsSameAsPrevOutPut previously). Since others can implement a new NodeLabelsProvider 2.4 + Set<String> nodeLabelsForHeartBeat = null; Shoud put in while (...) { } 3. Suggest to create a overload newInstance function without labels for RegisterNodeManagerRequest to avoid check like: + (nodeLabelsProvider == null ) ? null : nodeLabels); 4. YarnConfiguration: ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION -> DECENTRALIZED_NODE_LABELS_CONFIGURATION_ENABLED labels-provider.class -> node-labels-provider.class For configuration file based options, I suggest to add a same prefix for them, such as: NM_NODE_LABELS_PREFIX + "config-based" NM_NODE_LABELS_FROM_CONFIG sounds like a boolean value to me, how about call it NM_NODE_LABELS_PREFIX + "config-based" + ".node-labels"? NM_NODE_LABELS_FETCH_INTERVAL_MS should also be a part of config-based. Remove "+ /** End of Configurations in NodeManager for NodeLabelsFeature*/" 5. PB tests I think you can leverage TestPBImplRecords do all PB related tests, does it enough? 6. ConfigurationNodeLabelsProvider checkForNodeLabelModification: how about call it updateNodeLabelsFromConfig? Since the checkForNodeLabelModification is more like a read-only operation to me. Remove + } else { + if (LOG.isDebugEnabled()) + LOG.debug(errorMsg.toString()); + } Since the error message is handled by ConfigurationMonitorTimerTask + e.printStackTrace(); to LOG.error(e) 7. NodeLabelsProviderService: @return Need to be added for getNodeLabels, or javadocs warning will raise 8. Change the NodeHeartbeatResponse#getAcceptNodeLabels -> getIsNodeLabelsAcceptedByRM, same as RegisterResponse (Or some beter name you can think about, since the getAcceptedNodeLabels is more like a list of labels) 9. ResourceTrackerService: 9.1 + LOG.info( "Node Labels {" + StringUtils.join( "," , nodeLabels) + + "} of NodeManager from " + host + + " is not properly configured: " + ex.getMessage() + " " ); Again, should use error here 9.2 The node label update logic for register/hearbeat are very similar, could you merge common part of them? Will include review of tests in next turn.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12678260/YARN-2495.20141031-1.patch
          against trunk revision d1828d9.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
          org.apache.hadoop.yarn.client.api.impl.TestNMClient
          org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
          org.apache.hadoop.yarn.client.TestRMFailover
          org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
          org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot
          org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
          org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
          org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
          org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
          org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServer
          org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
          org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels
          org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown

          The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
          org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5652//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5652//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678260/YARN-2495.20141031-1.patch against trunk revision d1828d9. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServer org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5652//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5652//console This message is automatically generated.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Submitting the patch

          Show
          Naganarasimha Naganarasimha G R added a comment - Submitting the patch
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Updated with fixes for all review comments,testcases and rebasing to the trunk code

          Show
          Naganarasimha Naganarasimha G R added a comment - Updated with fixes for all review comments,testcases and rebasing to the trunk code
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda,
          I am uploading a patch with all the review comments fixed and with test cases, but i need to rebase it based on the latest code in trunk which i will do it tomorrow morning . With this patch you can review and if fine will submit the patch after re basing tomorrow

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda , I am uploading a patch with all the review comments fixed and with test cases, but i need to rebase it based on the latest code in trunk which i will do it tomorrow morning . With this patch you can review and if fine will submit the patch after re basing tomorrow
          Hide
          sunilg Sunil G added a comment -

          Hi NGarla_Unused Wangda Tan (No longer used)

          I have one comment here.

          if exception raise when replace labels on node, put the new labels

          I assume current idea is in such a way that if one label from NM is found as invalid, complete labels sent by NM in that heartbeat will be considered as error.

          Could it be done in such a way that, give a number of labels from NM as 5 at a given heartbeat. In that 2 labels are found to be invalid by RM. If so, why could we accept the 3 passed labels, and send back 2 rejected labels back to NM. This may make the work better and easier to handle.

          How do you feel?

          Show
          sunilg Sunil G added a comment - Hi NGarla_Unused Wangda Tan (No longer used) I have one comment here. if exception raise when replace labels on node, put the new labels I assume current idea is in such a way that if one label from NM is found as invalid, complete labels sent by NM in that heartbeat will be considered as error. Could it be done in such a way that, give a number of labels from NM as 5 at a given heartbeat. In that 2 labels are found to be invalid by RM. If so, why could we accept the 3 passed labels, and send back 2 rejected labels back to NM. This may make the work better and easier to handle. How do you feel?
          Hide
          leftnoteasy Wangda Tan added a comment -
          if number of labels are more than the weight of heartbeat message will be more and if the cluster nodes are more than more network IO
          checking of labels from Prev state to current state for all nodes is done in RM in earlier method each NM was taking care of it self.
          Lot of read locks needs to be held @ NodesLabelsManager
          

          Make sense, let's do that way

          Also as we either accept all labels or reject all labels can we have a flag whether RM accepted the labels or not ? and modify response proto when the NodeLabelsManager Interface changes ?

          Also make sense to me, and I think a corner case is, if a NM ask for remove all labels (pass a empty list), and how RM set reject list? So I agree to use a flag say if the last sync about node labels is success or not

          RegisterNodeManagerRequestProto and NodeHeartbeatRequestProto are modified in this file ... diff shows one line after the modification which looks like i have modified RegisterNodeManagerResponseProto

          Yes, you're correct, I misread the patch.

          So do we need to do address this scenario by adding some boolean flag for DeCentralizedConfigEnable in RegisterNodeManagerRequestProto ?

          I think we shouldn't, in such case, we should let RM follow what configured in RM side. Basically, it's a configuration error should be avoid.
          So if RM=centralized, NM=decentralized, just print error and reject such labels.
          If RM=decentralized, NM=centralized, client side will receive error message.

          Unknown macro: { createNodeStatusUpdater(context, dispatcher, nodeHealthChecker,nodeLabelsProviderService); }
          is it ok ?
          

          That's fine, basically I think we should avoid modify it everywhere.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - if number of labels are more than the weight of heartbeat message will be more and if the cluster nodes are more than more network IO checking of labels from Prev state to current state for all nodes is done in RM in earlier method each NM was taking care of it self. Lot of read locks needs to be held @ NodesLabelsManager Make sense, let's do that way Also as we either accept all labels or reject all labels can we have a flag whether RM accepted the labels or not ? and modify response proto when the NodeLabelsManager Interface changes ? Also make sense to me, and I think a corner case is, if a NM ask for remove all labels (pass a empty list), and how RM set reject list? So I agree to use a flag say if the last sync about node labels is success or not RegisterNodeManagerRequestProto and NodeHeartbeatRequestProto are modified in this file ... diff shows one line after the modification which looks like i have modified RegisterNodeManagerResponseProto Yes, you're correct, I misread the patch. So do we need to do address this scenario by adding some boolean flag for DeCentralizedConfigEnable in RegisterNodeManagerRequestProto ? I think we shouldn't, in such case, we should let RM follow what configured in RM side. Basically, it's a configuration error should be avoid. So if RM=centralized, NM=decentralized, just print error and reject such labels. If RM=decentralized, NM=centralized, client side will receive error message. Unknown macro: { createNodeStatusUpdater(context, dispatcher, nodeHealthChecker,nodeLabelsProviderService); } is it ok ? That's fine, basically I think we should avoid modify it everywhere. Thanks, Wangda
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          3) NodeManager:
          createNodeStatusUpdater : I suggest to create a overload method without the nodeLabelsProviderService to avoid lots of changes in test/mock classes.

          NodeManager during initservice method is calling a protected createNodeStatusUpdater method. So even though we add overloaded method initservice will call the method which takes
          nodeLabelsProviderService or we need to add code like

          if(null==nodeLabelsProviderService)

          Unknown macro: { createNodeStatusUpdater(context, dispatcher, nodeHealthChecker); }

          else

          Unknown macro: { createNodeStatusUpdater(context, dispatcher, nodeHealthChecker,nodeLabelsProviderService); }

          is it ok ?

          Show
          Naganarasimha Naganarasimha G R added a comment - 3) NodeManager: createNodeStatusUpdater : I suggest to create a overload method without the nodeLabelsProviderService to avoid lots of changes in test/mock classes. NodeManager during initservice method is calling a protected createNodeStatusUpdater method. So even though we add overloaded method initservice will call the method which takes nodeLabelsProviderService or we need to add code like if(null==nodeLabelsProviderService) Unknown macro: { createNodeStatusUpdater(context, dispatcher, nodeHealthChecker); } else Unknown macro: { createNodeStatusUpdater(context, dispatcher, nodeHealthChecker,nodeLabelsProviderService); } is it ok ?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          2.NodeHeartbeatRequestPBImpl:
          2.3. Everytime set the up-to-date labels to NodeHeartbeatRequest when do heartbeat.
          #1 and #2 will all need add more fields in NodeHeartbeatRequest. I suggest to do in #3, it's more simple and we can improve it in further JIRA.

          I missed to discuss about this point in particular. If we do the 3rd approach, on every heartbeat in RM side we need to take either of the 2 approach

          • Invoke getNodeLabelManager().replaceLabelsOnNode() which will validate the labels and then replace even though there is no change in the labels
          • Resource tracker service validates the labels from the heartbeat with the NodeLabels manager and if modified Invoke getNodeLabelManager().replaceLabelsOnNode()

          Both of these approach are costlier because of following impacts :

          • if number of labels are more than the weight of heartbeat message will be more and if the cluster nodes are more than more network IO
          • checking of labels from Prev state to current state for all nodes is done in RM in earlier method each NM was taking care of it self.
          • Lot of read locks needs to be held @ NodesLabelsManager

          So based on these points would like to prefer option 1 as its minimal change and anyway we have already modified the request and response (proposed reject labels list)

          5. NodeStatusUpdaterImpl:

          • In RM ResourceTracker, if exception raise when replace labels on node, put the new labels to reject node labels to response.
          • In NM NodeStatusUpdater, if reject node labels is not null, LOG.error rejected node labels, and print diagnostic message.

          you meant "reject node labels is not empty" right based on comment2 null cannot be identified for repeated fields
          Also as we either accept all labels or reject all labels can we have a flag whether RM accepted the labels or not ? and modify response proto when the NodeLabelsManager Interface changes ?

          6) yarn_server_common_service_protos.proto

          RegisterNodeManagerRequestProto and NodeHeartbeatRequestProto are modified in this file ... diff shows one line after the modification which looks like i have modified RegisterNodeManagerResponseProto

          9 It no need to send shutdown message when any of the labels not accepted by RMNodeLabelsManager. Just add them to a reject node labels list, and add diagnostic message should be enough.

          k this will take care but earlier i tried to send shutdown message because i wanted avoid the scenario where RM is confiugred for CentralNodeLabel and NM for distributed
          But as per your earlier comment "PB cannot tell difference between null and empty for repeated fields.".
          So do we need to do address this scenario by adding some boolean flag for DeCentralizedConfigEnable in RegisterNodeManagerRequestProto ?

          Show
          Naganarasimha Naganarasimha G R added a comment - 2.NodeHeartbeatRequestPBImpl: 2.3. Everytime set the up-to-date labels to NodeHeartbeatRequest when do heartbeat. #1 and #2 will all need add more fields in NodeHeartbeatRequest. I suggest to do in #3, it's more simple and we can improve it in further JIRA. I missed to discuss about this point in particular. If we do the 3rd approach, on every heartbeat in RM side we need to take either of the 2 approach Invoke getNodeLabelManager().replaceLabelsOnNode() which will validate the labels and then replace even though there is no change in the labels Resource tracker service validates the labels from the heartbeat with the NodeLabels manager and if modified Invoke getNodeLabelManager().replaceLabelsOnNode() Both of these approach are costlier because of following impacts : if number of labels are more than the weight of heartbeat message will be more and if the cluster nodes are more than more network IO checking of labels from Prev state to current state for all nodes is done in RM in earlier method each NM was taking care of it self. Lot of read locks needs to be held @ NodesLabelsManager So based on these points would like to prefer option 1 as its minimal change and anyway we have already modified the request and response (proposed reject labels list) 5. NodeStatusUpdaterImpl: In RM ResourceTracker, if exception raise when replace labels on node, put the new labels to reject node labels to response. In NM NodeStatusUpdater, if reject node labels is not null, LOG.error rejected node labels, and print diagnostic message. you meant "reject node labels is not empty" right based on comment2 null cannot be identified for repeated fields Also as we either accept all labels or reject all labels can we have a flag whether RM accepted the labels or not ? and modify response proto when the NodeLabelsManager Interface changes ? 6) yarn_server_common_service_protos.proto RegisterNodeManagerRequestProto and NodeHeartbeatRequestProto are modified in this file ... diff shows one line after the modification which looks like i have modified RegisterNodeManagerResponseProto 9 It no need to send shutdown message when any of the labels not accepted by RMNodeLabelsManager. Just add them to a reject node labels list, and add diagnostic message should be enough. k this will take care but earlier i tried to send shutdown message because i wanted avoid the scenario where RM is confiugred for CentralNodeLabel and NM for distributed But as per your earlier comment "PB cannot tell difference between null and empty for repeated fields." . So do we need to do address this scenario by adding some boolean flag for DeCentralizedConfigEnable in RegisterNodeManagerRequestProto ?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naga,
          Thanks for working on this patch,
          Comments round #1,

          1) YarnConfiguration:
          I think we should add a DEFAULT_DECENTRALIZED_NODELABEL_CONFIGURATION_ENABLED = false to avoid hardcode the "false" in implementations

          2) NodeHeartbeatRequestPBImpl:
          I just found current PB cannot tell difference between null and empty for repeated fields. And in your implementation, empty set will be always returned no matter the field is not being set or set an empty set.
          So what we defined null for "not changed", empty for "no label" not establish any more.
          What we can do is,

          1. Add a new field in NodeHeartbeatRequest, like "boolean nodeLabelUpdated".
          2. Use the add/removeLabelsOnNodes API provided by RMNodeLabelsManager, everytime pass the changed labels only.
          3. Everytime set the up-to-date labels to NodeHeartbeatRequest when do heartbeat.

          #1 and #2 will all need add more fields in NodeHeartbeatRequest. I suggest to do in #3, it's more simple and we can improve it in further JIRA.

          3) NodeManager:

          +    if (conf.getBoolean(
          +        YarnConfiguration.ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION, false)) {
          

          Instead of hardcode "false" here, we should use "DEFAULT_DECENTRALIZED_NODELABEL_CONFIGURATION_ENABLED" instead.

          + addService((Service) provider);

          Why do this type conversion? I think we don't need it.

          createNodeStatusUpdater

          I suggest to create a overload method without the nodeLabelsProviderService to avoid lots of changes in test/mock classes.

          4) NodeLabelsProviderService:
          It should extends AbstractService, there're some default implementations in AbstractService, we don't need implement all of them.

          5) NodeStatusUpdaterImpl:
          isDecentralizedNodeLabelsConf may not need here, if nodeLablesProvider passed in is null. That means isDecentralizedNodeLabelsConf is false.

          +            nodeLabelsForHeartBeat = null;
          +            if (isDecentralizedNodeLabelsConf) {
          ...
          

          According to my comment 2), I suggest to make it simple – if provider is not null, set NodeHeartbeatRequest.nodeLabels to labels get from provider.

          +            if (nodeLabelsForHeartBeat != null
          +                && response.getDiagnosticsMessage() != null) {
          +              LOG.info("Node Labels were rejected from RM "
          +                  + response.getDiagnosticsMessage());
          +            }
          

          We cannot assume when diagosticMessage is not null, it is the node label rejected. I sugguest to add rejected-node-labels field to RegisterNMResponse and NodeHeartbeatResponse. Existing behavior in RMNodeLabelsManager is, if any of the labels is not valid, all labels will be rejected. What you should do is,

          1. In RM ResourceTracker, if exception raise when replace labels on node, put the new labels to reject node labels to response.
          2. In NM NodeStatusUpdater, if reject node labels is not null, LOG.error rejected node labels, and print diagnostic message.

          As 3) suggested, create an overload constructor to avoid lots of changes in tests.

          6) yarn_server_common_service_protos.proto
          I think you miss adding nodeLabels to RegisterNodeManagerResponseProto, which should be in RegisterNodeManagerRequestProto ?

          7) ConfigurationNodeLabelsProvider:

          +    String[] nodeLabelsFromScript =
          +        StringUtils.getStrings(conf.get(YarnConfiguration.NM_NODE_LABELS_PREFIX, ""));
          
          1. nodeLabelsFromScript -> nodeLabelsFromConfiguration
          2. YarnConfiguration.NM_NODE_LABELS_PREFIX -> add an option like YarnConfiguration.NM_NODE_LABELS_FROM_CONFIG (NM_NODE_LABELS_PREFIX + "from-config") or some name you prefer – At least it shouldn't be a prefix.

          8) TestEventFlow:
          Just pass a null for nodeLabelsProvider not works?

          9) ResourceTrackerService:

          +    isDecentralizedNodeLabelsConf = conf.getBoolean(
          +        YarnConfiguration.ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION, false);
          

          Avoid hardcode config default here as suggested above.

          It no need to send shutdown message when any of the labels not accepted by RMNodeLabelsManager. Just add them to a reject node labels list, and add diagnostic message should be enough.

          +            + ", assigned nodeId " + nodeId + ", node labels { "
          +            + nodeLabels.toString()+" } ";
          

          You should use StringUtils.join when you want to get a set of labels to String, set.toString() not defined

          More comments will be added when you addressed above comments and added tests for them.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naga, Thanks for working on this patch, Comments round #1, 1) YarnConfiguration: I think we should add a DEFAULT_DECENTRALIZED_NODELABEL_CONFIGURATION_ENABLED = false to avoid hardcode the "false" in implementations 2) NodeHeartbeatRequestPBImpl: I just found current PB cannot tell difference between null and empty for repeated fields. And in your implementation, empty set will be always returned no matter the field is not being set or set an empty set. So what we defined null for "not changed", empty for "no label" not establish any more. What we can do is, Add a new field in NodeHeartbeatRequest, like "boolean nodeLabelUpdated". Use the add/removeLabelsOnNodes API provided by RMNodeLabelsManager, everytime pass the changed labels only. Everytime set the up-to-date labels to NodeHeartbeatRequest when do heartbeat. #1 and #2 will all need add more fields in NodeHeartbeatRequest. I suggest to do in #3, it's more simple and we can improve it in further JIRA. 3) NodeManager: + if (conf.getBoolean( + YarnConfiguration.ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION, false )) { Instead of hardcode "false" here, we should use "DEFAULT_DECENTRALIZED_NODELABEL_CONFIGURATION_ENABLED" instead. + addService((Service) provider); Why do this type conversion? I think we don't need it. createNodeStatusUpdater I suggest to create a overload method without the nodeLabelsProviderService to avoid lots of changes in test/mock classes. 4) NodeLabelsProviderService: It should extends AbstractService, there're some default implementations in AbstractService, we don't need implement all of them. 5) NodeStatusUpdaterImpl: isDecentralizedNodeLabelsConf may not need here, if nodeLablesProvider passed in is null. That means isDecentralizedNodeLabelsConf is false. + nodeLabelsForHeartBeat = null ; + if (isDecentralizedNodeLabelsConf) { ... According to my comment 2), I suggest to make it simple – if provider is not null, set NodeHeartbeatRequest.nodeLabels to labels get from provider. + if (nodeLabelsForHeartBeat != null + && response.getDiagnosticsMessage() != null ) { + LOG.info( "Node Labels were rejected from RM " + + response.getDiagnosticsMessage()); + } We cannot assume when diagosticMessage is not null, it is the node label rejected. I sugguest to add rejected-node-labels field to RegisterNMResponse and NodeHeartbeatResponse. Existing behavior in RMNodeLabelsManager is, if any of the labels is not valid, all labels will be rejected. What you should do is, In RM ResourceTracker, if exception raise when replace labels on node, put the new labels to reject node labels to response. In NM NodeStatusUpdater, if reject node labels is not null, LOG.error rejected node labels, and print diagnostic message. As 3) suggested, create an overload constructor to avoid lots of changes in tests. 6) yarn_server_common_service_protos.proto I think you miss adding nodeLabels to RegisterNodeManagerResponseProto , which should be in RegisterNodeManagerRequestProto ? 7) ConfigurationNodeLabelsProvider: + String [] nodeLabelsFromScript = + StringUtils.getStrings(conf.get(YarnConfiguration.NM_NODE_LABELS_PREFIX, "")); nodeLabelsFromScript -> nodeLabelsFromConfiguration YarnConfiguration.NM_NODE_LABELS_PREFIX -> add an option like YarnConfiguration.NM_NODE_LABELS_FROM_CONFIG (NM_NODE_LABELS_PREFIX + "from-config") or some name you prefer – At least it shouldn't be a prefix. 8) TestEventFlow: Just pass a null for nodeLabelsProvider not works? 9) ResourceTrackerService: + isDecentralizedNodeLabelsConf = conf.getBoolean( + YarnConfiguration.ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION, false ); Avoid hardcode config default here as suggested above. It no need to send shutdown message when any of the labels not accepted by RMNodeLabelsManager. Just add them to a reject node labels list, and add diagnostic message should be enough. + + ", assigned nodeId " + nodeId + ", node labels { " + + nodeLabels.toString()+ " } " ; You should use StringUtils.join when you want to get a set of labels to String, set.toString() not defined More comments will be added when you addressed above comments and added tests for them. Thanks, Wangda
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda

          Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface

          rmContext.getNodeLabelManager().replaceLabelsOnNode(labelUpdate); currently throws exception and hence felt that this message should be sent back to NM
          instead of sending the invalid list (which requires interface changes in NodeLabelsManager). So I was thinking of propagating Exception's errorMsg to NM by makiing use of NodeHeartbeatResponse.DiagnosticsMessage. and will log this in NM
          Have handled in this way please check the code ResourceTrackerService and NodeStatusUpdaterIMPL.(also contains other changes which you mentioned yest)

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface rmContext.getNodeLabelManager().replaceLabelsOnNode(labelUpdate); currently throws exception and hence felt that this message should be sent back to NM instead of sending the invalid list (which requires interface changes in NodeLabelsManager). So I was thinking of propagating Exception's errorMsg to NM by makiing use of NodeHeartbeatResponse.DiagnosticsMessage. and will log this in NM Have handled in this way please check the code ResourceTrackerService and NodeStatusUpdaterIMPL.(also contains other changes which you mentioned yest)
          Hide
          leftnoteasy Wangda Tan added a comment -

          But currently i have NodeLabelProvider as service (as per your earlier comments ) and is part of NodeStatus updater, so planning to have some dummy Service which doesn't give any node labels and is static

          I'm not very sure about what's the dummy service used for, is it for test purpose? I think if we don't enable decentralized node label configuration, we don't need initialize such service at all, NodeStatusUpdater will only get labels from it if decentralized enabled. If it is for test purpose, that should be fine

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - But currently i have NodeLabelProvider as service (as per your earlier comments ) and is part of NodeStatus updater, so planning to have some dummy Service which doesn't give any node labels and is static I'm not very sure about what's the dummy service used for, is it for test purpose? I think if we don't enable decentralized node label configuration, we don't need initialize such service at all, NodeStatusUpdater will only get labels from it if decentralized enabled. If it is for test purpose, that should be fine Thanks,
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Tan, Wangda
          "ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION" is fine and will work on that. And will first focus on NM stuffs and ResourceTracker changes in RM. But currently i have NodeLabelProvider as service (as per your earlier comments ) and is part of NodeStatus updater, so planning to have some dummy Service which doesn't give any node labels and is static,

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Tan, Wangda "ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION" is fine and will work on that. And will first focus on NM stuffs and ResourceTracker changes in RM. But currently i have NodeLabelProvider as service (as per your earlier comments ) and is part of NodeStatus updater, so planning to have some dummy Service which doesn't give any node labels and is static,
          Hide
          leftnoteasy Wangda Tan added a comment -

          Naganarasimha G R,
          One comment before you uploading patch:
          I suggest to have an option to indicate if currently is using decentralized node label configuration or not. If it is true, NM will do following steps like create NodeLabelProvider, setup labels in NodeHeadrbeatRequest, etc.
          If you think that is make sense to you, I suggest we can call it "ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION" -> (yarn.node-labels.decentralized-configuration.enabled), or do you have another suggestions?
          And also, that value will be used by RM, RM need do similar things like disable admin change labels on nodes via RM admin CLI, etc. I think you can first focus on NM stuffs and ResourceTracker changes in RM. AdminService related changes can be split to another JIRA.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Naganarasimha G R , One comment before you uploading patch: I suggest to have an option to indicate if currently is using decentralized node label configuration or not. If it is true, NM will do following steps like create NodeLabelProvider, setup labels in NodeHeadrbeatRequest, etc. If you think that is make sense to you, I suggest we can call it "ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION" -> (yarn.node-labels.decentralized-configuration.enabled), or do you have another suggestions? And also, that value will be used by RM, RM need do similar things like disable admin change labels on nodes via RM admin CLI, etc. I think you can first focus on NM stuffs and ResourceTracker changes in RM. AdminService related changes can be split to another JIRA. Thanks, Wangda
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naga,

          you meant NodeHeartBeatResponse right ?

          Yes

          Looking forward your patch.

          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naga, you meant NodeHeartBeatResponse right ? Yes Looking forward your patch. Wangda
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          hi Tan, Wangda
          Actually what i meant was update the HeartBeatResponse abt the labels acceptance by RM and once NodeStatusUpdater gets response(+ve or -ve) from RM then it can set LabelsProvider with approp flag. But your logic seems to be much better because i was handling thread sync unnecessarly in ConfNodeLabelsProvider. Having this logic in Node status updater removes the burden of each type of NodeLabelsProvider to have this sync logic and interface will be simple in NodeLabelsProvider (earlier my thinking was labels should not be handled by NodeStatusUpdater hence kept in nodeLabelsprovider)
          Actually was about the upload the patch with my logic, as its not as per your latest comments i will upload one more by tomorrow afternoon(IST) after correction as per your comments

          Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface

          you meant NodeHeartBeatResponse right ?

          Show
          Naganarasimha Naganarasimha G R added a comment - hi Tan, Wangda Actually what i meant was update the HeartBeatResponse abt the labels acceptance by RM and once NodeStatusUpdater gets response(+ve or -ve) from RM then it can set LabelsProvider with approp flag. But your logic seems to be much better because i was handling thread sync unnecessarly in ConfNodeLabelsProvider. Having this logic in Node status updater removes the burden of each type of NodeLabelsProvider to have this sync logic and interface will be simple in NodeLabelsProvider (earlier my thinking was labels should not be handled by NodeStatusUpdater hence kept in nodeLabelsprovider) Actually was about the upload the patch with my logic, as its not as per your latest comments i will upload one more by tomorrow afternoon(IST) after correction as per your comments Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface you meant NodeHeartBeatResponse right ?
          Hide
          leftnoteasy Wangda Tan added a comment -

          1)

          But was thinking about one sceanario labels got changed and on call to NodeLabelsProvider.getLabels() it returns the new labels but the heartbeat failed due to some reason.

          If heartbeat failed, the resource tracker in NM side cannot get NodeHeartbeatResponse. But I'm thinking another case is, labels reported by NMs can be invalid and rejected by RM. NM should be notified about such cases.

          So I would suggest do this way:

          • Keep getNodeLabels in NodeHeartbeatRequest and RegisterNodeManagerRequest.
          • Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface
          • Add a "lastNodeLabels" in NodeStatusUpdater, it will save last node labels list get from NodeLabelFetcher. And in the while loop of startStatusUpdater, we will check if the new list fetched from NodeLabelFetcher is different from our last node labels list. If different, we will set it, if same, we will skip and set the labels to be null in next heartbeat.

          And the interface of NodeLabelsProvider should be simple, just a getNodeLabels(), NodeStatusUpdater will take care other stuffs.

          2)

          and for If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI will add a jira, but was wondering how to do this ? by configuration with new parameter?

          Yes, we should add a new parameter for it, we may not need have this immediately, but we should have one in the future.

          I was earlier under the impression as MemoryRMNodeLabelsManager => is for distributed Configuration and RMNodeLabelsManager is for Centrallized configuration. and some factory will take care of this....

          Not really, the different between them is one will persist labels to filesystem and one not. We still have to do something for the distributed configuration.

          Any thoughts? Vinod Kumar Vavilapalli

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - 1) But was thinking about one sceanario labels got changed and on call to NodeLabelsProvider.getLabels() it returns the new labels but the heartbeat failed due to some reason. If heartbeat failed, the resource tracker in NM side cannot get NodeHeartbeatResponse. But I'm thinking another case is, labels reported by NMs can be invalid and rejected by RM. NM should be notified about such cases. So I would suggest do this way: Keep getNodeLabels in NodeHeartbeatRequest and RegisterNodeManagerRequest. Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface Add a "lastNodeLabels" in NodeStatusUpdater, it will save last node labels list get from NodeLabelFetcher. And in the while loop of startStatusUpdater , we will check if the new list fetched from NodeLabelFetcher is different from our last node labels list. If different, we will set it, if same, we will skip and set the labels to be null in next heartbeat. And the interface of NodeLabelsProvider should be simple, just a getNodeLabels(), NodeStatusUpdater will take care other stuffs. 2) and for If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI will add a jira, but was wondering how to do this ? by configuration with new parameter? Yes, we should add a new parameter for it, we may not need have this immediately, but we should have one in the future. I was earlier under the impression as MemoryRMNodeLabelsManager => is for distributed Configuration and RMNodeLabelsManager is for Centrallized configuration. and some factory will take care of this.... Not really, the different between them is one will persist labels to filesystem and one not. We still have to do something for the distributed configuration. Any thoughts? Vinod Kumar Vavilapalli Thanks, Wangda
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks for reviewing Wangda :

          2) It seems NM_LABELS_FETCH_INTERVAL_MS not been used in the patch, did you forget to do that?

          – Earlier was planning to make node labels script only to be dynamic and configruation based as static. Now based on your comment 4 will make it dynamic and change the configuration name too.

          3) Regarding ResourceTrackerProtocol, I think NodeHeartbeatRequest should only report labels when labels changed. So there're 3 possible values of node labels in NodeHeartbeatRequest ... And RegisterNodeManagerRequest should report label every time registering.

          – Yes this was my plan and will be doing it in the same way.
          But was thinking about one sceanario labels got changed and on call to NodeLabelsProvider.getLabels() it returns the new labels but the heartbeat failed due to some reason. in that case NodeLabelsProvider will not be able to detect this and on next request to getLabels() it will return null. So we should have some mechanism such that NodeLabelsProvider are informed whether RM accepted the change in labels so that appropriate SET of labels are provided on call to getLabels (also if needed we can have RM Rejected Labels too for logging purpose)
          Planning to have 3 interfaces in NodeLabelsProvider

          • getNodeLabels() : to get the labels which can be used for registration
          • getNodeLabelsOnModify() : to get the labels on modification which can be used for heartbeat
          • rmUpdateNodeLabelsStatus(boolean success) : to indicate that next call to getNodeLabelsOnModify can be reset to null

          4.1 Why this class extends from CompositeService? Did you want to add more component to it? If not, AbstractService should be enough. If the purpose of the NodeLabelsFetcherService is only create a NodeLabelsProvider, and the NodeLabelsProvider will take care of periodically read configuration from yarn-site.xml.I suggest to rename NodeLabelsFetcherService to NodeLabelsProviderFactory, and not extends from any Service, because the NodeLabelsProvider should be a Service. Rename NodeLabelsProvider to NodeLabelsProviderService if your purpose is as what I mentioned.

          – Your idea seems to be better, will try to do it in the way you have specified and hence NodeLabelsFetcherService will become factory or i will make it absolute.
          ConfigurationNodeLabelsProvider : will make it dynamic. i,e. periodically it will read the yarn-site and get the Labels.

          6) More implementation suggestions:
          Since we need central node labels configuration, I suggest to leverage what we already have in RM admin CLI directly – user can use RM admin CLI add/remove node labels. We can disable this when we're ready to do non-central node label configuration.And there should be an option to tell if distributed node label configuration is used. If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI. I suggest to do this in a separated JIRA.

          – I presume "central node labels configuration" as "Cluster Valid Node Labels" stored at RM side for validation of labels if so ok will do it in the same way as that of RM Admin CLI
          and for If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI will add a jira,
          but was wondering how to do this ? by configuration with new parameter? I was earlier under the impression as MemoryRMNodeLabelsManager => is for distributed Configuration and RMNodeLabelsManager is for Centrallized configuration. and some factory will take care of this....

          Other comments will handle

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks for reviewing Wangda : 2) It seems NM_LABELS_FETCH_INTERVAL_MS not been used in the patch, did you forget to do that? – Earlier was planning to make node labels script only to be dynamic and configruation based as static. Now based on your comment 4 will make it dynamic and change the configuration name too. 3) Regarding ResourceTrackerProtocol, I think NodeHeartbeatRequest should only report labels when labels changed. So there're 3 possible values of node labels in NodeHeartbeatRequest ... And RegisterNodeManagerRequest should report label every time registering. – Yes this was my plan and will be doing it in the same way. But was thinking about one sceanario labels got changed and on call to NodeLabelsProvider.getLabels() it returns the new labels but the heartbeat failed due to some reason. in that case NodeLabelsProvider will not be able to detect this and on next request to getLabels() it will return null. So we should have some mechanism such that NodeLabelsProvider are informed whether RM accepted the change in labels so that appropriate SET of labels are provided on call to getLabels (also if needed we can have RM Rejected Labels too for logging purpose) Planning to have 3 interfaces in NodeLabelsProvider getNodeLabels() : to get the labels which can be used for registration getNodeLabelsOnModify() : to get the labels on modification which can be used for heartbeat rmUpdateNodeLabelsStatus(boolean success) : to indicate that next call to getNodeLabelsOnModify can be reset to null 4.1 Why this class extends from CompositeService? Did you want to add more component to it? If not, AbstractService should be enough. If the purpose of the NodeLabelsFetcherService is only create a NodeLabelsProvider, and the NodeLabelsProvider will take care of periodically read configuration from yarn-site.xml.I suggest to rename NodeLabelsFetcherService to NodeLabelsProviderFactory, and not extends from any Service, because the NodeLabelsProvider should be a Service. Rename NodeLabelsProvider to NodeLabelsProviderService if your purpose is as what I mentioned. – Your idea seems to be better, will try to do it in the way you have specified and hence NodeLabelsFetcherService will become factory or i will make it absolute. ConfigurationNodeLabelsProvider : will make it dynamic. i,e. periodically it will read the yarn-site and get the Labels. 6) More implementation suggestions: Since we need central node labels configuration, I suggest to leverage what we already have in RM admin CLI directly – user can use RM admin CLI add/remove node labels. We can disable this when we're ready to do non-central node label configuration.And there should be an option to tell if distributed node label configuration is used. If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI. I suggest to do this in a separated JIRA. – I presume "central node labels configuration" as "Cluster Valid Node Labels" stored at RM side for validation of labels if so ok will do it in the same way as that of RM Admin CLI and for If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI will add a jira, but was wondering how to do this ? by configuration with new parameter? I was earlier under the impression as MemoryRMNodeLabelsManager => is for distributed Configuration and RMNodeLabelsManager is for Centrallized configuration. and some factory will take care of this.... Other comments will handle
          Hide
          leftnoteasy Wangda Tan added a comment -

          1) Please remove all script related configurations in YarnConfiguration

          2) It seems NM_LABELS_FETCH_INTERVAL_MS not been used in the patch, did you forget to do that?

          3) Regarding ResourceTrackerProtocol, I think NodeHeartbeatRequest should only report labels when labels changed. So there're 3 possible values of node labels in NodeHeartbeatRequest:
          1. null: labels not changed
          2. empty array: no label on this node
          3. non-empty array: labels on this node
          And RegisterNodeManagerRequest should report label every time registering.

          4) NodeLabelsFetcherService:
          4.1 Why this class extends from CompositeService? Did you want to add more component to it? If not, AbstractService should be enough. If the purpose of the NodeLabelsFetcherService is only create a NodeLabelsProvider, and the NodeLabelsProvider will take care of periodically read configuration from yarn-site.xml. I suggest to rename NodeLabelsFetcherService to NodeLabelsProviderFactory, and not extends from any Service, because the NodeLabelsProvider should be a Service. Rename NodeLabelsProvider to NodeLabelsProviderService if your purpose is as what I mentioned.

          4.2 This should be located in org.apache.hadoop.yarn.server.nodemanager.nodelabels, and rename "org.apache.hadoop.yarn.server.nodemanager.nodelabel" to "org.apache.hadoop.yarn.server.nodemanager.nodelabels" in other class definition

          5) ConfigurationNodeLabelsProvider
          This should be an independent class and located in a individual file

          6) More implementation suggestions:
          Since we need central node labels configuration, I suggest to leverage what we already have in RM admin CLI directly – user can use RM admin CLI add/remove node labels. We can disable this when we're ready to do non-central node label configuration.
          And there should be an option to tell if distributed node label configuration is used. If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI. I suggest to do this in a separated JIRA.

          Show
          leftnoteasy Wangda Tan added a comment - 1) Please remove all script related configurations in YarnConfiguration 2) It seems NM_LABELS_FETCH_INTERVAL_MS not been used in the patch, did you forget to do that? 3) Regarding ResourceTrackerProtocol, I think NodeHeartbeatRequest should only report labels when labels changed. So there're 3 possible values of node labels in NodeHeartbeatRequest: 1. null: labels not changed 2. empty array: no label on this node 3. non-empty array: labels on this node And RegisterNodeManagerRequest should report label every time registering. 4) NodeLabelsFetcherService: 4.1 Why this class extends from CompositeService? Did you want to add more component to it? If not, AbstractService should be enough. If the purpose of the NodeLabelsFetcherService is only create a NodeLabelsProvider, and the NodeLabelsProvider will take care of periodically read configuration from yarn-site.xml. I suggest to rename NodeLabelsFetcherService to NodeLabelsProviderFactory, and not extends from any Service, because the NodeLabelsProvider should be a Service. Rename NodeLabelsProvider to NodeLabelsProviderService if your purpose is as what I mentioned. 4.2 This should be located in org.apache.hadoop.yarn.server.nodemanager.nodelabels, and rename "org.apache.hadoop.yarn.server.nodemanager.nodelabel" to "org.apache.hadoop.yarn.server.nodemanager.nodelabels" in other class definition 5) ConfigurationNodeLabelsProvider This should be an independent class and located in a individual file 6) More implementation suggestions: Since we need central node labels configuration, I suggest to leverage what we already have in RM admin CLI directly – user can use RM admin CLI add/remove node labels. We can disable this when we're ready to do non-central node label configuration. And there should be an option to tell if distributed node label configuration is used. If it's distributed, AdminService should disable admin change labels on nodes via RM admin CLI. I suggest to do this in a separated JIRA.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          As per Wangda's comments
          1> raised new jira for "disable central node label configuration"
          2> removed modifications in the current jira for the CommonNodeLabelManager
          3> ScriptNodeLabelProvider in separate jira for better review

          Currently attached WIP(Earlier patch bifurcated into 2 jiras YARN-2495 and YARN-2729). Will update with actual patch at the earliest.

          Show
          Naganarasimha Naganarasimha G R added a comment - As per Wangda's comments 1> raised new jira for "disable central node label configuration" 2> removed modifications in the current jira for the CommonNodeLabelManager 3> ScriptNodeLabelProvider in separate jira for better review Currently attached WIP(Earlier patch bifurcated into 2 jiras YARN-2495 and YARN-2729 ). Will update with actual patch at the earliest.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Naganarasimha G R, Allen Wittenauer, let me first give you an overview about what we need to do to support labels in capacity scheduler, that will help you better understanding why we need central node label validation now.
          In existing capacity scheduler (patch of YARN-2496), we can support specify what labels of each queue can access (to make sure important resource can only be used by privileged users), and proportion of resource on label ("marketing" queue can access 80% of GPU resource). Now if user want to leverage change of capacity scheduler, user MUST specify 1) labels can be accessed by the queue and 2) proportion of resource can be accessed by a queue of each label.
          Back to the central node label validation discussion, without this, we cannot get capacity scheduler work for now. (user cannot specify capacity for a unknown node-label for a queue, etc.). So I still insist to have central node label valication for both centralized/distribtued node label configuration at least for 2.6 release. This might be changed in the future, I suggest to move disable central node label configuration to a separated task for further discussions.

          And I've looked at patch uploaded by Naganarasimha G R, thanks for this WIP patch, took a quick glance at the patch, several suggestions on this patch:

          • According to above comments, do not change CommonNodeLabelsManager, move the changes to disable central node label validation to a separated patch for further discussion.
          • Make this patch contains a NodeLabelProvider only and create separate JIRA for ScriptNodeLabelProvider and an implementation to read node label from yarn-site.xml for easier review.
          Show
          leftnoteasy Wangda Tan added a comment - Naganarasimha G R , Allen Wittenauer , let me first give you an overview about what we need to do to support labels in capacity scheduler, that will help you better understanding why we need central node label validation now. In existing capacity scheduler (patch of YARN-2496 ), we can support specify what labels of each queue can access (to make sure important resource can only be used by privileged users), and proportion of resource on label ("marketing" queue can access 80% of GPU resource). Now if user want to leverage change of capacity scheduler, user MUST specify 1) labels can be accessed by the queue and 2) proportion of resource can be accessed by a queue of each label. Back to the central node label validation discussion, without this, we cannot get capacity scheduler work for now. (user cannot specify capacity for a unknown node-label for a queue, etc.). So I still insist to have central node label valication for both centralized/distribtued node label configuration at least for 2.6 release. This might be changed in the future, I suggest to move disable central node label configuration to a separated task for further discussions. And I've looked at patch uploaded by Naganarasimha G R , thanks for this WIP patch, took a quick glance at the patch, several suggestions on this patch: According to above comments, do not change CommonNodeLabelsManager , move the changes to disable central node label validation to a separated patch for further discussion. Make this patch contains a NodeLabelProvider only and create separate JIRA for ScriptNodeLabelProvider and an implementation to read node label from yarn-site.xml for easier review.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          This is very useful to get in for 2.6, Wangda Tan/Naganarasimha G R how feasible is it?

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - This is very useful to get in for 2.6, Wangda Tan / Naganarasimha G R how feasible is it?
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi All,
          Uploading a WIP patch, just to share the approach...
          Completed

          • User can set labels in each NM (by setting yarn-site.xml or using script suggested by Allen Wittenauer)
          • NM will send labels to RM via ResourceTracker API
          • RM will set labels in NodeLabelManager when NM register/update labels

          Pending :

          • No test cases written. and may be test cases modified to resolve compilation issues can be done in a better way.
          • As per the design Doc there was requirement to specifically either support distributed or Centralized but was not sure how to get it done as current design does not seem to be specific to central or distributed and class was configured to identify NodeLabelsManager,
          • Currently i have not completely ensured that Node Labels are sent only when there is change in the labels got from script to the last successful updated labels to RM. Response of node heartbeat and node register can be made used for this. Yet to finish.
          • Configuration has been added to validate for centralized labels. But need to further discuss on the approach for this.
          Show
          Naganarasimha Naganarasimha G R added a comment - Hi All, Uploading a WIP patch, just to share the approach... Completed User can set labels in each NM (by setting yarn-site.xml or using script suggested by Allen Wittenauer) NM will send labels to RM via ResourceTracker API RM will set labels in NodeLabelManager when NM register/update labels Pending : No test cases written. and may be test cases modified to resolve compilation issues can be done in a better way. As per the design Doc there was requirement to specifically either support distributed or Centralized but was not sure how to get it done as current design does not seem to be specific to central or distributed and class was configured to identify NodeLabelsManager, Currently i have not completely ensured that Node Labels are sent only when there is change in the labels got from script to the last successful updated labels to RM. Response of node heartbeat and node register can be made used for this. Yet to finish. Configuration has been added to validate for centralized labels. But need to further discuss on the approach for this.
          Hide
          aw Allen Wittenauer added a comment -

          while modifying the script he will be able to configure the valid labels too.

          The script can be updated independently of changing the running configuration files. Changing the xml comfigs will also require a coordinated reconfigure of the RM. That isn't realistic, especially for things such as rolling upgrades. HARM, of course, makes the situation even worse. Additionally, I'm sure the label validation code will spam the RM logs every time it gets an invalid label, which is pretty much a "please fill the log directory" action.

          The only scenario I can think of where label validation has a practical use is if AMs and/or containers are allowed to inject labels. But that should be a different control structure altogether and have zero impact on administrator controlled labels.

          Seems like maintenance wise it might become difficult for example,

          Label validation actually makes your example worse because now the labels disappear completely. Is it a problem with the script or is it a problem with the label definition?

          i feel centralized Label validation can be made configurable. Please provide opinion on this.

          Just disable it completely. I'm still waiting to hear what practical application this bug would have.

          Show
          aw Allen Wittenauer added a comment - while modifying the script he will be able to configure the valid labels too. The script can be updated independently of changing the running configuration files. Changing the xml comfigs will also require a coordinated reconfigure of the RM. That isn't realistic, especially for things such as rolling upgrades. HARM, of course, makes the situation even worse. Additionally, I'm sure the label validation code will spam the RM logs every time it gets an invalid label, which is pretty much a "please fill the log directory" action. The only scenario I can think of where label validation has a practical use is if AMs and/or containers are allowed to inject labels. But that should be a different control structure altogether and have zero impact on administrator controlled labels. Seems like maintenance wise it might become difficult for example, Label validation actually makes your example worse because now the labels disappear completely. Is it a problem with the script or is it a problem with the label definition? i feel centralized Label validation can be made configurable. Please provide opinion on this. Just disable it completely. I'm still waiting to hear what practical application this bug would have.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Hi Allen Wittenauer ,

          I don't think you understand the use case at all. In fact, it's clear you need to re-read the sample script. It does not get updated with every new JDK. It's smart enough to update the label regardless of the JDK that is installed...

          I meant like script needs to be modified for new label set for example, currently admin as configured for JDK Labels and further if he wanted to add label related to some native lib version, As admin will knows all the valid native lib versions in the system(or can be automated to get this list) while modifying the script he will be able to configure the valid labels too.

          which means the only friction to operations is point is going to be updating this 'valid label list' on the RM.

          Seems like maintenance wise it might become difficult for example, once the valid JDK labels are loaded admins will forget about this feature and later on based on the req, some other admin/person might update the JDK and he might not be aware about such a script exists which updates the labels based on JDK or native libs version. So he might miss to update the valid labels and that node might not be useful or wrong labels will will be tagged to it as new labels are not updated.

          So i feel Allen's scenario needs to be addressed. As Tan, Wangda suggested i feel centralized Label validation can be made configurable. Please provide opinion on this.

          Show
          Naganarasimha Naganarasimha G R added a comment - Hi Allen Wittenauer , I don't think you understand the use case at all. In fact, it's clear you need to re-read the sample script. It does not get updated with every new JDK. It's smart enough to update the label regardless of the JDK that is installed... I meant like script needs to be modified for new label set for example, currently admin as configured for JDK Labels and further if he wanted to add label related to some native lib version, As admin will knows all the valid native lib versions in the system(or can be automated to get this list) while modifying the script he will be able to configure the valid labels too . which means the only friction to operations is point is going to be updating this 'valid label list' on the RM. Seems like maintenance wise it might become difficult for example, once the valid JDK labels are loaded admins will forget about this feature and later on based on the req, some other admin/person might update the JDK and he might not be aware about such a script exists which updates the labels based on JDK or native libs version. So he might miss to update the valid labels and that node might not be useful or wrong labels will will be tagged to it as new labels are not updated. So i feel Allen's scenario needs to be addressed. As Tan, Wangda suggested i feel centralized Label validation can be made configurable. Please provide opinion on this.
          Hide
          aw Allen Wittenauer added a comment -

          I understood the use case but what i did not understand is how would it restrict/deter a user, as he can do one more updation ; one more label to the central valid label list, like java version or jdk version etc. As anyway script will be written/updated to get specific set of labels so i feel in most cases admin can know what lables will be coming in the cluster. Any other use case where it will be difficult for admin to list the labels before hand ?

          I don't think you understand the use case at all. In fact, it's clear you need to re-read the sample script. It does not get updated with every new JDK. It's smart enough to update the label regardless of the JDK that is installed... which means the only friction to operations is point is going to be updating this 'valid label list' on the RM.

          Show
          aw Allen Wittenauer added a comment - I understood the use case but what i did not understand is how would it restrict/deter a user, as he can do one more updation ; one more label to the central valid label list, like java version or jdk version etc. As anyway script will be written/updated to get specific set of labels so i feel in most cases admin can know what lables will be coming in the cluster. Any other use case where it will be difficult for admin to list the labels before hand ? I don't think you understand the use case at all. In fact, it's clear you need to re-read the sample script. It does not get updated with every new JDK. It's smart enough to update the label regardless of the JDK that is installed... which means the only friction to operations is point is going to be updating this 'valid label list' on the RM.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Thanks Allen Wittenauer,Tan, Wangda & Sunil G,

          For Allen Wittenauer comments:

          I don't fully understand your question, but I'll admit I've been distracted with other JIRAs lately.

          You got first part of my question right and thanks for detailing the scenario

          If we are rolling out a new version of the JDK, I shouldn't have to tell the system that it's ok to broadcast that JDK version first.

          I understood the use case but what i did not understand is how would it restrict/deter a user, as he can do one more updation ; one more label to the central valid label list, like java version or jdk version etc. As anyway script will be written/updated to get specific set of labels so i feel in most cases admin can know what lables will be coming in the cluster. Any other use case where it will be difficult for admin to list the labels before hand ?

          For Tan, Wangda comments:

          they will be reported to RM when NM registration. We may not need to persist any of them, but RM should know these labels existence to do scheduling.

          Does RM needs to have all the list of valid labels even before registering of all nodes are done ? How will it impact scheduling ? How is it different from Central configuration ? As in centralized config; user needs to update the new labels and then send the node to lables mapping . similarly in distributed config first we can find out the new labels and update the super set list of lables and then update the label mapping for a node which wants update or modify labels.

          Another question is if we need check labels when they registering, I prefer to pre-set them because this affects scheduling behavior. For example, the maximum-resource and minimum-resource are setup in RM side, and RackResolver is also run in RM side

          May be this i did not get it correctly. You mean when NM is registering for the first time after startup, you want it have preset apart from what is read from NM's yarn-site.xml/script ? did not get this clearly please elaborate.

          At least, the label checking should be kept configurable in distributed mode. – "just ignore all the labels for that node if invalid labels exists" might be a good way when it enabled.

          in your earlier stmt you said it affects scheduling, if so then if its kept configurable then how will that solve ? But what was clear was

          • Support to add and remove Valid label and centralized level is required
          • RM will do the label validation on NM registraion & heartbeat
          • If while validating (during NM registraion & heartbeat) if one of the labels fail for a given node. then we will just ignore all the labels for that node.

          For Sunil G comments:

          If any such node label is invalid as per RM, then how this will be reported back to NM? Error Handling?

          I too have the same doubt and feel that usability will be reduced as script is executed some where and the validations are happening some where, if error is not propagated back to NM.

          But imagine a 1000 node cluster, and then with changing labels per heartbeat, will this be a bottleneck?

          we will not be changing label for every heart beat i will try to ensure that during heartbeat only if the labels have changed from previous set of labels for a node only then it will send the updated label set. But issue will be there that lot of contention will happen suppose some script is modified and all 2000 nodes want to update their labels

          Show
          Naganarasimha Naganarasimha G R added a comment - Thanks Allen Wittenauer , Tan, Wangda & Sunil G , For Allen Wittenauer comments: I don't fully understand your question, but I'll admit I've been distracted with other JIRAs lately. You got first part of my question right and thanks for detailing the scenario If we are rolling out a new version of the JDK, I shouldn't have to tell the system that it's ok to broadcast that JDK version first. I understood the use case but what i did not understand is how would it restrict/deter a user, as he can do one more updation ; one more label to the central valid label list, like java version or jdk version etc. As anyway script will be written/updated to get specific set of labels so i feel in most cases admin can know what lables will be coming in the cluster. Any other use case where it will be difficult for admin to list the labels before hand ? For Tan, Wangda comments: they will be reported to RM when NM registration. We may not need to persist any of them, but RM should know these labels existence to do scheduling. Does RM needs to have all the list of valid labels even before registering of all nodes are done ? How will it impact scheduling ? How is it different from Central configuration ? As in centralized config; user needs to update the new labels and then send the node to lables mapping . similarly in distributed config first we can find out the new labels and update the super set list of lables and then update the label mapping for a node which wants update or modify labels. Another question is if we need check labels when they registering, I prefer to pre-set them because this affects scheduling behavior. For example, the maximum-resource and minimum-resource are setup in RM side, and RackResolver is also run in RM side May be this i did not get it correctly. You mean when NM is registering for the first time after startup, you want it have preset apart from what is read from NM's yarn-site.xml/script ? did not get this clearly please elaborate. At least, the label checking should be kept configurable in distributed mode. – "just ignore all the labels for that node if invalid labels exists" might be a good way when it enabled. in your earlier stmt you said it affects scheduling, if so then if its kept configurable then how will that solve ? But what was clear was Support to add and remove Valid label and centralized level is required RM will do the label validation on NM registraion & heartbeat If while validating (during NM registraion & heartbeat) if one of the labels fail for a given node. then we will just ignore all the labels for that node. For Sunil G comments: If any such node label is invalid as per RM, then how this will be reported back to NM? Error Handling? I too have the same doubt and feel that usability will be reduced as script is executed some where and the validations are happening some where, if error is not propagated back to NM. But imagine a 1000 node cluster, and then with changing labels per heartbeat, will this be a bottleneck? we will not be changing label for every heart beat i will try to ensure that during heartbeat only if the labels have changed from previous set of labels for a node only then it will send the updated label set. But issue will be there that lot of contention will happen suppose some script is modified and all 2000 nodes want to update their labels
          Hide
          sunilg Sunil G added a comment -

          Hi All.

          I have a doubt here,

          1. In distributed configuration, each NM can specify label in register/heartbeat(update). I am not sure check for "Valid Label" to happen in RM or NM. As per the current design, it looks like all valid checks are happening at RM.
          If any such node label is invalid as per RM, then how this will be reported back to NM? Error Handling?

          2. If possible to change label at run time from NM, i think same existing interfaces are used (heartbeat). Do you feel this check will happen may be more frequent in RM than in a Centralized configuration? In centralized config, some command will be fired by admin to change labels. This may not be frequent. But imagine a 1000 node cluster, and then with changing labels per heartbeat, will this be a bottleneck?

          Show
          sunilg Sunil G added a comment - Hi All. I have a doubt here, 1. In distributed configuration, each NM can specify label in register/heartbeat(update). I am not sure check for "Valid Label" to happen in RM or NM. As per the current design, it looks like all valid checks are happening at RM. If any such node label is invalid as per RM, then how this will be reported back to NM? Error Handling? 2. If possible to change label at run time from NM, i think same existing interfaces are used (heartbeat). Do you feel this check will happen may be more frequent in RM than in a Centralized configuration? In centralized config, some command will be fired by admin to change labels. This may not be frequent. But imagine a 1000 node cluster, and then with changing labels per heartbeat, will this be a bottleneck?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Naganarasimha G R,
          First I think need to take care is, distributed configuration should be a way to input and store node labels configuration. In centralized configuration, user can input labels and node to labels mappings by REST API and RM admin CLI, and they will be persist to some file system like HDFS/ZK.

          For distributed configuration, node labels are input by NMs (either by set yarn-site.xml or some script), and they will be reported to RM when NM registration. We may not need to persist any of them, but RM should know these labels existence to do scheduling.

          Another question is if we need check labels when they registering, I prefer to pre-set them because this affects scheduling behavior. For example, the maximum-resource and minimum-resource are setup in RM side, and RackResolver is also run in RM side. At least, the label checking should be kept configurable in distributed mode. – "just ignore all the labels for that node if invalid labels exists" might be a good way when it enabled.

          For implementation, I suggest you can take a look at YARN-2494 and YARN-2496/YARN-2500, the distributed configuration should be an extension of NodeLabelsManager, and YARN-2496/YARN-2500 shouldn't be modified too much to support the distributed mode in scheduling.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Hi Naganarasimha G R , First I think need to take care is, distributed configuration should be a way to input and store node labels configuration. In centralized configuration, user can input labels and node to labels mappings by REST API and RM admin CLI, and they will be persist to some file system like HDFS/ZK. For distributed configuration, node labels are input by NMs (either by set yarn-site.xml or some script), and they will be reported to RM when NM registration. We may not need to persist any of them, but RM should know these labels existence to do scheduling. Another question is if we need check labels when they registering, I prefer to pre-set them because this affects scheduling behavior. For example, the maximum-resource and minimum-resource are setup in RM side, and RackResolver is also run in RM side. At least, the label checking should be kept configurable in distributed mode. – "just ignore all the labels for that node if invalid labels exists" might be a good way when it enabled. For implementation, I suggest you can take a look at YARN-2494 and YARN-2496 / YARN-2500 , the distributed configuration should be an extension of NodeLabelsManager, and YARN-2496 / YARN-2500 shouldn't be modified too much to support the distributed mode in scheduling. Thanks, Wangda
          Hide
          aw Allen Wittenauer added a comment -

          I don't fully understand your question, but I'll admit I've been distracted with other JIRAs lately. Hopefully I'm guessing correctly though...

          The idea that there are 'valid' labels and that one turns them on/off seems to conflict with the goals that I want to see fulfilled with this feature. Let me give a more concrete example, since I know a lot of people are confused as to why labels should be dynamic anyway:

          Here's node A's configuration:

          $ hadoop version
          Hadoop 2.6.0
          Subversion http://github.com/apache/hadoop -r 1e6d81a8869ceeb2f0f81f2ee4b89833f2b22cd4
          Compiled by aw on 2014-10-10T19:31Z
          Compiled with protoc 2.5.0
          From source with checksum 201dad5b10939faa6e5841378c8c94
          This command was run using /sw/hadoop/hadoop-2.6.0-SNAPSHOT/share/hadoop/common/hadoop-common-2.6.0-SNAPSHOT.jar
          $ java -version
          java version "1.6.0_32"
          OpenJDK Runtime Environment (IcedTea6 1.13.4) (rhel-7.1.13.4.el6_5-x86_64)
          OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)
          

          Here's node B's configuration:

          $ ~/HADOOP/hadoop-3.0.0-SNAPSHOT/bin/hadoop version
          Hadoop 3.0.0
          Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r 7b29f99ad23b2a87eac17fdcc7b5b29cd6c9b0c0
          Compiled by aw on 2014-10-08T21:12Z
          Compiled with protoc 2.5.0
          From source with checksum ae703b1a38a35d19f4584495dc31944
          This command was run using /Users/aw/HADOOP/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT.jar
          $ java -version
          java version "1.7.0_67"
          Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
          Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
          

          (Ignore the fact that they probably can't be in the same cluster. That's not really relevant.)

          I want to provide a script similar to this one:

          #!/usr/bin/env bash
          
          HADOOPVERSION=$(${HADOOP_PREFIX}/bin/hadoop version 2>/dev/null | /usr/bin/head -1 |/usr/bin/cut -f2 -d\ )
          JAVAVERSION=$(${JAVA_HOME}/bin/java -version 2>&1 |/usr/bin/head -1| /usr/bin/cut -f2 -d\" )
          JAVAMMM=$(echo ${JAVAVERSION}| /usr/bin/cut -f1 -d_)
          
          echo "LABELS=jdk${JAVAVERSION},jdk${JAVAMMM},hadoop_${HADOOPVERSION}"
          

          This script, when run, should tell the RM that:

          Node A has labels 1.6.0_32, jdk1.6.0, and hadoop_2.6.0 .
          Node B has labels LABELS=jdk1.7.0_67, jdk1.7.0, and hadoop_3.0.0

          Users should be able to submit a job that specifies that it only runs with 'hadoop_3.0.0'. Or that it requires 'jdk_1.7.0'.

          Making labels either valid or invalid based upon a central configuration defeats the purpose of having a living, breathing cluster able to adapt to change without operations intervention. If we are rolling out a new version of the JDK, I shouldn't have to tell the system that it's ok to broadcast that JDK version first.

          Show
          aw Allen Wittenauer added a comment - I don't fully understand your question, but I'll admit I've been distracted with other JIRAs lately. Hopefully I'm guessing correctly though... The idea that there are 'valid' labels and that one turns them on/off seems to conflict with the goals that I want to see fulfilled with this feature. Let me give a more concrete example, since I know a lot of people are confused as to why labels should be dynamic anyway: Here's node A's configuration: $ hadoop version Hadoop 2.6.0 Subversion http: //github.com/apache/hadoop -r 1e6d81a8869ceeb2f0f81f2ee4b89833f2b22cd4 Compiled by aw on 2014-10-10T19:31Z Compiled with protoc 2.5.0 From source with checksum 201dad5b10939faa6e5841378c8c94 This command was run using /sw/hadoop/hadoop-2.6.0-SNAPSHOT/share/hadoop/common/hadoop-common-2.6.0-SNAPSHOT.jar $ java -version java version "1.6.0_32" OpenJDK Runtime Environment (IcedTea6 1.13.4) (rhel-7.1.13.4.el6_5-x86_64) OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode) Here's node B's configuration: $ ~/HADOOP/hadoop-3.0.0-SNAPSHOT/bin/hadoop version Hadoop 3.0.0 Source code repository https: //git-wip-us.apache.org/repos/asf/hadoop.git -r 7b29f99ad23b2a87eac17fdcc7b5b29cd6c9b0c0 Compiled by aw on 2014-10-08T21:12Z Compiled with protoc 2.5.0 From source with checksum ae703b1a38a35d19f4584495dc31944 This command was run using /Users/aw/HADOOP/hadoop-3.0.0-SNAPSHOT/share/hadoop/common/hadoop-common-3.0.0-SNAPSHOT.jar $ java -version java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) (Ignore the fact that they probably can't be in the same cluster. That's not really relevant.) I want to provide a script similar to this one: #!/usr/bin/env bash HADOOPVERSION=$(${HADOOP_PREFIX}/bin/hadoop version 2>/dev/ null | /usr/bin/head -1 |/usr/bin/cut -f2 -d\ ) JAVAVERSION=$(${JAVA_HOME}/bin/java -version 2>&1 |/usr/bin/head -1| /usr/bin/cut -f2 -d\" ) JAVAMMM=$(echo ${JAVAVERSION}| /usr/bin/cut -f1 -d_) echo "LABELS=jdk${JAVAVERSION},jdk${JAVAMMM},hadoop_${HADOOPVERSION}" This script, when run, should tell the RM that: Node A has labels 1.6.0_32, jdk1.6.0, and hadoop_2.6.0 . Node B has labels LABELS=jdk1.7.0_67, jdk1.7.0, and hadoop_3.0.0 Users should be able to submit a job that specifies that it only runs with 'hadoop_3.0.0'. Or that it requires 'jdk_1.7.0'. Making labels either valid or invalid based upon a central configuration defeats the purpose of having a living, breathing cluster able to adapt to change without operations intervention. If we are rolling out a new version of the JDK, I shouldn't have to tell the system that it's ok to broadcast that JDK version first.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          hi Tan, Wangda & Allen Wittenauer ,
          Few queries :
          Some Requirements which make more sense for Centralized Configuration and not for Distributed configuration

          • ADD_LABELS (add valid labels which can be assigned to nodes of cluster)
          • REMOVE_LABELS (Remove from valid labels of cluster)

          As by configuration in yarnsite or dynamic scripting we are setting the labels for each Node, My opinion is to not support for distributed configuration for above cluster labels.
          But If we require to support this

          • as the valid lables information is only available with the RM, after the node sends heartbeat RM retains only valid labels for that node( log if any invalid labels) or we can just ignore all the labels for that node if invalid labels exists
          • We need to store the valid cluster node lables in the file but need not store labels for cluster nodes as during every registration nodes label can be got.
          Show
          Naganarasimha Naganarasimha G R added a comment - hi Tan, Wangda & Allen Wittenauer , Few queries : Some Requirements which make more sense for Centralized Configuration and not for Distributed configuration ADD_LABELS (add valid labels which can be assigned to nodes of cluster) REMOVE_LABELS (Remove from valid labels of cluster) As by configuration in yarnsite or dynamic scripting we are setting the labels for each Node, My opinion is to not support for distributed configuration for above cluster labels. But If we require to support this as the valid lables information is only available with the RM, after the node sends heartbeat RM retains only valid labels for that node( log if any invalid labels) or we can just ignore all the labels for that node if invalid labels exists We need to store the valid cluster node lables in the file but need not store labels for cluster nodes as during every registration nodes label can be got.

            People

            • Assignee:
              Naganarasimha Naganarasimha G R
              Reporter:
              leftnoteasy Wangda Tan
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development