Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5734

OrgQueue for easy CapacityScheduler queue configuration management

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      <!-- markdown -->

      The OrgQueue extension to the capacity scheduler provides a programmatic way to change configurations by providing a REST API that users can call to modify queue configurations. This enables automation of queue configuration management by administrators in the queue's `administer_queue` ACL.
      Show
      <!-- markdown --> The OrgQueue extension to the capacity scheduler provides a programmatic way to change configurations by providing a REST API that users can call to modify queue configurations. This enables automation of queue configuration management by administrators in the queue's `administer_queue` ACL.

      Description

      The current xml based configuration mechanism in CapacityScheduler makes it very inconvenient to apply any changes to the queue configurations. We saw 2 main drawbacks in the file based configuration mechanism:

      1. This makes it very inconvenient to automate queue configuration updates. For example, in our cluster setup, we leverage the queue mapping feature from YARN-2411 to route users to their dedicated organization queues. It could be extremely cumbersome to keep updating the config file to manage the very dynamic mapping between users to organizations.
      2. Even a user has the admin permission on one specific queue, that user is unable to make any queue configuration changes to resize the subqueues, changing queue ACLs, or creating new queues. All these operations need to be performed in a centralized manner by the cluster administrators.

      With these current limitations, we realized the need of a more flexible configuration mechanism that allows queue configurations to be stored and managed more dynamically. We developed the feature internally at LinkedIn which introduces the concept of MutableConfigurationProvider. What it essentially does is to provide a set of configuration mutation APIs that allows queue configurations to be updated externally with a set of REST APIs. When performing the queue configuration changes, the queue ACLs will be honored, which means only queue administrators can make configuration changes to a given queue. MutableConfigurationProvider is implemented as a pluggable interface, and we have one implementation of this interface which is based on Derby embedded database.

      This feature has been deployed at LinkedIn's Hadoop cluster for a year now, and have gone through several iterations of gathering feedbacks from users and improving accordingly. With this feature, cluster administrators are able to automate lots of thequeue configuration management tasks, such as setting the queue capacities to adjust cluster resources between queues based on established resource consumption patterns, or managing updating the user to queue mappings. We have attached our design documentation with this ticket and would like to receive feedbacks from the community regarding how to best integrate it with the latest version of YARN.

      1. OrgQueue_Design_v0.pdf
        220 kB
        Jonathan Hung
      2. OrgQueue_API-Based_Config_Management_v1.pdf
        115 kB
        Jonathan Hung
      3. OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf
        117 kB
        Jonathan Hung
      4. YARN-5734-YARN-5734.001.patch
        127 kB
        Jonathan Hung

        Issue Links

        1.
        Create YarnConfigurationStore interface and InMemoryConfigurationStore class Sub-task Resolved Jonathan Hung
         
        2.
        Create LeveldbConfigurationStore class using Leveldb as backing store Sub-task Resolved Jonathan Hung
         
        3.
        Implement MutableConfigurationManager for handling storage into configuration store Sub-task Resolved Jonathan Hung
         
        4.
        Add pluggable configuration ACL policy interface and implementation Sub-task Resolved Jonathan Hung
         
        5.
        Create StoreConfigurationProvider to construct a Configuration from the backing store Sub-task Resolved Unassigned
         
        6.
        Changes to allow CapacityScheduler to use configuration store Sub-task Resolved Jonathan Hung
         
        7.
        Create REST API for changing YARN scheduler configurations Sub-task Resolved Jonathan Hung
         
        8.
        Create CLI for changing YARN configurations Sub-task Resolved Jonathan Hung
         
        9.
        Implement a CapacityScheduler policy for configuration changes Sub-task Resolved Jonathan Hung
         
        10.
        Protocol for scheduler configuration changes between client and RM Sub-task Resolved Jonathan Hung
         
        11.
        Disable queue refresh when configuration mutation is enabled Sub-task Resolved Jonathan Hung
         
        12.
        Support global configuration mutation in MutableConfProvider Sub-task Resolved Jonathan Hung
         
        13. Support for adding and removing queue mappings Sub-task Patch Available Jonathan Hung
         
        14. Add ability to export scheduler configuration XML Sub-task Patch Available Jonathan Hung
         
        15.
        Implement zookeeper based store for scheduler configuration updates Sub-task Resolved Jonathan Hung
         
        16.
        Fix issues on recovery in LevelDB store Sub-task Resolved Jonathan Hung
         
        17.
        Add closing logic to configuration store Sub-task Resolved Jonathan Hung
         
        18.
        Documentation for API based scheduler configuration management Sub-task Resolved Jonathan Hung
         
        19.
        Merge YARN-5734 to trunk/branch-2 Sub-task Resolved Jonathan Hung
         
        20.
        Misc changes to YARN-5734 Sub-task Resolved Jonathan Hung
         
        21.
        Removing queue then failing over results in exception Sub-task Resolved Jonathan Hung
         
        22.
        Merge YARN-5734 branch to trunk branch Sub-task Resolved Xuan Gong
         
        23.
        Merge YARN-5734 branch to branch-3.0 Sub-task Resolved Xuan Gong
         
        24.
        Merge YARN-5734 branch to branch-2 Sub-task Resolved Xuan Gong
         
        25. Add HDFSSchedulerConfigurationStore for RM HA Sub-task Patch Available Jiandan Yang
         

          Activity

          Hide
          andrew.wang Andrew Wang added a comment -

          I added something from the design doc, will also make an update to the site docs.

          Show
          andrew.wang Andrew Wang added a comment - I added something from the design doc, will also make an update to the site docs.
          Hide
          subru Subru Krishnan added a comment -

          Jonathan Hung (cc: Min Shen, Xuan Gong, Wangda Tan, Zhe Zhang), can you update the fix versions and release note in anticipation of 2.9.0 release. Thanks.

          Show
          subru Subru Krishnan added a comment - Jonathan Hung (cc: Min Shen , Xuan Gong , Wangda Tan , Zhe Zhang ), can you update the fix versions and release note in anticipation of 2.9.0 release. Thanks.
          Hide
          jhung Jonathan Hung added a comment -

          Resolving this ticket, since all of the required tasks have been merged to 2.9.0/3.0.0/3.1.0. The consolidated patches can be found at YARN-7241.

          Thanks everyone!

          Show
          jhung Jonathan Hung added a comment - Resolving this ticket, since all of the required tasks have been merged to 2.9.0/3.0.0/3.1.0. The consolidated patches can be found at YARN-7241 . Thanks everyone!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13057 (See https://builds.apache.org/job/Hadoop-trunk-Commit/13057/)
          YARN-7251. Misc changes to YARN-5734 (jhung: rev 09c5dfe937f0570cd9494b34d210df2d5f0737a7)

          • (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesConfigurationMutation.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java
          • (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestLeveldbConfigurationStore.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestMutableCSConfigurationProvider.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13057 (See https://builds.apache.org/job/Hadoop-trunk-Commit/13057/ ) YARN-7251 . Misc changes to YARN-5734 (jhung: rev 09c5dfe937f0570cd9494b34d210df2d5f0737a7) (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesConfigurationMutation.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java (edit) hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestLeveldbConfigurationStore.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestMutableCSConfigurationProvider.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRest.md (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
          Hide
          andrew.wang Andrew Wang added a comment -

          Neato, sorry about the noise. If you think this is getting close to done, might be a good time for a new consolidated patch

          Show
          andrew.wang Andrew Wang added a comment - Neato, sorry about the noise. If you think this is getting close to done, might be a good time for a new consolidated patch
          Hide
          jhung Jonathan Hung added a comment -

          Hi Andrew Wang, thanks for taking a look. Actually the consolidated patch was a POC, we have since changed the derby implementation to leveldb, so we should not need any dependency changes.

          The current YARN-5734 branch has the code we want to eventually merge (not including the still-outstanding sub tasks), but there are no dependency changes in any of these (here's the current diff --stat for everything committed so far)

          jhung-mn3:hadoop jhung$ git diff 4249172e1419acdb2b69ae3db43dc59da2aa2e03 --stat
           hadoop-yarn-project/hadoop-yarn/bin/yarn                                                                                             |   4 +
           hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd                                                                                         |   5 +
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java                     |  30 +++++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java                 | 238 ++++++++++++++++++++++++++++++++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java             | 160 ++++++++++++++++++++++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java              |  57 ++++++++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java          |  85 ++++++++++++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/package-info.java                 |  27 ++++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/YarnWebServiceUtils.java         |  14 ++
           hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml                                               |  61 +++++++++
           .../hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java                 |  31 ++++-
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateVersionIncompatibleException.java                    |   2 +-
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicy.java                        |  47 +++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicyFactory.java                 |  49 +++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/DefaultConfigurationMutationACLPolicy.java                 |  45 +++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfScheduler.java                                  |  72 ++++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfigurationProvider.java                          |  50 +++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java                            |  86 +++++++++---
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java               |  12 ++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/CSConfigurationProvider.java                 |  47 +++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/FileBasedCSConfigurationProvider.java        |  67 +++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/InMemoryConfigurationStore.java              | 119 ++++++++++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/LeveldbConfigurationStore.java               | 361 +++++++++++++++++++++++++++++++++++++++++++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java          | 301 +++++++++++++++++++++++++++++++++++++++++
           .../main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/QueueAdminConfigurationMutationACLPolicy.java    | 110 +++++++++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStore.java                  | 204 ++++++++++++++++++++++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStoreFactory.java           |  46 +++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ZKConfigurationStore.java                    | 289 +++++++++++++++++++++++++++++++++++++++
           .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/package-info.java                            |  29 ++++
           .../hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java         |  51 ++++++-
           .../hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java           |  23 ++++
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestConfigurationMutationACLPolicies.java                  | 172 ++++++++++++++++++++++++
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java                        |   4 +-
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ConfigurationStoreBaseTest.java              |  92 +++++++++++++
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestInMemoryConfigurationStore.java          |  30 +++++
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestMutableCSConfigurationProvider.java      | 106 +++++++++++++++
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java                | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
           .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesConfigurationMutation.java                   | 507 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
           38 files changed, 4007 insertions(+), 27 deletions(-)
          Show
          jhung Jonathan Hung added a comment - Hi Andrew Wang , thanks for taking a look. Actually the consolidated patch was a POC, we have since changed the derby implementation to leveldb, so we should not need any dependency changes. The current YARN-5734 branch has the code we want to eventually merge (not including the still-outstanding sub tasks), but there are no dependency changes in any of these (here's the current diff --stat for everything committed so far) jhung-mn3:hadoop jhung$ git diff 4249172e1419acdb2b69ae3db43dc59da2aa2e03 --stat hadoop-yarn-project/hadoop-yarn/bin/yarn | 4 + hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd | 5 + hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java | 30 +++++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java | 238 ++++++++++++++++++++++++++++++++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestSchedConfCLI.java | 160 ++++++++++++++++++++++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/QueueConfigInfo.java | 57 ++++++++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/SchedConfUpdateInfo.java | 85 ++++++++++++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/dao/package-info.java | 27 ++++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/YarnWebServiceUtils.java | 14 ++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml | 61 +++++++++ .../hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java | 31 ++++- .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateVersionIncompatibleException.java | 2 +- .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicy.java | 47 +++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ConfigurationMutationACLPolicyFactory.java | 49 +++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/DefaultConfigurationMutationACLPolicy.java | 45 +++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfScheduler.java | 72 ++++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/MutableConfigurationProvider.java | 50 +++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java | 86 +++++++++--- .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java | 12 ++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/CSConfigurationProvider.java | 47 +++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/FileBasedCSConfigurationProvider.java | 67 +++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/InMemoryConfigurationStore.java | 119 ++++++++++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/LeveldbConfigurationStore.java | 361 +++++++++++++++++++++++++++++++++++++++++++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/MutableCSConfigurationProvider.java | 301 +++++++++++++++++++++++++++++++++++++++++ .../main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/QueueAdminConfigurationMutationACLPolicy.java | 110 +++++++++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStore.java | 204 ++++++++++++++++++++++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/YarnConfigurationStoreFactory.java | 46 +++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ZKConfigurationStore.java | 289 +++++++++++++++++++++++++++++++++++++++ .../src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/package-info.java | 29 ++++ .../hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java | 51 ++++++- .../hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java | 23 ++++ .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestConfigurationMutationACLPolicies.java | 172 ++++++++++++++++++++++++ .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java | 4 +- .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/ConfigurationStoreBaseTest.java | 92 +++++++++++++ .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestInMemoryConfigurationStore.java | 30 +++++ .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestMutableCSConfigurationProvider.java | 106 +++++++++++++++ .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestZKConfigurationStore.java | 401 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ .../src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesConfigurationMutation.java | 507 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 38 files changed, 4007 insertions(+), 27 deletions(-)
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Jonathan, thanks for working on this, I gave the consolidated patch from Jan 20th a quick look, a few comments:

          Looks like we add a new Derby dependency. Derby has a NOTICE file which we need to fold into ours:

          http://svn.apache.org/repos/asf/db/derby/code/trunk/NOTICE

          This is a release blocker, so should be a blocker for merge. I didn't check the current branch for any other new dependencies, but their LICENSE and NOTICE also need to be checked for this.

          One other little comment, we typically centralize dependency versions in hadoop-project/pom.xml for consistency. Recommend doing this for the Derby version as well.

          Show
          andrew.wang Andrew Wang added a comment - Hi Jonathan, thanks for working on this, I gave the consolidated patch from Jan 20th a quick look, a few comments: Looks like we add a new Derby dependency. Derby has a NOTICE file which we need to fold into ours: http://svn.apache.org/repos/asf/db/derby/code/trunk/NOTICE This is a release blocker, so should be a blocker for merge. I didn't check the current branch for any other new dependencies, but their LICENSE and NOTICE also need to be checked for this. One other little comment, we typically centralize dependency versions in hadoop-project/pom.xml for consistency. Recommend doing this for the Derby version as well.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Jonathan Hung,

          With this in mind do you still think AdminService is the right place to put the change configuration functionality?

          I would still prefer to use AdminService, we can add different logic to check ACLs inside AdminService. It is still better than adding them to ClientRMService.

          If we make MutableConfigurationManager part of CS only, the ClientRMService/AdminService still needs to access it somehow.

          I think we can make AdminService to call CS directly (like adding a method to CS like updateCSConfig), and inside CS we will check and reject the request. Changing the global provide-class looks more risks to me, since all YARN components are depended upon that. It's better to limit logics inside CS.

          Show
          leftnoteasy Wangda Tan added a comment - Hi Jonathan Hung , With this in mind do you still think AdminService is the right place to put the change configuration functionality? I would still prefer to use AdminService, we can add different logic to check ACLs inside AdminService. It is still better than adding them to ClientRMService. If we make MutableConfigurationManager part of CS only, the ClientRMService/AdminService still needs to access it somehow. I think we can make AdminService to call CS directly (like adding a method to CS like updateCSConfig ), and inside CS we will check and reject the request. Changing the global provide-class looks more risks to me, since all YARN components are depended upon that. It's better to limit logics inside CS.
          Hide
          jhung Jonathan Hung added a comment -

          Wangda Tan thanks for the review. Regarding 1 and 3, potentially there are queue admins (but not yarn admins) that will change scheduler configuration. In this case the AdminService will not check (yarn admin) acls, it should delegate it to ConfigurationMutationPolicy. With this in mind do you still think AdminService is the right place to put the change configuration functionality?

          For 3, I will add javadocs and a default implementation of the ConfigurationMutationPolicy (which will just check against queue admin acls). (YARN-5954)

          Regarding 2, do you mean a separate configuration provider (MutableConfigurationManager) for CS, and yarn.resourcemanager.configuration.provider-class for everything else? As it is now, I made a mistake in the current patch, we can actually take -Provider out of RMContext, since yarn.resourcemanager.configuration.provider-class is MutableConfigurationManager, so we can just access it via rmContext.getConfigurationProvider(). If we make MutableConfigurationManager part of CS only, the ClientRMService/AdminService still needs to access it somehow. Also to avoid having to change it in other places, currently MutableConfigurationManager overrides LocalConfigurationProvider, so the getConfigurationInputStream behavior in all other non-CS places should be the same. As long as MutableConfigurationManager does not overwrite this functionality we can load stuff from yarn-site, etc in the same way. (Also in the future if we add store functionality to other non-CS configurations we can just do this through the configuration provider.) Thoughts on this?

          Show
          jhung Jonathan Hung added a comment - Wangda Tan thanks for the review. Regarding 1 and 3, potentially there are queue admins (but not yarn admins) that will change scheduler configuration. In this case the AdminService will not check (yarn admin) acls, it should delegate it to ConfigurationMutationPolicy. With this in mind do you still think AdminService is the right place to put the change configuration functionality? For 3, I will add javadocs and a default implementation of the ConfigurationMutationPolicy (which will just check against queue admin acls). ( YARN-5954 ) Regarding 2, do you mean a separate configuration provider (MutableConfigurationManager) for CS, and yarn.resourcemanager.configuration.provider-class for everything else? As it is now, I made a mistake in the current patch, we can actually take -Provider out of RMContext, since yarn.resourcemanager.configuration.provider-class is MutableConfigurationManager, so we can just access it via rmContext.getConfigurationProvider(). If we make MutableConfigurationManager part of CS only, the ClientRMService/AdminService still needs to access it somehow. Also to avoid having to change it in other places, currently MutableConfigurationManager overrides LocalConfigurationProvider, so the getConfigurationInputStream behavior in all other non-CS places should be the same. As long as MutableConfigurationManager does not overwrite this functionality we can load stuff from yarn-site , etc in the same way. (Also in the future if we add store functionality to other non-CS configurations we can just do this through the configuration provider.) Thoughts on this?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Jonathan Hung,

          This prototype helps me to understand overall architecture design much better. And the workflow looks very neat since most of the changes are isolated to provider of CapacitySchedulerConfiguration.

          But I think we still need some time/iterations to consolidate APIs/architecture design, high level comments, haven't dived into details.

          1) ApplicationClientProtocol:
          I think the new interface is majorly for admin to use, so it's better to move it to AdminService/ResourceManagerAdministrationProtocol

          2) ConfigurationProvider / MutableConfigurationManager
          I think it's better to make it only used by CapacitySchedulerConfiguration (Inside CapacityScheduler#loadCapacitySchedulerConfiguration). With this, we can

          • Avoid unnecessary updating of other components like AdminService.
          • Also, make it less risky.
          • Avoid add -Provider to RMContext, which is already (not caused by your patch) overloaded and hard to manage.
          • To make it specific, I suggest to move org.apache.hadoop.yarn.server.resourcemanager.conf to -scheduler.capacity.conf (including sub-package "store"). Since now implementations are all dependent to CS's logics.

          3) ConfigurationMutationPolicy
          What's the purpose of this class? From design doc it is to do authorization, could you add javadocs to the class? If we don't have plan to add this in short term, I would prefer to not add the interface and default implementation for now.

          To proceed, I think we can start splitting patches into sub JIRAs. In my mind, the ordering to split tasks could be:

          a. Define record protocol, PB implementation. Provider/store interface and a simple (probably in memory) implementation. With changes to CS to use the provider. This is the largest part, but I think putting them to a single task can make review easier and speed up overall progress.

          b. Implementation of REST API

          c. Implementation of RM Admin CLI

          d/e. Implementation of store, like derby-based/RM-state-store-based.

          Xuan Gong, please add your thoughts.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Jonathan Hung , This prototype helps me to understand overall architecture design much better. And the workflow looks very neat since most of the changes are isolated to provider of CapacitySchedulerConfiguration. But I think we still need some time/iterations to consolidate APIs/architecture design, high level comments, haven't dived into details. 1) ApplicationClientProtocol: I think the new interface is majorly for admin to use, so it's better to move it to AdminService/ResourceManagerAdministrationProtocol 2) ConfigurationProvider / MutableConfigurationManager I think it's better to make it only used by CapacitySchedulerConfiguration (Inside CapacityScheduler#loadCapacitySchedulerConfiguration). With this, we can Avoid unnecessary updating of other components like AdminService. Also, make it less risky. Avoid add -Provider to RMContext, which is already (not caused by your patch) overloaded and hard to manage. To make it specific, I suggest to move org.apache.hadoop.yarn.server.resourcemanager.conf to -scheduler.capacity.conf (including sub-package "store"). Since now implementations are all dependent to CS's logics. 3) ConfigurationMutationPolicy What's the purpose of this class? From design doc it is to do authorization, could you add javadocs to the class? If we don't have plan to add this in short term, I would prefer to not add the interface and default implementation for now. To proceed, I think we can start splitting patches into sub JIRAs. In my mind, the ordering to split tasks could be: a. Define record protocol, PB implementation. Provider/store interface and a simple (probably in memory) implementation. With changes to CS to use the provider. This is the largest part, but I think putting them to a single task can make review easier and speed up overall progress. b. Implementation of REST API c. Implementation of RM Admin CLI d/e. Implementation of store, like derby-based/RM-state-store-based. Xuan Gong , please add your thoughts. Thanks,
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Jonathan Hung,
          Really appreciate your efforts of working on the patch, I'm reviewing it now, it may take some time. Will keep you updated.

          Show
          leftnoteasy Wangda Tan added a comment - Hi Jonathan Hung , Really appreciate your efforts of working on the patch, I'm reviewing it now, it may take some time. Will keep you updated.
          Hide
          jhung Jonathan Hung added a comment -

          Uploaded an initial patch containing some basic end-to-end functionality.
          Here are yarn-site.xml configurations to get this working:

          • yarn.scheduler.capacity.config.path should be set to a directory inside which the database will be stored. (resource manager user should be able to create subdirectories in here)
          • yarn.scheduler.mutable-queue-config.enabled should be true
          • yarn.resourcemanager.configuration.provider-class should be set to org.apache.hadoop.yarn.server.resourcemanager.conf.MutableConfigurationManager

          Here's some working examples which can be run in series, assuming a starting configuration of two queues, root.default (with 100 capacity) and root.test (with 0 capacity):

          curl -X PUT -H 'Content-Type: application/xml' -d '<schedConf>
            <update>
              <name>root.test</name>
              <params>
                <entry>
                  <key>state</key>
                  <value>STOPPED</value>
                </entry>
                <entry>
                  <key>maximum-applications</key>
                  <value>33</value>
                </entry>
              </params>
            </update>
          </schedConf>' --negotiate -u : "http://<rmHost>:8088/ws/v1/cluster/conf/scheduler/mutate"

          Sets the root.test queue's state to STOPPED and its maximum-applications to 33.

          curl -X PUT -H 'Content-Type: application/xml' -d '<schedConf>
            <remove>
              <name>root.test</name>
            </remove>
          </schedConf>' --negotiate -u : "http://<rmHost>:8088/ws/v1/cluster/conf/scheduler/mutate"

          Removes the root.test queue (since it is STOPPED, leveraging YARN-5556)

          curl -X PUT -H 'Content-Type: application/xml' -d '<schedConf>
            <add>
              <name>root.test2</name>
              <params>
                <entry>
                  <key>maximum-applications</key>
                  <value>34</value>
                </entry>
              </params>
            </add>
          </schedConf>' --negotiate -u : "http://<rmHost>:8088/ws/v1/cluster/conf/scheduler/mutate"

          Adds a root.test2 queue. Also sets its maximum-applications to 34.

          This is just a first version, so there are some details that are not yet implemented/tested (e.g. specifying a hierarchical conf update). Xuan Gong and Tan, Wangda, do you mind taking a look to make sure our ideas/interfaces are in alignment?

          Show
          jhung Jonathan Hung added a comment - Uploaded an initial patch containing some basic end-to-end functionality. Here are yarn-site.xml configurations to get this working: yarn.scheduler.capacity.config.path should be set to a directory inside which the database will be stored. (resource manager user should be able to create subdirectories in here) yarn.scheduler.mutable-queue-config.enabled should be true yarn.resourcemanager.configuration.provider-class should be set to org.apache.hadoop.yarn.server.resourcemanager.conf.MutableConfigurationManager Here's some working examples which can be run in series, assuming a starting configuration of two queues, root.default (with 100 capacity) and root.test (with 0 capacity): curl -X PUT -H 'Content-Type: application/xml' -d '<schedConf> <update> <name>root.test</name> <params> <entry> <key>state</key> <value>STOPPED</value> </entry> <entry> <key>maximum-applications</key> <value>33</value> </entry> </params> </update> </schedConf>' --negotiate -u : "http://<rmHost>:8088/ws/v1/cluster/conf/scheduler/mutate" Sets the root.test queue's state to STOPPED and its maximum-applications to 33. curl -X PUT -H 'Content-Type: application/xml' -d '<schedConf> <remove> <name>root.test</name> </remove> </schedConf>' --negotiate -u : "http://<rmHost>:8088/ws/v1/cluster/conf/scheduler/mutate" Removes the root.test queue (since it is STOPPED, leveraging YARN-5556 ) curl -X PUT -H 'Content-Type: application/xml' -d '<schedConf> <add> <name>root.test2</name> <params> <entry> <key>maximum-applications</key> <value>34</value> </entry> </params> </add> </schedConf>' --negotiate -u : "http://<rmHost>:8088/ws/v1/cluster/conf/scheduler/mutate" Adds a root.test2 queue. Also sets its maximum-applications to 34. This is just a first version, so there are some details that are not yet implemented/tested (e.g. specifying a hierarchical conf update). Xuan Gong and Tan, Wangda , do you mind taking a look to make sure our ideas/interfaces are in alignment?
          Hide
          jhung Jonathan Hung added a comment -

          Uploaded v2 design doc containing changes based on discussion.

          Show
          jhung Jonathan Hung added a comment - Uploaded v2 design doc containing changes based on discussion.
          Hide
          leftnoteasy Wangda Tan added a comment -

          f the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable?

          If everything works as expected, reinitialize failure will not change queue hierarchy. If there's any cases which makes queue structure still get updated when reinitialize fails. Queue configs could be turned to a limbo state, we need fix such cases separately.

          I think we will still need some sort of PluggablePolicy,...

          Make sense

          Not sure if this is what you meant ..

          I'm not sure what is the interface design, but I think the logic you described should be roughly same as what in my mind. We can check detailed logic while doing patch review.

          I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml

          Instead of specifying ConfigurationProvider, I think it might be easier for end user to specify config like ...scheduler.dynamic-queue-config.enabled. We can use different ConfigurationProvider implementation depends on value of dynamic-config.enabled.

          Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that?

          If we allow intialize store-based config based on capacity-scheduler.xml, this is not required.

          Show
          leftnoteasy Wangda Tan added a comment - f the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable? If everything works as expected, reinitialize failure will not change queue hierarchy. If there's any cases which makes queue structure still get updated when reinitialize fails. Queue configs could be turned to a limbo state, we need fix such cases separately. I think we will still need some sort of PluggablePolicy,... Make sense Not sure if this is what you meant .. I'm not sure what is the interface design, but I think the logic you described should be roughly same as what in my mind. We can check detailed logic while doing patch review. I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml Instead of specifying ConfigurationProvider, I think it might be easier for end user to specify config like ...scheduler.dynamic-queue-config.enabled . We can use different ConfigurationProvider implementation depends on value of dynamic-config.enabled. Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that? If we allow intialize store-based config based on capacity-scheduler.xml, this is not required.
          Hide
          jhung Jonathan Hung added a comment - - edited

          Thanks for the detailed points, Wangda Tan.

          How to handle bad configuration update?

          The idea of calling scheduler#reinitialize mostly makes sense to me, a couple questions/thoughts:

          • If the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable?
          • I think we will still need some sort of PluggablePolicy, but in this case it is just an authorization policy so we can leverage YarnAuthorizationProvider.

          By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS).

          Not sure if this is what you meant, but we can have MutableConfigurationManager extends ConfigurationProvider? So we would just have MutableConfigurationManager expose the X+1 configuration when validating the configuration, and either un-expose it (if failed to reinitialize) or keep it expose and store in backing store (if reinitialized successfully).

          If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed.

          I agree.

          So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use.

          I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml. Then we can infer the config source from there. So if the scheduler specific ConfigurationProvider is MutableConfigurationManager, it will use the store. Else, use the file.

          If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store.

          Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that?
          For switching from xml based to store based, I was thinking we could just manually change the scheduler's configuration provider in yarn-site.xml then restart the RM. Otherwise if we allow them to do this via CLI, the yarn-site.xml is not consistent with RM behavior (since yarn-site will still say it is file based but the RM will be store-based).

          Show
          jhung Jonathan Hung added a comment - - edited Thanks for the detailed points, Wangda Tan . How to handle bad configuration update? The idea of calling scheduler#reinitialize mostly makes sense to me, a couple questions/thoughts: If the reinitialization fails (i.e. scheduler.reinitialize(X+1)), then we will need to call scheduler.reinitialize(X). In this case we need to call reinitialize twice. Is this acceptable? I think we will still need some sort of PluggablePolicy, but in this case it is just an authorization policy so we can leverage YarnAuthorizationProvider. By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). Not sure if this is what you meant, but we can have MutableConfigurationManager extends ConfigurationProvider? So we would just have MutableConfigurationManager expose the X+1 configuration when validating the configuration, and either un-expose it (if failed to reinitialize) or keep it expose and store in backing store (if reinitialized successfully). If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed. I agree. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. I am thinking we can add a scheduler specific ConfigurationProvider option in yarn-site.xml. Then we can infer the config source from there. So if the scheduler specific ConfigurationProvider is MutableConfigurationManager, it will use the store. Else, use the file. If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store. Not sure what you mean by loading configuration file from xml while setting the cluster, can you elaborate on that? Do you mean if store is enabled and the admin wants to wipe it and load a new conf from a file into the store? Do we plan on supporting that? For switching from xml based to store based, I was thinking we could just manually change the scheduler's configuration provider in yarn-site.xml then restart the RM. Otherwise if we allow them to do this via CLI, the yarn-site.xml is not consistent with RM behavior (since yarn-site will still say it is file based but the RM will be store-based).
          Hide
          leftnoteasy Wangda Tan added a comment -

          Jonathan Hung,

          Discussed with Jian He for my above point #2 again.

          Now we think the original proposal from you looks better to handle the case when admin want to switch from XML file based solution to API based solution

          Initialization will be done by xml even if API-based approach is enabled. Then on crash/restart the config store will be honored. Basically once store is initialized, it will be used as source of truth (and the xml is no longer useful).

          But I think my points are still valid:

          In the other hand, store-based solution doesn't need refreshQueue CLI at all, because content in store and memory should be always synced.

          So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed.

          Please share your thoughts.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Jonathan Hung , Discussed with Jian He for my above point #2 again. Now we think the original proposal from you looks better to handle the case when admin want to switch from XML file based solution to API based solution Initialization will be done by xml even if API-based approach is enabled. Then on crash/restart the config store will be honored. Basically once store is initialized, it will be used as source of truth (and the xml is no longer useful). But I think my points are still valid: In the other hand, store-based solution doesn't need refreshQueue CLI at all, because content in store and memory should be always synced. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed. Please share your thoughts. Thanks,
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Jonathan Hung / Min Shen / Ye Zhou / Zhe Zhang for pushing this forward.

          A couple of questions regarding to design:

          1) How to handle bad configuration update?

          Existing design is updating config first, and then notify scheduler to do update. But how to avoid update failures? IIUC, PluggablePolicy is added to validate config, but does that mean we have to duplicate some validation logics from scheduler to PluggablePolicy?

          I have an idea that might simplify the overall process:

          MutableConfigurationManager always maintain the latest-in-use-config (version=X)

          a. When queue admin request to update some fields, it merges the latest-in-use-config and new-updated-field to a new configuration proposal (version=X+1). By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS).
          b. Then it calls scheduler.reinitialize(...) API and scheduler uses exactly same logic to validate configuration (including CS#parseQueue, etc.)
          c. If b succeed, write the ver=X+1 config to state store, and response to client about the operation succeeded. The latest-in-use-config updated to X+1
          d. If b failed, it report to client and the new-updated-field will simply discarded.

          This proposal should still fit existing overall architecture. The good things are, it avoids PluggablePolicy implementation (which may require duplicate queue config validation logic), and it avoids write a bad config to store.

          2) I think existing design which support using two sources of configuration at the same time is a little confusing, for example:

          • Admin setup a cluster from scratch, RM saves xml file to store, but admin could continue edit the capacity-scheduler.xml on disk and call rmadmin -refreshQueue, what should happen?

          To me this should not allowed:

          • Existing -refreshQueue is added because under the configuration-file based solution, content in file and memory could be different, -refreshQueue is a way to sync the two.
          • In the other hand, store-based solution doesn't need refreshQueue CLI at all, because content in store and memory should be always synced.

          So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed.

          If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store.

          Thoughts?

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Jonathan Hung / Min Shen / Ye Zhou / Zhe Zhang for pushing this forward. A couple of questions regarding to design: 1) How to handle bad configuration update? Existing design is updating config first, and then notify scheduler to do update. But how to avoid update failures? IIUC, PluggablePolicy is added to validate config, but does that mean we have to duplicate some validation logics from scheduler to PluggablePolicy? I have an idea that might simplify the overall process: MutableConfigurationManager always maintain the latest-in-use-config (version=X) a. When queue admin request to update some fields, it merges the latest-in-use-config and new-updated-field to a new configuration proposal (version=X+1). By using ConfigurationProvider, it can either get a new CapacitySchedulerConfiguration (CS) or a new AllocationConfiguration (FS). b. Then it calls scheduler.reinitialize(...) API and scheduler uses exactly same logic to validate configuration (including CS#parseQueue, etc.) c. If b succeed, write the ver=X+1 config to state store, and response to client about the operation succeeded. The latest-in-use-config updated to X+1 d. If b failed, it report to client and the new-updated-field will simply discarded. This proposal should still fit existing overall architecture. The good things are, it avoids PluggablePolicy implementation (which may require duplicate queue config validation logic), and it avoids write a bad config to store. 2) I think existing design which support using two sources of configuration at the same time is a little confusing, for example: Admin setup a cluster from scratch, RM saves xml file to store, but admin could continue edit the capacity-scheduler.xml on disk and call rmadmin -refreshQueue, what should happen? To me this should not allowed: Existing -refreshQueue is added because under the configuration-file based solution, content in file and memory could be different, -refreshQueue is a way to sync the two. In the other hand, store-based solution doesn't need refreshQueue CLI at all, because content in store and memory should be always synced. So I would prefer to add an option to yarn-site.xml to explicitly specify which config source the scheduler will use. If file-based solution is specified, no dynamic update queue operation will be allowed. If store-based solution is specified, no refreshQueue CLI will be allowed. If admin want to load configuration file from xml while setting the cluster, or want to switch from xml-file based config to store-based config, we can provide a CLI to load a XML file and save it to store. Thoughts?
          Hide
          jhung Jonathan Hung added a comment -

          Hi Jian He, thanks for the feedback.

          Does add/remove also support a full qualified queue name, not just a hierachical structure ? I think supporting a single full qualified queue name would be handy, especially for CLI add/remove

          Sure, I think it makes sense to support both.

          User may need to provide a new queue structure for initialization, then, the xml file will conflict with what's in config store.

          I don't think I understand this part, can you explain why the user needs to provide a new queue structure?
          Initialization will be done by xml even if API-based approach is enabled. Then on crash/restart the config store will be honored. Basically once store is initialized, it will be used as source of truth (and the xml is no longer useful).

          Is the implementation that the caller will block until the update is completed - both in store and memory ?

          Yes, the plan is to block until the update is completed for both. This is to prevent the scenario where the client sends a configuration change, an event is queued, and the call returns, then RM crashes, at which point the configuration change is lost.

          IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we could potentially have two RMs writing together in a split-brain situation and cause data inconsistency. Therefore, I think ZKRMStateStore might be a better store option by default, especially because of RM HA.

          Currently we are not running RM HA. The reason we have Derby as the default is because we currently have it running in production (and we don't have a working implementation which supports RM HA), so for single RM clusters we know it works well.

          Regarding PluggableConfigurationPolicy for authorization, has the implementation considered using YarnAuthorizationProvider ?

          Took a look at this. I have a couple comments about it, let me know if it's not what you had in mind.

          • Right now if I understand correctly it looks like YarnAuthorizationProvider only supports authorization based on queue ACL (submit/administer queue). We would need to extend the implementation to support things like fine-grained acls (e.g. acls by configuration key). In this case we would just extend YarnAuthorizationProvider with something like "SchedulerConfigurationAuthorizationProvider". If this is true, then each component using an authorization provider would need to configure its own implementation, since the SchedulerConfigurationAuthorizationProvider does not apply to all components (and it seems all components use the same provider determined by yarn.authorization-provider).
          • We will probably still need the new pluggable configuration policy, at least for configuration change validation to make sure the proposed configuration changes make sense.
          Show
          jhung Jonathan Hung added a comment - Hi Jian He , thanks for the feedback. Does add/remove also support a full qualified queue name, not just a hierachical structure ? I think supporting a single full qualified queue name would be handy, especially for CLI add/remove Sure, I think it makes sense to support both. User may need to provide a new queue structure for initialization, then, the xml file will conflict with what's in config store. I don't think I understand this part, can you explain why the user needs to provide a new queue structure? Initialization will be done by xml even if API-based approach is enabled. Then on crash/restart the config store will be honored. Basically once store is initialized, it will be used as source of truth (and the xml is no longer useful). Is the implementation that the caller will block until the update is completed - both in store and memory ? Yes, the plan is to block until the update is completed for both. This is to prevent the scenario where the client sends a configuration change, an event is queued, and the call returns, then RM crashes, at which point the configuration change is lost. IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we could potentially have two RMs writing together in a split-brain situation and cause data inconsistency. Therefore, I think ZKRMStateStore might be a better store option by default, especially because of RM HA. Currently we are not running RM HA. The reason we have Derby as the default is because we currently have it running in production (and we don't have a working implementation which supports RM HA), so for single RM clusters we know it works well. Regarding PluggableConfigurationPolicy for authorization, has the implementation considered using YarnAuthorizationProvider ? Took a look at this. I have a couple comments about it, let me know if it's not what you had in mind. Right now if I understand correctly it looks like YarnAuthorizationProvider only supports authorization based on queue ACL (submit/administer queue). We would need to extend the implementation to support things like fine-grained acls (e.g. acls by configuration key). In this case we would just extend YarnAuthorizationProvider with something like "SchedulerConfigurationAuthorizationProvider". If this is true, then each component using an authorization provider would need to configure its own implementation, since the SchedulerConfigurationAuthorizationProvider does not apply to all components (and it seems all components use the same provider determined by yarn.authorization-provider). We will probably still need the new pluggable configuration policy, at least for configuration change validation to make sure the proposed configuration changes make sense.
          Hide
          jianhe Jian He added a comment -

          Min Shen, Jonathan Hung, Zhe Zhang, very useful feature! thanks for the contribution, Some questions I had about the design:

          • Does add/remove also support a full qualified queue name, not just a hierachical structure ? I think supporting a single full qualified queue name would be handy, especially for CLI add/remove
          • IIUC, the xml-file will still be used for initialization on startup, even if the API-based approach is enabled ? Then, if the RM gets restarted, will the RM honor the xml file or the config store for initialization ? I feel both scenarios may be possible:
            • If it is a crash-and-restart, probably we should honor the config store.
            • If RM is going through a rolling upgrade. User may need to provide a new queue structure for initialization, then, the xml file will conflict with what's in config store.
          • Is the implementation that the caller will block until the update is completed - both in store and memory ?
          • IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we could potentially have two RMs writing together in a split-brain situation and cause data inconsistency. Therefore, I think ZKRMStateStore might be a better store option by default, especially because of RM HA.
          • Regarding PluggableConfigurationPolicy for authorization, has the implementation considered using YarnAuthorizationProvider ? YarnAuthorizationProvider is a interface which can be implemented by other authorization plugin(Apache Ranger). Ranger has a nice web portal where it can define arbitrary authorization policies such as restricting certain user/groups from doing certain operations. It would be useful if it did, as Ranger plugin just needs to implement the necessary interface and get the config authorization for free.
          Show
          jianhe Jian He added a comment - Min Shen , Jonathan Hung , Zhe Zhang , very useful feature! thanks for the contribution, Some questions I had about the design: Does add/remove also support a full qualified queue name, not just a hierachical structure ? I think supporting a single full qualified queue name would be handy, especially for CLI add/remove IIUC, the xml-file will still be used for initialization on startup, even if the API-based approach is enabled ? Then, if the RM gets restarted, will the RM honor the xml file or the config store for initialization ? I feel both scenarios may be possible: If it is a crash-and-restart, probably we should honor the config store. If RM is going through a rolling upgrade. User may need to provide a new queue structure for initialization, then, the xml file will conflict with what's in config store. Is the implementation that the caller will block until the update is completed - both in store and memory ? IIUC, the EmbededDerbyDatabase is suitable for single RM only. Do you run RM HA in your cluster? Also, I guess Derby does not support fencing ? If so, we could potentially have two RMs writing together in a split-brain situation and cause data inconsistency. Therefore, I think ZKRMStateStore might be a better store option by default, especially because of RM HA. Regarding PluggableConfigurationPolicy for authorization, has the implementation considered using YarnAuthorizationProvider ? YarnAuthorizationProvider is a interface which can be implemented by other authorization plugin(Apache Ranger). Ranger has a nice web portal where it can define arbitrary authorization policies such as restricting certain user/groups from doing certain operations. It would be useful if it did, as Ranger plugin just needs to implement the necessary interface and get the config authorization for free.
          Hide
          jhung Jonathan Hung added a comment -

          Thanks Kai Sasaki, right now we are working on initial patches, and we will have a better idea of how to split tasks once we have a skeleton of the implementation. Regarding the target branch, we will have an option to use the flat configuration file as it is now, so this shouldn't be incompatible.

          Robert Kanter, thanks for the note. As you mentioned, configuration changes shouldn't be too frequent so we don't anticipate this being an issue but we'll definitely keep it in mind.

          Show
          jhung Jonathan Hung added a comment - Thanks Kai Sasaki , right now we are working on initial patches, and we will have a better idea of how to split tasks once we have a skeleton of the implementation. Regarding the target branch, we will have an option to use the flat configuration file as it is now, so this shouldn't be incompatible. Robert Kanter , thanks for the note. As you mentioned, configuration changes shouldn't be too frequent so we don't anticipate this being an issue but we'll definitely keep it in mind.
          Hide
          rkanter Robert Kanter added a comment -

          Oozie has run into scalability problems with Derby, but I would imagine that Oozie does more frequent reads and writes to Derby than users will be doing with their Configurations, so it probably won't be a problem. Just something to keep in mind.

          Show
          rkanter Robert Kanter added a comment - Oozie has run into scalability problems with Derby, but I would imagine that Oozie does more frequent reads and writes to Derby than users will be doing with their Configurations, so it probably won't be a problem. Just something to keep in mind.
          Hide
          lewuathe Kai Sasaki added a comment -

          I'm also interested in flexible Queue configuration management because xml-based configuration often be troublesome for us.

          We discussed an advanced feature of supporting multi-update transactions.

          I sometimes faced in-consistent state of queue while updating queue configuration with xml because updating cannot be done transactionally. We have incomplete queue state in scheduler in this case.

          Target branch-2

          Obsolete xml file can be incompatible change, so might it be better to target 3.x later? Or does it mean only adding new backend storage impelemtation?

          Anyway I want to work on after the sub-tasks are arranged. Thanks!

          Show
          lewuathe Kai Sasaki added a comment - I'm also interested in flexible Queue configuration management because xml-based configuration often be troublesome for us. We discussed an advanced feature of supporting multi-update transactions. I sometimes faced in-consistent state of queue while updating queue configuration with xml because updating cannot be done transactionally. We have incomplete queue state in scheduler in this case. Target branch-2 Obsolete xml file can be incompatible change, so might it be better to target 3.x later? Or does it mean only adding new backend storage impelemtation? Anyway I want to work on after the sub-tasks are arranged. Thanks!
          Hide
          xgong Xuan Gong added a comment -

          Jonathan Hung

          Create feature branch

          I have created a feature branch: YARN-5734

          Show
          xgong Xuan Gong added a comment - Jonathan Hung Create feature branch I have created a feature branch: YARN-5734
          Hide
          jhung Jonathan Hung added a comment -

          Attached updated doc containing design for scheduler configuration management API and backing store

          Show
          jhung Jonathan Hung added a comment - Attached updated doc containing design for scheduler configuration management API and backing store
          Hide
          jhung Jonathan Hung added a comment -

          Here are the notes from yesterday's meetup:

          Objective: Aligning queue configuration requirements from YARN-5734 and YARN-5724
          Attendees: Xuan, Wangda, Vinod, Subru, Zhe, Konstantin, Ye, Min, Jonathan, Erik
          10/26/16 2-4pm

          Meeting minutes

          • Overall we are in agreement of adding a Mutable API for queue configuration. We discussed many details around APIs and storage implementation.
          • APIs
            • For compatibility we can keep xml-file-based configuration as an option. Subru and Wangda both raised a concern that having 2 sources of truth is hard to maintain; therefore user should choose to use either the xml-file-based configuration approach or the new API-based one.
            • Vinod raised a point that besides REST APIs, CLIs are also important.
            • We also discussed a tricky case of adding new resources to the entire system.
            • We discussed an advanced feature of supporting multi-update transactions. E.g. reducing capacity of queue A and moving the capacity of queue B.
            • We discussed how to support bulk updates.
            • We discussed how to make the project applicable for both Capacity and Fair schedulers. YARN-2986 should be revisited to provide a common data model for both schedulers.
            • We discussed the case of hierarchical queues.
          • Storage implementation
            • Derby embedded database can be used as default underlying storage implementation
            • Storage implementation should be configurable, e.g. need to use distributed storage to support HA
            • Another option is to use the YARN RM state store. This potentially simplifies how update events are logged (audit logger) and recovered.
            • Need to address other issues, such as scheduler-agnostic REST APIs and user-friendly concurrent updates
            • Target branch-2
          • Action items
            • Combine YARN-5724 and YARN-5734 to one umbrella
            • Create one unified design doc covering
              • Backing store implementations
              • Queue state machine
              • List of supported APIs
            • Create feature branch (and add Min Shen mshen@linkedin.com, Jonathan Hung jyhung2357@gmail.com, Ye Zhou zhouyejoe@gmail.com as branch committers)
            • After feature branch is created, create sub-tasks needed for implementing mutable API configuration provider
          Show
          jhung Jonathan Hung added a comment - Here are the notes from yesterday's meetup: Objective: Aligning queue configuration requirements from YARN-5734 and YARN-5724 Attendees: Xuan, Wangda, Vinod, Subru, Zhe, Konstantin, Ye, Min, Jonathan, Erik 10/26/16 2-4pm Meeting minutes Overall we are in agreement of adding a Mutable API for queue configuration. We discussed many details around APIs and storage implementation. APIs For compatibility we can keep xml-file-based configuration as an option. Subru and Wangda both raised a concern that having 2 sources of truth is hard to maintain; therefore user should choose to use either the xml-file-based configuration approach or the new API-based one. Vinod raised a point that besides REST APIs, CLIs are also important. We also discussed a tricky case of adding new resources to the entire system. We discussed an advanced feature of supporting multi-update transactions. E.g. reducing capacity of queue A and moving the capacity of queue B. We discussed how to support bulk updates. We discussed how to make the project applicable for both Capacity and Fair schedulers. YARN-2986 should be revisited to provide a common data model for both schedulers. We discussed the case of hierarchical queues. Storage implementation Derby embedded database can be used as default underlying storage implementation Storage implementation should be configurable, e.g. need to use distributed storage to support HA Another option is to use the YARN RM state store. This potentially simplifies how update events are logged (audit logger) and recovered. Need to address other issues, such as scheduler-agnostic REST APIs and user-friendly concurrent updates Target branch-2 Action items Combine YARN-5724 and YARN-5734 to one umbrella Create one unified design doc covering Backing store implementations Queue state machine List of supported APIs Create feature branch (and add Min Shen mshen@linkedin.com, Jonathan Hung jyhung2357@gmail.com, Ye Zhou zhouyejoe@gmail.com as branch committers) After feature branch is created, create sub-tasks needed for implementing mutable API configuration provider
          Hide
          zhz Zhe Zhang added a comment -

          Since there is some overlap between this JIRA's objectives and those of YARN-5724, we plan to have a meetup to better discuss these 2 projects. Thanks Tan, Wangda and Xuan Gong for proposing this. Please join in-person or remotely if you are interested.

          When: Wednesday 10/26 2~4pm
          Where: LinkedIn HQ, 950 West Maude Avenue, Sunnyvale, CA. (If you do plan to attend in-person, please email zhz@apache.org)
          Confcall: https://bluejeans.com/654904000

          We will post notes after the meetup.

          Show
          zhz Zhe Zhang added a comment - Since there is some overlap between this JIRA's objectives and those of YARN-5724 , we plan to have a meetup to better discuss these 2 projects. Thanks Tan, Wangda and Xuan Gong for proposing this. Please join in-person or remotely if you are interested. When : Wednesday 10/26 2~4pm Where : LinkedIn HQ, 950 West Maude Avenue, Sunnyvale, CA. (If you do plan to attend in-person, please email zhz@apache.org) Confcall : https://bluejeans.com/654904000 We will post notes after the meetup.
          Hide
          jhung Jonathan Hung added a comment -

          Rémy Saissy, glad to hear this is useful for your company. With this enabled, refreshQueue will no longer use the configuration from capacity-scheduler.xml as the latest conf, since calling capacity scheduler's reinitialize will load the capacity scheduler configuration from the backing store (e.g. derby database). The intent behind reset is to clear the configuration from the DB and load it from the xml file.

          Show
          jhung Jonathan Hung added a comment - Rémy Saissy , glad to hear this is useful for your company. With this enabled, refreshQueue will no longer use the configuration from capacity-scheduler.xml as the latest conf, since calling capacity scheduler's reinitialize will load the capacity scheduler configuration from the backing store (e.g. derby database). The intent behind reset is to clear the configuration from the DB and load it from the xml file.
          Hide
          rémy Rémy Saissy added a comment -

          Hi,
          thanks for this feature, it answers a pain point we have at Criteo.

          Does it completely disables the refreshQueue CLI which loads LocalConfigurationProvider content or can the command line will basically perform a call to the /cluster/queue/reset REST Endpoint?

          Show
          rémy Rémy Saissy added a comment - Hi, thanks for this feature, it answers a pain point we have at Criteo. Does it completely disables the refreshQueue CLI which loads LocalConfigurationProvider content or can the command line will basically perform a call to the /cluster/queue/reset REST Endpoint?
          Hide
          jhung Jonathan Hung added a comment -

          I see, that makes sense. The local param changes sounds like something we could leverage.

          Zhe Zhang it seems that there are a few things OrgQueue needs to integrate with so I think a feature branch would be useful here.

          Show
          jhung Jonathan Hung added a comment - I see, that makes sense. The local param changes sounds like something we could leverage. Zhe Zhang it seems that there are a few things OrgQueue needs to integrate with so I think a feature branch would be useful here.
          Hide
          curino Carlo Curino added a comment -

          Jonathan Hung what I was saying is a bit different, but what you mention makes sense.

          What I was pointing out was that we had a solution to tweak (for ReservationQueue) some of the key params in a very cheap / dynamic way. As part of YARN-4193 we had in a prototype the support for node-labels and did some further scalability work (lock tweaks in CS) to make it scale to many changes per second (300 queues with many node labels updated every sec). The insight was to do more "surgical" local changes to specific params, instead of large lock-deadly operations like refreshQueues.

          Said this, I agree that some of the work you guys are doing could be used (if low cost enough) to enforce the Plan, and generalize what reservations can "set" in the queues.

          Finally, during our convo with Min Shen I was pointing out that the ReservationSystem can be used to provide a time-varying notion of queues (think a daily sin for the queue capacity), which in turns could be used to "multiply" the sellable capacity in the cluster. For example, we could promise highly guaranteed access to the "dev" queue during the day and exclusive access to the "reporting" queue at night (note that this provides much stronger guarantees than over-capacity fair sharing). Integrating this with what you guys have would be neat.

          Show
          curino Carlo Curino added a comment - Jonathan Hung what I was saying is a bit different, but what you mention makes sense. What I was pointing out was that we had a solution to tweak (for ReservationQueue ) some of the key params in a very cheap / dynamic way. As part of YARN-4193 we had in a prototype the support for node-labels and did some further scalability work (lock tweaks in CS) to make it scale to many changes per second (300 queues with many node labels updated every sec). The insight was to do more "surgical" local changes to specific params, instead of large lock-deadly operations like refreshQueues. Said this, I agree that some of the work you guys are doing could be used (if low cost enough) to enforce the Plan , and generalize what reservations can "set" in the queues. Finally, during our convo with Min Shen I was pointing out that the ReservationSystem can be used to provide a time-varying notion of queues (think a daily sin for the queue capacity), which in turns could be used to "multiply" the sellable capacity in the cluster. For example, we could promise highly guaranteed access to the "dev" queue during the day and exclusive access to the "reporting" queue at night (note that this provides much stronger guarantees than over-capacity fair sharing). Integrating this with what you guys have would be neat.
          Hide
          jhung Jonathan Hung added a comment -

          Carlo Curino, thanks for the comments.

          For 1 and 2, this is in our plans (to do either internally or e.g. in a feature branch). The Derby based storage is one implementation (and eventually we will implement an RMStateStore version).

          I took a quick look at some of the ReservationSystem code - my understanding is that the PlanQueue's capacity/max-capacity is currently mutable in the same way as ParentQueue (i.e. via refreshQueues)? The dynamic part is in the ReservationQueue. So instead of having to setEntitlement for each child of a PlanQueue, we can leverage the MutableConfigurationProvider API to change all child queue capacities of a PlanQueue. Is this what you had in mind? Also changing queue configurations such as user-limit or user-limit-factor of a ReservationQueue can be done via this API (as can other configurations if they are added to ReservationQueue in the future).

          Show
          jhung Jonathan Hung added a comment - Carlo Curino , thanks for the comments. For 1 and 2, this is in our plans (to do either internally or e.g. in a feature branch). The Derby based storage is one implementation (and eventually we will implement an RMStateStore version). I took a quick look at some of the ReservationSystem code - my understanding is that the PlanQueue 's capacity/max-capacity is currently mutable in the same way as ParentQueue (i.e. via refreshQueues )? The dynamic part is in the ReservationQueue . So instead of having to setEntitlement for each child of a PlanQueue , we can leverage the MutableConfigurationProvider API to change all child queue capacities of a PlanQueue . Is this what you had in mind? Also changing queue configurations such as user-limit or user-limit-factor of a ReservationQueue can be done via this API (as can other configurations if they are added to ReservationQueue in the future).
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Min Shen Ye Zhou Jonathan Hung for the proposal! Also thanks Carlo Curino for the very helpful feedback.

          This is potentially a pretty large change, and I think we should use a feature branch for the development. Please share your opinions on this, thanks.

          Show
          zhz Zhe Zhang added a comment - Thanks Min Shen Ye Zhou Jonathan Hung for the proposal! Also thanks Carlo Curino for the very helpful feedback. This is potentially a pretty large change, and I think we should use a feature branch for the development. Please share your opinions on this, thanks.
          Hide
          curino Carlo Curino added a comment -

          Min Shen, I skimmed your doc, but not read it carefully yet. I am generally a fan of this. At MS we have similar mechanisms for other systems and users seem to like it, also at our scale the number of daily configuration is substantial and constant refresh from XML (could be tens daily) sits between very annoying and impractical. Moreover, in Federation YARN-2915 we would be happy to leverage this functionality, as we want to centralized the configuration of multiple RMs via our centralized FederationPolicyStore, our current practical workaround is to automate the download of the new conf, write to .xml file and refreshqueue.

          A couple of important considerations:

          1. The solution should play nice with HA, so using the RMStateStore (instead or beside) Derby for storing the updated configuration (beside the conf.xml as you do as a backup) is I think key.
          2. As you do this, please make the Store (e.g., DB) configurable. In our deployments, it would be very nice to use an external RDBMS. Generally I agree with Carl Steinbach that having configs stored in a DB is very convenient, as you can easily maintain a historical record of previous entries, and study how they evolve/relate with each other with simple OLAP queries.
          3. You should also take a look at the ReservationSystem code (YARN-1051, YARN-2572, YARN-2573), as the PlanQueue and ReservationQueue are used to very dynamically change configurations (focus on capacity/max-capacity only, but we could generalize it if useful).

          Bottomline, the specifics of the code might need to go through a few iterations/tweaks, but the general idea is very welcome IMHO. Also the fact you have large scale, and long experience in deploying and operating this is very reassuring.

          Show
          curino Carlo Curino added a comment - Min Shen , I skimmed your doc, but not read it carefully yet. I am generally a fan of this. At MS we have similar mechanisms for other systems and users seem to like it, also at our scale the number of daily configuration is substantial and constant refresh from XML (could be tens daily) sits between very annoying and impractical. Moreover, in Federation YARN-2915 we would be happy to leverage this functionality, as we want to centralized the configuration of multiple RMs via our centralized FederationPolicyStore, our current practical workaround is to automate the download of the new conf, write to .xml file and refreshqueue. A couple of important considerations: The solution should play nice with HA, so using the RMStateStore (instead or beside) Derby for storing the updated configuration (beside the conf.xml as you do as a backup) is I think key. As you do this, please make the Store (e.g., DB) configurable. In our deployments, it would be very nice to use an external RDBMS. Generally I agree with Carl Steinbach that having configs stored in a DB is very convenient, as you can easily maintain a historical record of previous entries, and study how they evolve/relate with each other with simple OLAP queries. You should also take a look at the ReservationSystem code ( YARN-1051 , YARN-2572 , YARN-2573 ), as the PlanQueue and ReservationQueue are used to very dynamically change configurations (focus on capacity/max-capacity only, but we could generalize it if useful). Bottomline, the specifics of the code might need to go through a few iterations/tweaks, but the general idea is very welcome IMHO. Also the fact you have large scale, and long experience in deploying and operating this is very reassuring.
          Hide
          mshen Min Shen added a comment -

          Carlo Curino, Subru Krishnan,

          As discussed offline, could you please provide feedbacks on the design docs we currently have?

          Show
          mshen Min Shen added a comment - Carlo Curino , Subru Krishnan , As discussed offline, could you please provide feedbacks on the design docs we currently have?

            People

            • Assignee:
              mshen Min Shen
              Reporter:
              mshen Min Shen
            • Votes:
              2 Vote for this issue
              Watchers:
              40 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development