Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5734

OrgQueue for easy CapacityScheduler queue configuration management

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0
    • None
    • None
    • Hide
      <!-- markdown -->

      The OrgQueue extension to the capacity scheduler provides a programmatic way to change configurations by providing a REST API that users can call to modify queue configurations. This enables automation of queue configuration management by administrators in the queue's `administer_queue` ACL.
      Show
      <!-- markdown --> The OrgQueue extension to the capacity scheduler provides a programmatic way to change configurations by providing a REST API that users can call to modify queue configurations. This enables automation of queue configuration management by administrators in the queue's `administer_queue` ACL.

    Description

      The current xml based configuration mechanism in CapacityScheduler makes it very inconvenient to apply any changes to the queue configurations. We saw 2 main drawbacks in the file based configuration mechanism:

      1. This makes it very inconvenient to automate queue configuration updates. For example, in our cluster setup, we leverage the queue mapping feature from YARN-2411 to route users to their dedicated organization queues. It could be extremely cumbersome to keep updating the config file to manage the very dynamic mapping between users to organizations.
      2. Even a user has the admin permission on one specific queue, that user is unable to make any queue configuration changes to resize the subqueues, changing queue ACLs, or creating new queues. All these operations need to be performed in a centralized manner by the cluster administrators.

      With these current limitations, we realized the need of a more flexible configuration mechanism that allows queue configurations to be stored and managed more dynamically. We developed the feature internally at LinkedIn which introduces the concept of MutableConfigurationProvider. What it essentially does is to provide a set of configuration mutation APIs that allows queue configurations to be updated externally with a set of REST APIs. When performing the queue configuration changes, the queue ACLs will be honored, which means only queue administrators can make configuration changes to a given queue. MutableConfigurationProvider is implemented as a pluggable interface, and we have one implementation of this interface which is based on Derby embedded database.

      This feature has been deployed at LinkedIn's Hadoop cluster for a year now, and have gone through several iterations of gathering feedbacks from users and improving accordingly. With this feature, cluster administrators are able to automate lots of thequeue configuration management tasks, such as setting the queue capacities to adjust cluster resources between queues based on established resource consumption patterns, or managing updating the user to queue mappings. We have attached our design documentation with this ticket and would like to receive feedbacks from the community regarding how to best integrate it with the latest version of YARN.

      Attachments

        1. YARN-5734-YARN-5734.001.patch
          127 kB
          Jonathan Hung
        2. OrgQueueAPI-BasedSchedulerConfigurationManagement_v2.pdf
          117 kB
          Jonathan Hung
        3. OrgQueue_Design_v0.pdf
          220 kB
          Jonathan Hung
        4. OrgQueue_API-Based_Config_Management_v1.pdf
          115 kB
          Jonathan Hung

        Issue Links

          Activity

            People

              mshen Min Shen
              mshen Min Shen
              Votes:
              2 Vote for this issue
              Watchers:
              43 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m