Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5734 OrgQueue for easy CapacityScheduler queue configuration management
  3. YARN-10139

ValidateAndGetSchedulerConfiguration API fails when cluster max allocation > default 8GB

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0, 3.2.2, 3.1.4
    • None
    • None
    • Reviewed

    Description

      ValidateAndGetSchedulerConfiguration fails when the cluster max allocation (yarn.scheduler.maximum-allocation-mb) is set to resource (eg: 16GB) > default 8GB in yarn-site.xml.

      As part of validation API, there are two configuration used - CapacitySchedulerConfiguration and Configuration (yarn-site.xml). When CapacityScheduler is initialized with CapacitySchedulerConfiguration, as part of queues initialization, it checks the queue maximum allocation which is not present and so checks cluster max allocation which is not present (it is present only in YarnConfiguration) and defaults to 8GB. This will fail as queue max allocation 8GB is decreased from previous 16GB.

      2020-02-14 07:38:46,087 WARN org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: CapacityScheduler configuration validation failed:java.io.IOException: Failed to re-init queues : Trying to reinitialize root.default.c1.c3 the maximum allocation size can not be decreased! Current setting: <memory:164860, vCores:88>, trying to set it to: <memory:8192, vCores:4>
      

      CapacityScheduler initialize code reads a yarn config from CapacitySchedulerConfiguration causing the issue.

      Attachments

        1. YARN-10139-002.patch
          5 kB
          Prabhu Joseph
        2. YARN-10139-001.patch
          3 kB
          Prabhu Joseph

        Issue Links

          Activity

            People

              prabhujoseph Prabhu Joseph
              prabhujoseph Prabhu Joseph
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: