Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10839

queueMaxAppsDefault when set blindly caps the root queue's maxRunningApps setting to this value ignoring any individually overriden maxRunningApps setting for child queues in FairScheduler

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.5, 3.3.1
    • None
    • yarn

    Description

      queueMaxAppsDefault sets the default running app limit for queues (including the root queue) which can be overridden by individual child queues through the maxRunningApps setting.

      Consider a simple FairScheduler XML as follows:

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      <allocations>
          <queue name="root">
              <weight>1.0</weight>
              <schedulingPolicy>drf</schedulingPolicy>
              <aclSubmitApps>*</aclSubmitApps>
              <aclAdministerApps>*</aclAdministerApps>
              <queue name="default">
                  <weight>1.0</weight>
                  <schedulingPolicy>drf</schedulingPolicy>
              </queue>
              <queue name="A">
                  <minResources>1024000 mb, 1000 vcores</minResources>
                  <maxRunningApps>15</maxRunningApps>
                  <weight>2.0</weight>
                  <schedulingPolicy>drf</schedulingPolicy>
              </queue>
              <queue name="B">
                  <minResources>512000 mb, 500 vcores</minResources>
                  <maxRunningApps>10</maxRunningApps>
                  <weight>1.0</weight>
                  <schedulingPolicy>drf</schedulingPolicy>
              </queue>
          </queue>
          <queueMaxAppsDefault>3</queueMaxAppsDefault>
          <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
          <queuePlacementPolicy>
              <rule name="specified" create="true"/>
              <rule name="user" create="true"/>
          </queuePlacementPolicy>
      </allocations>
      

      Here:

      • queueMaxAppsDefault is set to 3 maxRunningApps by default.
      • root queue does not have any maxRunningApps limit set,
      • maxRunningApps for child queues - root.A is 15 and for root.B is 10.

      From above, if users wants to submit jobs to root.B, they are (incorrectly) capped to 3, not 15 because the root queue (parent) itself is capped to 3 because of the queueMaxAppsDefault setting.

      Users' observations are thus seeing their apps stuck in ACCEPTED state.

      Either the above FairScheduler XML should have been rejected by the ResourceManager, or, the root queue should have been capped to the maximum maxRunningApps setting defined for a leaf queue.

      Possible solution -> If root queue has no maxRunningApps set and queueMaxAppsDefault is set to a lower value than maxRunningApps for an individual leaf queue, then, the root queue should implicitly be capped to the latter, instead of queueMaxAppsDefault.

      Attachments

        Activity

          People

            sahuja Siddharth Ahuja
            sahuja Siddharth Ahuja
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: