Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11682

Legacy auto created queue in absolute mode has zero capacity after creation during app recovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • capacity scheduler
    • None

    Description

      During recovery of a running app in a legacy auto created queue configured in absolute mode the configured min resources will be set to zero as NodeManagers have not registered yet (clusterResource is zero).

      GuaranteedOrZeroCapacityOverTimePolicy.getInitialLeafQueueConfiguration(AbstractAutoCreatedLeafQueue leafQueue)
      ...
         if (availableCapacity >= leafQueueTemplateCapacities
                  .getAbsoluteCapacity(nodeLabel)) {
                updateCapacityFromTemplate(capacities, nodeLabel);
                activate(leafQueue, nodeLabel);
              } else{
                updateToZeroCapacity(capacities, nodeLabel, leafQueue);
              }
      
      GuaranteedOrZeroCapacityOverTimePolicy.updateToZeroCapacity(QueueCapacities capacities, String nodeLabel, AbstractLeafQueue leafQueue)
      
      private void updateToZeroCapacity(QueueCapacities capacities,
            String nodeLabel, AbstractLeafQueue leafQueue) {
          capacities.setCapacity(nodeLabel, 0.0f);
          capacities.setMaximumCapacity(nodeLabel,
              leafQueueTemplateCapacities.getMaximumCapacity(nodeLabel));
          leafQueue.getQueueResourceQuotas().
              setConfiguredMinResource(nodeLabel, Resource.newInstance(0, 0));
        }
      
      

      When a NodeManager is registered, AbstractCSQueue.updateEffectiveResources(Resource clusterResource) is called, but specifically absolute mode queues are updated using the configured min resource, which will now be zero.

      AbstractCSQueue.updateEffectiveResources(Resource clusterResource)
      ...
      if (getCapacityConfigType().equals(
                CapacityConfigType.ABSOLUTE_RESOURCE)) {
              newEffectiveMinResource = createNormalizedMinResource(
                  usageTracker.getQueueResourceQuotas().getConfiguredMinResource(label),
                  ((AbstractParentQueue) parent).getEffectiveMinRatio(label));
      ...
         usageTracker.getQueueResourceQuotas().setEffectiveMinResource(label,
                newEffectiveMinResource);
      

      Reinitializing the queue via a config change will correctly recalculate the capacity based on current clusterResource.

      Attachments

        Activity

          People

            susheel_7 Susheel Gupta
            bgoerlitz Brian Goerlitz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: