Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9992

Max allocation per queue is zero for custom resource types on RM startup

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Found an issue where trying to request GPUs on a newly booted RM cannot schedule. It throws the exception in SchedulerUtils#throwInvalidResourceException:

      throw new InvalidResourceRequestException(
          "Invalid resource request, requested resource type=[" + reqResourceName
              + "] < 0 or greater than maximum allowed allocation. Requested "
              + "resource=" + reqResource + ", maximum allowed allocation="
              + availableResource
              + ", please note that maximum allowed allocation is calculated "
              + "by scheduler based on maximum resource of registered "
              + "NodeManagers, which might be less than configured "
              + "maximum allocation="
              + ResourceUtils.getResourceTypesMaximumAllocation());

      Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works again.

      I think the RC is that upon scheduler refresh, resource-types.xml is loaded in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call ResourceUtils#fetchMaximumAllocationFromConfig in CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to fetch the yarn.resource-types config. But resource-types.xml is not loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find the custom resource when computing max allocations, and the custom resource max allocation is 0.

        Attachments

        1. YARN-9992.001.patch
          1 kB
          Jonathan Hung

          Issue Links

            Activity

              People

              • Assignee:
                jhung Jonathan Hung
                Reporter:
                jhung Jonathan Hung
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: