Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8200

Backport resource types/GPU features to branch-3.0/branch-2

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.10.0
    • None
    • Hide
      The generic resource types feature allows admins to configure custom resource types outside of memory and CPU. Users can request these resource types which YARN will take into account for resource scheduling.

      This also adds GPU as a native resource type, built on top of the generic resource types feature. It adds support for GPU resource discovery, GPU scheduling and GPU isolation.
      Show
      The generic resource types feature allows admins to configure custom resource types outside of memory and CPU. Users can request these resource types which YARN will take into account for resource scheduling. This also adds GPU as a native resource type, built on top of the generic resource types feature. It adds support for GPU resource discovery, GPU scheduling and GPU isolation.

    Description

      Currently we have a need for GPU scheduling on our YARN clusters to support deep learning workloads. However, our main production clusters are running older versions of branch-2 (2.7 in our case). To prevent supporting too many very different hadoop versions across multiple clusters, we would like to backport the resource types/resource profiles feature to branch-2, as well as the GPU specific support.

       

      We have done a trial backport of YARN-3926 and some miscellaneous patches in YARN-7069 based on issues we uncovered, and the backport was fairly smooth. We also did a trial backport of most of YARN-6223 (sans docker support).

       

      Regarding the backports, perhaps we can do the development in a feature branch and then merge to branch-2 when ready.

      Attachments

        1. YARN-8200-branch-2.003.patch
          857 kB
          Jonathan Hung
        2. YARN-8200-branch-2.002.patch
          857 kB
          Jonathan Hung
        3. YARN-8200-branch-3.0.001.patch
          357 kB
          Jonathan Hung
        4. YARN-8200-branch-2.001.patch
          773 kB
          Jonathan Hung
        5. counter.scheduler.operation.allocate.csv.gpuResources
          241 kB
          Jonathan Hung
        6. synth_sls.json
          3 kB
          Jonathan Hung
        7. counter.scheduler.operation.allocate.csv.defaultResources
          144 kB
          Jonathan Hung

        Issue Links

          Activity

            People

              jhung Jonathan Hung
              jhung Jonathan Hung
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: