Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6223

[Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0
    • None
    • None

    Description

      As varieties of workloads are moving to YARN, including machine learning / deep learning which can speed up by leveraging GPU computation power. Workloads should be able to request GPU from YARN as simple as CPU and memory.

      To make a complete GPU story, we should support following pieces:
      1) GPU discovery/configuration: Admin can either config GPU resources and architectures on each node, or more advanced, NodeManager can automatically discover GPU resources and architectures and report to ResourceManager

      2) GPU scheduling: YARN scheduler should account GPU as a resource type just like CPU and memory.

      3) GPU isolation/monitoring: once launch a task with GPU resources, NodeManager should properly isolate and monitor task's resource usage.

      For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced an extensible framework to support isolation for different resource types and different runtimes.

      Related JIRAs:
      There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but different solutions:
      For scheduling:

      For isolation:

      Attachments

        1. YARN-6223.Natively-support-GPU-on-YARN-v1.pdf
          169 kB
          Wangda Tan
        2. YARN-6223.wip.1.patch
          31 kB
          Wangda Tan
        3. YARN-6223.wip.2.patch
          69 kB
          Wangda Tan
        4. YARN-6223.wip.3.patch
          128 kB
          Wangda Tan

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              leftnoteasy Wangda Tan
              leftnoteasy Wangda Tan
              Votes:
              4 Vote for this issue
              Watchers:
              55 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: