Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6223

[Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1.0
    • Component/s: None
    • Labels:
      None

      Description

      As varieties of workloads are moving to YARN, including machine learning / deep learning which can speed up by leveraging GPU computation power. Workloads should be able to request GPU from YARN as simple as CPU and memory.

      To make a complete GPU story, we should support following pieces:
      1) GPU discovery/configuration: Admin can either config GPU resources and architectures on each node, or more advanced, NodeManager can automatically discover GPU resources and architectures and report to ResourceManager

      2) GPU scheduling: YARN scheduler should account GPU as a resource type just like CPU and memory.

      3) GPU isolation/monitoring: once launch a task with GPU resources, NodeManager should properly isolate and monitor task's resource usage.

      For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced an extensible framework to support isolation for different resource types and different runtimes.

      Related JIRAs:
      There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but different solutions:
      For scheduling:

      For isolation:

        Attachments

        1. YARN-6223.wip.1.patch
          31 kB
          Wangda Tan
        2. YARN-6223.Natively-support-GPU-on-YARN-v1.pdf
          169 kB
          Wangda Tan
        3. YARN-6223.wip.2.patch
          69 kB
          Wangda Tan
        4. YARN-6223.wip.3.patch
          128 kB
          Wangda Tan

          Issue Links

            Activity

              People

              • Assignee:
                leftnoteasy Wangda Tan
                Reporter:
                leftnoteasy Wangda Tan
              • Votes:
                4 Vote for this issue
                Watchers:
                53 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: