Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6223

[Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0
    • None
    • None

    Description

      As varieties of workloads are moving to YARN, including machine learning / deep learning which can speed up by leveraging GPU computation power. Workloads should be able to request GPU from YARN as simple as CPU and memory.

      To make a complete GPU story, we should support following pieces:
      1) GPU discovery/configuration: Admin can either config GPU resources and architectures on each node, or more advanced, NodeManager can automatically discover GPU resources and architectures and report to ResourceManager

      2) GPU scheduling: YARN scheduler should account GPU as a resource type just like CPU and memory.

      3) GPU isolation/monitoring: once launch a task with GPU resources, NodeManager should properly isolate and monitor task's resource usage.

      For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced an extensible framework to support isolation for different resource types and different runtimes.

      Related JIRAs:
      There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but different solutions:
      For scheduling:

      For isolation:

      Attachments

        1. YARN-6223.wip.3.patch
          128 kB
          Wangda Tan
        2. YARN-6223.wip.2.patch
          69 kB
          Wangda Tan
        3. YARN-6223.wip.1.patch
          31 kB
          Wangda Tan
        4. YARN-6223.Natively-support-GPU-on-YARN-v1.pdf
          169 kB
          Wangda Tan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            leftnoteasy Wangda Tan
            leftnoteasy Wangda Tan
            Votes:
            4 Vote for this issue
            Watchers:
            55 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment