Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6223

[Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0
    • None
    • None

    Description

      As varieties of workloads are moving to YARN, including machine learning / deep learning which can speed up by leveraging GPU computation power. Workloads should be able to request GPU from YARN as simple as CPU and memory.

      To make a complete GPU story, we should support following pieces:
      1) GPU discovery/configuration: Admin can either config GPU resources and architectures on each node, or more advanced, NodeManager can automatically discover GPU resources and architectures and report to ResourceManager

      2) GPU scheduling: YARN scheduler should account GPU as a resource type just like CPU and memory.

      3) GPU isolation/monitoring: once launch a task with GPU resources, NodeManager should properly isolate and monitor task's resource usage.

      For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced an extensible framework to support isolation for different resource types and different runtimes.

      Related JIRAs:
      There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but different solutions:
      For scheduling:

      For isolation:

      Attachments

        1. YARN-6223.Natively-support-GPU-on-YARN-v1.pdf
          169 kB
          Wangda Tan
        2. YARN-6223.wip.1.patch
          31 kB
          Wangda Tan
        3. YARN-6223.wip.2.patch
          69 kB
          Wangda Tan
        4. YARN-6223.wip.3.patch
          128 kB
          Wangda Tan

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            leftnoteasy Wangda Tan Assign to me
            leftnoteasy Wangda Tan
            Votes:
            4 Vote for this issue
            Watchers:
            60 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment