Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7481

Gpu locality support for Better AI scheduling

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.2
    • Fix Version/s: None
    • Component/s: api, RM, yarn
    • Labels:
      None
    • Target Version/s:
    • Flags:
      Patch

      Description

      We enhance Hadoop with GPU support for better AI job scheduling.

      Currently, YARN-3926 also supports GPU scheduling, which treats GPU as countable resource.

      However, GPU placement is also very important to deep learning job for better efficiency.
      For example, a 2-GPU job runs on gpu

      {0,1}

      could be faster than run on gpu

      {0, 7}

      , if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.

      We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which support fine-grained GPU placement.

      A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage and locality information in a node (up to 64 GPUs per node). '1' means available and '0' otherwise in the corresponding position of the bit.  

        Attachments

        1. hadoop-2.9.0.gpu-port.patch
          936 kB
          Chen Qingcha
        2. hadoop-2.9.0.gpu-port.20180920.patch
          937 kB
          Chen Qingcha
        3. hadoop-2.9.0.gpu-port.20180725.patch
          937 kB
          Chen Qingcha
        4. hadoop-2.7.2.gpu-port-20180711.patch
          981 kB
          Chen Qingcha
        5. hadoop-2.7.2.gpu-port.patch
          967 kB
          Chen Qingcha
        6. hadoop_2.9.0.patch
          912 kB
          Chen Qingcha
        7. GPU locality support for Job scheduling.pdf
          370 kB
          Chen Qingcha
        8. branch-2.7.2.gpu-port-20180723.patch
          982 kB
          Chen Qingcha

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              qinchen@microsoft.com Chen Qingcha
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 1,344h
                1,344h
                Remaining:
                Remaining Estimate - 1,344h
                1,344h
                Logged:
                Time Spent - Not Specified
                Not Specified