Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7481

Gpu locality support for Better AI scheduling

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.2
    • Fix Version/s: None
    • Component/s: api, RM, yarn
    • Labels:
      None
    • Target Version/s:
    • Flags:
      Patch

      Description

      We enhance Hadoop with GPU support for better AI job scheduling.

      Currently, YARN-3926 also supports GPU scheduling, which treats GPU as countable resource.

      However, GPU placement is also very important to deep learning job for better efficiency.
      For example, a 2-GPU job runs on gpu

      {0,1}

      could be faster than run on gpu

      {0, 7}

      , if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.

      We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which support fine-grained GPU placement.

      A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage and locality information in a node (up to 64 GPUs per node). '1' means available and '0' otherwise in the corresponding position of the bit.  

        Attachments

        1. GPU locality support for Job scheduling.pdf
          370 kB
          Chen Qingcha
        2. hadoop-2.7.2.gpu-port.patch
          967 kB
          Chen Qingcha
        3. hadoop_2.9.0.patch
          912 kB
          Chen Qingcha
        4. hadoop-2.7.2.gpu-port-20180711.patch
          981 kB
          Chen Qingcha
        5. hadoop-2.9.0.gpu-port.patch
          936 kB
          Chen Qingcha
        6. branch-2.7.2.gpu-port-20180723.patch
          982 kB
          Chen Qingcha
        7. hadoop-2.9.0.gpu-port.20180725.patch
          937 kB
          Chen Qingcha
        8. hadoop-2.9.0.gpu-port.20180920.patch
          937 kB
          Chen Qingcha

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              qinchen@microsoft.com Chen Qingcha
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 1,344h
                1,344h
                Remaining:
                Remaining Estimate - 1,344h
                1,344h
                Logged:
                Time Spent - Not Specified
                Not Specified