Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7481

Gpu locality support for Better AI scheduling

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • None
    • api, RM, yarn
    • None
    • Patch

    Description

      We enhance Hadoop with GPU support for better AI job scheduling.

      Currently, YARN-3926 also supports GPU scheduling, which treats GPU as countable resource.

      However, GPU placement is also very important to deep learning job for better efficiency.
      For example, a 2-GPU job runs on gpu

      {0,1}

      could be faster than run on gpu

      {0, 7}

      , if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.

      We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which support fine-grained GPU placement.

      A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage and locality information in a node (up to 64 GPUs per node). '1' means available and '0' otherwise in the corresponding position of the bit.  

      Attachments

        1. branch-2.7.2.gpu-port-20180723.patch
          982 kB
          Chen Qingcha
        2. GPU locality support for Job scheduling.pdf
          370 kB
          Chen Qingcha
        3. hadoop_2.9.0.patch
          912 kB
          Chen Qingcha
        4. hadoop-2.7.2.gpu-port.patch
          967 kB
          Chen Qingcha
        5. hadoop-2.7.2.gpu-port-20180711.patch
          981 kB
          Chen Qingcha
        6. hadoop-2.9.0.gpu-port.20180725.patch
          937 kB
          Chen Qingcha
        7. hadoop-2.9.0.gpu-port.20180920.patch
          937 kB
          Chen Qingcha
        8. hadoop-2.9.0.gpu-port.patch
          936 kB
          Chen Qingcha

        Activity

          People

            Unassigned Unassigned
            qinchen@microsoft.com Chen Qingcha
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 1,344h
                1,344h
                Remaining:
                Remaining Estimate - 1,344h
                1,344h
                Logged:
                Time Spent - Not Specified
                Not Specified