Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3612

Resource calculation in child tasks is CPU-heavy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.0
    • None
    • None

    Description

      In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage.

      As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

      Attachments

        1. MAPREDUCE-4469_rev2.patch
          3 kB
          Ahmed Radwan
        2. MAPREDUCE-4469_rev3.patch
          9 kB
          Ahmed Radwan
        3. MAPREDUCE-4469_rev4.patch
          16 kB
          Ahmed Radwan
        4. MAPREDUCE-4469_rev5.patch
          11 kB
          Ahmed Radwan
        5. MAPREDUCE-4469.patch
          2 kB
          Ahmed Radwan
        6. YARN-3612.01.patch
          8 kB
          Varun Saxena
        7. YARN-3612.02.patch
          8 kB
          Varun Saxena

        Activity

          People

            Unassigned Unassigned
            tlipcon Todd Lipcon
            Votes:
            1 Vote for this issue
            Watchers:
            27 Start watching this issue

            Dates

              Created:
              Updated: