Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3612

Resource calculation in child tasks is CPU-heavy

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.0
    • Fix Version/s: None
    • Component/s: None

      Description

      In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage.

      As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

        Attachments

        1. YARN-3612.02.patch
          8 kB
          Varun Saxena
        2. YARN-3612.01.patch
          8 kB
          Varun Saxena
        3. MAPREDUCE-4469_rev5.patch
          11 kB
          Ahmed Radwan
        4. MAPREDUCE-4469_rev4.patch
          16 kB
          Ahmed Radwan
        5. MAPREDUCE-4469_rev3.patch
          9 kB
          Ahmed Radwan
        6. MAPREDUCE-4469_rev2.patch
          3 kB
          Ahmed Radwan
        7. MAPREDUCE-4469.patch
          2 kB
          Ahmed Radwan

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated: