Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3278

0.20: avoid a busy-loop in ReduceTask scheduling

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.205.0
    • Fix Version/s: 1.1.0
    • Component/s: mrv1, performance, task
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Looking at profiling results, it became clear that the ReduceTask has the following busy-loop which was causing it to suck up 100% of CPU in the fetch phase in some configurations:

      • the number of reduce fetcher threads is configured to more than the number of hosts
      • therefore "busyEnough()" never returns true
      • the "scheduling" portion of the code can't schedule any new fetches, since all of the pending fetches in the mapLocations buffer correspond to hosts that are already being fetched (the hosts are in the uniqueHosts map)
      • getCopyResult() immediately returns null, since there are no completed maps.
        Hence ReduceTask spins back and forth between trying to schedule things (and failing), and trying to grab completed results (of which there are none), with no waits.
      1. mr-3278.txt
        4 kB
        Todd Lipcon
      2. reducer-cpu-usage.png
        99 kB
        Todd Lipcon

        Activity

        Hide
        Matt Foley added a comment -

        Closed upon release of Hadoop-1.1.0.

        Show
        Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.
        Hide
        Todd Lipcon added a comment -

        Committed to branch-0.20-security. Thanks, Eli.

        Show
        Todd Lipcon added a comment - Committed to branch-0.20-security. Thanks, Eli.
        Hide
        Eli Collins added a comment -

        +1 Nice find.

        I went over your change a couple of times (wasn't familiar with ReduceTask) and it looks correct to me. Agree that you should be able to correctly wait on copyResultsOrNewEventsLock w/o a timeout but being conservative makes sense (at worst you fall back to the bug which is already present).

        Show
        Eli Collins added a comment - +1 Nice find. I went over your change a couple of times (wasn't familiar with ReduceTask) and it looks correct to me. Agree that you should be able to correctly wait on copyResultsOrNewEventsLock w/o a timeout but being conservative makes sense (at worst you fall back to the bug which is already present).
        Hide
        Todd Lipcon added a comment -

        Here's a candidate patch which fixed the CPU spinning on my cluster.

        Worth noting that this problem is more severe when the fetcher thread count is configured higher than number of nodes. But, it still happens even if you have fewer fetchers than nodes, as soon as the number of unique nodes holding map output drops below the number of threads.

        Show
        Todd Lipcon added a comment - Here's a candidate patch which fixed the CPU spinning on my cluster. Worth noting that this problem is more severe when the fetcher thread count is configured higher than number of nodes. But, it still happens even if you have fewer fetchers than nodes, as soon as the number of unique nodes holding map output drops below the number of threads.
        Hide
        Todd Lipcon added a comment -

        Here's a before-after of a node running terasort. On the left terasort (unpatched) you can see when the reducers start and eat up a ton of CPU. On the right (patched) terasort, the reducers add more iowait but CPU usage is minimal. top showed the reducers in fetch stage using ~15% CPU instead of ~105% CPU. Total terasort time improved by 10% or so. I'll upload a patch after a bit more testing.

        Show
        Todd Lipcon added a comment - Here's a before-after of a node running terasort. On the left terasort (unpatched) you can see when the reducers start and eat up a ton of CPU. On the right (patched) terasort, the reducers add more iowait but CPU usage is minimal. top showed the reducers in fetch stage using ~15% CPU instead of ~105% CPU. Total terasort time improved by 10% or so. I'll upload a patch after a bit more testing.
        Hide
        Todd Lipcon added a comment -

        AFAIK this only applies to the 0.20 code. The Shuffle was substantially rewritten for 0.21 by MAPREDUCE-318, which also did a big refactor. This JIRA is for a more targeted bug fix on the stable branch.

        Show
        Todd Lipcon added a comment - AFAIK this only applies to the 0.20 code. The Shuffle was substantially rewritten for 0.21 by MAPREDUCE-318 , which also did a big refactor. This JIRA is for a more targeted bug fix on the stable branch.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development