Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1904

Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: tasktracker
    • Labels:
      None

      Description

      While profiling tasktracker with Sort benchmark, it was observed that threads block on LocalDirAllocator.getLocalPathToRead() in order to get the index file and temporary map output file.

      As LocalDirAllocator is tied up with ServetContext, only one instance would be available per tasktracker httpserver. Given the jobid & mapid, LocalDirAllocator retrieves index file path and temporary map output file path. getLocalPathToRead() is internally synchronized.

      Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be varied based on the environment and I observed a throughput improvement in the order of 4-7% with the introduction of LRUCache.

      1. MAPREDUCE-1904-RC10.patch
        3 kB
        Rajesh Balamohan
      2. TaskTracker- yourkit profiler output .jpg
        237 kB
        Rajesh Balamohan
      3. Thread profiler output showing contention.jpg
        391 kB
        Rajesh Balamohan
      4. profiler output after applying the patch.jpg
        211 kB
        Rajesh Balamohan
      5. MAPREDUCE-1904-trunk.patch
        6 kB
        Rajesh Balamohan
      6. LocalDirAllocator.JPG
        191 kB
        Rajesh Balamohan
      7. LocalDirAllocator_Monitor.JPG
        267 kB
        Rajesh Balamohan

        Activity

        Hide
        Liyin Liang added a comment -

        This is a great patch. Here is part of the stack when work thread is blocked:

        "1797055149@qtp0-98" prio=10 tid=0x0000002aa1a40000 nid=0x333 waiting for monitor entry [0x0000000049dc5000]
           java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:377)
                - waiting to lock <0x00000000a0000090> (a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext)
                at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:142)
                at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3086)
        

        I have written a job with one map which output 1M data, and 100 reduces. Each reduce spawn 10 threads to fetch data from map side 3k times just like shuffle phase. When run this job, most of work threads is blocked on AllocatorPerContext.

        With LRUCache, most work threads are blocked on LOG.info() as following stack.

        "1793911889@qtp0-101" prio=10 tid=0x0000002aa1530000 nid=0x34f2 waiting for monitor entry [0x0000000041d45000]
           java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.log4j.Category.callAppenders(Category.java:204)
                - waiting to lock <0x00000000a01be928> (a org.apache.log4j.spi.RootLogger)
                at org.apache.log4j.Category.forcedLog(Category.java:391)
                at org.apache.log4j.Category.log(Category.java:856)
                at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:133)
                at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3246)
        

        With LRUCache + disable LOG.info(): This job takes 3mins, 19sec to run.
        Without LRUCache + enable LOG.info(): This job takes just 37sec to run.

        b.t.w LRUCache should use mapId as key instead of (jobId + mapId). Because jobId is just part of mapId.

        Show
        Liyin Liang added a comment - This is a great patch. Here is part of the stack when work thread is blocked: "1797055149@qtp0-98" prio=10 tid=0x0000002aa1a40000 nid=0x333 waiting for monitor entry [0x0000000049dc5000] java.lang. Thread .State: BLOCKED (on object monitor) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:377) - waiting to lock <0x00000000a0000090> (a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:142) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3086) I have written a job with one map which output 1M data, and 100 reduces. Each reduce spawn 10 threads to fetch data from map side 3k times just like shuffle phase. When run this job, most of work threads is blocked on AllocatorPerContext. With LRUCache, most work threads are blocked on LOG.info() as following stack. "1793911889@qtp0-101" prio=10 tid=0x0000002aa1530000 nid=0x34f2 waiting for monitor entry [0x0000000041d45000] java.lang. Thread .State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x00000000a01be928> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:133) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3246) With LRUCache + disable LOG.info(): This job takes 3mins, 19sec to run. Without LRUCache + enable LOG.info(): This job takes just 37sec to run. b.t.w LRUCache should use mapId as key instead of (jobId + mapId) . Because jobId is just part of mapId.
        Hide
        Rajesh Balamohan added a comment -

        CPU profiler output of a TaskTracker "without" the patch. Profiler output shows that ~4% of time being spent on internal methods of LocalDirAllocator.*.getLocalPathToRead(). Rest of them time is spent on the method itself. This along with the monitor profiling shows synchonization to be the bottleneck in getLocalPathToRead().

        Show
        Rajesh Balamohan added a comment - CPU profiler output of a TaskTracker "without" the patch. Profiler output shows that ~4% of time being spent on internal methods of LocalDirAllocator.*.getLocalPathToRead(). Rest of them time is spent on the method itself. This along with the monitor profiling shows synchonization to be the bottleneck in getLocalPathToRead().
        Hide
        Rajesh Balamohan added a comment -

        Thanks for the review comments Arun.

        1. For #1, I would post the profiler output of which methods are expensive in getLocalPathToRead().

        2. For #2, the code path for LocalDirAllocator.confChanged() need not be called in this context of TaskTracker.

        Reason: In this context, TaskTracker is trying to check for any config changes related to "mapred.local.dir" using LocalDirAllocator. Once its read, this parameter does not change over TaskTracker's lifetime. Hence, it is not mandatory to do this check for every invocation. Corner case: When tasktracker goes down and new configs are reloaded, the LRUCache would also be repopulated.

        Show
        Rajesh Balamohan added a comment - Thanks for the review comments Arun. 1. For #1, I would post the profiler output of which methods are expensive in getLocalPathToRead(). 2. For #2, the code path for LocalDirAllocator.confChanged() need not be called in this context of TaskTracker. Reason: In this context, TaskTracker is trying to check for any config changes related to "mapred.local.dir" using LocalDirAllocator. Once its read, this parameter does not change over TaskTracker's lifetime. Hence, it is not mandatory to do this check for every invocation. Corner case: When tasktracker goes down and new configs are reloaded, the LRUCache would also be repopulated.
        Hide
        Arun C Murthy added a comment -

        Couple of concerns:

        1. I'd like to understand what part of LocalDirAllocator.getLocalPathToRead is expensive... it's fine to add a cache, but it's better to do it after we understand why we really need it.
        2. This patch results in the code path skipping the sanity checks in LocalDirAllocator.confChanged which is called by LocalDirAllocator.getLocalPathToRead. That is a concern. Again, this might be the expensive part of LocalDirAllocator.getLocalPathToRead, but we need to ensure that.

        Don't get me wrong, the focus of this jira is very useful - we just need to fix it the 'right' way.

        Show
        Arun C Murthy added a comment - Couple of concerns: I'd like to understand what part of LocalDirAllocator.getLocalPathToRead is expensive... it's fine to add a cache, but it's better to do it after we understand why we really need it. This patch results in the code path skipping the sanity checks in LocalDirAllocator.confChanged which is called by LocalDirAllocator.getLocalPathToRead. That is a concern. Again, this might be the expensive part of LocalDirAllocator.getLocalPathToRead, but we need to ensure that. Don't get me wrong, the focus of this jira is very useful - we just need to fix it the 'right' way.
        Hide
        Rajesh Balamohan added a comment -

        Attaching the patch for trunk version.

        Show
        Rajesh Balamohan added a comment - Attaching the patch for trunk version.
        Hide
        Rajesh Balamohan added a comment -

        Contention on LocalDirAllocator is very less. Close to 0%

        Show
        Rajesh Balamohan added a comment - Contention on LocalDirAllocator is very less. Close to 0%
        Hide
        Rajesh Balamohan added a comment -

        LocalDirAllocator.AllocatorPerContext is heavily contended.

        Show
        Rajesh Balamohan added a comment - LocalDirAllocator.AllocatorPerContext is heavily contended.
        Hide
        Rajesh Balamohan added a comment -

        Patch for RC10 release is attached here.

        Show
        Rajesh Balamohan added a comment - Patch for RC10 release is attached here.

          People

          • Assignee:
            Unassigned
            Reporter:
            Rajesh Balamohan
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development