Hadoop Common
  1. Hadoop Common
  2. HADOOP-2980

slow reduce copies - map output locations not being fetched even when map complete

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.15.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      maps are long finished. reduces are stuck looking for map locations. they make progress - but slowly. it almost seems like they get new map locations every minute or so:

      2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 done copying task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
      2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete map-outputs from tasktracker and 0 map-outputs from previous failures
      2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 0 known map output location(s); scheduling...
      ...
      2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Got 50 known map output location(s); scheduling...
      2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow hosts and 9 dup hosts)

      they get about 50 locations at a time and this 1 minute delay pattern is surprisingly common ..

        Activity

        Hide
        Alexandre Normand added a comment -

        I googled my way to this ticket after seeing something similar. I'm seeing this occasionally where we have failed task attempts where the logs are showing this:

        2012-12-03 16:46:36,980 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 36 map output(s) where 1 is already in progress
        2012-12-03 16:46:36,980 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 4 outputs (0 slow hosts and29 dup hosts)
        2012-12-03 16:46:38,983 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 2 outputs (0 slow hosts and29 dup hosts)
        2012-12-03 16:47:36,992 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress
        2012-12-03 16:47:36,993 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts)
        ...
        2012-12-03 16:54:37,080 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress
        2012-12-03 16:54:37,080 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts)
        2012-12-03 16:55:37,228 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress
        2012-12-03 16:55:37,228 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts)
        2012-12-03 16:56:37,235 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress
        2012-12-03 16:56:37,236 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts)
        

        Eventually, a task attempt will get killed for not having reported a status in 600 seconds and the retry will complete quickly.

        Since I'm running with hadoop 2.0.0, I'm wondering if that would be the same issue or a completely different one. Also, the fact that this bug remained dormant for so long makes me wonder if people are just not seeing that issue anymore and if that could be config-related.

        Show
        Alexandre Normand added a comment - I googled my way to this ticket after seeing something similar. I'm seeing this occasionally where we have failed task attempts where the logs are showing this: 2012-12-03 16:46:36,980 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 36 map output(s) where 1 is already in progress 2012-12-03 16:46:36,980 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 4 outputs (0 slow hosts and29 dup hosts) 2012-12-03 16:46:38,983 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 2 outputs (0 slow hosts and29 dup hosts) 2012-12-03 16:47:36,992 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress 2012-12-03 16:47:36,993 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts) ... 2012-12-03 16:54:37,080 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress 2012-12-03 16:54:37,080 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts) 2012-12-03 16:55:37,228 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress 2012-12-03 16:55:37,228 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts) 2012-12-03 16:56:37,235 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is already in progress 2012-12-03 16:56:37,236 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 dup hosts) Eventually, a task attempt will get killed for not having reported a status in 600 seconds and the retry will complete quickly. Since I'm running with hadoop 2.0.0, I'm wondering if that would be the same issue or a completely different one. Also, the fact that this bug remained dormant for so long makes me wonder if people are just not seeing that issue anymore and if that could be config-related.
        Hide
        Sameer Al-Sakran added a comment -

        I'm seeing this behaviour in 0.17.0, only waits are anywhere from 1-60 minutes. Happens sporadically, but relatively frequently.

        Show
        Sameer Al-Sakran added a comment - I'm seeing this behaviour in 0.17.0, only waits are anywhere from 1-60 minutes. Happens sporadically, but relatively frequently.
        Hide
        Devaraj Das added a comment -

        Before you spend time instrumenting the code, could you please apply https://issues.apache.org/jira/secure/attachment/12372447/1719.patch and see whether it helps. I checked that the patch applies cleanly to 0.15.3. Thanks!

        Show
        Devaraj Das added a comment - Before you spend time instrumenting the code, could you please apply https://issues.apache.org/jira/secure/attachment/12372447/1719.patch and see whether it helps. I checked that the patch applies cleanly to 0.15.3. Thanks!
        Hide
        Joydeep Sen Sarma added a comment -

        yeah - it's not the 1719 jira. the copies are all done and no new locations come back before 1 additional minute. there are no slow copies and no backoffs (which is the 5 minute thing) in the logs.

        doesn't seem like there are any changes in 0.16 that would seem to resolve the situation then. i am sure i can find this with enough additional instrumentation. just haven't had the time to ..

        is there an ipc timeout of 60 seconds somewhere lower down the stack?

        Show
        Joydeep Sen Sarma added a comment - yeah - it's not the 1719 jira. the copies are all done and no new locations come back before 1 additional minute. there are no slow copies and no backoffs (which is the 5 minute thing) in the logs. doesn't seem like there are any changes in 0.16 that would seem to resolve the situation then. i am sure i can find this with enough additional instrumentation. just haven't had the time to .. is there an ipc timeout of 60 seconds somewhere lower down the stack?
        Hide
        Devaraj Das added a comment -

        The '50' can be explained since the default probe size (for mapcompletionevents) is 50. The probe size increases with the number of copier threads via the calculation: max(numCopiers*5, 50) So one thing that you should do is to increase the number of copier threads (mapred.reduce.parallel.copies to something like 20).
        But I am not able to clearly explain the 1 minute interval between the queries. This is one scenario when it can happen - discussed in HADOOP-1719 (https://issues.apache.org/jira/browse/HADOOP-1719?focusedCommentId=12522954#action_12522954). The other scenario where this can happen is when there is a failure to fetch output from a host and the task backs off between 1 & 5 minutes from that host. But you say that you don't see any exceptions...
        BTW the above issues are fixed in 0.16.

        Show
        Devaraj Das added a comment - The '50' can be explained since the default probe size (for mapcompletionevents) is 50. The probe size increases with the number of copier threads via the calculation: max(numCopiers*5, 50) So one thing that you should do is to increase the number of copier threads (mapred.reduce.parallel.copies to something like 20). But I am not able to clearly explain the 1 minute interval between the queries. This is one scenario when it can happen - discussed in HADOOP-1719 ( https://issues.apache.org/jira/browse/HADOOP-1719?focusedCommentId=12522954#action_12522954 ). The other scenario where this can happen is when there is a failure to fetch output from a host and the task backs off between 1 & 5 minutes from that host. But you say that you don't see any exceptions... BTW the above issues are fixed in 0.16.
        Hide
        Joydeep Sen Sarma added a comment -

        any guesses on what might be going wrong or what u might want us to log/look at? the tasktracker logs look clean to me (at least i dont see any exceptions and such). but this is happening all the time and i want to get to the bottom of it ..

        Show
        Joydeep Sen Sarma added a comment - any guesses on what might be going wrong or what u might want us to log/look at? the tasktracker logs look clean to me (at least i dont see any exceptions and such). but this is happening all the time and i want to get to the bottom of it ..
        Hide
        Devaraj Das added a comment -

        There is a fixed delay between two consecutive polls to the JobTracker. But as of 0.16, the way it works is if tasks run out of map output locations, the tasktracker polls the JT (minimum delay between two consecutive polls is set as 5 secs).

        Show
        Devaraj Das added a comment - There is a fixed delay between two consecutive polls to the JobTracker. But as of 0.16, the way it works is if tasks run out of map output locations, the tasktracker polls the JT (minimum delay between two consecutive polls is set as 5 secs).
        Joydeep Sen Sarma created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Joydeep Sen Sarma
          • Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development