Hadoop Common
  1. Hadoop Common
  2. HADOOP-5241

Reduce tasks get stuck because of over-estimated task size (regression from 0.18)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.19.2, 0.20.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Red Hat Enterprise Linux Server release 5.2
      JDK 1.6.0_11
      Hadoop 0.19.0

    • Hadoop Flags:
      Reviewed

      Description

      I have a simple MR benchmark job that computes PageRank on about 600 GB of HTML files using a 100 node cluster. For some reason, my reduce tasks get caught in a pending state. The JobTracker's log gets filled with the following messages:

      2009-02-12 15:47:29,839 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_d-59.cs.wisc.edu:localhost/127.0.0.1:33227 has 110125027328 bytes free; but we expect reduce input to take 399642198235
      2009-02-12 15:47:29,852 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_d-67.cs.wisc.edu:localhost/127.0.0.1:48626 has 107537776640 bytes free; but we expect reduce input to take 399642198235
      2009-02-12 15:47:29,885 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_d-73.cs.wisc.edu:localhost/127.0.0.1:58849 has 113631690752 bytes free; but we expect reduce input to take 399642198235
      <SNIP>

      The weird thing is that I get through about 70 reduce tasks completing before it hangs. If I reduce the amount of the input data on 100 nodes down to 200GB, then it seems to work. As I scale the amount of input to the number of nodes, I can get it work some of the times on 50 nodes and without any problems on 25 nodes and less.

      Note that it worked without any problems on Hadoop 0.18 late last year without changing any of the input data or the actual MR code.

      1. 5241_v1.patch
        6 kB
        Sharad Agarwal
      2. hadoop_task_screenshot.png
        78 kB
        Andy Pavlo
      3. hadoop-patched-jobtracker.log.gz
        905 kB
        Andy Pavlo
      4. 5241_v1.patch
        6 kB
        Sharad Agarwal
      5. hadoop-jobtracker.log.gz
        1.02 MB
        Andy Pavlo

        Activity

        Hide
        Devaraj Das added a comment -

        I committed this to the 0.19 branch.

        Show
        Devaraj Das added a comment - I committed this to the 0.19 branch.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #763 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/763/)
        . Fixes a bug in disk-space resource estimation. Makes the estimation formula linear where blowUp = Total-Output/Total-Input. Contributed by Sharad Agarwal.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #763 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/763/ ) . Fixes a bug in disk-space resource estimation. Makes the estimation formula linear where blowUp = Total-Output/Total-Input. Contributed by Sharad Agarwal.
        Hide
        Devaraj Das added a comment -

        I just committed this to the 0.20 and 0.21 branches. Thanks, Sharad! (After 0.19.1 is released, we should commit this to 0.19 branch as well)

        Show
        Devaraj Das added a comment - I just committed this to the 0.20 and 0.21 branches. Thanks, Sharad! (After 0.19.1 is released, we should commit this to 0.19 branch as well)
        Hide
        Sharad Agarwal added a comment -

        Test case failure is unrelated

        Show
        Sharad Agarwal added a comment - Test case failure is unrelated
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12400731/5241_v1.patch
        against trunk revision 746864.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400731/5241_v1.patch against trunk revision 746864. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3901/console This message is automatically generated.
        Hide
        Sharad Agarwal added a comment -

        Resubmitting the same patch, so that Hudson can pick the correct one.

        Show
        Sharad Agarwal added a comment - Resubmitting the same patch, so that Hudson can pick the correct one.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12400381/hadoop_task_screenshot.png
        against trunk revision 746864.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3900/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12400381/hadoop_task_screenshot.png against trunk revision 746864. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3900/console This message is automatically generated.
        Hide
        Sharad Agarwal added a comment -

        one of my reduce tasks is reporting to be over 100% complete!

        this is a known bug. see HADOOP-5210 and HADOOP-4000

        Show
        Sharad Agarwal added a comment - one of my reduce tasks is reporting to be over 100% complete! this is a known bug. see HADOOP-5210 and HADOOP-4000
        Hide
        Andy Pavlo added a comment -

        The benchmark is working again without any errors on 100 nodes with the full 600GB data set. I'm also getting roughly the same speed I was getting before. I think that Sharad's patch fixes my problem. I have two other experiments that I'll run tomorrow morning to double check whether this is truly fixed.

        Lastly, I don't know whether this is a side-effect of the patch, but I noticed that one of my reduce tasks is reporting to be over 100% complete! The setup is as follows: I run the full PageRank experiment on all 100 nodes and then combine the results into a single file using a second MR job with a single reduce task. The execution time is about the same as 0.18, so I'm not seeing any performance problems.

        Show
        Andy Pavlo added a comment - The benchmark is working again without any errors on 100 nodes with the full 600GB data set. I'm also getting roughly the same speed I was getting before. I think that Sharad's patch fixes my problem. I have two other experiments that I'll run tomorrow morning to double check whether this is truly fixed. Lastly, I don't know whether this is a side-effect of the patch, but I noticed that one of my reduce tasks is reporting to be over 100% complete! The setup is as follows: I run the full PageRank experiment on all 100 nodes and then combine the results into a single file using a second MR job with a single reduce task. The execution time is about the same as 0.18, so I'm not seeing any performance problems.
        Hide
        Andy Pavlo added a comment -

        Screenshot of running job with over 100% completion rate?

        Show
        Andy Pavlo added a comment - Screenshot of running job with over 100% completion rate?
        Hide
        Andy Pavlo added a comment -

        JobTracker logfile running same benchmarks as before without any errors.

        Show
        Andy Pavlo added a comment - JobTracker logfile running same benchmarks as before without any errors.
        Hide
        Sharad Agarwal added a comment -

        attaching patch with the fix. Andy, could you try out this patch with your installation ?

        Show
        Sharad Agarwal added a comment - attaching patch with the fix. Andy, could you try out this patch with your installation ?
        Hide
        Sharad Agarwal added a comment -

        The high estimate is due to the reason that blow up ratio is added for each completed task. The maps which have 0 input create lot of skew. In the current code, consider this case:
        1. first map completes blowupOnThisTask (output/input) = 1000/200 = 5
        mapBlowupRatio = 5
        2. second map completes blowupOnThisTask (output/input) = 50/1 = 50 (in this case the input was 0 and there is some output produced)
        mapBlowupRatio becomes 50/2 + ((2 - 1) / 2) * 5 = 27.5
        This leads to unreasonable blow up from 5 to 27.5

        The fix I propose is as follows:
        Instead of adding the blowup ratio, calculate the blow up based on (cumulative completed map output size)/(cumulative completed map input size). For above example, it works out as follows:
        1. first map completes blowupOnThisTask (output/input) = 1000/200 = 5
        mapBlowupRatio = 5
        2. second map completes blowupOnThisTask (output/input) = 50/1 = 50 (in this case the input was 0 and there is some output produced)
        mapBlowupRatio becomes (1000+50)/(200+1) ~ 5.2
        This leads to reasonable blow up from 5 to 5.2

        Show
        Sharad Agarwal added a comment - The high estimate is due to the reason that blow up ratio is added for each completed task. The maps which have 0 input create lot of skew. In the current code, consider this case: 1. first map completes blowupOnThisTask (output/input) = 1000/200 = 5 mapBlowupRatio = 5 2. second map completes blowupOnThisTask (output/input) = 50/1 = 50 (in this case the input was 0 and there is some output produced) mapBlowupRatio becomes 50/2 + ((2 - 1) / 2) * 5 = 27.5 This leads to unreasonable blow up from 5 to 27.5 The fix I propose is as follows: Instead of adding the blowup ratio, calculate the blow up based on (cumulative completed map output size)/(cumulative completed map input size). For above example, it works out as follows: 1. first map completes blowupOnThisTask (output/input) = 1000/200 = 5 mapBlowupRatio = 5 2. second map completes blowupOnThisTask (output/input) = 50/1 = 50 (in this case the input was 0 and there is some output produced) mapBlowupRatio becomes (1000+50)/(200+1) ~ 5.2 This leads to reasonable blow up from 5 to 5.2
        Hide
        Andy Pavlo added a comment - - edited

        I've attached the complete log of the JobTracker for a single MR job submission that hits this issue on 100 nodes.

        Show
        Andy Pavlo added a comment - - edited I've attached the complete log of the JobTracker for a single MR job submission that hits this issue on 100 nodes.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Hm.. there does seem to be a bug in the estimation as Jothi pointed out quite some time back here (https://issues.apache.org/jira/browse/HADOOP-657?focusedCommentId=12622464#action_12622464). The estimate calculated in that case was ~1000GB!!

        Logs would definitely help track this one down.

        Show
        Vinod Kumar Vavilapalli added a comment - Hm.. there does seem to be a bug in the estimation as Jothi pointed out quite some time back here ( https://issues.apache.org/jira/browse/HADOOP-657?focusedCommentId=12622464#action_12622464 ). The estimate calculated in that case was ~1000GB!! Logs would definitely help track this one down.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The patch related to considering disk space for scheduling was committed to 0.19 and later (HADOOP-657). So, the difference between behaviour of 0.18 and that of 0.19 is expected. But, it is definitely weird for the estimate to shoot up to as high as 400 GB. Can you attach the whole JT log when this particular problem occured, so that we can find how the estimates performed/were calculated over time and when they started going wayward.

        Show
        Vinod Kumar Vavilapalli added a comment - The patch related to considering disk space for scheduling was committed to 0.19 and later ( HADOOP-657 ). So, the difference between behaviour of 0.18 and that of 0.19 is expected. But, it is definitely weird for the estimate to shoot up to as high as 400 GB. Can you attach the whole JT log when this particular problem occured, so that we can find how the estimates performed/were calculated over time and when they started going wayward.

          People

          • Assignee:
            Sharad Agarwal
            Reporter:
            Andy Pavlo
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development